quicknotes/specs/readlist.md

136 lines
No EOL
5.1 KiB
Markdown

# Readlist Specification
## Overview
The Readlist module allows users to save web articles for later reading. It provides a "read it later" service similar to Pocket or Instapaper, with automatic content extraction and a clean reading experience.
## Data Model
### ReadLaterItem
The `ReadLaterItem` entity has the following attributes:
| Field | Type | Description |
|-------|------|-------------|
| ID | string | Unique identifier for the item (UUID) |
| URL | string | Original URL of the article |
| Title | string | Title of the article |
| Content | string | Extracted HTML content of the article |
| Description | string | Brief description or excerpt of the article |
| CreatedAt | timestamp | When the item was saved |
| UpdatedAt | timestamp | When the item was last updated |
| ReadAt | timestamp | When the item was marked as read (null if unread) |
| ArchivedAt | timestamp | When the item was archived (null if not archived) |
## Features
### Item Management
1. **Save Article**: Users can save articles by providing a URL
2. **View Article**: Users can view saved articles in a clean, reader-friendly format
3. **Mark as Read**: Users can mark articles as read
4. **Archive Article**: Users can archive articles to remove them from the main list
5. **Delete Article**: Users can delete articles permanently
6. **List Articles**: Users can view a list of all saved articles
### Content Extraction
1. **Automatic Extraction**: The system automatically extracts the main content from web pages
2. **Title Extraction**: The system extracts the title of the article
3. **Description Extraction**: The system extracts a brief description or excerpt of the article
4. **HTML Cleaning**: The system cleans the HTML to provide a distraction-free reading experience
### Filtering and Sorting
1. **Filter by Status**: Users can filter articles by read/unread status
2. **Filter by Archive**: Users can filter articles by archived/unarchived status
3. **Sort by Date**: Users can sort articles by date saved or date read
## API Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /api/readlist | List all read later items |
| POST | /api/readlist | Save a new article |
| GET | /api/readlist/:id | Get a specific article by ID |
| PUT | /api/readlist/:id | Update a specific article |
| DELETE | /api/readlist/:id | Delete a specific article |
| PUT | /api/readlist/:id/read | Mark an article as read |
| PUT | /api/readlist/:id/unread | Mark an article as unread |
| PUT | /api/readlist/:id/archive | Archive an article |
| PUT | /api/readlist/:id/unarchive | Unarchive an article |
## Frontend Routes
| Route | Description |
|-------|-------------|
| /readlist | List of saved articles |
| /readlist/:id | View a specific article |
## Implementation Details
### Content Extraction
The system uses the `go-readability` library to extract content from web pages:
```go
func (r *ReadLaterItem) ParseURL() error {
article, err := readability.FromURL(r.URL, 30*time.Second)
if err != nil {
return fmt.Errorf("failed to parse URL: %w", err)
}
r.Title = article.Title
r.Content = article.Content
r.Description = article.Excerpt
return nil
}
```
### HTML Sanitization
The extracted HTML content is sanitized to remove potentially harmful elements and provide a consistent reading experience:
1. Remove JavaScript and other active content
2. Preserve images, links, and basic formatting
3. Apply a consistent style to the content
### Status Management
The system tracks the status of articles using nullable timestamp fields:
1. `ReadAt`: When set, indicates the article has been read
2. `ArchivedAt`: When set, indicates the article has been archived
## User Interface
### Article List
- Displays a list of saved articles with titles, descriptions, and dates
- Provides filters for read/unread and archived/unarchived status
- Includes buttons for marking as read, archiving, and deleting
### Article Viewer
- Displays the article content in a clean, reader-friendly format
- Preserves images and links from the original article
- Provides buttons for marking as read, archiving, and returning to the list
### Save Form
- Input field for the URL to save
- Automatic extraction of content after submission
- Preview of the extracted content before saving
## Shiori Import
The Readlist module now supports importing bookmarks from a Shiori instance. This feature allows users to migrate or synchronize their bookmarks by connecting to a Shiori service using their credentials.
### API Endpoint
- POST /api/readlist/import/shiori: Accepts a JSON payload containing `url`, `username`, and `password`. The backend authenticates with the Shiori instance, fetches bookmarks, and creates corresponding read later items.
### Frontend Integration
- A form in the readlist UI accepts Shiori credentials. The readlist store includes an `importFromShiori` method that sends a request to the endpoint and processes the response, updating the list of saved articles accordingly.
### Error Handling
- Both the backend and frontend provide clear error messages if authentication or bookmark retrieval fails.