123 lines
4.3 KiB
Markdown
123 lines
4.3 KiB
Markdown
|
# Readlist Specification
|
||
|
|
||
|
## Overview
|
||
|
|
||
|
The Readlist module allows users to save web articles for later reading. It provides a "read it later" service similar to Pocket or Instapaper, with automatic content extraction and a clean reading experience.
|
||
|
|
||
|
## Data Model
|
||
|
|
||
|
### ReadLaterItem
|
||
|
|
||
|
The `ReadLaterItem` entity has the following attributes:
|
||
|
|
||
|
| Field | Type | Description |
|
||
|
|-------|------|-------------|
|
||
|
| ID | string | Unique identifier for the item (UUID) |
|
||
|
| URL | string | Original URL of the article |
|
||
|
| Title | string | Title of the article |
|
||
|
| Content | string | Extracted HTML content of the article |
|
||
|
| Description | string | Brief description or excerpt of the article |
|
||
|
| CreatedAt | timestamp | When the item was saved |
|
||
|
| UpdatedAt | timestamp | When the item was last updated |
|
||
|
| ReadAt | timestamp | When the item was marked as read (null if unread) |
|
||
|
| ArchivedAt | timestamp | When the item was archived (null if not archived) |
|
||
|
|
||
|
## Features
|
||
|
|
||
|
### Item Management
|
||
|
|
||
|
1. **Save Article**: Users can save articles by providing a URL
|
||
|
2. **View Article**: Users can view saved articles in a clean, reader-friendly format
|
||
|
3. **Mark as Read**: Users can mark articles as read
|
||
|
4. **Archive Article**: Users can archive articles to remove them from the main list
|
||
|
5. **Delete Article**: Users can delete articles permanently
|
||
|
6. **List Articles**: Users can view a list of all saved articles
|
||
|
|
||
|
### Content Extraction
|
||
|
|
||
|
1. **Automatic Extraction**: The system automatically extracts the main content from web pages
|
||
|
2. **Title Extraction**: The system extracts the title of the article
|
||
|
3. **Description Extraction**: The system extracts a brief description or excerpt of the article
|
||
|
4. **HTML Cleaning**: The system cleans the HTML to provide a distraction-free reading experience
|
||
|
|
||
|
### Filtering and Sorting
|
||
|
|
||
|
1. **Filter by Status**: Users can filter articles by read/unread status
|
||
|
2. **Filter by Archive**: Users can filter articles by archived/unarchived status
|
||
|
3. **Sort by Date**: Users can sort articles by date saved or date read
|
||
|
|
||
|
## API Endpoints
|
||
|
|
||
|
| Method | Endpoint | Description |
|
||
|
|--------|----------|-------------|
|
||
|
| GET | /api/readlist | List all read later items |
|
||
|
| POST | /api/readlist | Save a new article |
|
||
|
| GET | /api/readlist/:id | Get a specific article by ID |
|
||
|
| PUT | /api/readlist/:id | Update a specific article |
|
||
|
| DELETE | /api/readlist/:id | Delete a specific article |
|
||
|
| PUT | /api/readlist/:id/read | Mark an article as read |
|
||
|
| PUT | /api/readlist/:id/unread | Mark an article as unread |
|
||
|
| PUT | /api/readlist/:id/archive | Archive an article |
|
||
|
| PUT | /api/readlist/:id/unarchive | Unarchive an article |
|
||
|
|
||
|
## Frontend Routes
|
||
|
|
||
|
| Route | Description |
|
||
|
|-------|-------------|
|
||
|
| /readlist | List of saved articles |
|
||
|
| /readlist/:id | View a specific article |
|
||
|
|
||
|
## Implementation Details
|
||
|
|
||
|
### Content Extraction
|
||
|
|
||
|
The system uses the `go-readability` library to extract content from web pages:
|
||
|
|
||
|
```go
|
||
|
func (r *ReadLaterItem) ParseURL() error {
|
||
|
article, err := readability.FromURL(r.URL, 30*time.Second)
|
||
|
if err != nil {
|
||
|
return fmt.Errorf("failed to parse URL: %w", err)
|
||
|
}
|
||
|
|
||
|
r.Title = article.Title
|
||
|
r.Content = article.Content
|
||
|
r.Description = article.Excerpt
|
||
|
return nil
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### HTML Sanitization
|
||
|
|
||
|
The extracted HTML content is sanitized to remove potentially harmful elements and provide a consistent reading experience:
|
||
|
|
||
|
1. Remove JavaScript and other active content
|
||
|
2. Preserve images, links, and basic formatting
|
||
|
3. Apply a consistent style to the content
|
||
|
|
||
|
### Status Management
|
||
|
|
||
|
The system tracks the status of articles using nullable timestamp fields:
|
||
|
|
||
|
1. `ReadAt`: When set, indicates the article has been read
|
||
|
2. `ArchivedAt`: When set, indicates the article has been archived
|
||
|
|
||
|
## User Interface
|
||
|
|
||
|
### Article List
|
||
|
|
||
|
- Displays a list of saved articles with titles, descriptions, and dates
|
||
|
- Provides filters for read/unread and archived/unarchived status
|
||
|
- Includes buttons for marking as read, archiving, and deleting
|
||
|
|
||
|
### Article Viewer
|
||
|
|
||
|
- Displays the article content in a clean, reader-friendly format
|
||
|
- Preserves images and links from the original article
|
||
|
- Provides buttons for marking as read, archiving, and returning to the list
|
||
|
|
||
|
### Save Form
|
||
|
|
||
|
- Input field for the URL to save
|
||
|
- Automatic extraction of content after submission
|
||
|
- Preview of the extracted content before saving
|