Magpie HTML - v0.1.3
    Preparing search index...

    Interface Article

    Gathered article data.

    This interface represents the complete gathered data from an article page, including the authoritative URL, raw HTML, and extracted content. It will be extended incrementally with more properties.

    interface Article {
        url: URL;
        html: string;
        text: string;
        content?: string;
        title?: string;
        description?: string;
        image?: URL;
        language?: string;
        region?: string;
        internalLinks: URL[];
        externalLinks: URL[];
        wordCount: number;
        readingTime: number;
    }
    Index

    Properties

    url: URL

    Authoritative URL for the article.

    Uses canonical URL if present, otherwise the final URL after redirects.

    html: string

    Raw HTML content of the article page (UTF-8).

    The complete HTML source after fetching and decoding to UTF-8. Useful for custom processing or caching.

    text: string

    Plain text content extracted from the HTML.

    Automatically converted from HTML using the htmlToText function. Removes all tags, decodes entities, and preserves document structure with appropriate line breaks.

    content?: string

    Cleaned article content (plain text).

    Extracted using Mozilla Readability (cleaned HTML), then converted to plain text using htmlToText for proper formatting. This is the main article body without navigation, ads, or other clutter. Falls back to undefined if Readability extraction fails.

    title?: string

    Article title.

    Extracted from Mozilla Readability if available. Falls back to metadata (Schema.org, OpenGraph, Twitter Card, HTML title) if Readability extraction fails or title is empty.

    description?: string

    Article description/excerpt.

    Extracted from Mozilla Readability's excerpt if available. Falls back to metadata (OpenGraph, Twitter Card, HTML meta description) if Readability excerpt is empty or extraction fails.

    image?: URL

    Article keyvisual/image URL (from best available source).

    Priority: Schema.org NewsArticle/Article (largest) > OpenGraph > Twitter Card > Largest Apple Touch Icon > Favicon Returns the URL object of the best visual representation of the article.

    language?: string

    Primary language code (ISO 639-1).

    Extracted from HTML lang attribute, Content-Language meta, or OpenGraph locale. Returns lowercase 2-letter ISO 639-1 code (e.g., 'en', 'de', 'fr').

    region?: string

    Region/country code (ISO 3166-1 alpha-2).

    Extracted from language tags like 'en-US' or 'de-DE'. Returns uppercase 2-letter ISO 3166-1 alpha-2 code (e.g., 'US', 'GB', 'DE').

    internalLinks: URL[]

    Internal links found in the article (same domain/subdomain).

    Links pointing to pages within the same domain. Automatically excludes the current article URL. All URLs are absolute and normalized.

    externalLinks: URL[]

    External links found in the article (different domains).

    Links pointing to external domains (useful for citations, references). All URLs are absolute and normalized.

    wordCount: number

    Word count of the article.

    Calculated from content if available (Readability-cleaned content), otherwise calculated from text (full page text). Based on whitespace-separated word boundaries.

    readingTime: number

    Estimated reading time in minutes.

    Calculated from word count using average reading speed of 200 words per minute. Minimum value is 1 minute.