Magpie HTML - v0.1.3
    Preparing search index...

    Interface Website

    Gathered website data.

    This interface represents the complete gathered data from a website, including the authoritative URL and all extracted metadata. It will be extended incrementally with more properties.

    interface Website {
        url: URL;
        feeds: URL[];
        title?: string;
        description?: string;
        image?: URL;
        icon?: URL;
        language?: string;
        region?: string;
        html: string;
        text: string;
        internalLinks: URL[];
        externalLinks: URL[];
    }
    Index

    Properties

    url: URL

    Authoritative URL for the page.

    Uses canonical URL if present, otherwise the final URL after redirects.

    feeds: URL[]

    Discovered feed URLs (RSS, Atom, JSON Feed) as URL objects

    title?: string

    Page title (cleaned, from best available source).

    Collects titles from multiple sources, cleans them, and picks the longest. Sources: OpenGraph, Twitter Card, HTML title tag, First H1

    description?: string

    Page description (from best available source).

    Collects descriptions from metadata and picks the longest. Sources: OpenGraph, Twitter Card, HTML meta description

    image?: URL

    Page keyvisual/image URL (from best available source).

    Priority: OpenGraph > Twitter Card > Largest Apple Touch Icon > Favicon Returns the URL object of the best visual representation of the site.

    icon?: URL

    Best available icon/favicon for the site.

    Priority: Largest Apple Touch Icon > Safari mask icon > Favicon > Shortcut icon > MS tile > Fluid icon Returns the highest quality icon available, preferring modern, high-resolution formats.

    language?: string

    Primary language code (ISO 639-1).

    Extracted from HTML lang attribute, content-language meta tag, or OpenGraph locale. Normalized to lowercase ISO 639-1 format (e.g., 'en', 'de', 'fr', 'ja').

    region?: string

    Region code (ISO 3166-1 alpha-2).

    Only present if the language includes a region specifier. Normalized to uppercase ISO 3166-1 alpha-2 format (e.g., 'US', 'GB', 'DE').

    html: string

    Raw HTML content of the page (UTF-8).

    The complete HTML source after fetching and decoding to UTF-8. Useful for custom processing or caching.

    text: string

    Plain text content extracted from the HTML.

    Automatically converted from HTML using the htmlToText function. Removes all tags, decodes entities, and preserves document structure with appropriate line breaks.

    internalLinks: URL[]

    Internal links found on the page (same domain, excluding current URL).

    All links are URL objects. The current page URL is excluded to avoid self-references. Useful for site crawling and navigation analysis.

    externalLinks: URL[]

    External links found on the page (different domains).

    All links are URL objects. Useful for analyzing outbound links, citations, and external resources.