Authoritative URL for the article.
Raw HTML content of the article page (UTF-8).
Plain text content extracted from the HTML.
OptionalcontentCleaned article content (plain text).
OptionaltitleArticle title.
Extracted from Mozilla Readability if available. Falls back to metadata (Schema.org, OpenGraph, Twitter Card, HTML title) if Readability extraction fails or title is empty.
OptionaldescriptionArticle description/excerpt.
OptionalimageArticle keyvisual/image URL (from best available source).
Priority: Schema.org NewsArticle/Article (largest) > OpenGraph > Twitter Card > Largest Apple Touch Icon > Favicon Returns the URL object of the best visual representation of the article.
OptionallanguagePrimary language code (ISO 639-1).
OptionalregionRegion/country code (ISO 3166-1 alpha-2).
Internal links found in the article (same domain/subdomain).
External links found in the article (different domains).
Word count of the article.
Estimated reading time in minutes.
Gathered article data.
Remarks
This interface represents the complete gathered data from an article page, including the authoritative URL, raw HTML, and extracted content. It will be extended incrementally with more properties.