Article URL as string or URL object
Gathered article data including URL, content, metadata, language, and links
This is a high-level convenience method that fetches an article page and extracts relevant data. It handles encoding detection, redirects, and provides a unified interface for all article data.
This method will be extended incrementally to include metadata extraction, content extraction, and more.
// Fetch an article and get its data
const article = await gatherArticle('https://example.com/article');
console.log(article.url); // Final URL after redirects
console.log(article.html); // Raw HTML content (UTF-8)
console.log(article.text); // Plain text (full page HTML converted)
console.log(article.content); // Cleaned article content (Readability + htmlToText)
console.log(article.title); // Article title (from Readability or metadata)
console.log(article.description); // Article excerpt or description
console.log(article.image); // Article keyvisual/image (from best source)
console.log(article.language); // Language code (ISO 639-1, e.g., 'en')
console.log(article.region); // Region code (ISO 3166-1 alpha-2, e.g., 'US')
console.log(article.internalLinks); // Array of internal link URLs
console.log(article.externalLinks); // Array of external link URLs
console.log(article.wordCount); // Word count (from content or text)
console.log(article.readingTime); // Estimated reading time in minutes
Gather article data from a URL in one convenient call.