Magpie HTML - v0.1.3
    Preparing search index...

    Function extractContent

    • Extract article content from HTML.

      Parameters

      Returns ContentResult

      Extraction result (success or failure)

      Uses Mozilla Readability to extract clean article content from a pre-parsed Document. This function never throws exceptions - always returns a ContentResult.

      Error handling:

      • Returns success: false for any extraction failure
      • Categorizes errors by type for better handling
      • Includes extraction time even for failures
      import { parseHTML } from '../utils/html-parser.js';
      import { extractSEO } from '../metadata/index.js';

      const doc = parseHTML(html);
      const metadata = extractSEO(doc);
      const content = extractContent(doc, {
      baseUrl: 'https://example.com/article',
      charThreshold: 300,
      checkReadability: true,
      });

      if (content.success) {
      console.log(content.title);
      console.log(content.wordCount);
      console.log(`${content.readingTime} min read`);
      } else {
      console.error(content.error);
      }