Magpie HTML - v0.1.3
    Preparing search index...

    Function extractLinks

    • Extract links from parsed HTML document.

      Parameters

      • doc: Document

        Parsed HTML document

      • OptionalbaseUrl: string | URL | null

        Base URL for resolving relative links and determining internal/external

      • options: LinksExtractionOptions = {}

        Extraction options for filtering and categorization

      Returns LinksMetadata

      Links metadata with categorized links

      Extracts all <a href> links with comprehensive metadata and filtering options. Perfect for crawlers, SEO analysis, and link discovery.

      Features:

      • Internal/external link categorization
      • Rel attribute filtering (nofollow, ugc, sponsored, etc.)
      • Automatic URL normalization
      • Hash link filtering
      • Scheme filtering (only http/https)
      • Deduplication
      • Link text extraction
      const doc = parseHTML(htmlString);
      const links = extractLinks(doc, 'https://example.com');

      // Get all internal links (same origin)
      console.log(links.internal);

      // Get external links excluding nofollow
      const linksNoFollow = extractLinks(doc, 'https://example.com', {
      scope: 'external',
      excludeRel: ['nofollow']
      });
      // Crawler use case - get follow-able links
      const links = extractLinks(doc, baseUrl, {
      excludeRel: ['nofollow', 'ugc', 'sponsored'],
      includeHashLinks: false
      });