Magpie HTML - v0.1.3
    Preparing search index...

    Interface ContentExtractionOptions

    Options for content extraction.

    interface ContentExtractionOptions {
        baseUrl?: string;
        charThreshold?: number;
        maxElemsToParse?: number;
        keepClasses?: boolean;
        classesToPreserve?: string[];
        disableJSONLD?: boolean;
        checkReadability?: boolean;
        debug?: boolean;
    }
    Index

    Properties

    baseUrl?: string

    Base URL for resolving relative links and images. Highly recommended for proper link resolution.

    charThreshold?: number

    Minimum character count for article content. Articles shorter than this are considered too short.

    500
    
    maxElemsToParse?: number

    Maximum number of elements to parse. Set to 0 for no limit.

    0
    
    keepClasses?: boolean

    Whether to preserve CSS classes in extracted HTML.

    false
    
    classesToPreserve?: string[]

    CSS classes to preserve when keepClasses is false.

    disableJSONLD?: boolean

    Whether to skip JSON-LD parsing for metadata.

    false
    
    checkReadability?: boolean

    Check if content is probably readerable before extraction. If true and content is not readerable, returns early with failure.

    false
    
    debug?: boolean

    Enable debug logging.

    false