Interface representing the parameters for initializing a SitemapLoader. SitemapLoaderParams

Hierarchy (view full)

Implements

Constructors

Properties

allowUrlPatterns: undefined | (string | RegExp)[]
caller: AsyncCaller
chunkSize: number

The size to chunk the sitemap URLs into for scraping.

{300}
timeout: number

The timeout in milliseconds for the fetch request. Defaults to 10s.

webPath: string
headers?: HeadersInit

The headers to use in the fetch request.

selector?: SelectorType

The selector to use to extract the text from the document. Defaults to "body".

textDecoder?: TextDecoder

The text decoder to use to decode the response. Defaults to UTF-8.

Methods

  • A static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.

    Returns Promise<{
        load: ((content:
            | string
            | AnyNode
            | AnyNode[]
            | Buffer, options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
    }>

    A Promise that resolves to an object containing the load function from the Cheerio library.

  • Fetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.

    Parameters

    • urls: string[]

      An array of URLs to fetch and load.

    • caller: AsyncCaller
    • timeout: undefined | number
    • OptionaltextDecoder: TextDecoder
    • Optionaloptions: CheerioOptions & {
          headers?: HeadersInit;
      }

    Returns Promise<CheerioAPI[]>

    A Promise that resolves to an array of CheerioAPI instances.