🌐Website context

Website context

Link a website as an AI knowledge source and Your Office AI crawls it, generates a summary, and indexes it so the assistant can cite it alongside your uploaded documents. It's the same retrieval pipeline as Knowledge — manage your sources from the Website context tab in Integrations or from Knowledge.

🔗Add a URLRegister the source
🕷️CrawlWithin your caps
📝AI summaryMap-reduce, always
🧠IndexPer RAG mode
💬Cited in chatGrounded answers
From a URL to cited answers — the website-context ingest pipeline.

Linking a website

  1. Open Website context

    In Integrations, switch to the Website context tab (or add a website source from Knowledge).

  2. Paste the URL

    Enter the site or page you want the AI to use as a source.

  3. Set the crawl caps and RAG mode

    Choose how broadly to crawl and which representation to index. Sensible defaults are pre-filled.

  4. Ingest

    Your Office AI crawls within the caps, generates the AI summary, and indexes the source for retrieval.

An AI summary, always

Whatever mode you pick, Your Office AI always generates a real AI summary of the source at ingest time. It's produced by a map-reduce over the crawled page chunks — each chunk is summarised, then the summaries are combined — not by naive truncation. That means even a large site distils into a faithful overview the AI can reason over.

ℹ️
Summary vs. index are separate

The summary is generated every time. The RAG mode only decides what gets stored in the retrieval index — the summary, the full text, or both.

Three RAG modes

The RAG mode controls which representation of the crawled pages lands in the knowledge index used for retrieval:

ModeWhat's indexedWhen to use it
Summary (default)
summary
Only the AI summary of the source — the smallest index.Best when you want the gist of a site for grounding without indexing every page.
Summary + full
summaryPlusFull
Both the AI summary and the full page text.Best when you want high-level grounding plus the ability to retrieve exact passages.
Full
full
Only the full page text — no summary document.Best when you need precise, passage-level retrieval over everything crawled.

Crawl caps

Crawling is bounded by three caps you set per source. They're validated on save, and the source remembers them so you can re-display the breadth it was last crawled with:

CapWhat it controls
Max pagesThe most pages a single ingest or refresh will fetch from the site.
Max bytes per pageHow much raw text is indexed per page when full text is used.
Crawl depthHow many link hops to follow from the registered URL. Depth 0 means don't crawl — index only the page you registered.
💡
Depth 0 = single page

Set crawl depth to 0 to index just the page you registered without following any links — handy for a single documentation page or article.

Using a website source

Once indexed, a website source behaves like any other knowledge:

  • The assistant retrieves and cites the most relevant passages when answering, via pgvector semantic search.
  • Attach it as context in chat with #, the same way you attach a knowledge folder or document.
  • A re-ingest refreshes the crawl, regenerates the summary, and updates the index.
ℹ️
Part of Knowledge

Website sources share the Knowledge retrieval pipeline end to end. For document uploads, embedding models, and folder sharing, see Knowledge Base.