🌐Website context

Website context

Link a website as an AI knowledge source and Your Office AI crawls it, generates a summary, and indexes it so the assistant can cite it alongside your uploaded documents. It's the same retrieval pipeline as Knowledge — manage your sources from the Website context tab in Integrations or from Knowledge.

🔗Add a URLRegister the source

🕷️CrawlWithin your caps

📝AI summaryMap-reduce, always

🧠IndexPer RAG mode

💬Cited in chatGrounded answers

From a URL to cited answers — the website-context ingest pipeline.

Linking a website

Open Website context
In Integrations, switch to the Website context tab (or add a website source from Knowledge).
Paste the URL
Enter the site or page you want the AI to use as a source.
Set the crawl caps and RAG mode
Choose how broadly to crawl and which representation to index. Sensible defaults are pre-filled.
Ingest
Your Office AI crawls within the caps, generates the AI summary, and indexes the source for retrieval.

An AI summary, always

Whatever mode you pick, Your Office AI always generates a real AI summary of the source at ingest time. It's produced by a map-reduce over the crawled page chunks — each chunk is summarised, then the summaries are combined — not by naive truncation. That means even a large site distils into a faithful overview the AI can reason over.

ℹ️

Summary vs. index are separate

The summary is generated every time. The RAG mode only decides what gets stored in the retrieval index — the summary, the full text, or both.

Three RAG modes

The RAG mode controls which representation of the crawled pages lands in the knowledge index used for retrieval:

Mode	What's indexed	When to use it
Summary (default) `summary`	Only the AI summary of the source — the smallest index.	Best when you want the gist of a site for grounding without indexing every page.
Summary + full `summaryPlusFull`	Both the AI summary and the full page text.	Best when you want high-level grounding plus the ability to retrieve exact passages.
Full `full`	Only the full page text — no summary document.	Best when you need precise, passage-level retrieval over everything crawled.

Crawl caps

Crawling is bounded by three caps you set per source. They're validated on save, and the source remembers them so you can re-display the breadth it was last crawled with:

Cap	What it controls
Max pages	The most pages a single ingest or refresh will fetch from the site.
Max bytes per page	How much raw text is indexed per page when full text is used.
Crawl depth	How many link hops to follow from the registered URL, same-origin only. Crawl depth only matters once max pages allows more than one.

💡

Max pages 1 = single page

Set max pages to 1 to index just the page you registered without following any links — handy for a single documentation page or article. Crawl depth still applies, but a max-pages cap of 1 stops the crawl before it can follow anything.

Using a website source

Once indexed, a website source behaves like any other knowledge:

The assistant retrieves and cites the most relevant passages when answering, via pgvector semantic search.
Attach it as context in chat with #, the same way you attach a knowledge folder or document.
A re-ingest refreshes the crawl, regenerates the summary, and updates the index.

ℹ️

Part of Knowledge

Website sources share the Knowledge retrieval pipeline end to end. For document uploads, embedding models, and folder sharing, see Knowledge Base.

← PreviousCustom integrations Next →Organization Settings