Link a website as an AI knowledge source and Your Office AI crawls it, generates a summary, and indexes it so the assistant can cite it alongside your uploaded documents. It's the same retrieval pipeline as Knowledge — manage your sources from the Website context tab in Integrations or from Knowledge.
In Integrations, switch to the Website context tab (or add a website source from Knowledge).
Enter the site or page you want the AI to use as a source.
Choose how broadly to crawl and which representation to index. Sensible defaults are pre-filled.
Your Office AI crawls within the caps, generates the AI summary, and indexes the source for retrieval.
Whatever mode you pick, Your Office AI always generates a real AI summary of the source at ingest time. It's produced by a map-reduce over the crawled page chunks — each chunk is summarised, then the summaries are combined — not by naive truncation. That means even a large site distils into a faithful overview the AI can reason over.
The summary is generated every time. The RAG mode only decides what gets stored in the retrieval index — the summary, the full text, or both.
The RAG mode controls which representation of the crawled pages lands in the knowledge index used for retrieval:
| Mode | What's indexed | When to use it |
|---|---|---|
Summary (default)summary | Only the AI summary of the source — the smallest index. | Best when you want the gist of a site for grounding without indexing every page. |
Summary + fullsummaryPlusFull | Both the AI summary and the full page text. | Best when you want high-level grounding plus the ability to retrieve exact passages. |
Fullfull | Only the full page text — no summary document. | Best when you need precise, passage-level retrieval over everything crawled. |
Crawling is bounded by three caps you set per source. They're validated on save, and the source remembers them so you can re-display the breadth it was last crawled with:
| Cap | What it controls |
|---|---|
| Max pages | The most pages a single ingest or refresh will fetch from the site. |
| Max bytes per page | How much raw text is indexed per page when full text is used. |
| Crawl depth | How many link hops to follow from the registered URL. Depth 0 means don't crawl — index only the page you registered. |
Set crawl depth to 0 to index just the page you registered without following any links — handy for a single documentation page or article.
Once indexed, a website source behaves like any other knowledge:
#, the same way you attach a knowledge folder or document.Website sources share the Knowledge retrieval pipeline end to end. For document uploads, embedding models, and folder sharing, see Knowledge Base.