Glean-Bot

01 /Overview

Glean-Bot indexes web pages for an AI-powered search product operated by Glean. Unlike a pure training crawler, AI search crawlers are designed to drive users back to the original source via citations and links.

The crawl pattern looks similar to a traditional search engine: regular, broad, and bounded by your robots.txt directives. The difference is that ranking is done by an LLM, not a classic ranking algorithm.

Allowing Glean-Bot is generally how your site stays discoverable inside AI answer engines. The traffic it sends back is small but high-intent: users who clicked a citation usually wanted exactly what you wrote.

See Glean-Bot on your own site

02 /Identification

Match the User-Agent header on incoming requests against the pattern below.

regex

Glean-Bot

For higher confidence, also verify the source IP against the operator's published ranges. UA strings can be spoofed; IP ownership is harder to fake.

03 /Control

The polite way is a robots.txt rule. Compliant agents respect it; the others ignore it.

robots.txt

User-agent: Glean-Bot Disallow: /

Test a URL

Paste any URL on your site and we'll fetch its robots.txt to check whether Glean-Bot is allowed.

04 /Technical fingerprint

Renders JavaScript

IP verification

User-Agent only

Crawl frequency

Continuous

Honors robots.txt

Yes

Honors Crawl-delay

Yes

05 /Expected behavior

Expect regular, predictable indexing similar to a traditional search engine. Glean-Bot respects robots.txt and rate limits, and is the channel through which your content becomes citable inside AI answers.

Common questions

Should I let Glean-Bot through?

In most cases, yes. AI search crawlers cite and link back. Allowing is how your content becomes discoverable inside AI answers. If volume gets noisy, rate-limit it before you block it outright.

Does blocking Glean-Bot affect my Google rankings?

No. Glean-Bot feeds Glean's AI answer engine, which is a separate distribution channel from classical search. Blocking it removes you from citations inside Glean's product, but Google and Bing keep ranking you the same.

How do I confirm a request is really from Glean-Bot?

Look at the User-Agent header in your access logs and match it against the strings listed above. Worth knowing that the User-Agent is easy to fake, so this check tells you "the traffic claims to be Glean-Bot", not "the traffic is genuinely Glean-Bot". If you need stronger guarantees, look for a reverse-DNS check or wait for Glean to publish IP ranges.

How is Glean-Bot different from Googlebot?

Both crawl the web, but they feed completely different surfaces. Googlebot powers Google Search, where you compete for ten blue links. Glean-Bot powers Glean's AI answer engine, where you compete for one of a handful of citations in a written-out paragraph. The crawl mechanics are similar, the consumption pattern is not.

What's the cleanest way to control Glean-Bot?

Two layers. Robots.txt for the polite crawlers that read it, and rules at your CDN or edge for the ones that don't. Rankly's Agent Experience handles both from a single config, so you can allow, block, rate-limit, or serve a stripped-down version per bot. Agent Analytics handles the observation half so you know which bots are actually worth a rule.

Agent Directory

Glean-Bot

01 /Overview

See Glean-Bot on your own site

02 /Identification

Match the User-Agent header on incoming requests against the pattern below.

regex

Glean-Bot

For higher confidence, also verify the source IP against the operator's published ranges. UA strings can be spoofed; IP ownership is harder to fake.

03 /Control

The polite way is a robots.txt rule. Compliant agents respect it; the others ignore it.

robots.txt

User-agent: Glean-Bot Disallow: /

Test a URL

Paste any URL on your site and we'll fetch its robots.txt to check whether Glean-Bot is allowed.

04 /Technical fingerprint

Renders JavaScript

IP verification

User-Agent only

Crawl frequency

Continuous

Honors robots.txt

Yes

Honors Crawl-delay

Yes

05 /Expected behavior

Common questions

Should I let Glean-Bot through?

In most cases, yes. AI search crawlers cite and link back. Allowing is how your content becomes discoverable inside AI answers. If volume gets noisy, rate-limit it before you block it outright.