LlamaIndex-Agent
LlamaIndex-Agent is a live-fetch agent operated by LlamaIndex. It does not crawl the web on a schedule. It hits your site only when an end-user asks the underlying AI a question that requires fresh information from a specific page.
Traffic is bursty and unpredictable. A single trending topic can send hundreds of LlamaIndex-Agent requests in an hour, then nothing for days. Each request typically reads one or two pages, not your whole site.
Allowing LlamaIndex-Agent is how your content becomes part of LlamaIndex's answers. Blocking it means users asking that AI about your topic will be answered using someone else's content instead.
See LlamaIndex-Agent on your own site
Match the User-Agent header on incoming requests against the pattern below.
regex
For higher confidence, also verify the source IP against the operator's published ranges. UA strings can be spoofed; IP ownership is harder to fake.
Renders JavaScript
Sometimes
IP verification
User-Agent only
Crawl frequency
Burst, user-driven
Honors robots.txt
Yes
Honors Crawl-delay
Varies
Should I let LlamaIndex-Agent through?
In most cases, yes. Live-fetch agents drive citations inside AI answers. Allowing keeps your content in the conversation. If volume gets noisy, rate-limit it before you block it outright.
Does blocking LlamaIndex-Agent affect my Google rankings?
No. LlamaIndex-Agent fetches a page only when a user is actively asking LlamaIndex a question. It has nothing to do with how Google or Bing rank you. The cost of blocking is that LlamaIndex can't quote your content in its answer.
How do I confirm a request is really from LlamaIndex-Agent?
Look at the User-Agent header in your access logs and match it against the strings listed above. Worth knowing that the User-Agent is easy to fake, so this check tells you "the traffic claims to be LlamaIndex-Agent", not "the traffic is genuinely LlamaIndex-Agent". If you need stronger guarantees, look for a reverse-DNS check or wait for LlamaIndex to publish IP ranges.
Does a LlamaIndex-Agent visit count as a real user visit?
Sort of. There is a human asking LlamaIndex a question on the other end, but they never load your page in their own browser. They see whatever LlamaIndex quotes back, usually a snippet plus a citation link. Count it as upstream attention rather than as a session.
What's the cleanest way to control LlamaIndex-Agent?
Two layers. Robots.txt for the polite crawlers that read it, and rules at your CDN or edge for the ones that don't. Rankly's Agent Experience handles both from a single config, so you can allow, block, rate-limit, or serve a stripped-down version per bot. Agent Analytics handles the observation half so you know which bots are actually worth a rule.