Google-Extended
AI TrainingVerify Google-Extended IP Address
Verify if an IP address truly belongs to Google, using official verification methods. Enter both IP address and User-Agent from your logs for the most accurate bot verification.
Google-Extended is a special user-agent that allows website owners to control whether their publicly accessible content can be used to train and improve Google’s AI models, including products like Gemini. It does not crawl the web itself; instead, it serves as a policy signal interpreted by Google’s AI systems. Site owners can allow or block AI training access by configuring robots.txt rules for Google-Extended. Blocking this agent does not affect Google Search ranking, crawling, or indexing. Its purpose is purely governance-giving publishers a transparent way to manage how their content contributes to Google’s AI research and model development.
User Agent Examples
Google does not identify any specific user-agent pattern in HTTP requests for Google-Extended bot!Robots.txt Configuration for Google-Extended
Google-ExtendedUse this identifier in your robots.txt User-agent directive to target Google-Extended.
Recommended Configuration
Our recommended robots.txt configuration for Google-Extended:
User-agent: Google-Extended
Allow: /Completely Block Google-Extended
Prevent this bot from crawling your entire site:
User-agent: Google-Extended
Disallow: /Completely Allow Google-Extended
Allow this bot to crawl your entire site:
User-agent: Google-Extended
Allow: /Block Specific Paths
Block this bot from specific directories or pages:
User-agent: Google-Extended
Disallow: /private/
Disallow: /admin/
Disallow: /api/Allow Only Specific Paths
Block everything but allow specific directories:
User-agent: Google-Extended
Disallow: /
Allow: /public/
Allow: /blog/Set Crawl Delay
Limit how frequently Google-Extended can request pages (in seconds):
User-agent: Google-Extended
Allow: /
Crawl-delay: 10Note: This bot does not officially mention about honoring Crawl-Delay rule.
Frequently Asked Questions
- What is Google-Extended?
- Google-Extended is a control-oriented user-agent defined by Google that allows website owners to manage whether their content can be used for AI model training. It does not function as a traditional crawler and does not actively send requests to websites. Instead, it acts as a policy signal interpreted when Google systems process robots.txt rules. You will not typically see Google-Extended generating traffic in website logs.
- Is Google-Extended a legitimate bot, or is it commonly spoofed?
- Google-Extended is an official Google-defined user-agent, but it is not a crawling bot and therefore not typically seen in server requests. As a result, spoofing is less relevant compared to active crawlers, though malicious actors could still misuse the name in headers. Since it does not generate traffic, any requests claiming to be Google-Extended should be treated with skepticism. As always, user-agent strings alone cannot verify authenticity.
- How can I verify that a request is really coming from Google-Extended?
- There is generally nothing to verify because Google-Extended does not make HTTP requests to websites. If you see traffic using this user-agent in logs, it is likely not legitimate. Standard verification methods (reverse DNS, forward DNS, IP range checks) apply only to actual Google crawlers. In this case, user-agent-based detection is not applicable because the agent functions as a robots.txt policy identifier, not a network client.
- Should I allow or block Google-Extended on my website?
- Allowing or blocking Google-Extended is a policy decision rather than a traffic control decision. It determines whether your publicly accessible content may be used for Google’s AI model training. Blocking may be appropriate if: - You do not want your content used in AI training datasets - Your content is proprietary or sensitive - You require stricter content usage controls Allowing it enables participation in Google’s AI ecosystem but is entirely optional.
- How can I control or block Google-Extended using robots.txt or other methods?
- Google-Extended is controlled exclusively via robots.txt directives. You can add a rule in your robots.txt, as given above to signal no AI training usage of your content. No WAF, rate limiting, or IP blocking is needed since it does not generate server requests.
- How often does Google-Extended crawl websites, and can it impact server performance?
- Google-Extended does not crawl websites and does not generate HTTP requests. It has no crawl frequency, bandwidth usage, or request rate. As a result, it has zero impact on server performance. Any perceived traffic under this name is not from the legitimate Google-Extended agent.
- What happens if I block Google-Extended? SEO, visibility, and feature impact explained.
- Blocking Google-Extended does not affect search indexing, rankings, or standard crawling by Google Search. Its impact is limited to AI-related usage i.e., your content will not be used for training or improving Google AI models. This is strictly a content usage control, not an SEO setting.
- Does Google-Extended collect, scrape, or use my content for training or reuse?
- Google-Extended itself does not collect or scrape content. Instead, it defines whether Google systems are permitted to use publicly accessible content for AI training and model improvement. If allowed, content may be included in training datasets; if blocked, it is excluded. It does not store or process content independently, and it does not function as a crawler or indexing system.