Every website needs a robots.txt file — yet it remains one of the most misunderstood files in SEO. Get it wrong and you could accidentally block Googlebot from crawling your entire site. Get it right and you can guide crawlers efficiently, protect sensitive areas, reduce server load, and even block AI training bots from scraping your content.

Free SEO Tools — Try These Next

What Is a robots.txt File?

A robots.txt file is a plain text file located at the root of your domain — always at yourdomain.com/robots.txt. It communicates with web crawlers (robots) using the Robots Exclusion Protocol, telling them which pages or directories they should or should not access.

Here is the simplest valid robots.txt:

User-agent: * Disallow:

This allows all bots to crawl everything on your site. The User-agent: * applies to all bots, and Disallow: with nothing after it means no paths are disallowed.

robots.txt Syntax Rules

Common robots.txt Example

# Allow all search engine bots User-agent: * Allow: / # Block admin and private areas Disallow: /admin/ Disallow: /private/ Disallow: /wp-admin/ # Block AI training bots User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / # Sitemap location Sitemap: https://example.com/sitemap.xml

Step-by-Step: Create Your robots.txt

1
Open the Robots.txt Generator
Go to webtoolsz.com/robots-txt-generator. The tool starts with a default User-agent block for all bots.
2
Add your rules
Click "Add User-agent Block" for each bot group. For each block, add Disallow or Allow rules for specific paths like /admin/, /checkout/, or /private/.
3
Set your sitemap URL
Paste your sitemap URL in the Sitemap field (e.g. https://yourdomain.com/sitemap.xml). This helps Google discover all your pages.
4
Download and upload
Click "Download" to save the file as robots.txt. Upload it to the root of your web server (the same folder as your index.html or index.php).
Pro Tip: After uploading, verify it is accessible at yourdomain.com/robots.txt in your browser. Then submit it in Google Search Console under Settings → robots.txt to confirm Google can read it.

Blocking AI Training Bots in robots.txt

Since 2023, several AI companies have released named bots that you can block to prevent your content from being used in AI training datasets. Add separate User-agent blocks for each:

User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CCBot Disallow: /

Note that these bots are required to respect robots.txt under their terms of service, but compliance cannot be technically enforced.

Generate Your robots.txt — Free

Visual builder. Block specific bots, set crawl-delay, add sitemap URL. Download in one click.

Open Robots.txt Generator

Frequently Asked Questions

What is a robots.txt file?

A robots.txt file is a plain text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells web crawlers which pages or sections they are allowed or not allowed to crawl. It uses the Robots Exclusion Protocol, an informal standard followed by all major search engines.

Does robots.txt block pages from Google's index?

No. Disallowing a URL in robots.txt stops Googlebot from crawling that URL, but it does not remove an already-indexed page from search results. To remove a page from Google's index, use a noindex meta tag or the URL Removal Tool in Google Search Console.

What is Crawl-delay in robots.txt?

Crawl-delay tells a bot how many seconds to wait between successive requests to your server. It reduces server load from aggressive crawlers. Note: Googlebot ignores Crawl-delay in robots.txt — use the crawl rate settings in Google Search Console instead.

Can I block AI training bots with robots.txt?

Yes. Several AI companies have named bots you can block: GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google AI), and CCBot (Common Crawl). Add a User-agent block for each with Disallow: / to opt out of AI training data collection.

Back to Blog  |  Privacy Policy