How to Generate a robots.txt File — Control Bot Crawling Free

Every website needs a robots.txt file — yet it remains one of the most misunderstood files in SEO. Get it wrong and you could accidentally block Googlebot from crawling your entire site. Get it right and you can guide crawlers efficiently, protect sensitive areas, reduce server load, and even block AI training bots from scraping your content.

Free SEO Tools — Try These Next

Robots.txt Generator XML Sitemap Generator SERP Preview

What Is a robots.txt File?

A robots.txt file is a plain text file located at the root of your domain — always at yourdomain.com/robots.txt. It communicates with web crawlers (robots) using the Robots Exclusion Protocol, telling them which pages or directories they should or should not access.

Here is the simplest valid robots.txt:

User-agent: *
Disallow:

This allows all bots to crawl everything on your site. The User-agent: * applies to all bots, and Disallow: with nothing after it means no paths are disallowed.

robots.txt Syntax Rules

User-agent: Specifies which bot the rules apply to. Use * for all bots, or a specific bot name like Googlebot.
Disallow: The path bots should not crawl. Leave blank to disallow nothing. Use / to block everything.
Allow: Override a disallow for a specific path (useful for allowing a subdirectory inside a blocked directory).
Crawl-delay: Wait time in seconds between requests. Useful for preventing bot overload.
Sitemap: The full URL of your XML sitemap — helps bots discover it.

Common robots.txt Example

# Allow all search engine bots
User-agent: *
Allow: /

# Block admin and private areas
Disallow: /admin/
Disallow: /private/
Disallow: /wp-admin/

# Block AI training bots
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# Sitemap location
Sitemap: https://example.com/sitemap.xml

Step-by-Step: Create Your robots.txt

Open the Robots.txt Generator
Go to webtoolsz.com/robots-txt-generator. The tool starts with a default User-agent block for all bots.

Add your rules
Click "Add User-agent Block" for each bot group. For each block, add Disallow or Allow rules for specific paths like /admin/, /checkout/, or /private/.

Set your sitemap URL
Paste your sitemap URL in the Sitemap field (e.g. https://yourdomain.com/sitemap.xml). This helps Google discover all your pages.

Download and upload
Click "Download" to save the file as robots.txt. Upload it to the root of your web server (the same folder as your index.html or index.php).

Pro Tip: After uploading, verify it is accessible at yourdomain.com/robots.txt in your browser. Then submit it in Google Search Console under Settings → robots.txt to confirm Google can read it.

Blocking AI Training Bots in robots.txt

Since 2023, several AI companies have released named bots that you can block to prevent your content from being used in AI training datasets. Add separate User-agent blocks for each:

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

Note that these bots are required to respect robots.txt under their terms of service, but compliance cannot be technically enforced.

Generate Your robots.txt — Free

Visual builder. Block specific bots, set crawl-delay, add sitemap URL. Download in one click.

Open Robots.txt Generator

Frequently Asked Questions

What is a robots.txt file?

A robots.txt file is a plain text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells web crawlers which pages or sections they are allowed or not allowed to crawl. It uses the Robots Exclusion Protocol, an informal standard followed by all major search engines.

Does robots.txt block pages from Google's index?

No. Disallowing a URL in robots.txt stops Googlebot from crawling that URL, but it does not remove an already-indexed page from search results. To remove a page from Google's index, use a noindex meta tag or the URL Removal Tool in Google Search Console.

What is Crawl-delay in robots.txt?

Crawl-delay tells a bot how many seconds to wait between successive requests to your server. It reduces server load from aggressive crawlers. Note: Googlebot ignores Crawl-delay in robots.txt — use the crawl rate settings in Google Search Console instead.

Can I block AI training bots with robots.txt?

Yes. Several AI companies have named bots you can block: GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google AI), and CCBot (Common Crawl). Add a User-agent block for each with Disallow: / to opt out of AI training data collection.

How to Create an XML SitemapSEO

How to Preview Your SERP SnippetSEO

How to Add Schema MarkupSEO

Back to Blog | Privacy Policy

How to Generate a robots.txt File
Control Bot Crawling on Your Website

What Is a robots.txt File?

robots.txt Syntax Rules

Common robots.txt Example

Step-by-Step: Create Your robots.txt

Blocking AI Training Bots in robots.txt

Generate Your robots.txt — Free

Frequently Asked Questions

What is a robots.txt file?

Does robots.txt block pages from Google's index?

What is Crawl-delay in robots.txt?

Can I block AI training bots with robots.txt?

Related Articles

How to Generate a robots.txt FileControl Bot Crawling on Your Website

What Is a robots.txt File?

robots.txt Syntax Rules

Common robots.txt Example

Step-by-Step: Create Your robots.txt

Blocking AI Training Bots in robots.txt

Generate Your robots.txt — Free

Frequently Asked Questions

What is a robots.txt file?

Does robots.txt block pages from Google's index?

What is Crawl-delay in robots.txt?

Can I block AI training bots with robots.txt?

Related Articles

How to Generate a robots.txt File
Control Bot Crawling on Your Website