How to Configure WordPress robots.txt: What to Allow and What to Block

The robots.txt file tells search engine crawlers and other bots which parts of your WordPress site they are allowed to access. A correctly configured robots.txt keeps crawlers focused on content you want indexed, prevents them from wasting crawl budget on admin pages and duplicate URLs, and gives you a place to block specific bots you do not want crawling your site. A misconfigured robots.txt can accidentally block Google from indexing your entire site.

How WordPress Handles robots.txt

WordPress generates a virtual robots.txt file automatically if no physical robots.txt file exists in your site’s root directory. The default WordPress robots.txt disallows access to /wp-admin/ (except for the admin-ajax.php file which some plugins need accessible). Everything else is allowed by default.

Yoast SEO adds a robots.txt editor under Yoast SEO, Tools, File Editor that lets you edit the robots.txt directly from WordPress admin without SFTP access. On WP Engine, you can also edit robots.txt directly via SFTP or SSH, or create a physical robots.txt file in your site root that overrides WordPress’s virtual one.

A Sensible WordPress robots.txt Configuration

Here is a well-configured robots.txt for a standard WordPress site:

User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Disallow: /wp-includes/ Disallow: /?s= Disallow: /search/ Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Disallow: /wp-login.php

Sitemap: https://yourdomain.com/sitemap_xml

Breaking this down:

/wp-admin/ — Disallow crawlers from the admin area, but allow admin-ajax.php which some front-end functionality depends on
/wp-includes/ — WordPress core files that add no SEO value if crawled
/?s= and /search/ — Search result pages that create duplicate content for crawlers
/cart/ and /checkout/ — WooCommerce dynamic pages with no indexable content
/my-account/ — Private user account pages
/wp-login.php — Login page has no indexable content and no reason to crawl
Sitemap: — Points crawlers to your XML sitemap

Common robots.txt Mistakes That Hurt SEO

Disallowing everything. The most catastrophic mistake: Disallow: / blocks all crawlers from all pages. This sometimes happens when a site is set to discourage search engines in WordPress Settings (Settings, Reading, Search Engine Visibility) and the setting is not reversed before launch. Always check robots.txt after launch on a new site.

Blocking CSS and JavaScript. Google needs to render your pages to assess them properly. Blocking /wp-content/ or /wp-includes/ in robots.txt prevents Google from loading your site’s styles and scripts, which can result in lower quality scores and rankings. Do not block these directories.

Blocking the staging site incompletely. Your WP Engine staging URL (staging.wpengine.com) is set to noindex by default. If you have a custom staging domain, ensure it has a robots.txt that disallows all crawlers, or add a noindex header, to prevent duplicate content issues.

Using robots.txt to hide sensitive content. Robots.txt is publicly accessible. Listing a private URL in robots.txt to disallow crawling does not hide it — it just tells crawlers not to index it. Anyone can read your robots.txt and see the disallowed URLs. Use authentication or server-level access control for genuinely private content.

Allowing Specific AI Crawlers

AI search platforms (OpenAI’s GPTBot, Anthropic’s ClaudeBot, Perplexity’s PerplexityBot) crawl the web to train models and power AI search responses. You can control their access via robots.txt. To allow all AI crawlers (the default if you do not add specific rules):

User-agent: GPTBot Allow: /


User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot Allow: /

To block a specific AI crawler:

User-agent: GPTBot Disallow: /

Whether to allow AI crawlers is a content strategy decision. Being crawled by AI search platforms increases the likelihood your content is cited in AI-generated search responses. For the broader AI search visibility topic, see Why WP Engine where this is covered in the context of the site’s overall search strategy.

Frequently Asked Questions

How do I edit robots.txt on WP Engine?

The simplest method is via Yoast SEO’s file editor (Yoast SEO, Tools, File Editor) from WordPress admin. Alternatively, access your site via SFTP or SSH and edit the robots.txt file in the root directory directly. You can also create a physical robots.txt file and upload it to the site root, which overrides WordPress’s virtual robots.txt.

Does robots.txt affect Google rankings?

Indirectly. Disallowing pages from crawlers means they cannot be indexed and will not rank. Blocking CSS and JavaScript can prevent Google from rendering your pages properly, which may affect how it assesses content quality. A well-configured robots.txt that focuses crawl budget on valuable pages and prevents indexing of duplicate or low-value URLs can have a small positive effect on overall site crawl efficiency.

What is the difference between robots.txt disallow and noindex?

Disallow in robots.txt tells crawlers not to visit a URL. Noindex (set via a meta tag or HTTP header) tells crawlers they can visit the URL but should not include it in search results. For most purposes, noindex is more reliable for preventing indexing because disallow can cause URLs to appear in search results without description text if they are linked from other pages Google can see.

Fast, Indexable, Secure WordPress Hosting

A well-configured robots.txt combined with WP Engine’s managed security infrastructure gives your site the best foundation for healthy crawling and strong search performance.

VIEW WP ENGINE PLANS