robots.txt

robots.txt is a text file that tells search engine crawlers which pages or sections of a website they can access. It is used to prevent certain pages from being indexed.

robots.txt is a text file located at yourdomain.com/robots.txt that tells search engine crawlers which pages or sections of your website they can access and index. It's the first file crawlers check when visiting your site. Common directives include "User-agent" (which crawlers), "Disallow" (what not to crawl), "Allow" (what to crawl), and "Sitemap" (location of XML sitemap). The file is purely advisory—malicious crawlers can ignore it. Common uses include preventing indexing of admin pages, limiting crawl budget on large sites, directing crawlers to XML sitemaps, and blocking duplicate content.

robots.txt robots crawler search engine

Related Terms

XML Sitemap

← Back to Glossary Read Related Articles →