{"id":2021,"date":"2024-11-19T08:52:13","date_gmt":"2024-11-19T08:52:13","guid":{"rendered":"https:\/\/www.forestsoftware.co.uk\/blog\/?p=2021"},"modified":"2024-11-18T17:09:43","modified_gmt":"2024-11-18T17:09:43","slug":"how-to-use-robots-txt-for-small-business-websites-a-comprehensive-guide","status":"publish","type":"post","link":"https:\/\/www.forestsoftware.co.uk\/blog\/2024\/11\/how-to-use-robots-txt-for-small-business-websites-a-comprehensive-guide\/","title":{"rendered":"How to Use Robots.txt for Small Business Websites: A Comprehensive Guide"},"content":{"rendered":"<span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 4<\/span> <span class=\"rt-label rt-postfix\">minutes : <\/span><\/span><h1>How to Use Robots.txt for Small Business Websites: A Comprehensive Guide<\/h1>\n<p>If you run a small business website, chances are you&#8217;re always looking for ways to optimise your online presence and boost your search engine rankings. One tool that often gets overlooked is the humble <strong>robots.txt file<\/strong>. It\u2019s simple, powerful, and can help you manage how search engines interact with your website.<\/p>\n<p>This blog post will explain what robots.txt is, why it\u2019s important, and how small business owners can use it effectively. By the end of this guide, you\u2019ll understand how to create and implement a robots.txt file that works for your specific needs and also what some of the pitfalls can be.<\/p>\n<p><!--more--><\/p>\n<h2><strong>What is<\/strong> Robots.txt?<\/h2>\n<p>The robots.txt file is a simple text file placed in the root directory of your website (e.g., <code>www.yourwebsite.com\/robots.txt<\/code>). Its primary function is to provide instructions to web crawlers (also known as bots or spiders) on how to navigate your site. These crawlers are used by search engines like Google, Bing, and others to index your web pages for search results.<\/p>\n<p>The robots.txt file is part of the <strong><a href=\"https:\/\/developers.google.com\/search\/blog\/2019\/07\/rep-id\" target=\"_blank\" rel=\"noopener\">Robots Exclusion Protocol<\/a> (REP)<\/strong>, a set of web standards that help manage how bots interact with your site. While these instructions are not enforceable, most reputable crawlers, including Googlebot, will respect them.<\/p>\n<h2><strong>Why is<\/strong> Robots.txt<strong> Important?<\/strong><\/h2>\n<p>For small business websites, the robots.txt file offers several key benefits:<\/p>\n<ol>\n<li><strong>Control Over Crawling<\/strong><br \/>\nSearch engines may crawl and index parts of your site you don\u2019t want to appear in search results, such as admin pages, duplicate content, or temporary files. Robots.txt lets you block these areas from being crawled.<\/li>\n<li><strong>Optimising Crawl Budget<\/strong><br \/>\nSearch engines allocate a finite &#8220;crawl budget&#8221; for your site, which is the number of pages they\u2019ll crawl during a session. By preventing bots from wasting time on irrelevant pages, you ensure they focus on your important content.<\/li>\n<li><strong>Preventing Overload<\/strong><br \/>\nExcessive crawling by bots can strain your website server, particularly for smaller sites with limited hosting resources. Robots.txt can help you limit bot activity.<\/li>\n<li><strong>Safeguarding Sensitive Information<\/strong><br \/>\nAlthough sensitive information should never be publicly accessible, robots.txt can act as an additional layer of protection by directing bots away from such areas.<\/li>\n<\/ol>\n<h2>How<strong> to Create a Robots.txt File<\/strong><\/h2>\n<p>Creating a robots.txt (the name should be in lowercase) file is relatively straightforward. Here\u2019s how to do it step-by-step:<\/p>\n<h4>1. <strong>Open a Text Editor<\/strong><\/h4>\n<p>You can use any plain text editor, such as Notepad (Windows), TextEdit (Mac), or an online tool like Google Docs.<\/p>\n<h4>2. <strong>Write the Rules<\/strong><\/h4>\n<p>The basic syntax for robots.txt consists of directives and user-agents. Here\u2019s a breakdown:<\/p>\n<ul>\n<li><strong>User-agent<\/strong>: Specifies the bot you\u2019re addressing (e.g., <code>User-agent: Googlebot<\/code>). Use <code>*<\/code> to apply the rule to all bots.<\/li>\n<li><strong>Disallow<\/strong>: Blocks bots from accessing specific pages or directories.<\/li>\n<li><strong>Allow<\/strong>: Overrides a disallow rule for specific subdirectories or files.<\/li>\n<li><strong>Sitemap<\/strong>: Points bots to your XML sitemap.<\/li>\n<\/ul>\n<p>Example:<\/p>\n<div class=\"contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950\">\n<div class=\"sticky top-9 md:top-[5.75rem]\">\n<div class=\"absolute bottom-0 right-2 flex h-9 items-center\">\n<div class=\"flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary\">\n<p>Sitemap: https:\/\/www.yourwebsite.com\/sitemap.xml<\/p>\n<p>User-Agent: *<br \/>\nDisallow: \/directory\/<br \/>\nDisallow: \/demo\/<br \/>\nDisallow: \/*article$ ## block pages such as \/xxx\/yyy\/zzz\/article or \/xxx\/article but not ones at lower levels such as \/www\/article\/post<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h4>3. <strong>Save the File<\/strong><\/h4>\n<p>Save the file as <code>robots.txt<\/code> in <strong>plain text format<\/strong>. Ensure the name is all lowercase, as it is case-sensitive.<\/p>\n<h4>4. <strong>Upload to Your Website<\/strong><\/h4>\n<p>Use an FTP client or your hosting provider\u2019s file manager to upload the robots.txt file to the root directory of your site. It should be accessible at <code>www.yourwebsite.com\/robots.txt<\/code> or, if your site <a href=\"https:\/\/www.forestsoftware.co.uk\/blog\/2024\/09\/why-www-and-non-www-website-addresses-are-not-the-same\/\">doesn&#8217;t use a www address<\/a> at the yourwebsite.com\/robots.txt address.<\/p>\n<h2>Common<strong> Use Cases for Small Businesses<\/strong><\/h2>\n<p>Here are some practical scenarios where small business websites can benefit from using robots.txt:<\/p>\n<h3>1. <strong>Blocking<\/strong> Internal<strong> Pages<\/strong><\/h3>\n<p>Prevent search engines from crawling internal or admin pages that hold no value for users or search rankings.<br \/>\nExample:<\/p>\n<div class=\"contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950\">\n<div class=\"flex items-center text-token-text-secondary px-4 py-2 text-xs font-sans justify-between rounded-t-md h-9 bg-token-sidebar-surface-primary dark:bg-token-main-surface-secondary select-none\"><\/div>\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-javascript\"><span class=\"hljs-title class_\">User<\/span>-<span class=\"hljs-attr\">agent<\/span>: *<br \/>\n<\/code><\/div>\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-javascript\"><span class=\"hljs-title class_\">Disallow<\/span>: <span class=\"hljs-regexp\">\/admin\/<\/span><br \/>\n<\/code><\/div>\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-javascript\"><span class=\"hljs-title class_\">Disallow<\/span>: <span class=\"hljs-regexp\">\/login\/<\/span><\/code><\/div>\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-javascript\"><span class=\"hljs-regexp\">Disallow: \/%page$ ## block all addresses that end in page<br \/>\n<\/span><br \/>\n<\/code><\/div>\n<div dir=\"ltr\"><\/div>\n<\/div>\n<h3>2. <strong>Hiding<\/strong> Duplicate<strong> Content<\/strong><\/h3>\n<p>If you have duplicate content due to pagination or session IDs, use robots.txt to block these pages.<br \/>\nExample:<\/p>\n<div class=\"contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950\">\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-javascript\"><span class=\"hljs-title class_\">User<\/span>-<span class=\"hljs-attr\">agent<\/span>: *<br \/>\n<\/code><\/div>\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-javascript\"><span class=\"hljs-title class_\">Disallow<\/span>: <span class=\"hljs-regexp\">\/category\/<\/span>page\/<br \/>\n<\/code><\/div>\n<\/div>\n<h3>3. Allowing<strong> Specific Files<\/strong><\/h3>\n<p>If you block a directory but want bots to crawl specific files within it, use the <code>Allow<\/code> directive.<br \/>\nExample:<\/p>\n<div class=\"contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950\">\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-javascript\"><span class=\"hljs-title class_\">User<\/span>-<span class=\"hljs-attr\">agent<\/span>: *<br \/>\n<\/code><\/div>\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-javascript\"><span class=\"hljs-title class_\">Disallow<\/span>: <span class=\"hljs-regexp\">\/images\/<\/span><br \/>\n<\/code><\/div>\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-javascript\"><span class=\"hljs-title class_\">Allow<\/span>: <span class=\"hljs-regexp\">\/images\/<\/span>logo.<span class=\"hljs-property\">png<\/span><br \/>\n<\/code><\/div>\n<\/div>\n<h3>4. <strong>Pointing to<\/strong> Your<strong> Sitemap<\/strong><\/h3>\n<p>Including the location of your XML sitemap helps bots discover your site\u2019s structure.<br \/>\nExample:<\/p>\n<div class=\"contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950\">\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-arduino\">Sitemap: https:<span class=\"hljs-comment\">\/\/www.yourwebsite.com\/sitemap.xml<\/span><br \/>\n<\/code><\/div>\n<\/div>\n<h4>5. <strong>Managing Crawl Frequency<\/strong><\/h4>\n<p>To prevent overloading your server, use the <code>Crawl-delay<\/code> directive (though not all bots respect it) to request the interval between visits (in seconds).<br \/>\nExample:<\/p>\n<div class=\"contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950\">\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-makefile\"><span class=\"hljs-section\">User-agent: Bingbot<\/span><br \/>\n<\/code><\/div>\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\"><code class=\"!whitespace-pre hljs language-makefile\"><span class=\"hljs-section\">Crawl-delay: 10<\/span><br \/>\n<\/code><\/div>\n<\/div>\n<h2>Best<strong> Practices for Robots.txt<\/strong><\/h2>\n<p>While robots.txt is a powerful tool, it must be used carefully. Here are some best practices to follow:<\/p>\n<ol>\n<li><strong>Don\u2019t Rely on Robots.txt for Security<\/strong><br \/>\nRobots.txt does not enforce access restrictions. Sensitive data should be protected with server-side measures like passwords or encryption.\u00a0 Also bear in mind that some bad bots may use the robots.txt file as a hint of places to look for &#8220;interesting&#8221; information.<\/li>\n<li><strong>Test Your File<\/strong><br \/>\nUse tools like Google Search Console\u2019s Robots Testing Tool to ensure your robots.txt file is correctly configured and won\u2019t block essential pages.<\/li>\n<li><strong>Avoid Blocking CSS and JavaScript<\/strong><br \/>\nSearch engines need to crawl your CSS and JavaScript files to render your site properly. Blocking them can harm your rankings.<\/li>\n<li><strong>Keep It Simple<\/strong><br \/>\nAvoid overly complex rules. A clear, concise robots.txt file is easier to manage and less prone to errors.<\/li>\n<li><strong>Monitor Changes<\/strong><br \/>\nRegularly review your robots.txt file, especially after redesigning your site or adding new content.<\/li>\n<li><strong> Comment the file<\/strong><strong><br \/>\n<\/strong>You can add comments to the file by using a # (although I tend to use ## to make it stand out), anything after the sign will be ignored until the end of the line.<\/li>\n<\/ol>\n<h2><strong>Mistakes to<\/strong> Avoid<\/h2>\n<ul>\n<li><strong>Accidentally Blocking Important Pages<\/strong><br \/>\nDouble-check your file to ensure critical pages (e.g., homepage, product pages) are not blocked.<\/li>\n<li><strong>Forgetting to Update<\/strong><br \/>\nAs your site evolves, your robots.txt file should reflect those changes.<\/li>\n<li><strong>Using Wildcards Incorrectly<\/strong><br \/>\nBe cautious when using <code>*<\/code> and <code>$<\/code> to define patterns, as a misconfiguration could block more than intended.<\/li>\n<\/ul>\n<h2><strong>How to Check Your Robots.txt File<\/strong><\/h2>\n<p>Once your robots.txt file is live, verify its functionality by accessing it directly in your browser (e.g., <code>www.yourwebsite.com\/robots.txt<\/code>). You can also use Google Search Console to see how Google interprets the file and identify any issues.<\/p>\n<h2><strong>Conclusion<\/strong><\/h2>\n<p>For small business websites, a well-configured robots.txt file is an invaluable tool. It helps optimise your crawl budget, prevent the indexing of unwanted content, and improve your site\u2019s overall search engine performance. By understanding how to use robots.txt effectively, you\u2019ll gain greater control over your website\u2019s interaction with search engines.<\/p>\n<p>Take the time to review your site\u2019s structure, decide what should and shouldn\u2019t be crawled, and create a robots.txt file that aligns with your goals. With this simple but powerful file in place, you\u2019ll be one step closer to a more streamlined and optimised online presence.<\/p>\n<p>If you\u2019ve not yet created a robots.txt file, why wait? Start today and take control of how your website is seen by the search engines.<\/p>\n","protected":false},"excerpt":{"rendered":"<p><span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 4<\/span> <span class=\"rt-label rt-postfix\">minutes : <\/span><\/span>How to Use Robots.txt for Small Business Websites: A Comprehensive Guide If you run a small business website, chances are you&#8217;re always looking for ways to optimise your online presence and boost your search engine rankings. One tool that often gets overlooked is the humble robots.txt file. It\u2019s simple, powerful, and can help you manage [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,5,3],"tags":[],"class_list":["post-2021","post","type-post","status-publish","format-standard","hentry","category-business-advice","category-computers","category-seo"],"_links":{"self":[{"href":"https:\/\/www.forestsoftware.co.uk\/blog\/wp-json\/wp\/v2\/posts\/2021","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.forestsoftware.co.uk\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.forestsoftware.co.uk\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.forestsoftware.co.uk\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.forestsoftware.co.uk\/blog\/wp-json\/wp\/v2\/comments?post=2021"}],"version-history":[{"count":0,"href":"https:\/\/www.forestsoftware.co.uk\/blog\/wp-json\/wp\/v2\/posts\/2021\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.forestsoftware.co.uk\/blog\/wp-json\/wp\/v2\/media?parent=2021"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.forestsoftware.co.uk\/blog\/wp-json\/wp\/v2\/categories?post=2021"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.forestsoftware.co.uk\/blog\/wp-json\/wp\/v2\/tags?post=2021"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}