Forest Software

Web, SEO and IT & Business Advice for the Smaller Business

Reading Time: 4 minutes

How to Use Robots.txt for Small Business Websites: A Comprehensive Guide

If you run a small business website, chances are you’re always looking for ways to optimise your online presence and boost your search engine rankings. One tool that often gets overlooked is the humble robots.txt file. It’s simple, powerful, and can help you manage how search engines interact with your website.

This blog post will explain what robots.txt is, why it’s important, and how small business owners can use it effectively. By the end of this guide, you’ll understand how to create and implement a robots.txt file that works for your specific needs and also what some of the pitfalls can be.

What is Robots.txt?

The robots.txt file is a simple text file placed in the root directory of your website (e.g., www.yourwebsite.com/robots.txt). Its primary function is to provide instructions to web crawlers (also known as bots or spiders) on how to navigate your site. These crawlers are used by search engines like Google, Bing, and others to index your web pages for search results.

The robots.txt file is part of the Robots Exclusion Protocol (REP), a set of web standards that help manage how bots interact with your site. While these instructions are not enforceable, most reputable crawlers, including Googlebot, will respect them.

Why is Robots.txt Important?

For small business websites, the robots.txt file offers several key benefits:

  1. Control Over Crawling
    Search engines may crawl and index parts of your site you don’t want to appear in search results, such as admin pages, duplicate content, or temporary files. Robots.txt lets you block these areas from being crawled.
  2. Optimising Crawl Budget
    Search engines allocate a finite “crawl budget” for your site, which is the number of pages they’ll crawl during a session. By preventing bots from wasting time on irrelevant pages, you ensure they focus on your important content.
  3. Preventing Overload
    Excessive crawling by bots can strain your website server, particularly for smaller sites with limited hosting resources. Robots.txt can help you limit bot activity.
  4. Safeguarding Sensitive Information
    Although sensitive information should never be publicly accessible, robots.txt can act as an additional layer of protection by directing bots away from such areas.

How to Create a Robots.txt File

Creating a robots.txt (the name should be in lowercase) file is relatively straightforward. Here’s how to do it step-by-step:

1. Open a Text Editor

You can use any plain text editor, such as Notepad (Windows), TextEdit (Mac), or an online tool like Google Docs.

2. Write the Rules

The basic syntax for robots.txt consists of directives and user-agents. Here’s a breakdown:

  • User-agent: Specifies the bot you’re addressing (e.g., User-agent: Googlebot). Use * to apply the rule to all bots.
  • Disallow: Blocks bots from accessing specific pages or directories.
  • Allow: Overrides a disallow rule for specific subdirectories or files.
  • Sitemap: Points bots to your XML sitemap.

Example:

Sitemap: https://www.yourwebsite.com/sitemap.xml

User-Agent: *
Disallow: /directory/
Disallow: /demo/
Disallow: /*article$ ## block pages such as /xxx/yyy/zzz/article or /xxx/article but not ones at lower levels such as /www/article/post

3. Save the File

Save the file as robots.txt in plain text format. Ensure the name is all lowercase, as it is case-sensitive.

4. Upload to Your Website

Use an FTP client or your hosting provider’s file manager to upload the robots.txt file to the root directory of your site. It should be accessible at www.yourwebsite.com/robots.txt or, if your site doesn’t use a www address at the yourwebsite.com/robots.txt address.

Common Use Cases for Small Businesses

Here are some practical scenarios where small business websites can benefit from using robots.txt:

1. Blocking Internal Pages

Prevent search engines from crawling internal or admin pages that hold no value for users or search rankings.
Example:

User-agent: *
Disallow: /admin/
Disallow: /login/
Disallow: /%page$ ## block all addresses that end in page

2. Hiding Duplicate Content

If you have duplicate content due to pagination or session IDs, use robots.txt to block these pages.
Example:

User-agent: *
Disallow: /category/page/

3. Allowing Specific Files

If you block a directory but want bots to crawl specific files within it, use the Allow directive.
Example:

User-agent: *
Disallow: /images/
Allow: /images/logo.png

4. Pointing to Your Sitemap

Including the location of your XML sitemap helps bots discover your site’s structure.
Example:

Sitemap: https://www.yourwebsite.com/sitemap.xml

5. Managing Crawl Frequency

To prevent overloading your server, use the Crawl-delay directive (though not all bots respect it) to request the interval between visits (in seconds).
Example:

User-agent: Bingbot
Crawl-delay: 10

Best Practices for Robots.txt

While robots.txt is a powerful tool, it must be used carefully. Here are some best practices to follow:

  1. Don’t Rely on Robots.txt for Security
    Robots.txt does not enforce access restrictions. Sensitive data should be protected with server-side measures like passwords or encryption.  Also bear in mind that some bad bots may use the robots.txt file as a hint of places to look for “interesting” information.
  2. Test Your File
    Use tools like Google Search Console’s Robots Testing Tool to ensure your robots.txt file is correctly configured and won’t block essential pages.
  3. Avoid Blocking CSS and JavaScript
    Search engines need to crawl your CSS and JavaScript files to render your site properly. Blocking them can harm your rankings.
  4. Keep It Simple
    Avoid overly complex rules. A clear, concise robots.txt file is easier to manage and less prone to errors.
  5. Monitor Changes
    Regularly review your robots.txt file, especially after redesigning your site or adding new content.
  6. Comment the file
    You can add comments to the file by using a # (although I tend to use ## to make it stand out), anything after the sign will be ignored until the end of the line.

Mistakes to Avoid

  • Accidentally Blocking Important Pages
    Double-check your file to ensure critical pages (e.g., homepage, product pages) are not blocked.
  • Forgetting to Update
    As your site evolves, your robots.txt file should reflect those changes.
  • Using Wildcards Incorrectly
    Be cautious when using * and $ to define patterns, as a misconfiguration could block more than intended.

How to Check Your Robots.txt File

Once your robots.txt file is live, verify its functionality by accessing it directly in your browser (e.g., www.yourwebsite.com/robots.txt). You can also use Google Search Console to see how Google interprets the file and identify any issues.

Conclusion

For small business websites, a well-configured robots.txt file is an invaluable tool. It helps optimise your crawl budget, prevent the indexing of unwanted content, and improve your site’s overall search engine performance. By understanding how to use robots.txt effectively, you’ll gain greater control over your website’s interaction with search engines.

Take the time to review your site’s structure, decide what should and shouldn’t be crawled, and create a robots.txt file that aligns with your goals. With this simple but powerful file in place, you’ll be one step closer to a more streamlined and optimised online presence.

If you’ve not yet created a robots.txt file, why wait? Start today and take control of how your website is seen by the search engines.

About this blog

Over the years we have published many articles based around the questions that we get asked from small businesses relating to marketing, SEO, general business advice and other subjects.  You can find a list of related articles grouped by subject below or can even search for a word or phrase or browse our recent articles.

We hope that you find our articles useful.

Categories
Recently Updated Posts
Other sites of interest

The Crafty Kitten, a local craft business.

UK Business Services directory.

Are you a UK based firm of Accountants looking for a new website for your firm? Check out totalSOLUTION,for responsive, cutting edge websites for accountants, viewable across all modern devices. totalSolution specialise in designing and building websites for UK accountancy firms.