Table of contents

    If your site, despite the measures you’ve taken, still isn’t showing up in search results, the problem is probably indexation. In order for a site to be found in a search engine, Google’s robots must notice it, scan it and index it. The robots.txt file is an extremely important aspect that determines which pages of the site the robots can access. How to build it correctly?

    robots.txt file

    Robots.txt-file – what is it and why is it needed?

    The robots.txt file is a text file that is placed in the root folder of a domain. The information it contains is mainly rules that determine what pages can be accessed by robots, and which access is blocked by us. It is worth noting that the directives placed in the file are only a hint for bots scanning the site and will not be taken into account in every case. 

    A well-built robots.txt file will first of all save crawl budget. Thanks to the directives in it limiting access to pages that are, for example, not very relevant from the user’s point of view, the robots will take into account those URLs that we care more about.

    Robots.txt is a good way to save money.

    What to put in robots.txt file?

    To create a robots.txt file, we can use an ordinary Notepad found on the computer. This is a way that requires you to know the construction of the file and manually create all the rules.

    If we want to automate this process, we can use robots.txt file generators or create it directly from the CMS level. The last solution is the most optimal, due to the automatic addition of rules, so that the file is updated on a regular basis.

    The last solution is the most optimal, due to the automatic addition of rules, so that the file is updated on a regular basis.

    In order for the file to work properly, it is necessary to include some basic elements. 

    The directives that are included in the file should be formulated according to the Robots Exclusion Protocol(REP) standard,  which is read by search engine bots.  

    User-agent

    When creating a robots.txt file, we can either direct our information to various indexing robots or focus on one specific addressee. The most common solution is to target directives to the robots of all available search engines. 

    The directive we should then use is:

    User-agent: *

    A properly constructed rule aimed specifically at Google bots looks like this:

    User-agent: Googlebot

    Disallow and Allow

    These are rules that determine which URLs and directories certain robots can access. 

    Automatically, indexing bots can visit all URLs on the site. Given the aforementioned crawl budget, it is a good idea to block access to certain subpages. 

    This is what the Disallow directive is used for. 

    User-agent: Gogglebot

    Disallow: /wp-admin/

    So what do we need the Allow rule for? 

    In any situation, we may have to deal with an exception. It is also no different when it comes to bot access to subpages of a website. If there is a URL inside the blocking rule that we want to let indexing robots into, we can use the Allow command. 

    The correct structure in such a case is as follows

    User-agent: Googlebot

    Disallow: /wp-admin/

    Allow: /wp-admin/admin-ajax.php/

    We can also split the file considering different rules for bots of different search engines, creating two separate directives.

    Sitemap.xml file

    The robots.txt file is also the ideal place to put the site map URL in XML format. It makes it much easier for indexing robots to reach the subpages of the site. Site map in XML format supports the SEO process and shows the hierarchical structure of the site. 

    According to guidelines from Google, the URL of the map should be complete so that search engine robots read it correctly. 

    Sitemap: https://nazwa-domeny.pl/plik-sitemapy.xml

    How to check robots.txt file?

    After creating the file, it’s a good idea to check that all the information in it is correct and structured in a way that is readable by search engine robots.  We can do this in the Google Search Console tool. Just log in and go to https://www.google.com/webmasters/tools/robots-testing-tool

    The tool will retrieve the robots.txt file that resides on the domain and allow you to check whether the URLs in question are blocked or allowed by the relevant directives. The program will also indicate through which rule the action takes place. 

    Is robots.txt necessary?

    What if you don’t plan to block robots’ access to any subpage?

    The robots.txt file is an integral part of the site. If a scanning robot doesn’t find it at the correct URL, it will consider it a 404 error (the page doesn’t exist) and rate the site as poorly optimized for SEO. 

    In case you do not want to block access to subpages, you can use the directive:

    User-agent:*

    Allow: /

    This will let the robots know that the domain has a robots.txt file and that it is structured correctly. 

    Rules of robots.txt file

    There are quite a few rules to follow when creating a robots.txt file. It is worth remembering some of the most important ones.

    • the file must be named robots.txt and located in the root directory of the domain
    • it should be a text file encoded in UTF-8 format
    • The file size limit according to Google’s guidelines is 500 kB. Once the limit is exceeded, the remaining content is ignored.
    • directives should be constructed according to the generally accepted format directive: [path].
    • we can put comments on directives in the file. Just put a # sign in front of the content of the comment and robots will ignore it.

    When creating a website or starting activities regarding its positioning, let’s keep in mind basic elements such as robots.txt file. Thanks to this, with little effort we can improve the indexing of the site, which will certainly translate into more traffic and visibility for our site.

    Let's talk!

    Karolina Jastrzebska
    Karolina Jastrzebska

    The author of the post is Karolina Jastrzebska. She started her adventure with SEO in 2021. She currently works as an SEO Specialist at Up More.