Robots.txt | Seo Galway
Robots.txt is a rather plain text file, but also an extremely efficient file for stripping out your duplicate content issues.
- One element of your Search Engine Optimisation arsenal.
- Theoretically, "barring" "compliant robots or spiders from accessing and recording pages or directories that you select.
- Applicable to both Plain Old Html and CMS arrangements, but usually more useful for the latter.
- Its a plain text file, like notepad would create, with no special formatting but special rules (standards) to address your intentions. Its name is one of them.
User-agent: *
Disallow: /admin/
Disallow: /backup/
Disallow: /cgi-bin/
Disallow: /documentation/
Disallow: /images/
Disallow: /include/
Disallow: /javascripts/
Disallow: /lang/
Disallow: /libs/
Disallow: /temp/
Disallow: /templates/
Disallow: /index.php?search=*
Disallow: /*?search=*
with the sample code (coloured for effect, only) above from another site, you may notice some folders are "protected", and some "search" urls and variations (ran through index.php in this case).
Duplicate Content - this robots.txt file is particularly useful for sites which use sorting - alphabetical, oldest, newest and so on, where the page is the same ( very same content and meta tags, but another or different url - dynamically generated ). This can be an issue for the S.E.'s, and seems like a duplication.
Other considerations:
- Naughty robots will ignore this file. Others may use it to locate folders (they're relative to the root).