he beginning of WordPress SEO, should start with the optimization process of your WordPress robots.txt file. The reason to do this is to prevent as much duplicate content as possible because “WordPress” out of the box produces a lot of duplicate content.  Having duplicated content can cause the search engines to penalize your site’s search engine rankings, PageRank and crawl rate.
Robots.txt




[caption id="attachment_357" align="alignleft" width="250" caption="Robots.txt"]Robots.txt is good for seo[/caption]


The WordPress robots.txt file is used to give the search engines “robots” instructions to follow, when they crawl your WordPress blog. These instructions will tell search engines not to crawl non-relevant files, folders, images and duplicate content. Excluding non-relevant files, such as “/wp-admin/”, “/wp-content”, “/wp-includes/” will save bandwidth and speed up the search engine crawling process, when they access your site.

A robots.txt file is a simple text file which can be created with your Windows notepad text editor, and then it is placed in your root domain of your site.

WordPress robots.txt file example one, when WordPress is installed into the root directory, such as, /.

# robots.txt for http://www.yourdomain.com/
#  PARTIAL access (Googlebot)
User-agent: Googlebot

Disallow: /*?
Disallow: /*.css$
Disallow: /*.inc$
Disallow: /*.js$
Disallow: /*.php$
Disallow: /category/*/*
Disallow: /comment-page/*
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
Disallow: /wp-

User-agent: *
Disallow: /cgi-bin/
Disallow: /archives/
Disallow: /category/
Disallow: /comment-page
Disallow: /feed
Disallow: /feed/
Disallow: /page/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /wp-content/
Disallow: /wp-includes/
Disallow: /wp-login.php/
Disallow: /index.php

# Edited last, on 04-05-2009

WordPress robots.txt file example two, when WordPress is installed into a sub-directory, such as, yourdomain.com/blog/.

# robots.txt for http://www.yourdomain.com/

#  PARTIAL access (Googlebot)
User-agent: Googlebot
Disallow: /blog/*?
Disallow: /blog/*.css$
Disallow: /blog/*.inc$
Disallow: /blog/*.js$
Disallow: /blog/*.php$
Disallow: /blog/category/*/*
Disallow: /blog/comment-page/*
Disallow: /blog/*/feed/$
Disallow: /blog/*/feed/rss/$
Disallow: /blog/*/trackback/$
Disallow: /blog/wp-

User-agent: *
Disallow: /blog/archives/
Disallow: /blog/category/
Disallow: /blog/comment-page
Disallow: /blog/feed
Disallow: /blog/feed/
Disallow: /blog/page/
Disallow: /blog/tag/
Disallow: /blog/trackback/
Disallow: /blog/wp-admin/
Disallow: /blog/wp-content/
Disallow: /blog/wp-includes/
Disallow: /blog/index.php

# Edited last, on 04-05-2009

“User-agent: *” function, which means that all search engines bots/spiders will crawl your site according to your instructions.

“Disallow: /wp-” function, will exclude the search engines from crawling all files and folders which start with”wp-”, therefore prevent the indexing of duplicate content.

“Disallow: /*?” function, will exclude your index any url that contains a ?

“$” function, at the end of a file, excludes extensions – example, /*.css$
Using the .css$, will match all files the end with a .css

# is used to show comments for reference because search engines spiders do not read lines beginning with #.

Additional notes:
* The above robots.txt file configuration focuses on keeping as much pagerank as possible on your main page and article pages.
* You must make sure that you name your “robots.txt file” as, “robots.txt” and upload it as ASCII into your website’s root directory or sub directory.
* I have the robots.txt file up into two groups, one for User-agent: Googlebot and the other for User-agent: *which is for all bots.
* You can also install the SearchStatus if you use the FireFox browser. It has many SEO options, and it will let you look at many sites robots.txt files if they are not hidden. This way you can find a nice PR6 site in your niche and see what they are using for a WordPress robots.txt configuration.
* Make sure that your robots.txt file validates. Most people do not do this, but it is a good idea to make sure it validates. To make sure that your robots.txt validates, please visit “Robots.txt Checker.
http://tool.motoricerca.info/robots-checker.phtml

0 comments:

Post a Comment