Up Close and Personal with Bing Team
Click to Play

Up Close and Personal with Bing...
The buzz around the announcement of Microsoft's new search brand Bing is still going strong. Users are commenting on everything from Bing's interface to the...

Recent Articles

Tips To Not Creating Duplicate Content In WordPress
The best way to ensure a web page ranks well in Google keyword searches is to make sure it is the only one on the web that includes the content on the page.

Controlling Your User Generated Blog Content
When thinking about a blog, one big thing that a lot of companies are concerned with is control. They are worried about what may get posted on their blog...

Widget Development Checklist
This is post two of a three part series on how to use widgets for marketing. Widget development can be costly and time consuming. It is important to ask...

SEO Guide For Webmaster Tools
Google Webmaster Tools is a free service that provides a wealth of information directly from Google. Once you have verified a site with Google, they'll give you...

Understand The Use Of Natural Language Optimization
Not long ago we got word that a new search engine will launch in May that will rely heavily on Natural Language Processing (NLP). And we have even heard Eric...

Why Allinanchor Is Important
It is a known fact that the number of links to a particular website play a crucial role in deciding its position on the search engine ranking results. But, how important...


06.23.09

How To Effectively Use Robot.txt Files With Your Site

Patrick HareBy Patrick Hare

On many occasions customers come to us with the complaint that they can't be found. They either had rankings on all search engines and suddenly disappeared, or never were seen in the first place. Believing that they are the victims of a ban in the search engines, they come to us for search engine optimization advice. In many cases, the culprit is found in the robots.txt file, in the form of the classic:

User-agent: *
Disallow: /

(Special Note: Using this command will make your site disappear in the search engines!) The forward slash after the disallow tells the engines to ignore all files. The soluton to this problem is to delete the forward slash, which tells search engines that everything is fair game. If you use Google Webmaster Tools, you will be told that the robots file prevents the indexing of your site. Many times a webmaster will upload this accidentally, or forget to take it down when a dev site goes live. The command effectively tells every honest search engine spider to stop reading your site and go away. Note that unethical spiders that scrape for phone numbers, email addresses, and content will not even bother to look at your robots.txt file, unless they are programmed to look for the files you don't want found. If you are looking to block search spiders from dishonest people on the internet, the robots.txt file is probably not going to help you, so you should look to server level exclusions.

Depending on the complexity of your site, the robots.txt file can be modified to support your SEO initiatives. If you have a series of pages in a shopping cart, forum, or section that you want to exclude, you can disallow a specific directory:

Disallow: /Example

If you have multiple directories, you would just add them to the list:

User-agent:
Disallow: /Example
Disallow: /secret_plans
Disallow: /things_we_do_not_want_the_world_to_know

or you can use a newer wildcard format that disallows pages with certain phrases of string segments in them. If you wanted to disallow all the pages with a session ID in them, you could use a command that says:

Relax and Enjoy Free* Managed Hosting till
the Summer Solstice 21st June 2009

Disallow: /*sessionid

Keep in mind that this will effectively shut out search engines for these pages, so you should ensure that your string is long enough that it does not accidentally blind the engines to pages that you want to get found. The wildcard robots disallow is ideal for people who may have bought sites and then found out that the site was a parked domain with thousands of "junk" pages installed by a previous owner. Even if you don't have any of those pages on your site, it can take months for Google to notice that they no longer exist. By excluding them in your robots file, the removal of those cached pages can take less time.

In the past, people have disallowed the /images directory but normally we don't recommend this. Image and universal search features on search engines allow for your images to get indexed, and this leads to traffic. One of our clients made a substantial number of sales based on image search, so excluding this directory should be done with some thought.

If you want to exclude certain search engines, or direct them away from certain directories, it is easy to set up separate exclusion protocols in the file. For instance, excluding Yahoo! (which uses the "Slurp" robot") from seeing a directory would be done this way:

Continue reading this article.

About the Author:
Patrick Hare has been managing online and offline marketing projects since 1999. From 2005 to present, he has been with Scottsdale Arizona’s Web.com Search Agency (formerly Submitawebsite). Patrick provides Search Engine Optimization and Marketing advice to in-house customers and Web.com Jacksonville’s web design group.
About DevWebProNL
DevWebProNL is for professional developers ... those who build and manage applications and sophisticated websites. DevWebProNL delivers via news and expert advice New Strategies In Development.





DevWebProNL is brought to you by:

SecurityConfig.com NetworkingFiles.com
NetworkNewz.com WebProASP.com
DatabaseProNews.com SQLProNews.com
ITcertificationNews.com SysAdminNews.com
LinuxProNews.com WirelessProNews.com
CProgrammingTrends.com ITmanagementnews.com






-- DevWebProNL is an iEntry, Inc. publication --
iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509
© 2009 iEntry Inc. All Rights Reserved Privacy Policy Legal

archives | advertising info | news headlines | free newsletters | comments/feedback | submit article


Delivering IT Solutions DevWebPro DevWebPro Home Page About Article Archive News Downloads WebProWorld Forums Jayde iEntry Advertise Contact