|
|
|
Houston Web Site Marketing , Houston Search Engine Optimization, Houston SEO,
Domino Marketing© 25319 Wingfield Lane Spring, Texas 77373 Telephone 281-353-8992
|
|
Welcome to Domino Marketing©! Houston web site marketing. Our purpose is to help you maximize your
presence on line with our Houston search engine optimization online training experts in Texas and to help you achieve high
visibility for your web site.
You can learn search engine optimization by self study E-Book or attending one of our seminars.
Call now for details 281-353-8992 Free 1-888-275-9840
|
How to create a Robots.txt file
How to create a Robots.txt file
Robots, Spiders, crawlers, User-agents, whatever you
want to call them, they are your website's best friends. Actually, they are more like your
website's publishing agent. You want your spiders to see your site as often as possible so they can
bring any new content you've published to the eyes of the users who are searching for it.
There is no
doubt that facilitating their job is in your best interest, and the Robots.txt file will allow you
to do just that, by telling all of your 8-legged visitors where to look across your whole domain.
Additionally, it also allows you to tell the Spiders to stay away from the pages you want to keep
hidden, or not to follow the links on it. (For whatever purpose you may have...) There is a META Tag
that does this task too, but it's not as comprehensive, and it isn't used by all Search Engines.
Clearly, the best 'spider guider' is a Robots.txt file that you make for your web site.
How to make sure your spiders see exactly what you want them to:
- First we have to decide what kind of Robots.txt file you need. Do you have any content on your site
that you want to hide from them? Temporarily or permanently? Perhaps you have some pages still under
construction that you don't want them to see YET? Think for a moment about what exactly you want your
spiders to see, and which links you'd like them to follow.
- If you simply want your entire site open to all spiders like ours is,
place a simple .TXT file that has nothing but the following 2 lines of
simple code inside it:
- Copy this file to the root folder of your server, named "robots.txt" (without the quotes). In case
you don't know what I mean by your "root" folder, it's the file on your server that has your index.html
file in it as well... Sometimes servers call it the 'Public HTML' folder.
- If you DO have some pages or links to hide, or even just want to tell particular user-agents (Spiders)
not to index your site, then it gets more difficult, so we have to add more to the code above.
Again, once you have your finished text file, save it to the root or 'public html' folder of your web
server, in the same directory as your index.html file.
- Your spiders will now be forced to go where you want them to. Need to make sure it works like you
designed it?
Why spider guiding is relevant to SEO Many people who are still building parts of their
website use the Robots.txt file to HIDE their new pages until they are complete. (By instructing the
spiders to stay away from that directory or file, and then changing the file back to normal
afterwards.) I've also heard of people who create a links directory for link-swapping purposes,
and then put a line in their Robots.txt file that says NOT to follow the
links from that page. -Don't get caught doing this! People who do so are not only cheating the
people who they swapped links with, but are likely to be caught by the Search Engines themselves...
It's dishonest, and Search Engines don't like it. (It makes their reporting harder.) In essence, such
people are promising to "Vote for" the popularity of another website, and then they turn around behind the
search engines' backs and recall their votes before they can help anyone.
If you have traded links with someone and they do
this to the page your link is on, Search Engines won't see your 'vote,' and so they have in
effect cheated you. However, don't just assume that they've done it on purpose to recall their votes
though, you can check first by viewing their Robots.txt file. Just open a browser and type:
Http://www.theirdomain.com/robots.txt Just like yours.
(Replacing "theirdomain.com" with their
actual domain, of course. If you see that they've instructed their spiders to leave out your link's
page on their site, then perhaps a polite reminder by way of email would be in order. If they still
don't link to you after that, however, simply remove your link to them. You can do better. Someone who
uses this tactic obviously knows what they are doing and how unethical it is.
In general, however, many people find that spiders
just aren't "Smart enough" to find all of their site's contents on its' own. For that reason, it makes
sense to tell any visiting spiders to index all pages, and follow all links. It "gives them no excuse" to
miss valuable links and pages.
The importance of robots.txt
Although the robots.txt file is a very important file if you want to have a good ranking on search engines, many Web sites don't offer this file.
If your Web site doesn't have a robots.txt file yet, read on to learn how to create one. If you already have a robots.txt file, read our tips to make sure that it doesn't contain errors.
What is robots.txt?
When a search engine crawler comes to your site, it will look for a special file on your site. That file is called robots.txt and it tells the search engine spider, which Web pages of your site should be indexed and which Web pages should be ignored.
The robots.txt file is a simple text file (no HTML), that must be placed in your root directory, for example:
- http://www.yourwebsite.com/robots.txt
How do I create a robots.txt file?
As mentioned above, the robots.txt file is a simple text file. Open a simple text editor to create it. The content of a robots.txt file consists of so-called "records".
A record contains the information for a special search engine. Each record consists of two fields: the user agent line and one or more Disallow lines. Here's an example:
User-agent: googlebot
Disallow: /cgi-bin/
This robots.txt file would allow the "googlebot", which is the search engine spider of Google, to retrieve every page from your site except for files from the "cgi-bin" directory. All files in the "cgi-bin" directory will be
ignored by googlebot.
The Disallow command works like a wildcard. If you enter
- User-agent: googlebot
Disallow: /support
both "/support-desk/index.html" and "/support/index.html" as well as all other files in the "support" directory would not be indexed by search engines.
If you leave the Disallow line blank, you're telling the search engine that all files may be indexed. In any case, you must enter a Disallow line for every User-agent record.
If you want to give all search engine spiders the same rights, use the following robots.txt content:
- User-agent: *
Disallow: /cgi-bin/
Where can I find user agent names?
You can find user agent names in your log files by checking for requests to robots.txt. Most often, all search engine spiders should be given the same rights. in that case, use "User-agent: *" as mentioned above.
Things you should avoid
If you don't format your robots.txt file properly, some or all files of your Web site might not get indexed by search engines. To avoid this, do the following:
- Don't use comments in the robots.txt file
Although comments are allowed in a robots.txt file, they might confuse some search engine spiders.
"Disallow: support # Don't index the support directory" might be misinterepreted as "Disallow: support#Don't index the support directory".
- Don't use white space at the beginning of a line. For example, don't write
placeholder User-agent: *
place Disallow: /support
but
User-agent: *
Disallow: /support
- Don't change the order of the commands. If your robots.txt file should work, don't mix it up. Don't write
Disallow: /support
User-agent: *
but
User-agent: *
Disallow: /support
- Don't use more than one directory in a Disallow line. Do not use the following
User-agent: *
Disallow: /support /cgi-bin/ /images/
Search engine spiders cannot understand that format. The correct syntax for this is
User-agent: *
Disallow: /support
Disallow: /cgi-bin/
Disallow: /images/
- Be sure to use the right case. The file names on your server are case sensitve. If the name of your directory is "Support", don't write "support" in the robots.txt file.
- Don't list all files. If you want a search engine spider to ignore all files in a special directory, you don't have to list all files. For example:
User-agent: *
Disallow: /support/orders.html
Disallow: /support/technical.html
Disallow: /support/helpdesk.html
Disallow: /support/index.html
You can replace this with
User-agent: *
Disallow: /support
- There is no "Allow" command
Don't use an "Allow" command in your robots.txt file. Only mention files and directories that you don't want to be indexed. All other files will be indexed automatically if they are linked on your site.
Tips and tricks:
1. How to allow all search engine spiders to index all files
Use the following content for your robots.txt file if you want to allow all search engine spiders to index all files of your Web site:
- User-agent: *
Disallow:
2. How to disallow all spiders to index any file
-
If you don't want search engines to index any file of your Web site, use the following:
User-agent: *
Disallow: /
3. Where to find more complex examples.
-
If you want to see more complex examples, of robots.txt files, view the robots.txt files of big Web sites:
Your Web site should have a proper robots.txt file if you want to have good rankings on search engines. Only if search engines know what to do with your pages, they can give you a good ranking.
We Will Help You
What ever you need to promote your Internet presence, Domino Marketing© will help you.
We can assist you to take your web site to the next level by providing SEO training experts
and a number of other services including Search Engine optimization, Hosting, E-commerce,
Search Engine Placement, Customer Support Tools, and a Free 60-day Email Marketing Trial
For more information click here and a web consultant will contact you with specific pricing and strategies via email or phone
Learn How To Rank Number #1
Goo gle, Msn, Yahoo
Back to top
Back To Home Page
|
|
|