Ask us a question!

Web Moves Blog

Web Moves News and Information

08
Nov
2003

Making Sure Your Site is Crawler-Friendly

Making Sure Your Site Is Crawler-friendly
I couldn’t find any “meaty” questions for this week’s newsletter, so I thought I’d just talk generally about what makes a site “crawler-friendly.” I used to call this “search-engine-friendly” but my friend Mike Grehan convinced me that the more accurate phrase was “crawler-friendly” because it’s the search engine crawlers (or spiders) that your site needs to buddy-up to, as opposed to the search engine itself.

So, how do you make sure your site is on good terms with the crawlers? Well, it always helps to first buy it a few drinks. But, since that’s not usually possible, your next-best bet is to design your site with the crawlers in mind. The search engine spiders are primitive beings, and although they are constantly being improved, for best results you should always choose simplicity over complexity.

What this means is that cutting-edge designs are generally not the best way to go. Interestingly enough, your site visitors may agree. Even though we SEO geeks have cable modems and DSL, our site visitors probably don’t. Slow-loading Flash sites, for example, may stop the search engine spiders right in their tracks. There’s nothing of interest on the average Flash site to a search engine spider anyway, so they’re certainly not going to wait for it to download!

Besides Flash, there are a number of “helpful” features being thrown into site designs these days that can sadly be the kiss of death to its overall spiderability. For instance, sites that require a session ID to track visitors may never receive any visitors to begin with — at least not from the search engines. If your site or shopping cart requires session IDs, check Google right now to see if your pages are indexed. (Do an allinurl:yourdomainhere.com in Google’s search box and see what shows up.) If you see that Google only has one or two pages indexed, your session IDs may be the culprit. There are workarounds for this, as I have seen many sites that use session IDs get indexed; however, the average programmer/designer may not even know this is a problem.

Another source of grief towards getting your pages thoroughly crawled is the use of the exact same Title tags on every page of your site. This sometimes happens because of Webmaster laziness, but often it’s done because a default Title tag is automatically pulled up through a content management system (CMS). If you have this problem it’s well worth taking the time to fix it.

Most CMS’s have workarounds where you can add a unique Title tag as opposed to pulling up the same one for each page. Usually the programmers simply never realized it was important, so it was never done. The cool thing is that with dynamically generated pages you can often set your templates to pull a particular sentence from each page and plug it into your Title field. A nice little “trick” is to make sure each page has a headline at the top of the page that is utilizing your most important keyword phrases. Once you’ve got that, you can set your CMS to pull it out and use it for your Titles also.

Another reason I’ve seen for pages not being crawled is because they are set to require a cookie when a visitor gets to the page. Well guess what, folks? Spiders don’t eat cookies! (Sure, they like beer, but they hate cookies!) No, you don’t have to remove your cookies to get crawled. Just don’t force-feed them to anyone and everyone. As long as they’re not required, your pages should be crawled just fine.

What about the use of JavaScript? We’ve often heard that JavaScript is unfriendly to the crawlers. This is partly true, and partly false. Nearly every site I look at these days uses some sort of JavaScript within the code. It’s certainly not bad in and of itself. As a rule of thumb, if you’re using JavaScript for mouseover effects and that sort of thing, just check to make sure that the HTML code for the links also uses the traditional <a> tag. As long as that’s there, you’ll most likely be fine. For extra insurance, you can place any JavaScript links into the tag, put text links at the bottom of your pages, and create a visible link to a sitemap page which contains links to all your other important pages. It’s definitely not overkill to do *all* of those things!

There are plenty more things you can worry about where your site’s crawlability is concerned, but those are the main ones I’ve been seeing lately. One day, I’m sure that any type of page under the sun will be crawler-friendly, but for now, we’ve still gotta give our little arachnid friends some help.

One tool I use to help me view any potential crawler problems is the Lynx browser tool. Generally, if your pages can be viewed and clicked through in a Lynx browser (which came before our graphical browsers of today), then a search engine spider should also be able to make its way around. That isn’t written in stone, but it’s at least one way of discovering potential problems that you may be having. It’s not foolproof, however. I just checked my forum in the Lynx browser and it shows a blank page, yet the forum gets spidered and indexed by the search engines without a problem.

This is a good time to remind you that when you think your site isn’t getting spidered completely, check out lots of things before jumping to any conclusions.

Jill

Author Bio:
Jill Whalen of High Rankings is an internationally recognized search engine marketing consultant and editor of the free weekly email newsletter, the High Rankings Advisor.

She specializes in search engine optimization, SEO consultations and seminars. Jill’s handbook, “The Nitty-gritty of Writing for the Search Engines” teaches business owners how and where to place relevant keyword phrases on their Web sites so that they make sense to users and gain high rankings in the major search engines.