Site Rankings Gone?
Site Rankings Gone
As you may already know, a good part of my job is researching how the organic search engines work. Trying to figure out how the algorithms work in ranking pages is crucial to our day to day operations. Occasionally, we come across sites which seem to defy explanation – they have proper optimization, good internal linking and so on, yet seem to be getting penalized by engines such as Google. Today, I’m going to explain how we began researching a particular problem, in hopes that if it happens to you, you will know what to do.
The first indication that there was a problem with a site was when the PageRank in the Google toolbar disappeared, seemingly over night. This happened soon after a new URL was put up on an existing site. We assumed, as is usual, that Google hadn’t been able to associate the new URL with the “old” content. That is – Google was still expecting to see the old URL associated with the content. We advised the client that it would likely take a few weeks to re associate the new URL with the site.
When sufficient time passed without progress, we had to dig deeper to see what the issues were. As I mentioned above, everything looked ok. Optimization was in place, and there wasn’t any over optimization happening. Internal linking was good, and there was good use of a properly constructed site map. So we had to dig deeper – going beyond on-the-page factors to see if we could figure out what else was causing the problem.
The first thing I looked for…
The first thing I looked for was the existence of a robots.txt file. In many cases, an improperly coded robots file will exclude some, or all, search engine spiders from indexing. In this case a robots.txt was not being used, so I ruled this out.
I then checked to see if there were robots Meta tags in the body of the HTML. These tags do the same as the robots.txt file. That is, they tell the spiders which pages they can and cannot index, but it is done on a page by page basis, rather than a site wide basis as in the robots.txt. Again, an improperly coded robots Meta tag can exclude part or all of a site from getting indexed. Again, this was not the case. Although this site does use a Meta robots tag it was coded properly. In fact, the same tag existed on the “old” site and wasn’t an issue then.
So I then checked the log files to see if the spiders had been visiting the site and they have been there on a regular basis. As recent as a few days ago, as a matter of fact.
Seeing that everything was coded properly, and that spiders had been visiting the site peaked my interest. How is it that spiders are able to see the site (as indicated by their visits) yet the site is not showing up in the index and has a PageRank of 0 still, months after the change?
Some more digging…
So I did some more digging. I checked Google for the old URL. Upon viewing the cached version of the old URL, a theory began to form.
The cached pages are actually the current content of the new site. In other words, Google was somehow associating the old URL with the new site. So I did some more checking. I did a whois lookup and found that the old URL was still registered. So I decided to ping it, and found that it resolved to a new IP address, yet when I try to connect to it using my web browser it comes up as a 404 (page not found error).
I pinged the new site and the IP address is different, but it is the IP address that the site had when it had the “old” URL. This still doesn’t explain why the new site has no PageRank or indexed pages and the “old” URL is showing pages from the new site, but it does give me some clues.
We already know that in order to save time most search engines do not perform a DNS query when they visit a site. They tend to try and connect directly to the site via IP. If they don’t get a site via IP they then perform a DNS query to find the IP of the site.
In the case of this site, Google hasn’t needed to perform a DNS query as, from their point of view, the “old” site still exists. They can connect via IP to the site and are associating the “new” site to the “old” URL.
This also explains why the “new” site is showing a PageRank of 0 with no pages indexed. Because Google has also resolved the “new” site to the same IP which it thinks belongs to the “old” URL. Once it visits the new site it realizes that the new and old sites are identical it gives preference to the “old” site because it pre exists the new site.
Confused yet?
Let me put it in other terms. Since the “old” site has been around for longer, it has built up a reputation on the web. When the client replaced the URL they wiped out that reputation. But no one told Google that the old site was gone. Had Google performed a DNS query they would have found that the old site had in fact been moved, but since it found a site with the same content at the same IP it assumes that it is the site with the reputation.
Along comes a new site with the exact same content and no reputation and of course the first thing Google assumes is that the site owner is trying to spam the engine, so it penalizes the new site. Hence the lack of indexed pages and 0 PageRank value.
To resolve this issue we will try a variety of things. First will be either a 301 redirect (approved by Google to help spiders understand if a site has moved) or another on-the-page redirect, such as a Meta refresh or hyperlink on the “old” URL. These different efforts help enforce to Google that the “old” site has been replaced by the “new” site.
If this doesn’t work, our next step will be to request that the site be removed from the index. This is a last resort; as we would rather the engine figure it out on its own. If we find that Google still can’t figure out that there is a new site, we will definitely request the URL removal.
In addition, to try and help speed things along, we need to ensure that all other links, such as ODP directory listings, point to the correct URL and not the old domain. This will reinforce to the search engines that the “old” site no longer exists and that the “new” site is actually a valid site that isn’t spamming the engines.
Author Bio:
Rob Sullivan
Production Manager
Searchengineposition.com
Search Engine Positioning Specialists