How to Fix Crawling Errors on Your Website
Your website seems to be performing well and it is enjoying a steady stream of visitors pouring on regular basis. Everything looks so perfect but scratch the surface a little and you will realise how much trouble is waiting for you. Just log into your Google Webmaster Tools account and you will be surprised to be greeted by thousands of crawl errors. Crawl Errors if ignored can threaten the visibility of a website and therefore, you need to do everything possible to get them fixed as soon as possible.
So, how on earth you are supposed to go through this insurmountable task. You need to form a strategy to make your life simple and here in this article; we are going to show you how you can do that:
Do Not Overlook It
There are some webmasters who firmly believe that crawl errors have got nothing to do with the existing online presence of a website. But this is not the case. Crawl errors are very much related to your website’s health. In fact, the majority of search engines usually set a crawling budget and that means, you need to take extra measures to ensure that the bot is spending time to crawl the important pages of your website rather than crawling the worthless 404 error pages.
HTTP errors are not that important. These are usually generated whenever the Google bot faces any problem while crawling the website. You need to contact the web hosting service provider if you see the number of such errors is on the rise. I hope they will take care of this HTTP error thing, which is mostly made of 403 HTTP header responses. Read this awesome blog post on HTTP errors posted in Moz.
Errors In Sitemaps
This is a usual thing. Sometimes, you just forget to remove error pages from your XML sitemap and then submit the same thing to Google via Webmaster tool. Make sure all the URLs within XML sitemap of your website are up and running and none of them are returning 404 or 301 error. If you manage to do this, you will not have to face the same issue anymore. Webmasters at Web Designing X, MotoCMS 3.0 and their likes believe that we need to check the sitemap periodically so that it does not contain error pages.
This is a common error caused redirection error. If Google bot finds a link that uses redirection loop, it may take it as an error and it will show up here at no Followed Section. There are some ways to fix this nagging problem. Just make sure that the pages are redirecting properly and they are using 301 server side permanent response code. Another vital thing is that the redirecting pages should be redirected to pages that are returning 200 ok server side response code, if the landing URL also returns 301 or 302 response code, it will be a double redirection issue which will certainly not be good for the health of your website.
It is always better to use relative links rather than absolute links, as relative links may lead to a bunch of error pages itself case of some weird issues, like when the error page returns a 200 Ok server side response code.
This is another common error, which is by and large caused by 404 error on your website. 404 errors can be caused by different ways like you have deleted a page of your website but you have forgotten to redirect it to the proper page. It might be caused by a typo in a URL or sometimes, while migrating from one domain to another, the internal linking structure may lead to 404 server side response code. Prestashop has published a great article on this topic. Do read it.
Restricted by robots.txt
This is basically not an error rather this is informational in nature. It shows those URLs which are blocked from search engines via robots.txt file. So, you need to check that you are not blocking any internal pages inadvertently and if you find any mistakes, you can remove the code from Robots.txt file to fix that issue. Yoast has published an article on how to deal with this problem effectively. Check it out here.
This kind of error generates when the 404 error pages return 200 Ok server side response code. You need to make sure that whenever the server gets a 404 server side response code, it is returning error page content in the body copy.
If any of your webpage takes a long time to get loaded, Google bot will fail to crawl that page and it will be treated as a time out error and will be flagged accordingly in the Google Webmaster Tools. You need to check the sever log to get more details about the error. DNS lookup timeout, URL timeout etc are some of the common cases of Timeout errors.
This error can be generated due to several factors. For say, the server may receive internal server error or there has been some DNS issues. In some cases, some URLs can be labeled as Unreachable if the robots.txt file seems to be blocking the crawler from visiting a certain page.
Originally posted 2015-10-06 23:05:36.