As you can imagine, we see a lot of website failure scenarios running a website monitoring service. While every website downtime is unique, there are a lot of similarities. We thought it would be interesting to look at aggregate data to see what the most common failure scenarios were. We thought we would first look at the status codes we get back when a website goes down.
As you can see, the largest number of failures don't have a status code assigned at all. This means there was a failure sometime before we were able to get a status code response from the target server. This could mean we failed to look up the server address, there was a network failure somewhere between the monitoring server and the target server, or there was a failure when attempting to open the connection with the server. If we take out this data point, you can see the status code data in more detail.
Looking at just the status code errors we see two broad range of issues. The first fall into the 400 range. 401 and 403 are permission errors, which means either the current user doesn't have permission to access that particular resource, or the web server doesn't have access to the underlying file or service. 404 is the common File Not Found error, which means that the resource may have been moved or deleted. If you see any of these errors on a site that was working fine, you may want to check to make sure something hasn't been changed.
Because this scenario is so common, we've created a tool to help you track down broken links on your site. You can periodically run our Vigil Broken Link Checker to look through your entire site for issues like these.
The second group of errors we see are in the 500 range. A 500 error is an Internal Server Error and is usually indicative of some failure or crash in an underlying component. You will want to check your local log files to see what might be going on. This can also happen if a resource such as a database is not available.
A 503 or Bad Gateway error occurs when your site is sitting behind another service acting as an intermediary. This can be a proxy, or most often a loadbalancer that takes incoming requests and feeds them to a number of backend servers. a 503 error may indicate that there are no backend servers to fulfill a request, or that the gateway itself is overloaded. If you are using a large shared hosting provider, you can see an error like this at times.
Hopefully you found this information interesting. In a future installment we will look into more detail at that "other" category—all the failures that occur before we get a status code.