Last night, a routine maintenance of Sweden’s top-level domain .se went seriously wrong, introducing an error that made DNS lookups for all .se domain names start failing. The entire Swedish Internet effectively stopped working at this point. Swedish (.se) websites could not be reached, email to Swedish domain names stopped working, and for many these problems persist still.
According to sources we have inside the Swedish web hosting industry, the .se zone, the central record for the .se top-level domain, broke at 21:19 21:45 local time and was not returned to normal until 22:43 local time.
However, since DNS lookups are cached externally by Internet service providers (ISPs) and web hosting companies, the problems remained even after that. It wasn’t until around 23:30 local time last night that the major Swedish ISPs had flushed their own DNS caches, meaning that they cleared away the broken results so that new DNS lookups could start working properly again. If they had not done this the problem would have remained for a full 24 hours.
There are still a large number of smaller ISPs that have not yet fixed the problem. It is also likely that ISPs outside of Sweden is not aware of the incident, so the effects of the problem may remain there as well.
We (Pingdom) are based in Sweden, so we have witnessed the massive effects of this incident firsthand and also the widespread frustration from end users. The incident is also receiving a significant amount of media attention.
What exactly happened?
The problem happened during planned maintenance of the .se domain. The .SE registry used an incorrectly configured script to update the .se zone, which introduced an error to every single .se domain name.
We have spoken to a number of industry insiders and what happened is that when updating the data, the script did not add a terminating “.” to the DNS records in the .se zone. That trailing dot is necessary in the settings for DNS to understand that “.se” is the top-level domain. It is a seemingly small detail, but without it, the whole DNS lookup chain broke down.
The problems were made worse by the fact that DNS lookups are cached externally. Since DNS lookups are cached a certain time and the .se zone has a 24 hour time-to-live (the time information is cached by external DNS servers), the problem could last for up to 24 hours for some users.
The solution once the problem had been corrected was to “flush” the cache of external DNS servers, i.e. empty their cache, but this can only be done by the ones controlling the DNS servers, usually ISPs and web hosting companies. The end user has little control over this and is left at the mercy of his/her ISP.
The implications
Pingdom monitors the uptime of tens of thousands of websites for our customers, and we often see downtime due to DNS problems. These problems are very common all over the world, but usually it’s a single domain name that has been incorrectly configured or the DNS servers of a single web host having problems. An entire top-level domain breaking is exceptionally rare.
Problems that affect an entire top-level zone have very wide-ranging effects as can be seen by the .se incident. There are just over 900,000 .se domain names, and every single one of these were affected.
Imagine the same thing happening to the .com domain, which has over 80 million domain names. Although not all of these are actually in use by websites or for email, the effects would still be huge and cause an unprecedented amount of downtime across the entire Internet.
Update: According to a statement issued by the .SE registry the problem started at 21:45 local time, not 21:19 as we previously noted from our source. Changed this accordingly.
Check your DNS health here.