How to keep your site up in a disaster with a DNS redirect plan

 In a previous article, I discussed how using Border Gateway Protocol (BGP) to redirect routes can be an effective method for maintaining Web services in the event that the host Web site is rendered inaccessible. 

While cumbersome, this method is effective if implemented properly. However, a simpler method using the Domain Name Service (DNS) to reroute Web server queries can also provide a basic Web presence when the primary data center is offline. I prepared a plan for this method and soon had to put it into place to help a partner company in dire straits. Here's how I did it.

Tweaking DNS

Thus far, I've focused on rerouting the IP address of the Web server to another network. Any time you fiddle with route tables bad things can, and often do, happen. By going the opposite route and changing the IP address to which the www A (address) record points, you aren't hindered by location of the temporary Web server. It can literally be placed anywhere, without rerouting IP networks.

The DNS solution requires some preparation work as well, and while not too difficult, it is vital. DNS relies on primary and secondary servers to be authoritative for a particular domain (in this case, So long as at least one of these servers is located outside of the disaster zone and is published as authoritative for the zone, the www A record can be changed and propagated to other DNS servers. In this case, the assumption is made that all authoritative DNS servers except one (the remote secondary DNS server) for the zone are located on the affected network and therefore are inaccessible.

Hopefully, during the disaster the domain administrator has some network connectivity (be it cell modem, land line or in a physical location not affected by the outages) for accessing the remote secondary DNS server. If the remote site is running Berkeley Internet Name Domain (BIND) on Linux, Secure Shell (SSH) needs to be enabled; Windows running the Microsoft DNS server should have Remote Desktop enabled.

If access isn't possible, a remote site administrator should be accessible via phone to relay instructions. If voice communications aren't possible, the remote administrator can be instructed in advance on the changes necessary to point the URL to the new location. Obviously it's important to communicate these plans (including passwords) before a disaster.

Ideally, the remote location should also be the host of the emergency Web server (such as Apache on Linux, IIS on Microsoft or another Web server package) so the remote administrator can post emergency messages as well as make the DNS changes, but this isn't necessary. In other words, the two can be separate, and this actually allows for several Web sites to be prepared in advance.

So what are the changes? First, if the remote DNS server isn't the primary for the domain (and in most cases it shouldn't be in a normal environment), promote it to be the primary. Second, change the www A record to the new IP address. In Microsoft, right click the record and change the properties. With BIND, the package nsupdate can be used to delete the www record and insert a new one.

Time To Live

Regardless of DNS platform, the Time To Live (TTL) variable should be set to 3600. This is the number of seconds that other DNS servers are told to cache the wwww record. To fully understand this, a brief review on how DNS works is necessary.

If a user on the Internet wishes to access, after the user puts this URL in the Web browser his computer does a DNS query to the DNS server(s) as configured on the client machine to resolve to an IP address. Note that clients don't usually manually configure their DNS parameters; this is accomplished via Dynamic Host Configuration Protocol (DHCP).

If the client's ISP doesn't contain the record in its cache, it will query one of the published DNS servers for the domain. The amount of time that a record is cached is determined by the TTL. By default, most DNS servers assign a TTL of 86400 to all records; this is the number of seconds before the cached entry expires. This default is found in the Start of Authority (SOA) record.

The significance of the TTL should be obvious, as it determines how fast a DNS change propagates throughout the Internet. Since 86,400 seconds denotes one day, it may be preferable to configure the normal www A record with a TTL of 3,600 seconds. That way, it can be assured that the new www A record will propagate across the entire Internet within an hour. This is a must when the goal is to establish an emergency Web presence as quickly as possible.

Other DNS TTL settings for zone positive and negative response caching may also be tweaked. These will be contained in the SOA record. However, since enabling a temporary Web server requires altering only one record it often is best to leave the entire zone defaults as they are.

You must ensure, though, that the serial number in the SOA for the zone in which the www record resides is incremented. Usually this occurs automatically when using the proper tool for making record changes (for example, nsupdate in Linux). However, if manually editing the zone file, the serial number must be incremented and the DNS server process restarted.

From this point, getting the necessary information out is as simple as editing an HTML file on the new Web server. This isn't a time for flashy graphics or databases. Simple communication of status, plans and emergency contact phone numbers can go a long way on the road to recovery.

The Test

Shortly after I created and implemented this plan, a situation arose in which a partner company was taken off-line because of a massive power and facilities disruption. This company hosted its primary DNS server onsite, and its ISP managed the company's secondary DNS server. The partner company, having heard of my plan, asked me if there was a method to quickly redirect its Web site so to provide timely emergency information.

I contacted the ISP and explained to their DNS administrator the DNS redirect concept. While the company's www A record TTL was set for the default of one day, it was clear that Web hosting would not be possible at the damaged site for possibly a week. So even a delay of up to a day in the redirect was preferable to being off-line for a week.

The ISP declined to host a temporary Web site. Therefore, I suggested creating a temporary server on the main corporate site and redirecting the www record to that server's IP. Again, in this case, a hefty Web server was not necessary, as a simple page explaining the situation was all that was needed.

A Microsoft Windows 2003 server running IIS as a test Web server was recommissioned as the emergency Web server. Remote access for the partner company's Web administrator was granted via Remote Desktop. The DNS changes were made, and within a couple of hours the Web site was up. Granted, because of the TTL issue, some users couldn't access the partner company's Web site at first. But as the DNS changes propagated, more clients were able to access the page, until complete access was attained within a day.

After the emergency passed, I asked the ISP to change the www A record to the original IP address, and because the emergency record's TTL was 3600, the new information propagated within an hour. The crisis was over, normal operations resumed, and some business continuity was maintained during the crisis because critical information was available during the shutdown.


While I am sure there are multiple methods to provide basic emergency Web services, for me the DNS redirect method proved to be the most economical and easiest to implement. As I stated in the beginning of the previous article, I wished that there was a plan available to follow when I was tasked to prepare for such a situation. By relating these experiences, my goal is to fill that void by providing two possible road maps to achieve an acceptable solution.

Copyright © 2006 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon