Speaking with quite a few network engineers in the last few months, I was shocked by the lack of real understanding of the Domain Naming System (DNS). It shocked me because it is the singular application functionality that is entirely network-based. Meaning that DNS is the foundation of the Internet, and from their perspective, should be the foundation of the enterprise. If routing, firewalls and switching is working, but DNS is not – things are still down. I can’t tell you how many times I have heard, “the network is up, DNS is down, but that’s not my job.”
If I achieve anything with this post, I hope it is that the regular network engineer can help troubleshoot DNS alongside (if not leading) with their system administrator counterparts. Here’s the bottom line up front: DNS is your job, period. The most junior to senior-level network engineer must know all of the “ins and outs” of DNS as if it were OSPF or Spanning Tree.
DNS – the basic architecture
DNS is simply a host requesting resolution from a name to an IP address (IPv6 or IPv4). Pro Tip: when IPv6 is available the host will do two queries. It will ask for an IPv6 address (AAAA query) and an IPv4 address (A query). These queries have to be resolved by servers upstream. These servers can be an Active Directory domain controller or a standalone recursive DNS server (like BIND or others). Here’s a description of many common record requests done:
- CNAME – also referred to as an alias. These are different names for another hostname.
- MX – this is the mail exchanger record. This means that the mail server for a domain is represented by this record. For proper email functionality to work this and good reverse DNS is needed.
- NS – name server record. These are the records for authoritative servers that control the domain. These are likely the the servers with Start of Authority (SOA) of the domain, but not always.
- SOA – Start of Authority. These are records that correspond with the primary name server for the domain. No other servers have the authority to add or take away records but the SOA.
- PTR – Pointer Records. These are very important records for use with Reverse DNS. Reverse DNS is used with most Internet services to verify an application request comes from a real domain/trusted domain. These records look something like this (IP addresses backwards): IPv4 address 10.0.0.125 is 125.0.0.10.in-addr.arpa and IPv6 address 2001:db8::1:1 is 1.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d.1:0:0:2.ip6.arpa
- SRV – Server Record. These are used specifically for Active Directory. You will mostly find these records inside your AD enterprise, and only outside of it if you are doing things like external AD trusts.
- DNSKEY and RSIG – DNSSEC records. These are very important records that authenticate a record to specific domain servers. These are foundational to DNSSEC. If these expire or revoked, odds are DNS is broken. See a case study of that from NASA’s DNSSEC roll out
What a forward look-up domain zone may look like
For example, let’s use the domain example.com. Using the same forward lookup file that comes with BIND we can parse through how all this works. See below:
$TTL 86400 ; 24 hours could have been written as 24h or 1d ; $TTL used for all RRs without explicit TTL value $ORIGIN example.com. @ 1D IN SOA ns1.example.com. hostmaster.example.com. ( 2002022401 ; serial 3H ; refresh 15 ; retry 1w ; expire 3h ; minimum ) IN NS ns1.example.com. ; in the domain IN NS ns2.smokeyjoe.com. ; external to domain IN MX 10 mail.another.com. ; external mail provider ; server host definitions ns1 IN A 192.168.0.1 ;name server definition www IN A 192.168.0.2 ;web server definition ftp IN CNAME www.example.com. ;ftp server definition ; non server domain hosts bill IN A 192.168.0.3 fred IN A 192.168.0.4
- The first item in the zone file is TTL, or Time to Live. This means how long the zone’s record should stay cached within a local DNS resolver. If the Internet didn’t have this then it would be overwhelmed with DNS traffic constantly looking for the same address. That is why in most default settings 86400 (seconds) is standard. This means 1 Day.
- The SOA definition is next. The SOA in this case is ns1. There are other name servers (ns2), but only one server is the start of authority.
- The serial is the arbitrary number that increments upon updates. This is a good troubleshooting area to look for if DNS is fragmented with SOA transfers to other server/devices like using F5’s DNS Express Zones
- There are also other instructions for caching, retries and expiration.
- Within the zone all of the records are listed. Within this is the MX record. It is always weighted. The lowest number means it’s the preferred server. Adding others would include higher weights in increments of 10. If two servers have the same weight then they are “round-robined.” This means mail will be rotated equally among the servers.
What a sample reverse look-up zone may look like
$TTL 86400 ; 24 hours could have been written as 24h or 1d $ORIGIN 0.168.192.IN-ADDR.ARPA. @ 1D IN SOA ns1.example.com. mymail.example.com. ( 2002022401 ; serial 3H ; refresh 15 ; retry 1w ; expire 3h ; minimum ) ; server host definitions 1 IN PTR ns1.example.com. 2 IN PTR www.example.com. ; non server domain hosts 3 IN PTR bill.example.com. 4 IN PTR fred.example.com.
- The zone begins with defining the scope. For this example, it’s the 192.168.0.0/24 subnet which is represented as 0.168.192.in-addr.arpa.
- Each server then takes its last octect as 1 and instead of an A record it is a PTR (pointer record) to the hostname.
What about Reverse DNS in IPv6?
; 2001:db8::/48 ; ; Zone file built with the IPv6 Reverse DNS zone builder ; http://rdns6.com/ ; $TTL 1h ; Default TTL @ IN SOA ns1.example.com. admin.example.com. ( 2013122601 ; serial 1h ; slave refresh interval 15m ; slave retry interval 1w ; slave copy expire time 1h ; NXDOMAIN cache time ) ; ; domain name servers ; @ IN NS ns1.example.com. ; IPv6 PTR entries 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d.0.1.0.0.2.ip6.arpa. IN PTR host1.example.com. 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d.0.1.0.0.2.ip6.arpa. IN PTR host2.example.com.
- Very similar but each “.” represents a nibble of the address. So in 2001:1:1::1 – “2001” is a chuck and “2” is the nibble. So each dotted annotation is each nibble of the address. I bet you can’t wait to type this stuff out! Fortunately, most DNS tools out there just ask you for the address and they do all the heavy lifting.
Fun with DIG – DNS troubleshooting
Dig is a powerful tool that comes in basic installs of GNU/Linux or UNIX. The standard command is “dig” with a required field for address or hostname. Everything else is the cool stuff.
So let’s look at a few sample troubleshooting steps with DNS using DIG
- Dig all the records of the domain add see the response
dig ANY www.tachyondynamics.com ; <<>> DiG 9.8.1-P1 <<>> ANY www.tachyondynamics.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48166 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 3, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.tachyondynamics.com. IN ANY ;; ANSWER SECTION: www.tachyondynamics.com. 12470 IN A 108.39.81.50 www.tachyondynamics.com. 12470 IN AAAA 2001:470:e073:20:dcfd:16ee:baff:d0d0 ;; AUTHORITY SECTION: tachyondynamics.com. 109670 IN NS ns2.dreamhost.com. tachyondynamics.com. 109670 IN NS ns1.dreamhost.com. tachyondynamics.com. 109670 IN NS ns3.dreamhost.com. ;; Query time: 4 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Thu Dec 26 16:02:12 2013 ;; MSG SIZE rcvd: 149
The question section is what we asked “anything” and the answer section is both A (IPv4) and AAAA (IPv6). We also get NS records for the domain.
- Dig for only the mail servers under the domain
dig MX tachyondynamics.com ; <<>> DiG 9.8.1-P1 <<>> MX tachyondynamics.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22207 ;; flags: qr rd ra; QUERY: 1, ANSWER: 7, AUTHORITY: 3, ADDITIONAL: 1 ;; QUESTION SECTION: ;tachyondynamics.com. IN MX ;; ANSWER SECTION: tachyondynamics.com. 14400 IN MX 30 ASPMX2.GOOGLEMAIL.com. tachyondynamics.com. 14400 IN MX 30 ASPMX3.GOOGLEMAIL.com. tachyondynamics.com. 14400 IN MX 30 ASPMX4.GOOGLEMAIL.com. tachyondynamics.com. 14400 IN MX 30 ASPMX5.GOOGLEMAIL.com. tachyondynamics.com. 14400 IN MX 10 ASPMX.L.GOOGLE.com. tachyondynamics.com. 14400 IN MX 20 ALT1.ASPMX.L.GOOGLE.com. tachyondynamics.com. 14400 IN MX 20 ALT2.ASPMX.L.GOOGLE.com. ;; AUTHORITY SECTION: tachyondynamics.com. 109425 IN NS ns1.dreamhost.com. tachyondynamics.com. 109425 IN NS ns3.dreamhost.com. tachyondynamics.com. 109425 IN NS ns2.dreamhost.com. ;; ADDITIONAL SECTION: ns1.dreamhost.com. 14367 IN A 66.33.206.206 ;; Query time: 76 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Thu Dec 26 16:06:17 2013 ;; MSG SIZE rcvd: 293
Why should network engineers even care?
More ways in which to use Dig are available here. The bottom line is the network engineering teams out there need to get smarter on DNS, because they always reveal more systematic issues that may be hurting the network like:
- Firewalls blocking eDNS. This is Enhanced DNS. Used by DNSSEC and IPv6 it increases the size of many DNS payloads from 512 to upwards of 4096. Most firewall inspection policies (*cough Cisco ASA*) will block these by default.
- Looking at common DNS logs can reveal other eDNS issues in the network upstream or down stream:
success resolving ... (query etc) ... after reducing the advertised EDNS UDP packet size to 512 octets
success resolving ... (another query etc.) ... after disabling EDNS
- Some of your DNS servers configured in DHCP may no longer exists. Having a dead IPv4 or IPv6 address can cause DNS resolutions to impact noticeable user performance
- IPv6. Need I say more? If you are doing or will about to do IPv6, DNS is needed or you will drive yourself crazy.