Bind: zone transfer failed

Problem:


The Domain Name Resolution stopped working without cause, DNS lookups in a specific zone are no longer possible.

Problems like this may occure if the DNS database backend contains wrong or corrupted entries. They may happen out of the sudden because the DNS server caches may hold a working copy of the zone for a longer time.

You may find different error messages in /var/log/syslog and/or /var/log/daemon.log:

1. “dns_sdb_put… failed for”



Jun  3 13:21:30 master named[4191]: zone domain.test/IN: Transfer started.
Jun  3 13:21:30 master named[4191]: transfer of domain.test/IN' from 127.0.0.1#7777: connected using 127.0.0.1#44046
Jun  3 13:21:30 master named[4172]: LDAP sdb zone 'domain.test': dns_sdb_put... failed for client.domain.test.
Jun  3 13:21:30 master named[4172]: LDAP sdb zone 'domain.test': dns_sdb_put... failed for client.domain.test.
Jun  3 13:21:30 master named[4191]: transfer of 'domain.test/IN' from 127.0.0.1#7777: failed while receiving responses: SERVFAIL

2. “zone … : NS has no address records”:

Aug  8 12:30:19 master named[31910]: zone domain.test/IN: NS 'ns.domain.test' has no address records (A or AAAA)
Aug  8 12:30:19 master named[31910]: zone domain.test/IN: NS 'ns.domain.test' has no address records (A or AAAA)
Aug  8 12:30:19 master named[31910]: zone domain.test/IN: not loaded due to errors. 

3. refresh: unexpected rcode (SERVFAIL) from master 127.0.0.1#7777 (source 0.0.0.0#0)

Aug 29 11:16:47 master named[4708]: zone 68.39.10.in-addr.arpa/IN: refresh: unexpected rcode (SERVFAIL) from master 127.0.0.1#7777 (source 0.0.0.0#0)
Aug 29 11:16:47 master named[4708]: zone 177.38.10.in-addr.arpa/IN: refresh: unexpected rcode (SERVFAIL) from master 127.0.0.1#7777 (source 0.0.0.0#0)

Solution:

To find the corrupted records and to fix them you might want trigger the zone transfer to take a look at the corresponding log entrys.

1. Manually trigger zone transfer

The univention-bind reads it’s DNS data from the LDAP backend. The zones are than transferred to the univention-bind-proxy for caching. To trigger transfer from the univention-bind, use port 7777 as DNS port:

dig @127.0.0.1 -p 7777 domain.test axfr

2. Analyze /var/log/syslog and find records

In the first example we used, the problematic record is or has something to do with client.domain.test
It is suggestive to have an initial look at the corresponding alias record via udm-cli:

udm dns/alias list 
--superordinate zoneName=domain.test,cn=dns,dc=domain,dc=test 
--filter cname=client.domain.test.

If there is no obvious problem please have a look at all regarding objects.

In the second example the reason is obviously a not existing nameserver ns.domain.test at the service configuration for domain.test. If the error message is not that clear, like in the third example, it might be helpful to raise the debuglevel and incestigate the messages surrounding the failure message like:

ucr set dns/debug/level=99
/etc/init.d/bind9 restart
grep -B10 "not loaded due to errors" /var/log/daemon.log 

A common cause might be a missing/broken nameserver or contact record at a DNS forward or reverse zone. To find (and probably fix) those records, the attached script might be of use.

If you encounter messages like the following, take a closer look at the DNS host record object of the nameservers configured for the problematic zone(s). They might be corrupt or missing some attributes (like dNSTTL/zonettl).

Nov 19 08:51:01 master named[15984]: zone 177.38.10.in-addr.arpa/IN: could not find NS and/or SOA records
Nov 19 08:51:01 master named[15984]: zone 177.38.10.in-addr.arpa/IN: has 0 SOA records
Nov 19 08:51:01 master named[15984]: zone 177.38.10.in-addr.arpa/IN: has no NS records
Nov 19 08:51:01 master named[15984]: zone 177.38.10.in-addr.arpa/IN: not loaded due to errors.

attached files: check-dns-zone-syntax.py (3,4 KB)

Mastodon