From Fedora Project Wiki

< QA‎ | Networking

Revision as of 03:43, 8 December 2012 by Pavlix (talk | contribs) (Conclusion)

Flag AI_ADDRCONFIG considered harmful

As far as I know, AI_ADDRCONFIG was added for the following reasons:

  • Some buggy DNS servers would be confused by AAAA requests
  • Optimization of DNS queries to only ask for useful addresses

Currently, I'm aware of several documents that define AI_ADDRCONFIG:

  • POSIX1-2008: useless but harmless
  • RFC 3493 (informational): useless but (partially) breaks IPv4/IPv6 localhost
  • RFC 2553 (obsolete informational): useless but hopefully harmless
  • man getaddrinfo: like RFC 3493

The current glibc getaddrinfo() code doesn't behave strictly according to any of these definitions including its own manual page. Under some conditions it fails to translate literal addresses and non-DNS names like localhost, localhost4, localhost6, and any other names you put in /etc/hosts (e.g. the hostname). In Fedora, there was a patch that further broke link-local IPv6 addresses, but it has been removed recently.

The first time I learned about this is on a laptop with virtualization, but you can get into this problem very easily even as an ordiary user. The symptom is an unexpected failure of a software that uses node-local TCP/IP communication.

Problem statement

The choice whether to use AI_ADDRCONFIG is done by developers of software that uses TCP/IP networking. Those developers cannot always anticipate whether the software will used for node-local networking, link-local networking or global scope networking, just as they cannot anticipate whether the software will connect using an IPv4 or IPv6 address. The getaddrinfo() function is here to provide a universal interface independent of address family and scope.

There is a huge number of critical or less critical services that can be accessed globally, through a link-local IPv6 address or through one of the two localhost addresses. If localhost is broken, you never know what else will break because of it. It can be a file service including NFS, FTP and HTTP, remote access protocol including SSH, database service, mail service, system configuration service, print service or anything else.

The proper function of AI_ADDRCONFIG requires that:

1) The usual processing of all node-local and link-local names and addresses is preserved as long as the respective addresses are present.

2) The global name resolution is not affected by the existence or nonexitence of node-local and link-local addresses.

Unfortunately, the current implementation of getaddrinfo() mostly follows the informational RFC 3493, which fails in both #1 and #2. Filtering addresses based on existence of a global address of that family is one big mistake. That way you filter out addresses that are by no means global. And RFC 3493 doesn't get even that right, as it says to ignore loopback addresses instead of saying only count global addresses. That means link-local addresses are treated as global.

Also, the standards are unclear on whether a global address assigned to a loopback interface considered a loopback address. I would say no, as it's not a node-local address, but does everyone read the standards the same way? Standards shouldn't be written like fiction books, so that everyone uses his imagination to fill in the missing pieces.

That said, AI_ADDRCONFIG is all about heuristics. About avoiding both false negatives and false positives. Only the routing decision can be used as a test whether a particular host is considered potentially reachable. And the routing decision is the right place to do the final decision whether an address should be contacted.

Potential benefits

The potential benefits of AI_ADDRCONFIG are more than questionable. If it hasn't been a problem that AI_ADDRCONFIG doesn't even work in most cases (see the tests), why should it be a problem if it's just ignored?

If the benefit of not querying DNS records you don't need is important enough to have a special flag for it, it should not do anything else than that. It should particularly not do any filtering of non-DNS results, otherwise you can be sure you'll get into problems.

Not querying IPv6 records is only really useful in an IPv4-only network and vice versa. The (recommended) behavior of such a flag should be precisely specified and should be done exactly the way it's described in the documentation (or its behavior should be precisely described in the documentation).

I don't see any benefits at all in filtering non-DNS results. Applications using getaddrinfo() cycle through all the results and try to connect() to each address until it succeeds (or tries all of them). This works for both TCP and UDP. For unreachable hosts, connect() simply fails.

Tests

Tested with glibc 2.16.0.

#!/usr/bin/python3
import sys
from socket import *
hosts = [
    None,
    "localhost",
    "127.0.0.1",
    "localhost4",
    "::1",
    "localhost6",
    "195.47.235.3",
    "2a02:38::1001",
    "info.nix.cz",
    "www.google.com",
]
for host in hosts:
    print("getaddrinfo host=\"{}\" hints.ai_flags=AI_ADDRCONFIG:".format(host))
    try:
        for item in getaddrinfo(host, "http", AF_UNSPEC, SOCK_STREAM, SOL_TCP, AI_ADDRCONFIG):
            print("  {}".format(item[4][0]))
    except gaierror as error:
    	print("  !! {} !!".format(error))

The desired result may not be well defined in this case. For now I'm using a simple definition that says:

1) Don't break non-DNS results. You never know when you need them.

2) Filter DNS results based on the presence of global IPv4 and global IPv6 addresses (with a simplified definition of global that means not node-local and not link-local).

Feel free to offer better definitions of what constitutes a desired result.

The documented result is what follows from the manual page. Note that the definition of getaddrinfo() is roughly the same as RFC 3493 but substantially different from POSIX1-2008.

Host with only 127.0.0.1 and ::1 names

Desired result: All addresses and all non-DNS names should work.

Documented result: Nothing should work.

Actual result: Same as desired result, different from documented result.

Broken addresses: None (127.0.0.1, ::1 according to documentation).

Host with 127.0.0.1, ::1 and at least one link-local IPv6 address

Desired result: All addresses and all non-DNS names should work.

Documented result: Only IPv6 addresses should work. Non-DNS names should only give IPv6 addresses.

Actual result: Same as documented result, different from desired result.

Broken addresses: 127.0.0.1

Host with global IPv4, link-local IPv6 (and DNS)

Desired result: All addresses and all non-DNS names should work. DNS names should only give IPv4 addresses.

Document result: Unlimited address resolution (like without AI_ADDRCONFIG).

Actual result: Same as documented, different from desired.

Host with global IPv4 (and DNS), without link-local IPv6 (like non-ethernet links)

Desired result: All addresses and all non-DNS names should work. DNS names should only give IPv4 addresses.

Document result: Only IPv4 addresses should work. Both non-DNS and DNS names should only give IPv4 addresses.

Actual result: Same as documented, different from desired.

Broken addresses: ::1

Host with global IPv6 (and DNS)

Desired result: All addresses and all non-DNS names should work. DNS name should only give IPv6 addresses.

Documented result: Only IPv6 addresses should work. Both non-DNS and DNS names should only give IPv6 addresses

Actual result: Same as documented result, different from desired result.

Broken addresses: 127.0.0.1

Host with both IPv4 and IPv6 addresses (and DNS, of course)

Desired and documented result: Unlimited address resolution (like without AI_ADDRCONFIG).

Actual result: Same as desired and documented. Everything works.

Conclusion

The whole idea of filtering-out non-DNS addresses is flawed and unfortunate.

Proposed solutions:

1a) Remove all code that deals with AI_ADDRCONFIG, effectively disabling it in the general getaddrinfo() code (patch).

1b) Modify the code to disable all the filtering while keeping the gethostbyname* function selection which in turn affects DNS queries.

2a) Remove AI_ADDRCONFIG in all software that uses it. Deprecate AI_ADDRCONFIG and prevent/reject modifications that add it to any software. Can be used together with #1a.

2b) Implement workarounds over AI_ADDRCONFIG in all software.

3) Implement getaddrinfo() in the name service switch (which is a good idea in itself). Implement AI_ADDRCONFIG in the DNS plugin. This must be used together with #1a, to bring any effect.

I (Pavel Šimerda) favor solution #3 (together with #1, of course), which is cleanest but the most difficult to implement (thanks to Tore Anderson for the idea). As a temporary solution, #1b would be a logical improvement of the current situation. I would even have a problem with plain #1a, which assumes that DNS optimizations are not necessary.

Solution #2a is advocated by Michal Kubeček from SUSE and was also proposed as an option by Tore Anderson, as well as solution #2b. I don't like any of them.

More resources:

Examples of software using AI_ADDRCONFIG