From Fedora Project Wiki

< QA‎ | Networking

Line 26: Line 26:
The proper function of AI_ADDRCONFIG requires that:
The proper function of AI_ADDRCONFIG requires that:


1) The usual processing of all node-local and link-local names and addresses is preserved as long as the respective addresses are present.
1) The usual '''processing of all node-local and link-local names and addresses is preserved''' as long as the respective addresses are present.


2) The global name resolution is not affected by the existence or nonexitence of non-routable addresses.
2) The '''global name resolution is not affected''' by the existence or nonexitence of node-local and link-local addresses.


Unfortunately, the current implementation of <code>getaddrinfo()</code> mostly follows the informational RFC 3493, which fails in both #1 and #2. Filtering addresses based on existence of a global address of that family is one big mistake. That way you filter out addresses that are by
Unfortunately, the current implementation of <code>getaddrinfo()</code> mostly follows the informational RFC 3493, which '''fails in both #1 and #2'''. Filtering addresses based on existence of a global address of that family is one big mistake. That way you filter out addresses that are by
no means global. And RFC 3493 doesn't get even that right, as it says to ''ignore loopback addresses'' instead of saying ''only count global addresses''. That means link-local addresses
no means global. And RFC 3493 doesn't get even that right, as it says to ''ignore loopback addresses'' instead of saying ''only count global addresses''. That means link-local addresses
are treated as global.
are treated as global.
In this case, the two problems partially neutralize each other. So if you keep the filtering, and fix one of them, you break the other one even worse. The only viable solution is to avoid the filtering entirely, at least in my opinion.


Also, the standards are unclear on whether a global address assigned to a loopback interface considered a loopback address. I would say no, as it's not a node-local address, but does everyone read the standards the same way? Standards shouldn't be written like fiction books, so that everyone
Also, the standards are unclear on whether a global address assigned to a loopback interface considered a loopback address. I would say no, as it's not a node-local address, but does everyone read the standards the same way? Standards shouldn't be written like fiction books, so that everyone

Revision as of 03:02, 8 December 2012

Flag AI_ADDRCONFIG considered harmful

As far as I know, AI_ADDRCONFIG was added for the following reasons:

  • Some buggy DNS servers would be confused by AAAA requests
  • Optimization of DNS queries to only ask for useful addresses

Currently, I'm aware of several documents that define AI_ADDRCONFIG:

  • POSIX1-2008: useless but harmless
  • RFC 3493 (informational): useless but (partially) breaks IPv4/IPv6 localhost
  • RFC 2553 (obsolete informational): useless but hopefully harmless
  • man getaddrinfo: like RFC 3493

The current glibc getaddrinfo() code doesn't behave strictly according to any of these definitions including its own manual page. Under some conditions it fails to translate literal addresses and non-DNS names like localhost, localhost4, localhost6, and any other names you put in /etc/hosts (e.g. the hostname). In Fedora, there was a patch that further broke link-local IPv6 addresses, but it has been removed recently.

The first time I learned about this is on a laptop with virtualization, but you can get into this problem very easily even as an ordiary user. The symptom is an unexpected failure of a software that uses node-local TCP/IP communication.

Problem statement

The choice to use AI_ADDRCONFIG is done by developers of software that uses TCP/IP networking. Those developers cannot always anticipate whether the software will used for node-local networking, link-local networking or global networking, not whether IPv4 or IPv6 will be used.

There is a huge number of critical or less critical services that can be accessed globally, through a link-local IPv6 address or through one of the two localhost addresses. If localhost is broken, you never know what else breaks. It can be a file service including NFS, FTP and HTTP, remote access protocol including SSH, database service, mail service, system configuration service, print service or anything else.

The proper function of AI_ADDRCONFIG requires that:

1) The usual processing of all node-local and link-local names and addresses is preserved as long as the respective addresses are present.

2) The global name resolution is not affected by the existence or nonexitence of node-local and link-local addresses.

Unfortunately, the current implementation of getaddrinfo() mostly follows the informational RFC 3493, which fails in both #1 and #2. Filtering addresses based on existence of a global address of that family is one big mistake. That way you filter out addresses that are by no means global. And RFC 3493 doesn't get even that right, as it says to ignore loopback addresses instead of saying only count global addresses. That means link-local addresses are treated as global.

Also, the standards are unclear on whether a global address assigned to a loopback interface considered a loopback address. I would say no, as it's not a node-local address, but does everyone read the standards the same way? Standards shouldn't be written like fiction books, so that everyone uses his imagination to fill in the missing pieces.

That said, AI_ADDRCONFIG is all about heuristics. About avoiding both false negatives and false positives. Only the routing decision can be used as a test whether a particular host is considered potentially reachable.

Potential benefits

The potential benefits of AI_ADDRCONFIG are more than questionable. If it hasn't been a problem that AI_ADDRCONFIG doesn't even work in most cases (see the tests), why should it be a problem if it's just ignored.

If the benefit of not querying DNS records you don't need is important enough to have a special flag for it, it should really *not* do anything else than that. It should not do any filtering of non-DNS results, otherwise you can be sure you'll get into problems.

Not querying IPv6 records is only really useful in an IPv4-only network and vice versa. The (recommended) behavior of such a flag should be precisely specified and should be done exactly the way it's described in the documentation.

I don't see *any* benefits at all in filtering non-DNS results. Applications using getaddrinfo() cycle through all the results and try to connect() to each address until it succeeds (or tries all of them). This works for both TCP and UDP. For unreachable hosts, connect() just fails.

Tests

Tested with glibc 2.16.0.

#!/usr/bin/python3
import sys
from socket import *
hosts = [
    None,
    "localhost",
    "127.0.0.1",
    "localhost4",
    "::1",
    "localhost6",
    "195.47.235.3",
    "2a02:38::1001",
    "info.nix.cz",
    "www.google.com",
]
for host in hosts:
    print("getaddrinfo host=\"{}\" hints.ai_flags=AI_ADDRCONFIG:".format(host))
    try:
        for item in getaddrinfo(host, "http", AF_UNSPEC, SOCK_STREAM, SOL_TCP, AI_ADDRCONFIG):
            print("  {}".format(item[4][0]))
    except gaierror as error:
    	print("  !! {} !!".format(error))

The desired result may not be well defined in this case. For now I'm using a simple definition that says:

1) Don't break non-DNS results. You never know when you need them.

2) Filter DNS results based on the presence of global IPv4 and global IPv6 addresses (with a simplified definition of global that means not node-local and not link-local).

Feel free to offer better definitions of what constitutes a desired result.

The documented result is what follows from the manual page. Note that the definition of getaddrinfo() is roughly the same as RFC 3493 but substantially different from POSIX1-2008.

Host with only 127.0.0.1 and ::1 names

Desired result: All addresses and all non-DNS names should work.

Documented result: Nothing should work.

Actual result: Same as desired result, different from documented result.

Broken addresses: None (127.0.0.1, ::1 according to documentation).

Host with 127.0.0.1, ::1 and at least one link-local IPv6 address

Desired result: All addresses and all non-DNS names should work.

Documented result: Only IPv6 addresses should work. Non-DNS names should only give IPv6 addresses.

Actual result: Same as documented result, different from desired result.

Broken addresses: 127.0.0.1

Host with global IPv4, link-local IPv6 (and DNS)

Desired result: All addresses and all non-DNS names should work. DNS names should only give IPv4 addresses.

Document result: Unlimited address resolution (like without AI_ADDRCONFIG).

Actual result: Same as documented, different from desired.

Host with global IPv4 (and DNS), without link-local IPv6 (like non-ethernet links)

Desired result: All addresses and all non-DNS names should work. DNS names should only give IPv4 addresses.

Document result: Only IPv4 addresses should work. Both non-DNS and DNS names should only give IPv4 addresses.

Actual result: Same as documented, different from desired.

Broken addresses: ::1

Host with global IPv6 (and DNS)

Desired result: All addresses and all non-DNS names should work. DNS name should only give IPv6 addresses.

Documented result: Only IPv6 addresses should work. Both non-DNS and DNS names should only give IPv6 addresses

Actual result: Same as documented result, different from desired result.

Broken addresses: 127.0.0.1

Host with both IPv4 and IPv6 addresses (and DNS, of course)

Desired and documented result: Unlimited address resolution (like without AI_ADDRCONFIG).

Actual result: Same as desired and documented. Everything works.

Making AI_ADDRCONFIG useful

A possible solution for the first problem (that AI_ADDRCONFIG is useless) is to treat link-local addresses the same as loopback (or node-local) addresses. But this is even more harmful.

Fedora's GLIBC was patched to do exactly the above thing. The consequence was that even link-local IPv6 stopped working when a global IPv6 address was absent. And what would we have link-local addresses for if they didn't work without global addresses? This patch has been already reverted.

Conclusion

The whole idea of filtering-out non-DNS addresses is flawed and breaks so many things including IPv4 and IPv6 literals. There is no reason to filter them out.

Proposed solutions:

1) Make getaddrinfo() ignore AI_ADDRCONFIG. It has not been working for years and nobody cared enough to fix it, there is a substantial probability that it's not needed. Remove the code that implements it (patch).

1b) Make getaddrinfo() ignre AI_ADDRCONFIG only when filtering the results but keeps its behavior for gethostbyname* function selection which affects DNS results. The resulting behavior is something between #1 and #3.

2) Patch all software to avoid using AI_ADDRCONFIG. Follow new development, and prevent/reject modifications that add it. This is impractical.

3) Only process AI_ADDRCONFIG in the nsswitch DNS plugin. This requires implementing getaddrinfo() in nsswitch which is required for zeroconf networking anyway. Use solution (1) as a temporary fix. Locally assigned addresses looked up through local DNS would still fail.

Notes: Solution #2 is advocated by Michal Kubeček from SUSE. The third solution is an output of long discussions between me (Pavel Šimerda) and Tore Anderson, who explained me the original purpose of AI_ADDRCONFIG. I would have no problem with just doing #1.

More resources:

Examples of software using AI_ADDRCONFIG