From Fedora Project Wiki

< QA‎ | Networking

Line 186: Line 186:
Various people have various preferences on how to approach the problem:
Various people have various preferences on how to approach the problem:


'''Pavel Šimerda''': Favors solution #3 (together with #1, of course), which is cleanest but the most difficult to implement. As a temporary solution, #1b would be a logical improvement of the current situation. He would even not have a problem with plain #1a, which assumes that DNS optimizations are not necessary.
'''Pavel Šimerda''': Favors solution #3 (implying #1a), which is cleanest but the most difficult to implement. As a temporary solution, #1b would be a logical improvement of the current situation. He would even not have a problem with plain #1a, which assumes that DNS optimizations are not necessary.


'''Michal Kubeček (SuSE)''': Advocates solution #2a, as well as solution #2b.
'''Michal Kubeček (SuSE)''': Advocates solution #2a, as well as solution #2b.

Revision as of 17:19, 10 December 2012

Current implementation of AI_ADDRCONFIG considered harmful

AI_ADDRCONFIG was added in order to optimise DNS query traffic, so that only useful addresses are queried for. In other words, an IPv4-only node should not query its upstream resolver for IN AAAA resource records, while an IPv6-only node should not query for IN A resource records.

AI_ADDCRONFIG is defined in several places:

  • POSIX1-2008
  • RFC 3493 (informational)
  • RFC 2553 (obsolete informational)
  • man getaddrinfo: like RFC 3493

The current glibc getaddrinfo() code doesn't behave strictly according to any of these definitions including its own manual page.

The choice whether to use AI_ADDRCONFIG is done by developers of software that uses TCP/IP networking. It is not enabled by default.

Problem statement

The proper function of AI_ADDRCONFIG requires that:

  1. The usual processing of all node-local and link-local names and addresses is preserved as long as the respective addresses are present.
  2. The global name resolution is not affected by the existence or non-existence of node-local and link-local addresses.
  3. IN AAAA DNS queries should not be transmitted from a node with no global IPv6 address, and vice versa: IN A queries should not to be transmitted from a node with no global IPv4 address.

Unfortunately, the current implementation of getaddrinfo() mostly follows the informational RFC 3493, which fails in both #1, #2, and partially in #3.

The standards are unclear on whether a global address assigned to a loopback interface considered a loopback address. The current implementation does not consider it to be.

AI_ADDRCONFIG is a best-effort heuristic to determine whether a node is IPv4-only, IPv6-only, or dual-stacked. While only a routing lookup can be used as a definitive test whether or not a particular destination host is considered potentially reachable, AI_ADDRCONFIG's heuristics are applied before the actual address of the destination is known, so a routing lookup cannot be used.

Problem 1: Node-local and link-local networking

Software developers cannot always anticipate whether their software will used for node-local networking, link-local networking or global scope networking, just as they cannot anticipate whether the software will connect using an IPv4 or IPv6 address. The getaddrinfo() function is here to provide a universal interface independent of address family and scope.

There is a huge number of critical or less critical services that can be accessed globally, through a link-local IPv6 address or through one of the two localhost addresses. If localhost is broken, you never know what else will break because of it. It can be a file service including NFS, FTP and HTTP, remote access protocol including SSH, database service, mail service, system configuration service, print service or anything else.

Filtering getaddrinfo()'s result set based on non-existence of a global address of that family is a mistake, as this will filter out addresses that are not global.

In particular, symptoms of this problems are:

  • On IPv4-only nodes, getaddrinfo() w/AI_ADDRCONFIG will fail to yield any results for nodenames such as ::1, fe80::1%eth0, and localhost6.
  • On IPv6-only nodes, getaddrinfo() w/AI_ADDRCONFIG will fail to yield any results for nodenames such as 127.0.0.1 and localhost4.
  • On single-stack node, getaddrinfo() w/AI_ADDRCONFIG will fail to yield both IPv4 and IPv6 results for nodenames such as localhost and the system hostname (assuming it's present in /etc/hosts).

This leads to bug reports from users such such as:

Problem 2: IN AAAA DNS query suppression from Ethernet-connected IPv4-only hosts

The current implementation of AI_ADDRCONFIG considers IPv6 link-local addresses as an indicator to not suppress IN AAAA DNS queries. On Ethernet, IPv6 link-local addresses are usually automatically configured on every Ethernet interface, even though this is not connected to a network with any IPv6 service. This defeats the purpose of AI_ADDRCONFIG, as IN AAAA DNS queries are transmitted even though the host really has no IPv6 connectivity.

This leads to bugs reports from users such as:

A patch was applied to Fedora that attempted to improve the DNS filtering logic by ignoring IPv6 link-local addresses when determining whether or not to apply AI_ADDRCONFIG. While this patch solved the IN AAAA DNS query suppression problem, it aggravated the problem described in bug 808147. It was therefore eventually reverted.

Benefits of AI_ADDRCONFIG

The preceived benefits of AI_ADDRCONFIG (if implemented perfectly), are:

  • Reduction of network traffic from single-stack nodes.
  • Reduction of DNS server load generated by single-stacked nodes.
  • Potential reduction of getaddrinfo()'s run-time on single-stack nodes, as it would only need to wait for DNS response instead of two.
  • Avoid tickling bugs in single-stack DNS servers that do not correctly cope with the "opposite" record type queries.

Note that all these benefits are exlusively relating to the suppression of DNS queries.

No benefits associated with glibc's current filtering of getaddrinfo()'s result set have been identified. This may even include results that originated from DNS in the first place, e.g., results that were cached by NSCD from an earlier call to getaddrinfo() that did not use AI_ADDRCONFIG.

Applications using getaddrinfo() will usually loop through all the results and try connect()/sendto() for each address until it succeeds (or tries all of them). This works for both TCP and UDP. For unreachable hosts, connect()/sendto() simply fails. Each entry in getaddrinfo() result set includes the entry's address family, so an appliaction that are only interested in a specific address family may also decide to skip to the next result immediately, if the result entry's address family isn't the desired one. For these reasons, it is considered harmless to return all available results to the application, even if it requested ai_flags = AI_ADDRCONFIG (as long as it did not additionally specify ai_family = AF_INET or AF_INET6, that is).

Tests

Tested with glibc 2.16.0.

#!/usr/bin/python3
import sys
from socket import *
hosts = [
    None,
    "localhost",
    "127.0.0.1",
    "localhost4",
    "::1",
    "localhost6",
    "195.47.235.3",
    "2a02:38::1001",
    "info.nix.cz",
    "www.google.com",
]
for host in hosts:
    print("getaddrinfo host=\"{}\" hints.ai_flags=AI_ADDRCONFIG:".format(host))
    try:
        for item in getaddrinfo(host, "http", AF_UNSPEC, SOCK_STREAM, SOL_TCP, AI_ADDRCONFIG):
            print("  {}".format(item[4][0]))
    except gaierror as error:
    	print("  !! {} !!".format(error))

The desired result may not be well defined in this case. The simple definition used here is as follows:

1) Don't filter any non-DNS results under any circumstance.

2) Filter DNS queries based on the presence of global IPv4 and global IPv6 addresses (with a simplified definition of global that means not node-local and not link-local).

The documented result is what follows from the manual page. Note that the definition of getaddrinfo() is roughly the same as RFC 3493 but substantially different from POSIX1-2008.

Host with only 127.0.0.1 and ::1 names

Desired result: All addresses and all non-DNS names should work.

Documented result: Nothing should work.

Actual result: Same as desired result, different from documented result.

Broken addresses: None (127.0.0.1, ::1 according to documentation).

Host with 127.0.0.1, ::1 and at least one link-local IPv6 address

Desired result: All addresses and all non-DNS names should work.

Documented result: Only IPv6 addresses should work. Non-DNS names should only give IPv6 addresses.

Actual result: Same as documented result, different from desired result.

Broken addresses: 127.0.0.1

Host with global IPv4, link-local IPv6 (and DNS)

Desired result: All addresses and all non-DNS names should work. DNS names should only give IPv4 addresses.

Document result: Unlimited address resolution (like without AI_ADDRCONFIG).

Actual result: Same as documented, different from desired.

Host with global IPv4 (and DNS), without link-local IPv6 (like non-ethernet links)

Desired result: All addresses and all non-DNS names should work. DNS names should only give IPv4 addresses.

Document result: Only IPv4 addresses should work. Both non-DNS and DNS names should only give IPv4 addresses.

Actual result: Same as documented, different from desired.

Broken addresses: ::1

Host with global IPv6 (and DNS)

Desired result: All addresses and all non-DNS names should work. DNS name should only give IPv6 addresses.

Documented result: Only IPv6 addresses should work. Both non-DNS and DNS names should only give IPv6 addresses

Actual result: Same as documented result, different from desired result.

Broken addresses: 127.0.0.1

Host with both IPv4 and IPv6 addresses (and DNS, of course)

Desired and documented result: Unlimited address resolution (like without AI_ADDRCONFIG).

Actual result: Same as desired and documented. Everything works.

Conclusions

  • Filtering out non-DNS addresses from getaddrinfo()'s result set is flawed and unfortunate.
  • Using IPv6 link-local addresses as an indicator to issue IN AAAA queries is flawed, as they will be present on most IPv4-only hosts with connected Ethernet interfaces.

Proposed solutions

1a) Remove all code that deals with AI_ADDRCONFIG, effectively disabling it in the general getaddrinfo() code (patch). Pros: Solves all known problematic cases relating to filtering of (non-DNS) results. Cons: Breaks the DNS query suppression funcionality. Undermines applications that are consciously using AI_ADDRCONFIG.

1b) Modify the code to disable all the result set filtering while keeping the gethostbyname* function selection which in turn affects suppression of DNS queries. Patch here. Pros: Solves problematic cases relating to filtering of IP literal lookups. May be combined with #4 to improve IN AAAA DNS query suppression logic. Cons: None?

2a) Remove AI_ADDRCONFIG in all software that uses it. Deprecate AI_ADDRCONFIG and prevent/reject modifications that add it to any software. Can be used together with #1a. Pros: Solves all known problematic cases relating to filtering of (non-DNS) results. Cons: Same as for #1a, and in addition it would be an monumental task, especially considering that AI_ADDRCONFIG is a cross-platform feature.

2b) Implement workarounds over AI_ADDRCONFIG in all software. Pros and cons: Same as for #2a.

3) Implement getaddrinfo() in the name service switch (which is a good idea in itself). Implement AI_ADDRCONFIG in the DNS plugin. This must be used together with #1a, to bring any effect. Pros: Solves problematic cases relating to filtering of IP literal lookups and non-DNS hostnames. May be combined with #4 to improve IN AAAA DNS query suppression logic. Cons: None?

4) Ignore any link-local IPv6 addresses when determining whether to apply AI_ADDRCONFIG logic on otherwise IPv4-only nodes (patch) Pros: Makes the DNS filtering logic work as expected on hosts connected to IPv4-only Ethernet segments. Cons: Breaks getaddrinfo() for IPv6 node- or link-local nodenames on hosts connected to IPv4-only Ethernet segments.

Various people have various preferences on how to approach the problem:

Pavel Šimerda: Favors solution #3 (implying #1a), which is cleanest but the most difficult to implement. As a temporary solution, #1b would be a logical improvement of the current situation. He would even not have a problem with plain #1a, which assumes that DNS optimizations are not necessary.

Michal Kubeček (SuSE): Advocates solution #2a, as well as solution #2b.

Tore Anderson: Favours #3 or #1b - ideally combined with #4. Opposed to #1a and #2a because they will prevent applications from using/requesting DNS query filtering. Believes #2a and #2b are unfeasible.

More resources:

Examples of software using AI_ADDRCONFIG