How to correctly parse IPv6 addresses (in C and Java)
I recently started to do some bug fixing in GNU Netcat. One of the things I worked on was better support for IPv6. In principle, IPv6 support was added to GNU Netcat quite some time ago on the trunk (aka 0.8-cvs), but it turned out that it doesn’t really work. After fixing two obvious bugs (c8c0234, 714dcc5), I stumbled over another interesting issue.
One experiment I wanted to do with Netcat was to connect to another host over IPv6 using a link-local address. With
IPv6, a link-local address is assigned automatically to each interface that has a MAC address (i.e. all
Ethernet interfaces, but not the loopback interface). The IPv6 address is derived from the MAC address and is
unique (because MAC addresses are unique). E.g. an interface with the MAC address 08:00:27:84:0b:e2
would get the
following IPv6 address: fe80::a00:27ff:fe84:be2
.
The problem with link-local addresses is that because of the way they are defined, the routing code in the operating system has no clue which interface it has to use in order to send packets to such an address. Here is where zone IDs come into play. The zone ID (also called scope ID) is a new feature in IPv6 that has no equivalent in IPv4. Basically, in the case considered here, it identifies the interface through which packets have to be sent (but the concept is more general).
Together with the concept of zone ID, the IPv6 specification also introduced a distinct notation to represent an address with an associated zone ID:
<address>%<zone_id>
In the case considered here, the zone ID is simply the interface name (at least, that is how it works on Linux and
Mac OS X). E.g. assuming that the remote host with MAC address 08:00:27:84:0b:e2
is attached to the same network as
the eth0
interface on the local host, the complete address including the zone ID would be:
fe80::a00:27ff:fe84:be2%eth0
This address can indeed be used with programs such as SSH to connect to the remote host. Unfortunately that didn’t work with GNU Netcat:
$ netcat -6 fe80::a00:27ff:fe84:be2%eth0 22
Error: Couldn't resolve host "fe80::a00:27ff:fe84:be2%eth0"
That raises the question how to correctly parse host parameters (passed on the command line or read from a configuration file) such that IPv6 addresses with zone IDs are recognized. It turns out that Netcat was using the following strategy:
-
Attempt to use
inet_pton
to parse the host parameter as an IPv4 or IPv6 address. -
If the host parameter is neither parsable as an IPv4 address nor an IPv6 address, assume that it is a host name and use
gethostbyname
to look up the corresponding address.
The problem with that strategy is that although inet_pton
and gethostbyname
both support IPv6 addresses, they don’t
understand zone IDs. That is to be expected because both functions produce an in6_addr
structure, but the zone ID is
part of the corresponding socket address structure sockaddr_in6
.
To fully support IPv6, several enhancements have been introduced in the Unix socket APIs. In our context the
getaddrinfo
function is the most relevant one. It is able to parse IP addresses and to translate host names, but
in contrast to inet_pton
and gethostbyname
it produces sockaddr_in6
(or sockaddr_in
) structures and fully
supports zone IDs.
As a conclusion, to write C code that supports all types of IP address including IPv6 addresses with zone IDs, use the following approach:
-
Don’t use
inet_pton
andgethostbyname
; always usegetaddrinfo
. -
Don’t assume that the information to connect to a remote host can be stored separately as a host address (
in_addr
orin6_addr
) and a port number: that is only true for IPv4, but not for IPv6. Instead you should always use a socket address so that the zone ID can be stored as well. Obviously there are use cases where the host address and port number need to be processed at different places in the code (consider e.g. a port scanner that takes a host address/name and a port range). In those cases, you can still usegetaddrinfo
, but with aNULL
value for theservice
argument. You then have to store the partially filled socket address and complete the port number later.
Unfortunately, fixing existing code to respect those guidelines may require some extensive changes.
Interestingly, things are much easier and much more natural in Java. In fact, Java considers that the zone ID is part of
the host address (an Inet6Address
instance in this case) so that the socket address (InetSocketAddress
)
simply comprises a host address and port number, exactly as in IPv4. This means that any code that uses the standard
InetAddress.getByName
method to parse an IP address will automatically support IPv6 addresses with zone IDs.
Note that this is true even for code not specifically written with IPv6 support in mind (and even for code written
before the introduction of IPv6 support in Java 1.4), unless of course the code casts the returned InetAddress
to an
Inet4Address
or is not prepared to encounter a :
in the host address, e.g. because it uses it as a separator between
the host address and the port number.