|
|
|
|
|
|||||||||||||||||||||||||||||||||||||||||
|
|
|
|||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
[Text version]
RFC: 814
David D. Clark MIT Laboratory for Computer Science Computer Systems and Communications Group July, 1982 1. Introduction It has been said that the principal function of an operating system- is to define a number of different names for the same object, so that it
- can busy itself keeping track of the relationship between all of the
- different names. Network protocols seem to have somewhat the same
characteristic. In TCP/IP, there are several ways of referring to- things. At the human visible interface, there are character string
- "names" to identify networks, hosts, and services. Host names are
- translated into network "addresses", 32-bit values that identify the
- network to which a host is attached, and the location of the host on
- that net. Service names are translated into a "port identifier", which
in TCP is a 16-bit value. Finally, addresses are translated into- "routes", which are the sequence of steps a packet must take to reach
- the specified addresses. Routes show up explicitly in the form of the
- internet routing options, and also implicitly in the address to route
- translation tables which all hosts and gateways maintain.
This RFC gives suggestions and guidance for the design of the- tables and algorithms necessary to keep track of these various sorts of
identifiers inside a host implementation of TCP/IP. 2 2. The Scope of the Problem One of the first questions one can ask about a naming mechanism is- how many names one can expect to encounter. In order to answer this, it
- is necessary to know something about the expected maximum size of the
- internet. Currently, the internet is fairly small. It contains no more
- than 25 active networks, and no more than a few hundred hosts. This
- makes it possible to install tables which exhaustively list all of these
- elements. However, any implementation undertaken now should be based on
an assumption of a much larger internet. The guidelines currently- recommended are an upper limit of about 1,000 networks. If we imagine
- an average number of 25 hosts per net, this would suggest a maximum
- number of 25,000 hosts. It is quite unclear whether this host estimate
- is high or low, but even if it is off by several factors of two, the
- resulting number is still large enough to suggest that current table
- management strategies are unacceptable. Some fresh techniques will be
- required to deal with the internet of the future.
3. Names As the previous section suggests, the internet will eventually have- a sufficient number of names that a host cannot have a static table
- which provides a translation from every name to its associated address.
- There are several reasons other than sheer size why a host would not
- wish to have such a table. First, with that many names, we can expect
- names to be added and deleted at such a rate that an installer might
- spend all his time just revising the table. Second, most of the names
will refer to addresses of machines with which nothing will ever be 3- exchanged. In fact, there may be whole networks with which a particular
- host will never have any traffic.
To cope with this large and somewhat dynamic environment, the- internet is moving from its current position in which a single name
- table is maintained by the NIC and distributed to all hosts, to a
- distributed approach in which each network (or group of networks) is
- responsible for maintaining its own names and providing a "name server"
- to translate between the names and the addresses in that network. Each
- host is assumed to store not a complete set of name-address
- translations, but only a cache of recently used names. When a name is
- provided by a user for translation to an address, the host will first
- examine its local cache, and if the name is not found there, will
- communicate with an appropriate name server to obtain the information,
- which it may then insert into its cache for future reference.
Unfortunately, the name server mechanism is not totally in place in- the internet yet, so for the moment, it is necessary to continue to use
- the old strategy of maintaining a complete table of all names in every
- host. Implementors, however, should structure this table in such a way
- that it is easy to convert later to a name server approach. In
- particular, a reasonable programming strategy would be to make the name
- table accessible only through a subroutine interface, rather than by
- scattering direct references to the table all through the code. In this
- way, it will be possible, at a later date, to replace the subroutine
- with one capable of making calls on remote name servers.
A problem which occasionally arises in the ARPANET today is that 4- the information in a local host table is out of date, because a host has
- moved, and a revision of the host table has not yet been installed from
- the NIC. In this case, one attempts to connect to a particular host and
- discovers an unexpected machine at the address obtained from the local
table. If a human is directly observing the connection attempt, the error is usually detected immediately. However, for unattended- operations such as the sending of queued mail, this sort of problem can
- lead to a great deal of confusion.
The nameserver scheme will only make this problem worse, if hosts- cache locally the address associated with names that have been looked
- up, because the host has no way of knowing when the address has changed
- and the cache entry should be removed. To solve this problem, plans are
- currently under way to define a simple facility by which a host can
- query a foreign address to determine what name is actually associated
with it. SMTP already defines a verification technique based on this- approach.
4. Addresses The IP layer must know something about addresses. In particular,- when a datagram is being sent out from a host, the IP layer must decide
- where to send it on the immediately connected network, based on the
- internet address. Mechanically, the IP first tests the internet address
- to see whether the network number of the recipient is the same as the
- network number of the sender. If so, the packet can be sent directly to
- the final recipient. If not, the datagram must be sent to a gateway for
further forwarding. In this latter case, a second decision must be 5- made, as there may be more than one gateway available on the immediately
- attached network.
When the internet address format was first specified, 8 bits were reserved to identify the network. Early implementations thus- implemented the above algorithm by means of a table with 256 entries,
- one for each possible net, that specified the gateway of choice for that
- net, with a special case entry for those nets to which the host was
- immediately connected. Such tables were sometimes statically filled in,
- which caused confusion and malfunctions when gateways and networks moved
- (or crashed).
The current definition of the internet address provides three- different options for network numbering, with the goal of allowing a
- very large number of networks to be part of the internet. Thus, it is
- no longer possible to imagine having an exhaustive table to select a
- gateway for any foreign net. Again, current implementations must use a
- strategy based on a local cache of routing information for addresses
- currently being used.
The recommended strategy for address to route translation is as follows. When the IP layer receives an outbound datagram for- transmission, it extracts the network number from the destination
- address, and queries its local table to determine whether it knows a
- suitable gateway to which to send the datagram. If it does, the job is
done. (But see RFC 816 on Fault Isolation and Recovery, for- recommendations on how to deal with the possible failure of the
gateway.) If there is no such entry in the local table, then select any 6- accessible gateway at random, insert that as an entry in the table, and
- use it to send the packet. Either the guess will be right or wrong. If
- it is wrong, the gateway to which the packet was sent will return an
- ICMP redirect message to report that there is a better gateway to reach
- the net in question. The arrival of this redirect should cause an
- update of the local table.
The number of entries in the local table should be determined by- the maximum number of active connections which this particular host can
- support at any one time. For a large time sharing system, one might
- imagine a table with 100 or more entries. For a personal computer being
- used to support a single user telnet connection, only one address to
- gateway association need be maintained at once.
The above strategy actually does not completely solve the problem,- but only pushes it down one level, where the problem then arises of how
- a new host, freshly arriving on the internet, finds all of its
- accessible gateways. Intentionally, this problem is not solved within
- the internetwork architecture. The reason is that different networks
- have drastically different strategies for allowing a host to find out
about other hosts on its immediate network. Some nets permit a- broadcast mechanism. In this case, a host can send out a message and
- expect an answer back from all of the attached gateways. In other
- cases, where a particular network is richly provided with tools to
- support the internet, there may be a special network mechanism which a
- host can invoke to determine where the gateways are. In other cases, it
may be necessary for an installer to manually provide the name of at 7- least one accessible gateway. Once a host has discovered the name of
- one gateway, it can build up a table of all other available gateways, by
- keeping track of every gateway that has been reported back to it in an
- ICMP message.
5. Advanced Topics in Addressing and Routing The preceding discussion describes the mechanism required in a- minimal implementation, an implementation intended only to provide
- operational service access today to the various networks that make up
- the internet. For any host which will participate in future research,
- as contrasted with service, some additional features are required.
- These features will also be helpful for service hosts if they wish to
- obtain access to some of the more exotic networks which will become part
- of the internet over the next few years. All implementors are urged to
- at least provide a structure into which these features could be later
- integrated.
There are several features, either already a part of the- architecture or now under development, which are used to modify or
- expand the relationships between addresses and routes. The IP source
- route options allow a host to explicitly direct a datagram through a
- series of gateways to its foreign host. An alternative form of the ICMP
- redirect packet has been proposed, which would return information
- specific to a particular destination host, not a destination net.
- Finally, additional IP options have been proposed to identify particular
- routes within the internet that are unacceptable. The difficulty with
implementing these new features is that the mechanisms do not lie 8- entirely within the bounds of IP. All the mechanisms above are designed
- to apply to a particular connection, so that their use must be specified
- at the TCP level. Thus, the interface between IP and the layers above
- it must include mechanisms to allow passing this information back and
- forth, and TCP (or any other protocol at this level, such as UDP), must
be prepared to store this information. The passing of information- between IP and TCP is made more complicated by the fact that some of the
- information, in particular ICMP packets, may arrive at any time. The
- normal interface envisioned between TCP and IP is one across which
- packets can be sent or received. The existence of asynchronous ICMP
- messages implies that there must be an additional channel between the
- two, unrelated to the actual sending and receiving of data. (In fact,
- there are many other ICMP messages which arrive asynchronously and which
must be passed from IP up to higher layers. See RFC 816, Fault- Isolation and Recovery.)
Source routes are already in use in the internet, and many implementations will wish to be able to take advantage of them. The- following sorts of usages should be permitted. First, a user, when
- initiating a TCP connection, should be able to hand a source route into
- TCP, which in turn must hand the source route to IP with every outgoing
- datagram. The user might initially obtain the source route by querying
- a different sort of name server, which would return a source route
- instead of an address, or the user may have fabricated the source route
manually. A TCP which is listening for a connection, rather than- attempting to open one, must be prepared to receive a datagram which
- contains a IP return route, in which case it must remember this return
route, and use it as a source route on all returning datagrams. 9 6. Ports and Service Identifiers The IP layer of the architecture contains the address information- which specifies the destination host to which the datagram is being
sent. In fact, datagrams are not intended just for particular hosts,- but for particular agents within a host, processes or other entities
- that are the actual source and sink of the data. IP performs only a
- very simple dispatching once the datagram has arrived at the target
host, it dispatches it to a particular protocol. It is the- responsibility of that protocol handler, for example TCP, to finish
- dispatching the datagram to the particular connection for which it is
destined. This next layer of dispatching is done using "port- identifiers", which are a part of the header of the higher level
- protocol, and not the IP layer.
This two-layer dispatching architecture has caused a problem for certain implementations. In particular, some implementations have- wished to put the IP layer within the kernel of the operating system,
- and the TCP layer as a user domain application program. Strict
- adherence to this partitioning can lead to grave performance problems,
- for the datagram must first be dispatched from the kernel to a TCP
- process, which then dispatches the datagram to its final destination
- process. The overhead of scheduling this dispatch process can severely
- limit the achievable throughput of the implementation.
As is discussed in RFC 817, Modularity and Efficiency in Protocol- Implementations, this particular separation between kernel and user
leads to other performance problems, even ignoring the issue of port 10- level dispatching. However, there is an acceptable shortcut which can
- be taken to move the higher level dispatching function into the IP
- layer, if this makes the implementation substantially easier.
In principle, every higher level protocol could have a different dispatching algorithm. The reason for this is discussed below.- However, for the protocols involved in the service offering being
- implemented today, TCP and UDP, the dispatching algorithm is exactly the
- same, and the port field is located in precisely the same place in the
- header. Therefore, unless one is interested in participating in further
- protocol research, there is only one higher level dispatch algorithm.
- This algorithm takes into account the internet level foreign address,
- the protocol number, and the local port and foreign port from the higher
- level protocol header. This algorithm can be implemented as a sort of
- adjunct to the IP layer implementation, as long as no other higher level
- protocols are to be implemented. (Actually, the above statement is only
- partially true, in that the UDP dispatch function is subset of the TCP
- dispatch function. UDP dispatch depends only protocol number and local
- port. However, there is an occasion within TCP when this exact same
- subset comes into play, when a process wishes to listen for a connection
from any foreign host. Thus, the range of mechanisms necessary to- support TCP dispatch are also sufficient to support precisely the UDP
- requirement.)
The decision to remove port level dispatching from IP to the higher- level protocol has been questioned by some implementors. It has been
argued that if all of the address structure were part of the IP layer, 11- then IP could do all of the packet dispatching function within the host,
which would lead to a simpler modularity. Three problems were- identified with this. First, not all protocol implementors could agree
- on the size of the port identifier. TCP selected a fairly short port
identifier, 16 bits, to reduce header size. Other protocols being- designed, however, wanted a larger port identifier, perhaps 32 bits, so
- that the port identifier, if properly selected, could be considered
probabilistically unique. Thus, constraining the port id to one- particular IP level mechanism would prevent certain fruitful lines of
research. Second, ports serve a special function in addition to- datagram delivery: certain port numbers are reserved to identify
- particular services. Thus, TCP port 23 is the remote login service. If
- ports were implemented at the IP level, then the assignment of well
- known ports could not be done on a protocol basis, but would have to be
- done in a centralized manner for all of the IP architecture. Third, IP
was designed with a very simple layering role: IP contained exactly- those functions that the gateways must understand. If the port idea had
- been made a part of the IP layer, it would have suggested that gateways
- needed to know about ports, which is not the case.
There are, of course, other ways to avoid these problems. In- particular, the "well-known port" problem can be solved by devising a
- second mechanism, distinct from port dispatching, to name well-known
- ports. Several protocols have settled on the idea of including, in the
- packet which sets up a connection to a particular service, a more
- general service descriptor, such as a character string field. These
special packets, which are requesting connection to a particular 12- service, are routed on arrival to a special server, sometimes called a
- "rendezvous server", which examines the service request, selects a
- random port which is to be used for this instance of the service, and
- then passes the packet along to the service itself to commence the
- interaction.
For the internet architecture, this strategy had the serious flaw- that it presumed all protocols would fit into the same service paradigm:
- an initial setup phase, which might contain a certain overhead such as
- indirect routing through a rendezvous server, followed by the packets of
- the interaction itself, which would flow directly to the process
- providing the service. Unfortunately, not all high level protocols in
- internet were expected to fit this model. The best example of this is
- isolated datagram exchange using UDP. The simplest exchange in UDP is
- one process sending a single datagram to another. Especially on a local
- net, where the net related overhead is very low, this kind of simple
- single datagram interchange can be extremely efficient, with very low
- overhead in the hosts. However, since these individual packets would
- not be part of an established connection, if IP supported a strategy
- based on a rendezvous server and service descriptors, every isolated
- datagram would have to be routed indirectly in the receiving host
- through the rendezvous server, which would substantially increase the
- overhead of processing, and every datagram would have to carry the full
- service request field, which would increase the size of the packet
- header.
In general, if a network is intended for "virtual circuit service", 13- or things similar to that, then using a special high overhead mechanism
- for circuit setup makes sense. However, current directions in research
- are leading away from this class of protocol, so once again the
- architecture was designed not to preclude alternative protocol
structures. The only rational position was that the particular- dispatching strategy used should be part of the higher level protocol
- design, not the IP layer.
This same argument about circuit setup mechanisms also applies to- the design of the IP address structure. Many protocols do not transmit
- a full address field as part of every packet, but rather transmit a
- short identifier which is created as part of a circuit setup from source
- to destination. If the full address needs to be carried in only the
- first packet of a long exchange, then the overhead of carrying a very
- long address field can easily be justified. Under these circumstances,
- one can create truly extravagant address fields, which are capable of
extending to address almost any conceivable entity. However, this- strategy is useable only in a virtual circuit net, where the packets
- being transmitted are part of a established sequence, otherwise this
- large extravagant address must be transported on every packet. Since
- Internet explicitly rejected this restriction on the architecture, it
- was necessary to come up with an address field that was compact enough
- to be sent in every datagram, but general enough to correctly route the
- datagram through the catanet without a previous setup phase. The IP
- address of 32 bits is the compromise that results. Clearly it requires
- a substantial amount of shoehorning to address all of the interesting
places in the universe with only 32 bits. On the other hand, had the 14- address field become much bigger, IP would have been susceptible to
- another criticism, which is that the header had grown unworkably large.
- Again, the fundamental design decision was that the protocol be designed
- in such a way that it supported research in new and different sorts of
- protocol architectures.
There are some limited restrictions imposed by the IP design on the port mechanism selected by the higher level process. In particular,- when a packet goes awry somewhere on the internet, the offending packet
- is returned, along with an error indication, as part of an ICMP packet.
- An ICMP packet returns only the IP layer, and the next 64 bits of the
- original datagram. Thus, any higher level protocol which wishes to sort
- out from which port a particular offending datagram came must make sure
- that the port information is contained within the first 64 bits of the
- next level header. This also means, in most cases, that it is possible
- to imagine, as part of the IP layer, a port dispatch mechanism which
- works by masking and matching on the first 64 bits of the incoming
- higher level header.