Understanding TCP/IP


1. Introduction

Internet comes to be the most important thing for the business world. Nowadays, the Internet is a public, cooperative, and self-sustaining facility accessible to hundreds of millions of people worldwide. You can do much more than you think on the Internet. The Internet, sometimes called simply "the Net," is a worldwide system of computer networks - a network of networks in which users at any one computer can, if they have permission, get information from any other computer (and sometimes talk directly to users at other computers). How does the Internet work? Protocol is the answer. A protocol is the special set of rules for communicating that the end points in a telecommunication connection use when they send signals back and forth. On the Internet, it uses of a set of protocols called TCP/IP. Transmission Control Protocol/Internet Protocol (TCP/IP) is the basic communication language or protocol of the Internet. Know more about it and how does it work? This paper will explain about it.
 

2. Protocols

Before you know about TCP/IP (Transmission Control Protocol/Internet Protocol), you should have an idea about Protocols first. In information technology, a protocol (pronounced PROH-tuh-cahl, from the Greek protocollon, which was a leaf of paper glued to a manuscript volume, describing its contents) is the special set of rules for communicating that the end points in a telecommunication connection use when they send signals back and forth.

A protocol is the standard or set of rules that two computers use to communicate with each other. Also known as a communications protocol or network protocol, this is a set of standards that assures that different network products or programs can work together. Any product that uses a given protocol should work with any other product using the same protocol.

Protocols dictate the "whats" and the "hows" of the various systems on the Internet. The success of the Internet, its very existence, in fact, depends on people voluntarily agreeing to configure their hardware and software to the TCP/IP standard.

How do protocols work? Take FTP (File Transfer Protocol) as an example. When you contact a computer to download a file, the computers communicate with each other in a series of pre-agreed-upon rules. The "conversation" between the computers goes something like this: "I want that file," "here it comes," "didn't get it, please resend," "here it is again," "got it," "goodbye," "goodbye." This is a brief example of protocols.

On the Internet, there are the TCP/IP protocols, consisting of TCP (Transmission Control Protocol) and IP (Internet Protocol)
 

3. The History of TCP/IP

Prior to the 1960s, what little computer communication existed comprised simple text and binary data, carried by the most common telecommunications network technology of the day; namely, circuit switching, the technology of the telephone networks for nearly a hundred years. Because most data traffic is bursty in nature (i.e., most of the transmissions occur during a very short period of time), circuit switching results in highly inefficient use of network resources. In 1962, Paul Baran, of the Rand Corporation, described a robust, efficient, store-and-forward data network in a report for the U.S. Air Force; Donald Davies suggested a similar idea in independent work for the Postal Service in the U.K., and coined the term packet for the data units that would be carried. According to Baran and Davies, packet switching networks could be designed so that all components operated independently, eliminating single point-of-failure problems. In addition, network communication resources appear to be dedicated to individual users but, in fact, statistical multiplexing and an upper limit on the size of a transmitted entity result in fast, economical data networks.

The modern Internet began as a U.S. Department of Defense (DoD) funded experiment to interconnect DoD-funded research sites in the U.S. In December 1968, the Advanced Research Projects Agency (ARPA) awarded a contract to design and deploy a packet switching network to Bolt Beranek and Newman (BBN). In September 1969, the first node of the ARPANET was installed at UCLA. With four nodes by the end of 1969, the ARPANET spanned the continental U.S. by 1971 and had connections to Europe by 1973.

The original ARPANET gave life to a number of protocols that were new to packet switching. One of the most lasting results of the ARPANET was the development of a user-network protocol that has become the standard interface between users and packet switched networks; namely, ITU-T (formerly CCITT) Recommendation X.25. This "standard" interface encouraged BBN to start Telenet, a commercial packet-switched data service, in 1974; after much renaming, Telenet is now a part of Sprint's X.25 service.

The initial host-to-host communications protocol introduced in the ARPANET was called the Network Control Protocol (NCP). Over time, however, NCP proved to be incapable of keeping up with the growing network traffic load. In 1974, a new, more robust suite of communications protocols was proposed and implemented throughout the ARPANET, based upon the Transmission Control Protocol (TCP) and Internet Protocol (IP). TCP and IP were originally envisioned functionally as a single protocol, thus the protocol suite, which actually refers to a large collection of protocols and applications, is usually referred to simply as TCP/IP. The original versions of both TCP and IP that are in common use today were written in September 1981, although both have had several modifications applied to them (in addition, the IP version 6, or IPv6, specification was released in December 1995). In 1983, the DoD mandated that all of their computer systems would use the TCP/IP protocol suite for long-haul communications, further enhancing the scope and importance of the ARPANET.

In 1983, the ARPANET was split into two components. One component, still called ARPANET, was used to interconnect research/development and academic sites; the other, called MILNET, was used to carry military traffic and became part of the Defense Data Network. That year also saw a huge boost in the popularity of TCP/IP with its inclusion in the communications kernel for the University of California s UNIX implementation, 4.2BSD (Berkeley Software Distribution) UNIX.

In 1986, the National Science Foundation (NSF) built a backbone network to interconnect four NSF-funded regional supercomputer centers and the National Center for Atmospheric Research (NCAR). This network, dubbed the NSFNET, was originally intended as a backbone for other networks, not as an interconnection mechanism for individual systems. Furthermore, the "Appropriate Use Policy" defined by the NSF limited traffic to non-commercial use. The NSFNET continued to grow and provide connectivity between both NSF-funded and non-NSF regional networks, eventually becoming the backbone that we know today as the Internet. Although early NSFNET applications were largely multiprotocol in nature, TCP/IP was employed for interconnectivity (with the ultimate goal of migration to Open Systems Interconnection).

The NSFNET originally comprised 56-kbps links and was completely upgraded to T1 (1.544 Mbps) links in 1989. Migration to a "professionally-managed" network was supervised by a consortium comprising Merit (a Michigan state regional network headquartered at the University of Michigan), IBM, and MCI. Advanced Network & Services, Inc. (ANS), a non-profit company formed by IBM and MCI, was responsible for managing the NSFNET and supervising the transition of the NSFNET backbone to T3 (44.736 Mbps) rates by the end of 1991. During this period of time, the NSF also funded a number of regional Internet service providers (ISPs) to provide local connection points for educational institutions and NSF-funded sites.

In 1993, the NSF decided that it did not want to be in the business of running and funding networks, but wanted instead to go back to the funding of research in the areas of supercomputing and high-speed communications. In addition, there was increased pressure to commercialize the Internet; in 1989, a trial gateway connected MCI, CompuServe, and Internet mail services, and commercial users were now finding out about all of the capabilities of the Internet that once belonged exclusively to academic and hard-core users! In 1991, the Commercial Internet Exchange (CIX) was formed by General Atomics, Performance Systems International (PSI), and UUNET Technologies to promote and provide a commercial Internet backbone service. Nevertheless, there remained intense pressure from non-NSF ISPs to open the network to all users.

In 1994, a plan was put in place to reduce the NSF's role in the public Internet. The new structure comprises three parts:

  1. Network Access Points (NAPs), where individual ISPs would interconnect. Although the NSF is only funding four such NAPs (Chicago, New York, San Francisco, and Washington, D.C.), several non-NSF NAPs are also in operation.
  2. The very High Speed Backbone Network Service, a network interconnecting the NAPs and NSF-funded centers, operated by MCI. This network was installed in 1995 and operated at OC-3 (155.52 Mbps); it was completely upgraded to OC-12 (622.08 Mbps) in 1997.
  3. The Routing Arbiter, to ensure adequate routing protocols for the Internet.
In addition, NSF-funded ISPs were given five years of reduced funding to become commercially self-sufficient. This funding ended by 1998.

In 1988, meanwhile, the DoD and most of the U.S. Government chose to adopt OSI protocols. TCP/IP was now viewed as an interim, proprietary solution since it ran only on limited hardware platforms and OSI products were only a couple of years away. The DoD mandated that all computer communications products would have to use OSI protocols by August 1990 and use of TCP/IP would be phased out. Subsequently, the U.S. Government OSI Profile (GOSIP) defined the set of protocols that would have to be supported by products sold to the federal government and TCP/IP was not included. Despite this mandate, development of TCP/IP continued during the late 1980s as the Internet grew. TCP/IP development had always been carried out in an open environment (although the size of this open community was small due to the small number of ARPA/NSF sites), based upon the creed "We reject kings, presidents, and voting. We believe in rough consensus and running code" (Dave Clark, MIT.) OSI products were still a couple of years away while TCP/IP became, in the minds of many, the real open systems interconnection protocol suite.

It is not the purpose of this memo to take a position in the OSI vs. TCP/IP debate. Nevertheless, a number of observations are in order. First, the ISO Development Environment (ISODE) was developed in 1990 to provide an approach for OSI migration for the DoD. ISODE software allows OSI applications to operate over TCP/IP. During this same period, the Internet and OSI communities started to work together to bring about the best of both worlds as many TCP and IP features started to migrate into OSI protocols, particularly the OSI Transport Protocol class 4 (TP4) and the Connectionless Network Layer Protocol (CLNP), respectively. Finally, a report from the National Institute for Standards and Technology (NIST) in 1994 suggested that GOSIP should incorporate TCP/IP and drop the "OSI-only" requirement.
 

4. TCP/IP Overview

TCP/IP (Transmission Control Protocols/Internet Protocol) is the basic communication language or protocol of the Internet. It can also be used as a communications protocol in the private networks called intranets and in extranets. When you are setup with direct access to the Internet, your computer is provide with a copy of the TCP/IP program just as every other computer that you may send messages to or get information from also has a copy of TCP/IP.

First some basic definitions. The most accurate name for the set of protocols are describing is the "Internet protocol suite". TCP and IP are two of the protocols in this suite. Because TCP and IP are the best known of the protocols, it has become common to use the term TCP/IP or IP/TCP to refer to the whole family.

In order to understand what this means, it is useful to look at an example. A typical situation is sending mail. First, there is a protocol for mail. This defines a set of commands which one machine sends to another, e.g. commands to specify who the sender of the message is, who it is being sent to, and then the text of the message. However this protocol assumes that there is a way to communicate reliably between the two computers. Mail, like other application protocols, simply defines a set of commands and messages to be sent. It is designed to be used together with TCP and IP.

TCP is responsible for making sure that the commands get through to the other end. It keeps track of what is sent, and retransmits anything that did not get through. If any message is too large for one datagram, e.g. the text of the mail, TCP will split it up into several datagrams, and make sure that they all arrive correctly. Since these functions are needed for many applications, they are put together into a separate protocol, rather than being part of the specifications for sending mail. You can think of TCP as forming a library of routines that applications can use when they need reliable network communications with another computer. Similarly, TCP calls on the services of IP. Although the services that TCP supplies are needed by many applications, there are still some kinds of applications that don't need them. However there are some services that every application needs. So these services are put together into IP. As with TCP, you can think of IP as a library of routines that TCP calls on, but which is also available to applications that don't use TCP. This strategy of building several levels of protocol is called "layering". We think of the applications programs such as mail, TCP, and IP, as being separate "layers", each of which calls on the services of the layer below it.

Generally, TCP/IP applications use 4 layers:

(The ISO/OSI protocol with seven layers is the usual reference model. Since TCP/IP was designed before the ISO model was developed it has four layers; however the differences between the two are mostly minor. Below, is a comparison of the TCP/IP and OSI protocol stacks:
 
OSI Reference Layer
OSI Protocol Stack
TCP/IP stacks
7
Application
Specifies distributed client/server applications Applications: 
  • WWW browsers (http) 
  • telnet 
  • file transfer protocol (FTP) 
6
Presentation
Specifies protocols for translating data format
5
Session
Specifies protocols for starting and ending a communications session across a network
4
Transport
Specifies protocols for end to end error control Transmission Control Protocol
3
Network
Specifies protocols for routing messages: 
  • Addressing 
  • Paths for transferring messages on a network 
Internet Protocol
2
Data
Specifies protocols for point-to-point transmission and error control Data Link
1
Physical
Specifies protocols for transmission of data over physical media Physical
Below are the major difference between the OSI and TCP/IP: TCP/IP is based on the "catenet model". This model assumes that there are a large number of independent networks connected together by gateways. The user should be able to access computers or other resources on any of these networks.

Datagrams will often pass through a dozen different networks before getting to their final destination. The routing needed to accomplish this should be completely invisible to the user. As far as the user is concerned, all he needs to know in order to access another system is an "Internet address". This is an address that looks like 128.6.4.194. It is actually a 32-bit number. However it is normally written as 4 decimal numbers, each representing 8 bits of the address. (The term "octet" is used by Internet documentation for such 8-bit chunks. The term "byte" is not used, because TCP/IP is supported by some computers that have byte sizes other than 8 bits.) Generally the structure of the address gives you some information about how to get to the system. For example, 161.57 is a network number assigned by a central authority to Ferris State University. Ferris uses the next octet to indicate which of the campus Ethernets is involved. 161.57.212 happens to be an Ethernet used by the Information System and Technology Department. The last octet allows for up to 66 systems on each Ethernet. Note that 161.57.212.66 and 161.57.211.66 would be different systems. The structure of an Internet address is described in a bit more detail later.

Of course we normally refer to systems by name, rather than by Internet address. When we specify a name, the network software looks it up in a database, and comes up with the corresponding Internet address. Most of the network software deals strictly in terms of the address. (RFC 882 describes the name server technology used to handle this lookup.)

TCP/IP is built on "connectionless" technology. Information is transferred as a sequence of "datagrams". A datagram is a collection of data that is sent as a single message. Each of these datagrams is sent through the network individually. There are provisions to open connections (i.e. to start a conversation that will continue for some time). However at some level, information from those connections is broken up into datagrams, and those datagrams are treated by the network as completely separate. For example, suppose you want to transfer a 15000 octet file. Most networks can't handle a 15000 octet datagram. So the protocols will break this up into something like 30 500-octet datagrams. Each of these datagrams will be sent to the other end. At that point, they will be put back together into the 15000-octet file. However while those datagrams are in transit, the network doesn't know that there is any connection between them. It is perfectly possible that datagram 14 will actually arrive before datagram 13. It is also possible that somewhere in the network, an error will occur, and some datagram won't get through at all. In that case, that datagram has to be sent again.

Note by the way that the terms "datagram" and "packet" often seem to be nearly interchangeable. Technically, datagram is the right word to use when describing TCP/IP. A datagram is a unit of data, which is what the protocols deal with. A packet is a physical thing, appearing on an Ethernet or some wire. In most cases a packet simply contains a datagram, so there is very little difference. However they can differ. When TCP/IP is used on top of X.25, the X.25 interface breaks the datagrams up into 128-byte packets. This is invisible to IP, because the packets are put back together into a single datagram at the other end before being processed by TCP/IP. So in this case, one IP datagram would be carried by several packets. However with most media, there are efficiency advantages to sending one datagram per packet, and so the distinction tends to vanish.

Most Internet users are familiar with the even higher layer application protocols that use TCP/IP to get to the Internet. These include the World Wide Web's Hypertext Transfer protocol (HTTP), the File Transfer Protocol (FTP), Telnet (Telnet) which let you logon the remote computers, and the Simple Mail Transfer Protocol (SMTP). These and other protocols are often packaged together with TCP/IP as a "suite."
 

5. Basic Structure

To understand this technology you must first understand the following logical structure:


FIGURE 1. OSI Model for TCP/IP Protocols

The figure 1 shows comparison between OSI Layers and TCP/IP Protocol Suit.

The 4 layers of TCP/IP:

            -----------------------------------------------------  ------
APPLICATION |Telnet|FTP|Gopher|SMTP|HTTP|Finger|POP|DNS|SNMP|RIP|  |Ping|
            |------+---+------+----+----+------+---+-+-+----+---|  |----+-----
  TRANSPORT |                   TCP                  |    UDP   |  |ICMP|OSPF|
            |----------------------------------------+----------+--+----+----+----
   INTERNET |                            IP                                  |ARP|
            |----------+-------+----+------+-------+------+-----+-----+------+---|
    NETWORK | Ethernet | Token |FDDI| X.25 | Frame | SMDS | ISDN| ATM | SLIP |PPP|
  INTERFACE |          | Ring  |    |      | Relay |      |     |     |      |   |
            ----------------------------------------------------------------------
            FIGURE 2. Simplified TCP/IP protocol stack.
Figure 2. shows the TCP/IP protocol architecture; this diagram is by no means exhaustive, but shows the major protocol and application components common to most commercial TCP/IP software packages and their relationship.
 

            5.1 The Network Interface Layer

The TCP/IP protocols have been designed to operate over nearly any underlying local or wide area network technology. Although certain accommodations may need to be made, IP messages can be transported over all of the technologies shown in the figure, as well as numerous others.

Two of the underlying interface protocols are particularly relevant to TCP/IP. The Serial Line Internet Protocol (SLIP) and Point-to-Point Protocol (PPP), respectively, may be used to provide data link layer protocol services where no other underlying data link protocol may be in use, such as in leased line or dial-up environments. Most commercial TCP/IP software packages for PC-class systems include these two protocols. With SLIP or PPP, a remote computer can attach directly to a host server and, therefore, connect to the Internet using IP rather than being limited to an asynchronous connection. PPP, in addition, provides support for simultaneous multiple protocols over a single connection, security mechanisms, and dynamic bandwidth allocation (e.g., when running over ISDN.)

The Network layer protocol also provides some way of identifying each function node in a machine-readable form. The addressing scheme in IP is very complex and can accommodate many millions of nodes. Although the addressing scheme in IPX is more limited, it's also more automated, more flexible, and much easier to use, and it delivers fast performance.
 

            5.2 The Internet Layer

The Internet Protocol, provides services that are roughly equivalent to the OSI Network Layer. IP provides a datagram (connectionless) transport service across the network. This service is sometimes referred to as unreliable because the network does not guarantee delivery nor notify the end host system about packets lost due to errors or network congestion. IP datagrams contain a message, or one fragment of a message, that may be up to 65,535 bytes (octets) in length. IP does not provide a mechanism for flow control.

5.2.1 IP Addresses
One important aspect of IP, even to a typical end-user, is the format and notation used for addressing. IP addresses are always 32 bits in length, as shown in Figure 2. They are typically written as a sequence of four numbers, representing the decimal value of each of the address bytes. Since the values are separated by periods, the notation is referred to as dotted decimal. A sample IP address is 208.162.106.17.

                                    1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
                0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
               --+-------------+------------------------------------------------
     Class A   |0|     NET_ID  |                         HOST_ID               |
               |-+-+-----------+---------------+-------------------------------|
     Class B   |1|0|    NET_ID                 |            HOST_ID            |
               |-+-+-+-------------------------+---------------+---------------|
     Class C   |1|1|0|                     NET_ID              |    HOST_ID    |
               |-+-+-+-+---------------------------------------+---------------|
     Class D   |1|1|1|0|                            MULTICAST_ID               |
               |-+-+-+-+-------------------------------------------------------|
     Class E   |1|1|1|1|                      EXPERIMENTAL_ID                  |
               --+-+-+-+--------------------------------------------------------
            FIGURE 3. IP Address Format
IP addresses are hierarchical for routing purposes and are subdivided into two subfields. The Network Identifier (NET_ID) subfield identifies the TCP/IP subnetwork connected to the Internet. The NET_ID is used for high-level routing between networks, much the same way as the country code, city code, or area code is used in the telephone network. The Host Identifier (HOST_ID) subfield indicates the specific host within a subnetwork.

To accommodate different size networks, IP defines several address classes, as shown in the figure.

A Class A address has a 7-bit NET_ID and 24-bit HOST_ID. Class A addresses are intended for very large networks and can address up to 16,777,216 (224) hosts per network. The first digit of a Class A addresses will be a number between 1 and 126. Relatively few Class A addresses have been assigned; examples include 9.0.0.0 (IBM) and 35.0.0.0 (Merit).

A Class B address has a 14-bit NET_ID and 16-bit HOST_ID. Class B addresses are intended for moderate sized networks and can address up to 65,536 (216) hosts per network. The first digit of a Class B address will be a number between 128 and 191. The Class B address space is most in danger of being exhausted of any of the classes and it is very difficult to get a Class B address assigned at this time; examples include 128.138.0.0 (WestNet) and 152.163.0.0 (America Online).

A Class C address has a 21-bit NET_ID and 8-bit HOST_ID. These addresses are intended for small networks and can address only up to 256 (28) hosts per network. The first digit of a Class C address will be a number between 192 and 223. Most addresses assigned to networks today are Class C; examples include 208.162.102.0 (Hill Associates) and 192.80.64.0 (St. Michael's College, Colchester, VT).

The remaining two address classes are used for special functions only and are not commonly assigned to individual hosts. Class D addresses may begin with a value between 224 and 239, and are used for IP multicasting (i.e., sending a single datagram to multiple hosts.) Class E addresses begin with a value between 240 and 255 and are reserved for experimental use.

Several address values are reserved and/or have special meaning. A HOST_ID of 0 (as used above) is a dummy value reserved as a place holder when referring to an entire subnetwork; the address 10.0.0.0, then, refers to the Class A address with a NET_ID of 10 (this was the old ARPANET address). A HOST_ID of all ones (usually written "255" when referring to an all-ones byte, but also denoted as "-1") is a broadcast address and refers to all hosts on a network. A NET_ID value of 127 is used for loopback testing. An additional addressing tool is the subnet mask. Subnet masks are used to indicate to applications the portion of the address that identifies the network from the portion that identifies the individual hosts. The subnet mask is written in dotted decimal and the number of 1s indicates the significant NET_ID bits. A Class B address, for example, would typically have a subnet mask of 255.255.0.0 since the first 16 bits are NET_ID.

Subnet masks can also be used to subdivide a large address space or to combine multiple small address spaces. For example, a network may subdivide their address space to define multiple logical networks by segmenting the HOST_ID subfield into a Subnetwork Identifier (SUBNET_ID) and (smaller) HOST_ID (e.g., define the Class B address space 130.20.0.0 into a 16-bit NET_ID, 4-bit SUBNET_ID, and 12-bit HOST_ID; in this case, the subnet mask for individual subnet routing would be 255.255.240.0). Alternatively, a single user might be assigned the four Class C addresses 200.77.128.0, 200.77.129.0, 200.77.130.0, and 200.77.131.0, and use the subnet mask 255.255.252.0 for routing to the domain. This use of subnet masks in routing tables to circumvent the limitations of class-based addresses is called Classless Interdomain Routing (CIDR).

As of January 1996, there were 95 Class A addresses, 5892 Class B addresses, and 128,378 Class C addresses assigned. Because CIDR is becoming so widely used, however, these numbers are not a true reflection of the number of networks attached to the public Internet because multiple addresses may be assigned to a single organizational entity.

5.2.2 IP Domains and Host Names
While IP addresses are 32 bits in length, most users do not memorize the numeric addresses of the hosts to which they attach; instead, people are more comfortable with host names. Most IP hosts, then, have both a numeric IP address and a name. While this is convenient for people, however, the name must be translated back to a numeric address for routing purposes.

Internet hosts use a hierarchical naming structure comprising a top-level domain (TLD), domain and subdomain (optional), and host name.

The IP address space (and all TCP/IP-related numbers) is assigned and maintained by the Internet Assigned Number Authority (IANA) (http://www.arin.net).

Domain names are assigned by the TLD naming authority; until April 1998, the Internet Network Information Center (InterNIC) (http://rs.internic.net/) had overall authority of these names, with NICs around the world handling non-U.S. domains. The InterNIC was also responsible for the overall coordination and management of the Domain Name System (DNS), the distributed database that reconciles host names and IP addresses on the Internet.

The InterNIC is an interesting example of changes in the Internet. Since 1993, Network Solutions, Inc. (NSI) has operated the InterNIC on behalf of the NSF. The InterNIC has had exclusive registration authority for the .com, .org, .net, and .edu domains. NSI's contract ran out in April 1998 and was extended several times while everyone tried to determine who should pick up the registration for those domains. In October 1998, it was decided that NSI will remain as the sole administrator for those domains but that users could register names in those domains with other firms.

The domain name structure is best understood if the name is read from right-to-left. Internet hosts names end with a top-level domain name. World-wide generic top-level domains include:

The host name ism.ferris.edu, for example, is assigned to a MS-ISM program (ism) in the Ferris State University domain (ferris), within the educational top-level domain (edu).

Other top-level domain names use the two-letter country codes defined in ISO standard country codes (ftp://venera.isi.edu/in-notes/iana/assignments/country-codes); for example of country codes are ca (Canada), de (Germany), es (Spain), fr (France), ie (Ireland), jp (Japan), mx (Mexico), us (United States, the us domain is largely organized on the basis of geography or function.), and th (Thailand). It is important to note that there is not necessarily any correlation between a country code and where a host is actually physically located.

In November 1996, an Internet International Ad Hoc Committee (IAHC) (http://www.iahc.org/) was formed to resolve some of these naming issues and to act as a focal point for the international debate over a proposal to establish additional global naming registries and global Top Level Domains (gTLDs). In February 1997, the IAHC proposed the creation of seven new gTLDs:

The IAHC also proposed that up to 28 new registrars be established to grant second-level domain names under the new gTLDs, all of which will be shared among the new registrars. Furthermore, the three existing gTLDs .com, .net, and .org will also be shared upon conclusion of the NSF contract in the U.S. in 1998.

The DNS is a distributed database containing host name and IP address information for all domain on the Internet. There is a single authoritative name server for every domain; about a dozen root servers have a list of all of these authoritative name servers. When a request is made by a host to the DNS, the request goes to a local name server. If there is insufficient information at the local name server, a request is made to the root to find the authoritative name server, and the information request is forwarded to that name server. Name servers contain the following types of information:

5.2.3 ARP and Address Resolution
ARP (Address Resolution Protocol) is used to translate IP addresses to Ethernet addresses. The translation is done only for outgoing IP packets, because this is when the IP header and the Ethernet header are created.

1) ARP Table for Address Translation

The translation is performed with a table look-up. The table, called the ARP table, is stored in memory and contains a row for each computer. There is a column for IP address and a column for Ethernet address. When translating an IP address to an Ethernet address, the table is searched for a matching IP address. The following is a simplified ARP table:
 

------------------------------------
|IP address          Ethernet address |
------------------------------------
|223.1.2.1      08-00-39-00-2F-C3|
|223.1.2.3      08-00-5A-21-A7-22|
|223.1.2.4      08-00-10-99-AC-54|
------------------------------------
FIGURE 4. Example ARP Table

The ARP table is necessary because the IP address and Ethernet address are selected independently; you can not use an algorithm to translate IP address to Ethernet address. The IP address is selected by the network manager based on the location of the computer on the Internet. When the computer is moved to a different part of an internet, its IP address must be changed. The Ethernet address is selected by the manufacturer based on the Ethernet address space licensed by the manufacturer. When the Ethernet hardware interface board changes, the Ethernet address changes.

2) Typical Translation Scenario

During normal operation a network application, such as TELNET, sends an application message to TCP, then TCP sends the corresponding TCP message to the IP module. The destination IP address is known by the application, the TCP module, and the IP module. At this point the IP packet has been constructed and is ready to be given to the Ethernet driver, but first the destination Ethernet address must be determined.

The ARP table is used to look-up the destination Ethernet address.

3) ARP Request/Response Pair

But how does the ARP table get filled in the first place? The answer is that it is filled automatically by ARP on an "as-needed" basis. Two things happen when the ARP table can not be used to translate an address:

Every computer's Ethernet interface receives the broadcast Ethernet frame. Each Ethernet driver examines the Type field in the Ethernet frame and passes the ARP packet to the ARP module. The ARP request packet says "If your IP address matches this target IP address, then please tell me your Ethernet address". An ARP request packet looks something like this:

--------------------------------------------
|Sender IP Address         223.1.2.1 |
|Sender Enet Address 0800-39-00-2F-C3|
---------------------------------------------
|Target IP Address              223.1.2.2 |
|Target Enet Address              <blank> |
---------------------------------------------
FIGURE 5. Example ARP Request

Each ARP module examines the IP address and if the Target IP address matches its own IP address, it sends a response directly to the source Ethernet address. The ARP response packet says "Yes, that target IP address is mine, let me give you my Ethernet address". An ARP response packet has the sender/target field contents swapped as compared to the request. It looks something like this:
 

-------------------------------------------
|Sender IP Address           223.1.2.2 |
|Sender Enet Address  08-00-28-00-38-A9|
-------------------------------------------
|Target IP Address            223.1.2.1 |
|Target Enet Address   08-00-39-00-2F-C3|
-------------------------------------------
FIGURE 6. Example ARP Response

The response is received by the original sender computer. The Ethernet driver looks at the Type field in the Ethernet frame then passes the ARP packet to the ARP module. The ARP module examines the ARP packet and adds the sender's IP and Ethernet addresses to its ARP table.

The updated table now looks like this:
 

-------------------------------------
|IP addres     Ethernet address |
-------------------------------------
|223.1.2.1     08-00-39-00-2F-C3|
|223.1.2.2     08-00-28-00-38-A9|
|223.1.2.3     08-00-5A-21-A7-22|
|223.1.2.4     08-00-10-99-AC-54|
----------------------------------
FIGURE 7. ARP Table after Response

4) Scenario Continued

The new translation has now been installed automatically in the table, just milli-seconds after it was needed. As you remember from step 2 above, the outgoing IP packet was queued. Next, the IP address to Ethernet address translation is performed by look-up in the ARP table then the Ethernet frame is transmitted on the Ethernet.

Therefore, with the new steps 3, 4, and 5, the scenario for the sender computer is:

  1. An ARP requests packet with a broadcast Ethernet address is sent out on the network to every computer.
  2. The outgoing IP packet is queued.
  3. The ARP response arrives with the IP-to-Ethernet address translation for the ARP table.
  4. For the queued IP packet, the ARP table is used to translate the IP address to the Ethernet address.
  5. The Ethernet frame is transmitted on the Ethernet.
In summary, when the translation is missing from the ARP table, one IP packet is queued. The translation data is quickly filled in with ARP request/response and the queued IP packet is transmitted.

Each computer has a separate ARP table for each of its Ethernet interfaces. If the target computer does not exist, there will be no ARP response and no entry in the ARP table. IP will discard outgoing IP packets sent to that address. The upper layer protocols can't tell the difference between a broken Ethernet and the absence of a computer with the target IP address.

Some implementations of IP and ARP don't queue the IP packet while waiting for the ARP response. Instead the IP packet is discarded and the recovery from the IP packet loss is left to the TCP module or the UDP network application. This recovery is performed by time-out and retransmission. The retransmitted message is successfully sent out onto the network because the first copy of the message has already caused the ARP table to be filled.
 

            5.3 The Transport Layer

The TCP/IP protocol suite comprises two protocols that correspond roughly to the OSI Transport and Session Layers; these protocols are called the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP).

Transmission Control Protocol (TCP) provides a virtual circuit (connection-oriented) communication service across the network. Most of the applications in the TCP/IP suite operate over the reliable transport service provided by TCP. TCP provides a different service than UDP. TCP offers a connection-oriented byte stream, instead of a connectionless datagram delivery service. TCP guarantees delivery, whereas UDP does not.

TCP is used by network applications that require guaranteed delivery and cannot be bothered with doing time-outs and retransmissions. The two most typical network applications that use TCP are File Transfer Protocol (FTP) and the TELNET. Other popular TCP network applications include X-Window System, rcp (remote copy), and the r- series commands. TCP's greater capability is not without cost: it requires more CPU and network bandwidth. The internals of the TCP module are much more complicated than those in a UDP module.

Similar to UDP, network applications connect to TCP ports. Well-defined port numbers are dedicated to specific applications. For instance, the TELNET server uses port number 23. The TELNET client can find the server simply by connecting to port 23 of TCP on the specified computer.

When the application first starts using TCP, the TCP module on the client's computer and the TCP module on the server's computer start communicating with each other. These two end-point TCP modules contain state information that defines a virtual circuit. This circuit consumes resources in both TCP end-points. The virtual circuit is full duplex; data can go in both directions simultaneously. The application writes data to the TCP port, the data traverses the network and is read by the application at the far end.

TCP packetizes the byte stream at will; it does not retain the boundaries between writes. For example, if an application does 5 writes to the TCP port, the application at the far end might do 10 reads to get all the data. Or it might get all the data with a single read. There is no correlation between the number and size of writes at one end to the number and size of reads at the other end.

TCP is a sliding window protocol with time-out and retransmits. Outgoing data must be acknowledged by the far-end TCP. Acknowledgements can be piggybacked on data. Both receiving ends can flow control the far end, thus preventing a buffer overrun.

As with all sliding window protocols, the protocol has a window size. The window size determines the amount of data that can be transmitted before an acknowledgement is required. For TCP, this amount is not a number of TCP segments but a number of bytes.

User Datagram Protocol (UDP) provides an end-to-end datagram (connectionless) service. Some applications, such as those that involve a simple query and response, are better suited to the datagram service of UDP because there is no time lost to virtual circuit establishment and termination. UDP's primary function is to add a port number to the IP address to provide a socket for the application.
 

            5.4 Applications Layer

Most applications are implemented to use only one or the other. You, the programmer, choose the protocol that best meets your needs. If you need a reliable stream delivery service, TCP might be best. If you need a datagram service, UDP might be best. If you need efficiency over long-haul circuits, TCP might be best. If you need efficiency over fast networks with short latency, UDP might be best. If your needs do not fall nicely into these categories, then the "best" choice is unclear. However, applications can make up for deficiencies in the choice. For instance if you choose UDP and you need reliability, then the application must provide reliability. If you choose TCP and you need a record oriented service, then the application must insert markers in the byte stream to delimit records.

The following is a brief description of the applications mentioned

Telnet: Telnet is the way you can access someone else's computer, assuming they have given you permission. (Such a computer is frequently called a host computer.) Telnet is a user command and an underlying TCP/IP protocol for accessing remote computers. Telnet provides a remote login capability on TCP. The operation and appearance is similar to keyboard dialing through a telephone switch. On the command line the user types "telnet delta" and receives a login prompt from the computer called "delta".

Telnet works well; it is an old application and has widespread interoperability. Implementations of Telnet usually work between different operating systems. For instance, a Telnet client may be on VAX/VMS and the server on UNIX System V.

FTP: The File Transfer Protocol (FTP) allows a user to transfer files between local and remote host computers. FTP also uses TCP and has widespread interoperability. The operation and appearance is as if you Telneted to the remote computer. But instead of typing your usual commands, you have to make do with a short list of commands for directory listings and the like. FTP commands allow you to copy files between computers.

Archie: A utility that allows a user to search all registered anonymous FTP sites for files on a specified topic.

Gopher: A tool that allows users to search through data repositories using a menu-driven, hierarchical interface, with links to other sites (RFC 1436).

SMTP: The Simple Mail Transfer Protocol is the standard protocol for the exchange of electronic mail over the Internet. An important feature of SMTP is its capability to relay mail across transport service environments. (RFC 821).

HTTP: The Hypertext Transfer Protocol is the basis for exchange of information over the World Wide Web (WWW). Various versions of HTTP are in use over the Internet, with HTTP version 1.1 is the latest vision. WWW pages are written in the Hypertext Markup Language (HTML), an ASCII-based, platform-independent formatting language (RFC 1866).

Finger: Used to determine the status of other hosts and/or users (RFC 1288).

POP: The Post Office Protocol defines a simple interface between a user's mail reader software and an electronic mail server; the current version is POP3 (RFC 1460).

DNS: The Domain Name System (described in slightly more detail in Section 3.2.2 above) defines the structure of Internet names and their association with IP addresses, as well as the association of mail, name, and other servers with domains.

SNMP: The Simple Network Management Protocol defines procedures and management information databases for managing TCP/IP-based network devices. SNMP (RFC 1157) is widely deployed in local and wide area network. SNMP Version 2 (SNMPv2, RFC 1441) adds security mechanisms that are missing in SNMP, but is also very complex; widespread use of SNMPv2 has yet to be seen.

Ping: A utility that allows a user at one system to determine the status of other hosts and the latency in getting a message to that host. Uses ICMP Echo messages.

Whois/NICNAME: Utilities that search databases for information about Internet domain and domain contact information (RFC 954).

Traceroute: A tool that displays the route that packets will take when traveling to a remote host.
 

6. Summary

TCP/IP is not merely a pair of communication protocols but is a suite of protocols, applications, and utilities. Increasingly, these protocols are referred to as the Internet Protocol Suite.

                 ----------------                                                                      ----------------
                 | Application  |<------ end-to-end connection ------>|  Application  |
                 |--------------|                                                                           |--------------|
                 |    TCP       |    <------------virtual circuit ------------>       |       TCP       |
                 |--------------|                     --------------------------                   |--------------|
                 |     IP          |       <-- DG -->|           IP            |<-- DG -->      |       IP       |
                 |--------------|                          |-------+-------|                           |--------------|
                 | Subnetwork 1 |<------>  |Subnet1|Subnet2|  <------>| Subnetwork 2 |
                 ----------------                        --------+--------                        ----------------
                     HOST                               GATEWAY                                HOST
 
                                         FIGURE 8. TCP/IP protocol suite architecture.

Figure 8 shows the relationship between the various protocol layers of TCP/IP. Applications and utilities reside in host, or end-communicating, systems. TCP provides a reliable, virtual circuit connection between the two hosts. (UDP provides an end-to-end datagram connection at this layer.) IP provides a datagram (DG) transport service over any intervening subnetworks, including local and wide area networks. The underlying subnetwork may employ nearly any common local or wide area network technology.

Understanding TCP/IP is very important for having knowledge about Internet and Telecommunication nowadays. Before designing networks, make sure you understand and follow the design limitations for each media type you use.
 

7. Acronyms and Abbreviations

ARP                     Address Resolution Protocol

ARPANET           Advanced Research Projects Agency Network

ASCII                  American Standard Code for Information Interchange

ATM                    Asynchronous Transfer Mode

BSD                      Berkeley Software Development

CCITT                  International Telegraph and Telephone Consultative Committee

CIX                       Commercial Internet Exchange

DARPA                 Defense Advanced Research Projects Agency

DNS                      Domain Name System

DoD                       U.S.Department of Defense

FAQ                       Frequently Asked Questions lists

FDDI                      Fiber Distributed Data Interface

FTP                         File Transfer Protocol

FYI                         For Your Information series of RFCs

GOSIP                    U.S. Government Open Systems Interconnection Profile

HTML                     Hypertext Markup Language

HTTP                      Hypertext Transfer Protocol

IAB                          Internet Activities Board

IANA                       Internet Assigned Numbers Authority

ICMP                       Internet Control Message Protocol

IESG                        Internet Engineering Steering Group

IETF                         Internet Engineering Task Force

IP                              Internet Protocol

ISO                           International Organization for Standardization

ISOC                         Internet Society

ITU-T                        International Telecommunication Union Telecommunication Standardization Sector

MAC                         Medium (or media) access control

Mbps                         Megabits (millions of bits) per second

NICNAME                Network Information Center name service

NSF                           National Science Foundation

NSFNET                    National Science Foundation Network

OSI                            Open Systems Interconnection

OSPF                         Open Shortest Path First

PPP                             Point-to-Point Protocol

RARP                          Reverse Address Resolution Protocol

RIP                              Routing Information Protocol

RFC                             Request For Comments

SLIP                             Serial Line IP

SMDS                          Switched Multimegabit Data Service

SMTP                           Simple Mail Transfer Protocol

SNMP                           Simple Network Management Protocol

STD                               Internet Standards series of RFCs

TCP                               Transmission Control Protocol

TLD                               Top-level domain

UDP                               User Datagram Protocol
 

Reference

Anderson A. (7 March 1996) "Introduction to TCP/IP-Networks." Available: http://linuxwww.db.erau.edu/NAG/node8.html#SECTION00331000000000000000

Bates, R. & Gregory, D. (1998) Voice and Data Communications Handbook. New York: McGraw-Hill.

Conlon, M. (18 September 1995) "TCP/IP Networking Applications and Network Administration." Available: http://www.clas.ufl.edu/docs/tcp/

Derfler, Frank (1998) Using Network. Indiana: Que.

Gilbert, H. (2 February 1995) "Introduction to TCP/IP." Available: http://pclt.cis.yale.edu/pclt/COMM/TCPIP.HTM

Gonzalez, S. Jennifer, Ph.D. (1998) The 21st Century Intranet. New Jersey: Prentice Hall, Inc.

Hedrick, C. (20 January 1994) "Introduction to the Internet Protocols." Available: http://oac3.hsc.uth.tmc.edu/staff/snewton/tcp-tutorial/

Held, Gilbert (1996) Understanding Data Communication. Indiana: Sams Publishing.

Kessler, G. (27 October 1998) "An Overview of TCP/IP Protocols and the Internet." Available: http://www.hill.com/TechLibrary/index.htm

Marshall, R. (23 May 1997) "An Introduction to IP." Available: http://www.its.unimelb.edu.au/tech-support/public/ip.html

Oliver, M. (2 October 1998) "TCP/IP Frequently Asked Questions." Available: http://www.dc.net/ilazar/tcpipfaq/default.htm

University of Illinois at Chicago (11 May 1994) "Introduction to TCP/IP TELNET." Available: http://www.uic.edu/depts/adn/infwww/txt/v3111001.txt

University of Maryland Baltimore County (24 September 1995) "Introduction to TCP/IP Networking." Available: http://www.gl.umbc.edu/~jack/ifsm498/tcpip-intro.html