Section notes for CS162 Section #13, April 30, 2002 Barbara Hohlt -------------------------------------------------------------------------- Ethernet and TCP/IP Networking - from Matt Welsh -------------------------------------------------------------------------- * Networking layers: Each part of a network communicates a certain level of abstraction, depending on its needs. For example, a Web browser uses the HTTP protocol to send and receive Web pages, but a packet router manipulates lower-level IP packets. In general higher-level protocols and abstractions are built on top of lower-level protocols. At the lowest level is the physical network interface (such as Ethernet), which provides very few features. Higher-level protocols are built up on top of these low-level mechanisms. Generally, the data for higher-level protocols is *encapsulated* in messages formatter for the lower-level protocols. The term "protocol stack" refers to a set of protocols implemented in terms of each other. For example: Application: Telnet, FTP, HTTP, etc. Transport: TCP, UDP, etc. Network: IP, ICMP, IGMP, etc. Link: Ethernet, PPP, etc. HTTP is encapsulated within TCP, which is in turn encapsulated within IP, which is encapsulated over whatever physical network the messages happen to be travelling over. Generally applications use application-level protocols, and the kernel (and hardware) implement transport, network, and link-level protocols. On most operating systems, apps talk to the network through a SOCKET interface. A socket is a "virtual file" from which bytes can be read and written; on most systems, different types of sockets are used to provide encapsulation over TCP and UDP. As we will see later, TCP is a stream-oriented protocol, providing the communication of arbitrary byte streams between two network hosts. UDP is a datagram-oriented protocol, providing communication of complete packets between applications. * Encapsulation example: For example consider an HTTP message which contains the data for a Web page: ... which is called [PAYLOAD] below. This is encapsulated within TCP, which places a header on the front of the message: [TCP HEADER][PAYLOAD] This in turn is encapsulated within IP: [IP HEADER][TCP HEADER][PAYLOAD] And finally this is transmitted over a local network, such as Ethernet (which places a header and a trailer on the message): [ETHERNET HEADER][IP HEADER][TCP HEADER][PAYLOAD][ETHERNET TRAILER] size: (14 bytes) (20 bytes) (20 bytes) (4 bytes) * Fragmentation: If the resulting packet is too large to send over a given protocol layer, then the packet must be *fragmented* and *reassembled* on the other end of the connection. For example, Ethernet limits the size of each packet to 1500 bytes (plus the 14 byte header and 4-byte trailer, resulting in a max length of 1518 bytes). If the IP layer gets a packet larger than 1500 bytes, it fragments the packet into several smaller packets (labelling each packet so that they may be reassembled on the other end). As we will see, this can lead to performance problems, especially if higher-level protocols are not aware of the fragmentation and reassembly process. * ETHERNET: Original LAN from DEC, Intel, and Xerox (first standard published 1982). Originally 10 Mbit/sec, then "Fast Ethernet" (100 Mbit/sec), and now "Gigabit Ethernet" (1 GBit/sec). Amazingly the standard has changed very little since the original 10 Mbit/sec Ethernet in 1982! Each packet contains: 6-byte destination address 6-byte source address 2-byte type (0x0800 for IP) 46-1500 byte payload 4-byte CRC - a checksum on the entire packet The source and dest addresses are only known on a local Ethernet segment -- in order to talk to machines on other Ethernet segments, a hub or router must be used. The addresses look like 00:00:c0:14:db:bf and are usually hard-wired into the Ethernet interface of the host. (You can identify the vendor that makes the Ethernet interface by looking at the first few bytes of the Ethernet address.) Ethernet was originally very much a *shared medium* network: all nodes would broadcast on the wire and each node would listen for packets with its own local address. Ethernet is a "CSMA/CD" network, meaning Carrier Sense Multiple Access with Collision Detection. This means that nodes can detect when another node is speaking on the network, and packet collisions are detected. When packets collide, each node retransmits after waiting for a small amount of time. Nodes perform EXPONENTIAL BACKOFF: if there are many collisions, they wait for longer and longer periods of time before retransmitting. Of course with this setup it is possible for every node on the Ethernet to listen to every packet on that segment -- this has been the source of real security problems as it is trivial to write a "network sniffer" program to capture packets from a shared Ethernet segment. These days not many people use shared-medium Ethernet; rather, the Ethernet is *switched*. Each node has a (private) line to a local Ethernet switch which performs the task of sending messages between source and destination nodes. In most cases the Ethernet is also *full duplex*, meaning that a host can send and receive data at the same time; for this reason there is never any collision on such a network. With switched Ethernet, sniffing is no longer a problem as well. Nodes discover the Ethernet address of the destination using a simple protocol called ARP (Address Resolution Protocol). In ARP, a node broadcasts a small message (on Ethernet) with an IP address, and requests that someone tell that node what the corresponding Ethernet address is. Hopefully someone on the network knows (of course the destination itself should!) and will respond with an ARP reply. * THE INTERNET PROTOCOL - IP Ethernet is all fine and good, but is really designed for a local area network. If you want to send a message to someone not on your local Ethernet, it might need to traverse numerous different networks (of different kinds) to get there. This is where IP comes into play. Official spec is RFC 791 (from 1981) but the basic mechanisms and ideas in IP go further back, to the original ARPANET. IP provides an *unreliable connectionless datagram service*. What this means is that IP may drop packets (due to network errors, lack of buffer space in routers, etc.) and it is the higher-level protocol's responsibility to detect this and deal with it. By "connectionless" we mean that IP does not "remember" any state about successive messages: it just sends individual messages with no notion of a "connection" between two hosts. Among other things this means that messages can be delivered out of order, since each message is handled completely separately. In fact, two back-to-back messages between the same hosts might take totally different paths through the network! Think of IP as the US Postal Service: All they do is deliver letters between addresses, with no idea how those different letters might be related. They might drop letters at any time (in fact they do), or deliver letters out of order. If IP is the US Postal Service, then Ethernet is like the mail truck that delivers messages between your home and the local post office. (Even better, with Ethernet, messages between nodes on the same segment are delivered directly, without going to the "post office" first.) Each IP packet looks like this: [12 bytes of various header fields] 32-bit source IP address (e.g., 216.102.32.12) 32-bit destination IP address (e.g., 128.32.131.6) [options (if any)] data Most of these things aren't important. The source and destination addresses are just the IP addresses that we all know and love. (Note that these are numerical addresses only -- if you want to use an alphanumeric name like "www.slashdot.org" you have to translate this to a numerical address using a separate protocol called DNS, Domain Name Service). Routing of IP packets is simple: If the destination is on the same local network as the host, then send it directly to the host (i.e. send the message on the appropriate Ethernet segment). Otherwise, send the packet to a ROUTER -- a node on the network which will forward the packet to its destination. In order to know which router to send messages to, hosts use a ROUTING TABLE, which maps destination IP addresses to router IP addresses. Most systems use just one DEFAULT ROUTER -- a node which will handle all IP packets not destined for the local network. Here is a simple example network - hostnames chosen completely at random of course :-) Host nelson: 128.32.131.1 Host david: 128.32.131.2 | | +-----+---------------------+ Local Ethernet (128.32.131.0) | | host matt: 128.32.131.254 (router, has one IP address | 169.229.48.1 for each network segment) | +-----------------------------------------+ Ethernet (169.229.48.0) | | Host mark: 169.229.48.31 Host dennis: 169.229.48.32 nelson and david have a local network address of 128.32.131.0, so they know any packets with that prefix go to the local network. To send a message to any node on the local subnet, they get the corresponding Ethernet address from ARP and just send it. ************************************************************************ ** FOR AFTERNOON SECTION: Use names gifford, vickie, eric, tina, and ** ** amy as the router :-) ** ************************************************************************ matt is a router between subnets 128.32.131.0 and 169.229.48.0. He has two Ethernet interfaces (one per network) -- in general this could be a router between many different networks. When nelson wants to send a packet to mark (169.229.48.31), since nelson's network address does not match mark's destination IP address, nelson has to send the packet to matt. nelson looks up matt's Ethernet address (using ARP) and sends the packet to matt. matt compares the destination address (mark) with its routing table, discovers the packet needs to go on the Ethernet segment for 169.229.48.0, does an ARP request for mark and sends it along. What happens if IP drops a packet, or if packets get sent out of order? Well, it's up to higher-level protocols to discover this and deal with it. What happens if IP has to fragment packets (e.g., because they are larger than the Maximum Transfer Unit (MTU) of the underlying network)? IP has a mechanism where it labels packets allowing them to be fragmented and reassembled on the other end. * TRANSMISSION CONTROL PROTOCOL - TCP IP by itself is not very useful, since it can drop packets, deliver them out of order, and so forth. Most applications want to have the illusion that there is a private, reliable, byte-oriented data pipe that they can communicate over. The TCP protocol provides just this. TCP is defined in RFC 793 as well (1981). TCP provides a connection-oriented service, meaning that two nodes must first perform a HANDSHAKE before they can communicate, and that the protocol maintains state for the connection. Second, TCP provides RELIABILITY through the following: - Messages are broken into segments which are then encapsulated in IP. - Each segment has a checksum associated with it, so that erroneous segments are detected and dropped. - When receiving a segment, a node sends an acknowledgement (ACK) to indicate to the sender that it was received OK. - Nodes perform RETRANSMISSION: If an ACK for a given segment is not received after a certain amount of time, the node sends the segment again. - Since IP can duplicate and reorder packets, the TCP layer must deal with this. - TCP also provides FLOW CONTROL: each end of the connection has a finite amount of buffer space, and a node cannot send more packets to the network if the receiver's buffer space is full. Of course, the details of TCP are very complicated. How often do retransmissions occur? How are connections established? How is segmentation and reassembly performed? What is the flow control mechanism used? In fact the networking community has not really settled on the answers to all of these questions -- every year more papers are published arguing for various changes in the way that TCP works. From a very high level, a TCP segment looks like: 16-bit source port 16-bit destination port 32-bit sequence number 32-bit ack number [various flags] 16-bit window size -- used for flow control [other flags] data Each end of the connection is identified by a 16-bit port number. This allows multiple TCP connections to be taking place at once between two hosts. Note that the TCP header does not include any host addresses -- those are part of the corresponding IP header in which the TCP packet is encapsulated. Each TCP packet contains a sequence number which is used to identify the ordering of packets in the stream (remember that IP can drop and reorder packets). The sequence number indicates the BYTE NUMBER in the stream. If we initialize sequence numbers to 0 when a connection is established, then the sequence number indicates the ordinal number of the first byte of the packet. In addition each packet contains a 32-byte ACK field. This is used to tell the other end of the connection how many bytes have been successfully received. (Actually it indicates the next sequence number that the node expects to receive.) Notice that ACKs can be "piggybacked" onto data flowing between two nodes -- this saves network bandwidth by allowing an ACK to be sent along with some new data, rather than having to send ACKs as separate packets. For example if we have nodes A and B communicating, their packets might look like: A -> B sequence 48, 100 bytes of data B -> A sequence 1006, ack 148, 3 bytes of data A -> B sequence 148, ack 1009, 100 bytes of data B -> A sequence 1009, ack 248, 3 bytes of data Of course, A might send multiple packets to B at a time: A -> B sequence 48, 25 bytes of data A -> B sequence 73, 25 bytes of data A -> B sequence 98, 25 bytes of data A -> B sequence 123, 25 bytes of data Say that the second of these 4 packets is dropped. Then the only ACK that B can send is B -> A sequence 1006, ack 73, 3 bytes of data Both sides maintain a retransmission timer. Eventually A will time out waiting for the ACKs from B for all 4 packets (it wants to see an ACK containing the value 148), and will retransmit the last 3 packets again. This can be wasteful, since A has no idea how much of the data was lost. In the above case, B received 3 out of the 4 packets okay, but A only knows that the first packet was ACKed. What would happen if IP were to fragment a given TCP segment, and then drop one of the fragments? Since IP does not perform any retransmission, eventually A would time out waiting for an ACK for that segment, and retransmit the whole segment again (even though most of it may have been received OK). For this reason most TCP implementations try to keep segment sizes small enough so that they don't need to be fragmented by the IP layer. There are many other interesting (and complex) issues related to how TCP achieves reliability, flow control, and so forth. See any good book on TCP/IP networking for all of the hairy details. I particularly recommend W. Richard Stevens' classic "TCP/IP Illustrated (Volume 1)". HTTP - HYPERTEXT TRANSFER PROTOCOL As an example of an application-level protocol layered over TCP, let's look briefly at HTTP. As you probably know this was originally invented by physicists at CERN Labs who built a simple hypertext-based system for scientists at the lab to share notes and other data. The protocol is beautifully simple, although its wide popularity has led to much scrutiny. HTTP is implemented using TCP. In its original form (HTTP/0.9), each request for an object (i.e. a Web page, an embedded image, etc.) required that the client opened up a new TCP connection to the web server and send a message of the form GET The server would respond with the raw data for the requested object, or an error code if the requested URL was invalid, and then close the connection. This was really easy to implement but led to a number of problems. Many objects on the Web are very small (a few hundred bytes or so), and the overhead to establish a new TCP connection for each object is relatively high. This not only slows down the browser and the web server, but can cause problems for network routers (TCP was not originally designed for such short-lived connections!) In HTTP/1.1, a new mechanism called persistent connections was introduced: a single TCP connection could be used to transfer multiple HTTP requests and responses. Of course, this leads to a new problem: how does a browser know when it is done reading the data for a given response? Originally it could simply wait for the server to close the connection -- but now we have to expand the HTTP protocol a bit to provide some additional information. The HTTP/1.1 request looks like: GET [Headers] \n\n Multiple requests can be issued on a single connection - generally a client waits for a response before sending the next request. Optional header fields are used to provide more information to the server (for example, most browsers use this to tell the server the type of the O/S and browser being used, which is nice for statistics gathering). The HTTP response looks like: 200 OK Content-Length: Content-Type: \n\n [Data] This allows the brower to know how much data to expect in the response; the content type field also tells it what type of object it is (e.g., image, HTML page, etc.). This is useful as the browser can begin to render the object before it has been completely received; for example, it can launch a viewer application while the object is downloading. While HTTP is simple and all of the protocol keywords are English words, it is not necessarily easy to implement. It is difficult to efficiently parse ASCII strings on most systems, and since the protocol is somewhat flexible (i.e. each header line might contain a variable amount of whitespace), implementations have to be prepared for different interpretations of the protocol by different browsers and servers.