CS 261 Homework 3

Instructions

This problem set is due Tuesday, November 20th.

You may work together and discuss the questions on this homework with others, but the writeup you turn in must be your own, and you should list anyone who you collaborated with. You may use any source you like (including other papers or textbooks), but if you use any source not discussed in class, you must cite it.

Question 1

An authenticated set is a signed data structure that enables set-membership queries to be answered with a proof that the answer is correct. In more detail, we assume that the creator of the set has a public/private keypair (pk,k), and that everyone knows the public key pk. There are two operations:

create(S,k) accepts a set S and a signing key k, and produces auxiliary data v (used below).
query(x,v) accepts a value x and tests whether x ∈ S. If x ∈ S, query(x,v) outputs (true, w), where w is a witness that anyone with knowledge of the public key pk can use to convince themselves that x is indeed an element of S. If x ∉ S, query(x,v) just outputs false.

The idea is that this represents a set that has been signed, but so that there is an efficient way to answer membership queries and prove the answer correct. Notice that there are no insert() or delete() operations; once the set is created, it is frozen. We impose several security and performance requirements:

Publishing the auxiliary data v must not reveal about the signing key k or endanger the security of the scheme. This is a nice property, because it means that the set can be signed in advance and the auxiliary data given to an untrusted third party, who can answer membership queries even if no one trusts them.
The auxiliary data v should be not too large. In particular, its size should be at most O(n), where n represents the size of S. The witness w should be much smaller: say, of size O(1) or O(log n) or so.
Anyone who does not know the signing key k should be unable to come up with a value x ∉ S and a fake witness w' so that (true, w') would be accepted by a recipient as a valid response to query(x,v).

It is easy to design a simple authenticated set data structure: if S={x₁,..,x_n}, define create(S,k)=v=(v₁,..,v_n), where v_i = sign(k,x_i); then we can define query(x_i,v) = (true, v_i). The recipient of a response (true, w) to query(x,v) can check the validity of this response by using the public key pk to verify that w is a valid signature on x.

One shortcoming of the above framework is that negative responses to query() are unauthenticated. If an untrusted third party is answering membership queries, then they can freely lie and claim that the value x is not in S. This is a point of vulnerability.

Your job is to devise an authenticated set data structure so that both positive and negative responses to query() are authenticated. In particular, the above definition is modified so that if x ∉ S, then query(x,v) should return (false, w), where w is a witness that anyone with knowledge of the public key pk can use to convince themselves that x is indeed not an element of S. Everything else is as above. Your solution should satisfy all of the above security and performance requirements, as well as the following additional requirement:

Anyone who does not know the signing key k should be unable to come up with a value x ∈ S and a fake witness w' so that (false, w') would be accepted by a recipient as a valid response to query(x,v).

You should provide a definition of the create() and query() algorithms, as well as specify how a recipient of an answer from query() can verify its validity.

Motivation for the curious: This question is inspired by the DNS security scheme sketched in lecture. As I described it in lecture, each domain has a public key and an authenticated set of DNS records for that domain. The administrator creates and signs the set in advance using the create() operation, and then the DNS server responds to queries using the query() operation. Note that the private key can be stored on a computer that is never directly connected to the Internet, and doesn't need to be known to the DNS server, which is a nice property. Also DNS clients can verify the validity of these DNS records, even if they have passed through several other caching DNS servers on their way to the end client. However, the scheme I described in lecture has the shortcoming that it leaves negative responses unauthenticated: if you ask a caching DNS server whether there is any record for foo.bar.com, it can lie to you and claim that there is no such record, when in fact such a record does exist. This is bad, and can open up subtle and nasty security vulnerabilities if the end host has a DNS search path that spans multiple organizations. In Question 1, you'll invent a data structure that could be used to efficiently authenticate both positive and negative responses to DNS queries.

Question 2

This question asks you to explore some of the consequences of active networks, where packets can contain mobile code that is executed by the routers along the path.

For concreteness, we can think of 'adaptive routing' as a sample application: if your TCP connection to France is too slow because of poor bandwidth on the transatlantic link and for some reason you happen to know that there is a much faster route to France via China, you might wish to adaptively update the route your TCP packets take. In this case, you would "push" some mobile code into each router along the way; the mobile code would run at each router before the packet is forwarded and select which interface to send it out over.

We describe below a series of extensions to the IP protocol suite which allows for progressively more sophisticated active networks applications. For each of the four extensions below, list the security threats that might arise for that extension and how they could be addressed. The purpose of this question is to study issues that are inherent in the functionality; you may ignore the risk of implementation bugs such as buffer overruns.

In the simplest variant, we'd extend the IP packet format to allow an optional extra header which contains some mobile code to run at each router. The mobile code is specified in the BPF (Berkeley Packet Filter) bytecode language. Each router which receives such a packet first verifies that the bytecode contains no backwards jumps, and then interprets the bytecode. The only memory locations the bytecodes are allowed to read are (1) the packet itself, and (2) a global list of interfaces available at the router. (Each interface in the list is annotated with a little bit of relevant information that can be read by the handler, such as the IP address of the next hop along that interface. No writes to memory are allowed.) There are no function calls, computed gotos, exceptions, or other forms of indirect control flow. Just before exiting, the bytecode should store the name of the desired outbound interface in a fixed register, and the router will forward the packet out via that interface on towards its destination.
One obvious performance issue with the previous scheme is that it requires an overhead of potentially hundreds of bytes of code in every packet. So we introduce the notion of "flows" to amortize the cost of specifying the mobile code. Each packet is associated with a flow. In TCP, the flow ID might be the (src host, dst host, src port, dst port) tuple. For other protocols, we might simply extend the packet format to allow for a 32-bit flow ID. We add a "set handler" IP option which allows endpoints to specify a single chunk of mobile code which will be run at the router every time a packet is received on the same flow. Thus one endpoint can send a packet with the "set handler" IP option and containing a lengthy chunk of mobile code; that mobile code will then be applied to all subsequent packets on that flow, and does not need to be sent again. This allows us to specify a chunk of mobile code once; then all subsequent packets in the flow will inherit the same code without incurring any bandwidth overhead.
It occurs to us that we might like to allow the mobile code to make routing policy decisions based on the payload of the packets, or even to compress packets for us on the fly when bandwidth is scarce. Since this might require scanning the entire packet and possibly interpreting higher-level protocols, we will need to be able to write loops in bytecode. Therefore, we eliminate the restriction on backwards jumps, and allow arbitrary control flow in the bytecode. To implement compression, the handler will need to be able to modify the contents of the packet. Therefore, we also relax our security policy so that handlers are allowed both read and write access to the packet itself. If the handler modifies the packet during execution, the router will forward the modified packet instead of the original contents. Also, we allow handlers to maintain state across packet reception events. Thus, when a new flow is created, we set aside a chunk of memory for use by that flow's handler; the handler is allowed read and write access only its own chunk of memory.
An astute reader points out that decompression may increase the size of a packet. If this exceeds the network's MTU, our decompression handler may need to send multiple packets. Therefore, we extend the scheme so that handlers can construct whole IP packets in their own memory space and invoke a special operation to send that packet over the wire.

Clarification (added 11/15): Don't forget: In each part, you should list security threats, and also propose a way that those threats could be addressed (e.g., propose a fix).