Chapter 3: Goals for the Arctic Testing Project

In the remaining parts of this thesis, I will explore the problem of functional testing by describing the functional testing system of the Arctic router chip. Arctic chips will form the fat tree network that will allow the processors in the *T multiprocessor to communicate with each other. In this chapter, I will go over the goals the design team had in mind when designing the testing system for Arctic, but before I begin, it may be helpful to give an overview of Arctic itself.

Figure 3-1: The Arctic Router

Figure 3-1 (above) shows a block diagram of Arctic [5]. Arctic consists of four input ports connected to four output ports by a crossbar, and maintenance interface section through which Arctic is controlled. Message packets enter an input port and can exit out of any output port. Since all ``links'' are bidirectional in an Arctic network, input ports are paired with output ports that are connected to the same device. Packets vary in size and can be buffered upon arrival in each input port. Flow control in an Arctic network is accomplished with a sliding window protocol similar to the one used by TCP/IP. A transmitter (output port) and a receiver (input port) are both initialized with an initial number of buffers, and a receiver notifies the transmitter when a buffer is freed so that the transmitter knows to send packets only when buffers are available.

So that the system can tolerate any clock skew for incoming signals, each input port runs on a different clock which is transmitted with the data. Data has to be synchronized into a local clock domain before it can be sent out. Also, data on the links is transmitted at 100 MHz, though the chip itself operates at 50 MHz, which causes a little more complexity. More sources of complexity are an extensive set of error checking and statistics counting functions, two levels of priority for packets, flow control functions such as block-port and flush-port, a ``mostly-compliant'' JTAG test interface, and manufacturing test rings that are accessible in system (not just during manufacturing tests).

Most of these details about Arctic can be ignored unless the reader wishes to dive into the examples given in Appendix A or the user's manual in Appendix B. The details are mentioned here only to give the reader an impression of Arctic's complexity. Arctic falls into that large category of ASICs that have many complex functions and for which there is no obvious way to design a functional testing system. We chose to begin by implementing a directed testing system, and approached the problem by first drawing up a set of goals as guidelines to help us with our implementation. The sections that follow list each goal we had for our system and explain why we found that goal important.

3.1 General

Because this system was to be the only system completely dedicated to testing Arctic, it seemed necessary to require that the system be general enough to test any of Arctic's functions. This meant that the testing system needed to be capable of putting Arctic in any state, or, stated another way, the testing system needed to be able to send any input signal to Arctic. With this ability, the system was guaranteed to be able to put Arctic in any state. Also, we decided that every output signal of Arctic should be recorded or monitored, so that no important behavior could be missed.

3.2 Easy To Use

Since the original time frame for the testing project was only three months, and since there were only 5 engineers and 4 students working on Arctic during that period, we decided that the testing system needed to be very easy to use, or testing would never get done in time. Three students were to implement the system, and it was to be available to anyone in the design team who had the time or the need to look for bugs. This meant that the intended users were more than just the designers of the testing system, and it would therefore have to be simple and well documented. Also, since the chip was still being developed, we knew that this system might be used as a debugging tool, and as mentioned in Section 1.2, any debugging tool has to be easy to use for it to be effective. When debugging, users need to be able to create intricate tests even if they lack experience with the system.

3.3 Fast

As with ease of use, the system had to be very fast because of the lack of time and human resources. Because Arctic was so complex, behavioral simulations were only running at about two cycles per second. We knew that the testing system would have to play some interesting tricks to boost speed or we would not be able to finish testing in time. Our hope was to make the system capable of running a short test in no more than 5 to 10 minutes, so that the user would not have to wait terribly long for the results of a simulation after a new bug fix.

An additional reason to speed up the system was the group's lack of computing resources. The members of the design team were sharing a fairly small number of workstations. We hoped to keep the load on these machines to a minimum by making simulations take as little time as possible.

3.4 Repeatable

It was also mentioned in Section 1.2 that all tests needed to be repeatable. We hoped to be able to save all the tests we generated so that we would be able to run any of them again as regression tests. Also, it was necessary for any input to be repeatable if the system was to be useful as a debugging tool. This meant that there could be no unpredictable behavior in the system. If any parts were random, the seeds for the random number generators needed to be saved so that the simulation could be run again in exactly the same way.

3.5 Randomizable

Our hope was to build random test generators into this system, but the immediate need was for a general tester and debugging tool. We knew that the random input generators might not be general enough to test any function of the chip, and we knew that debugging is impossible when all inputs are generated randomly, without any user control. We decided to build a directed testing system with the intent of adding some kind of random testing later on, since randomization seemed to be too complex a task for the first pass.

3.6 Low Level

We also saw in Section 1.2 that it is a good idea for a simulation to work with the lowest level specification of a chip design. This idea is presented well in a paper by Douglas Clark [2]. In this paper, Clark argues that all serious implementation and simulation work should be done at the gate level, and designers should not waste time designing and testing high-level models of their chips. His argument is that the latter method requires more design time since designers need to design and test separate high-level simulations in addition to low level designs. He also argues that tests on high level simulations are less accurate, lacking the subtle interactions between gates.

Arctic was being designed with Verilog and compiled to gates with Synopsis, so, in a sense, all simulations were at a very low level. The Verilog model described every interaction between the sub-modules in full detail, and the gate description was generated automatically. We decided to follow Clark's wisdom to the letter and chose to make our system capable of simulating both the pre-compiled Verilog description and the compiled, gate-level description, which could be represented as Verilog code. This, we felt, would be a more rigorous test, and since we hoped to have a working chip in only three months, such a rigorous test was necessary.

In the next chapters we will see that it is nearly impossible to reach all of these goals simultaneously. The desire to make the system general, for example, is almost diametrically opposed to the desire to make it easy to use, because the addition of functions always complicates a system. After taking a close look at the implementation of Arctic's testing system, we will return to this set of goals and evaluate the system's performance with respect to each of them.