Chapter 6: Evaluation of the Arctic Testing Project

Before we can begin to consider what wisdom Arctic's testing system has taught about the functional testing problem, we need to evaluate how well it has met each of the goals laid out in Chapter 3. This functional testing system has been up and running in some form since the summer of 1993, and in that time I have had many chances to evaluate its performance relative to those stated goals. In this chapter, I will list those goals and comment on how well the system meets them.

6.1 General

One promise this system does indeed deliver is generality. It was never the case that the system could not generate an input pattern that was desired, nor was it the case that it could not detect some erroneous output pattern.

Some prime examples of the value of this generality, as seen in the Appendix Section A.3, are the error detection test groups. These test groups needed to create some very unusual input patterns in order to determine if the error detection functions were working correctly. Without the ability to put any block of Verilog code in a test group, these tests could never have been integrated seamlessly with the rest of the system. Many subtle bugs were uncovered by the test groups that focused on errors.

Other good examples of tests that rely on the generality of this system are the JTAG tests, one of which is described in Appendix Section A.4. These tests did not integrate quite as seamlessly into the system as did the errors test groups, since special functions needed to be written for them, but they were relatively easy to write because of the structured framework they fit into. The JTAG tests uncovered several problems related to the stitching of the scan rings.

The random tester appears to be very general as well. Considerable effort has gone in to making it capable of generating a wide variety of inputs. Also, it is possible to determine types of bugs that the random tester has no chance of detecting, and focus directed testing on cases that have a good chance of detecting those bugs. This seems to be a very general, comprehensive approach to random testing.

6.2 Fast

Because of our efforts to give many procedures the ability to run in quick mode, this system did end up being fast enough to use as a debugging tool. By speeding up only two functions, configuration and writes to the Arctic control register, the running time of a simple test could be reduced from 45 minutes to about five minutes. We came to rely on this quick mode very heavily, but it was always easy to turn off when it broke because of the SPEED variable mentioned in Section 4.3.

I used this quick mode quite heavily whenever I found myself debugging portions of the chip that I did not design. By placing signal probes and re-simulating I was able to track down several typographical errors without any help from the implementors themselves. The time to re-simulate after changing probes was short enough to make this kind of debugging possible.

6.3 Randomizable

Our intention was to extend the original system to run random tests and search for bugs without our intervention. In actuality, the second system is not connected to the first, but since the basic structure of the first system was borrowed to build the randomized system, I can claim that this goal was met in some way. The true goal, of course, was to build a random testing system that could be relied on to search for subtle bugs. Since the system has not been completed, I cannot claim that the system met that goal. It should certainly be clear from Chapter 5, however, that the system should, in its final form, be able to generate quite a wide variety of tests.

6.4 Repeatable

The original functional testing system was obviously repeatable, and that made it very useful as a debugging tool. The fact that the randomized functional tester was able to restart any generated test group without re-running all previous groups was also very handy. In its half-completed form, the random tester only found one bug, but that bug would have been very troublesome to track down without this ability. The error was caused by a pattern that was generated after several hours of simulating, but because we were able to restart the test group, it took us only about 10 minutes to reproduce the error.

6.5 Low Level

One of the more obscure requirements we made of this system was that it should be able to simulate Arctic at the lowest possible level. Most of the time, functional testing was performed on the original Verilog model of Arctic. However, the system is capable of running on a Verilog gate-level description of Arctic that is generated by Synopsis. This model can also be back-annotated with timing data, so that the simulation can come very close to the actual behavior of the fabricated chip. Unfortunately, this part of the system is not working either, due to some problems which we believe are related to the crossing of clock boundaries, but this may be corrected soon.

6.6 Easy to Use

At the end of Chapter 3, I stated that some of these goals were almost in direct opposition to each other, but in all of the preceding sections, Arctic's testing system seems to have met its goals well. Unfortunately, many of these goals are met at the expense of this final goal, ease of use. In nearly all places where a tradeoff was made between ease of use and some other quality, ease of use was sacrificed. This is especially true of generality. The confusing file structure and packet management system of the original system is all done in the name of generality.

This does not mean that the system is impossible to use. Great effort was put into providing simple ways to specify common inputs, and generating a specific input pattern and checking for a specific output pattern was generally very easy to do, no matter how unusual the pattern was. For example, it would take about 30 minutes for someone familiar with the system to write a test group that disables output port 2 and sends, 3 cycles after disabling, a packet destined for port 2 with the value 057b3aff in the 14th word of its payload into port 1. This could be very useful if such an input pattern was believed to reveal a bug. Our hope was to make specification of such tests easy, and the system succeeds in this goal.

The main problem with this system is that normal cases are not much easier to specify than unusual cases, despite our efforts to simplify these cases. If the user wanted a test group that did something as simple as sending 100 different packets through the system each 10 cycles apart, this could take hours to create because the user has to specify every bit of every packet and define exactly what time each of these packets should be sent into the system, as well as exactly which output port each should emerge from. In most test groups, the user needs to send a large number of packets, but the system gives the user such precise control over these packets that it is impossible to send a packet unless it is described in every detail. Some solution to this problem was needed. We created some packet generation programs, but these were rather awkward and often generated an unmanageable number of packets. Perhaps this problem could have been solved by putting random packet generators in each input port's stub. Test groups could specify a few parameters for these random packets and let the system worry about generating them. Tests of this kind were called ``bashers'' by the team working on the Alewife cache controller, and apparently uncovered many bugs [7].

Another reason designing test groups was so difficult was that the structure of the files used in these test groups was very unintuitive. Remember that Verilog can only read in files with binary or hexadecimal data. That meant that every file that specified some detail of a test had to be encoded, adding an extra layer of confusion to the testing process. Imagine, for example, that the user wants to send packet 5 into port 1 at time 0. To do this, the user has to create a packet insertion file that looks like this.

00000000000000001
00000000000000052
Since it would take an inhuman memory to recall the exact position of each of these bits, the user is forced to call up the testing system user's manual when this file is created, or whenever it needs to be modified. This slows the user down, and makes life very confusing. If the file instead contained the line ``Insert packet 5 into port 1 at time 0,'' the system might have been much easier to use. It would not have been easy to add this ability to the system, but it is possible it could have been supported as a pre-simulation step that would translate more easily understood descriptions like the one above into hexadecimal numbers before the simulation. A simpler and perhaps just as effective approach would be to make all test files a kind of table that the user would fill out. The pre-simulation step could then remove everything but the numbers, which would be input by the Verilog simulation.

It seems, then, that the only big problem with this functional testing system is its complexity. Indeed, many members of the team never learned to use the system at all because it takes so long to learn. Even those who do understand it avoid using it regularly.

Still, this system has been successfully used to carry out extensive testing of the Arctic chip. Countless bugs have been uncovered through directed testing with the original system. Many examples of these tests can be found in Appendix A. The random tester has found only one bug, but has the potential to find many more if it is completed. Also, beyond these practical concerns, this system has been a wonderful test bed for many ideas related to functional testing.