SimCluster Distributed Parallel Simulation

 

Today, we are already in an electronic era. Look around, you can see electronic devices everywhere. Computers, cell phones, HDTVs, DVD players, mp3 players ¡­ The spirits of all these devices are integrated circuit (IC) chips. As you can imagine, to design such chips with complex functionalities is by no means an easy job. It involves a whole flow of design stages. For different application domains (e.g. microprocessor, DSP, ASIC), there exists different design flows. No matter which design flow a designer is going to use, there must be a stage called function verification. The prevailing way to do this is by simulation. The biggest challenge for simulation is the large design size. Multi-million-gate designs are very common nowadays. However, over the last decade standalone simulator companies have not kept up with the rising functional complexity of today¡¯s ICs. The functional verification gap has widened to now critical proportions.

 

Avery Design Systems innovates SimCluster, a distributed parallel simulator, which unleashes the power of parallel computing and delivers scalable simulation performance.

¡¤       SimCluster¡¯s parallel simulation runs on computer clusters as well as symmetric multiprocessors from 2 to 10s of processors. Now, designers can utilize low-cost PC cluster solutions to tackle the largest simulations configurations. SimCluster is developped with C/C++. This makes it easy to support popular Verilog and VHDL simulators through their programming language interfaces. It also supports hardware accelerators and emulators.

¡¤       SimCluster delivers scalable simulation performance through the parallel simulation of the sub-modules (partitions) of the design. Today¡¯s IC systems can often be partitioned at functional subsystem boundaries for distributed simulation purposes. Multiple configurations can be designated and selected at runtime based on the number of computing resources available. Benchmarks have shown a 5X speedup using just 6 processors.

Simulation of another real application, PhotonEx Px-UltraTM 4T Optical Transport System, shows 183% speedup with just 3 processors.

¡¤       In the above table, the CPU% shows that only 2/3 peak computing power is used. This is because of the sequential operatoin in the system and synchronization overhead among processors. In general, by tuning or partitioning better the design, it is possible to increase the CPU usage. SimCluster supports different levels of simulation synchronization accuracy to tune runtime performance, including delta-cycle, clock cycle and transaction level synchronization. The higher level, the less synchronization overhead and less simulation accuracy. In all cases, simulation is deterministic.

¡¤       Conclusion: Overall, SimCluster is very successful in using cheap hardware to boost simulation performance. However, there is one challenge in partitioning the system into several sub-systems. Though the natural boundaries work to some extent, it is rarely the case that the work load are balanced across sub-systems. This requires good design experiences and a lot of manual tuning, which should ideally be resolved by EDA tools not designers.

Reference:

1.     EEDesign: SimCluster runs single simulation on multiple CPUs

2.     SimCluster Data Sheet

3.     Bruce Caldwell, Chris Browy, Using Distributed Simulation to Speedup Verification of a Next Generation Optical Transport System, HDL Conference 2002

4.     SimCluster Demo