SimCluster
• Distributed
Parallel Simulation
Today,
we are already in an electronic era. Look around, you can see electronic
devices everywhere. Computers, cell phones, HDTVs, DVD players, mp3 players ¡
The spirits of all these devices are integrated circuit (IC) chips. As you can
imagine, to design such chips with complex functionalities is by no means an
easy job. It involves a whole flow of design stages. For different application
domains (e.g. microprocessor, DSP, ASIC), there exists different design flows.
No matter which design flow a designer is going to use, there must be a stage
called function verification. The prevailing way to do this is by simulation.
The biggest challenge for simulation is the large design size.
Multi-million-gate designs are very common nowadays. However, over the last
decade standalone simulator companies have not kept up with the rising
functional complexity of today¡¯s ICs. The functional verification gap has
widened to now critical proportions.
Avery Design Systems innovates
SimCluster, a distributed parallel simulator, which unleashes the power of
parallel computing and delivers scalable simulation performance.
¡¤
SimCluster¡¯s parallel simulation runs on computer clusters as well as
symmetric multiprocessors from 2 to 10s of processors. Now, designers can
utilize low-cost PC cluster solutions to tackle the largest simulations
configurations. SimCluster is developped with C/C++. This makes it easy to
support popular Verilog and VHDL simulators through their programming language
interfaces. It also supports hardware accelerators and emulators.

¡¤
SimCluster delivers scalable simulation performance through the parallel
simulation of the sub-modules (partitions) of the design. Today¡¯s IC systems
can often be partitioned at functional subsystem boundaries for distributed
simulation purposes. Multiple configurations can be designated and selected at
runtime based on the number of computing resources available. Benchmarks have
shown a 5X speedup using just 6 processors.

Simulation
of another real application, PhotonEx Px-UltraTM 4T Optical Transport System,
shows 183% speedup with just 3 processors.

¡¤
In the above table, the CPU% shows that only 2/3 peak computing power is
used. This is because of the sequential operatoin in the system and
synchronization overhead among processors. In general, by tuning or
partitioning better the design, it is possible to increase the CPU usage.
SimCluster supports different levels of simulation synchronization accuracy to
tune runtime performance, including delta-cycle, clock cycle and transaction
level synchronization. The higher level, the less synchronization overhead and
less simulation accuracy. In all cases, simulation is deterministic.
¡¤
Conclusion: Overall, SimCluster is very
successful in using cheap hardware to boost simulation performance. However,
there is one challenge in partitioning the system into several sub-systems.
Though the natural boundaries work to some extent, it is rarely the case that
the work load are balanced across sub-systems. This requires good design
experiences and a lot of manual tuning, which should ideally be resolved by EDA
tools not designers.
Reference:
1.
EEDesign: SimCluster
runs single simulation on multiple CPUs
3.
Bruce Caldwell, Chris Browy, Using Distributed
Simulation to Speedup Verification of a Next Generation Optical Transport
System, HDL Conference 2002