Optical
Proximity Correction (OPC)
CS267
HW0, 1/28/04
Frank
Gennari
Reasons
for Choosing this Topic
I am a PhD student in Prof. Neureuther’s TCAD lithography group. I have never written or used a parallel processor application, and my research group’s TEMPEST simulator was discussed by Mike Lam in CS267 HW0 last year. My research involves a pattern matching system for locating groups of polygons in an integrated circuit mask layout that resemble the shapes of bitmap images representing the worst case with respect to particular process effects. I plan to eventually parallelize this software, but I haven’t yet so I can’t discuss it at this time. Optical Proximity Correction (OPC) is one parallel application that I’m familiar with that has a similar structure to the pattern matching system, so I will discuss OPC here.
What
is OPC?
OPC is a step in the
manufacturing process that semiconductor manufactures employ to improve the
quality of high-performance integrated circuit designs such as microprocessors.
The overall lithography process involves projecting a circuit design from a
mask, through a complex lens system that shrinks the image, and onto a wafer
that will later be divided into individual chips. These circuits contain tiny
metal and polysilicon lines on the order of 100nm in width, in some cases
smaller than the wavelength of the light used to print them.
Several problems arise from the
small size of these features and the finite size and inherent limitations of the
imaging system. First, the high frequency components required to reproduce the
sharp edges in polygon features may fall outside the lens. Secondly, stray light
entering the opening from one shape may find its way into another shape in close
proximity, leading to a complex interaction of the electric fields of adjacent
polygons. Thus the final shapes will have rounded corners and may bulge towards
adjacent shapes, possibly shorting together and rendering the chip defective if
the situation is bad enough.
Optical Proximity Correction is the process of modifying the polygons that are drawn by the designers to compensate for the non-ideal properties of the lithography process. Given the shapes desired on the wafer, the mask is modified to improve the reproduction of the critical geometry. This is done by dividing polygon edges into small segments and moving the segments around, and by adding additional small polygons to strategic locations in the layout. The addition of OPC features to the mask layout allows for tighter design rules and significantly improves process reliability and yield. The following figure demonstrates the use of and results of OPC (taken from [1]).

Here is another figure showing the results of applying OPC (opt) to a simple mask layout to reduce corner rounding (taken from [1]).

OPC
Algorithms
OPC is often run on the entire
chip at once. There are many different types of OPC algorithms, the two main
classifications being rule-based and model-based. Each involves subdividing
polygons into smaller shapes or edge segments (fragmentation), moving or adding
to the shapes, performing a fast simulation to determine if the new locations
are better, moving them somewhere else, and iteratively repeating this process.
Rule-based OPC is simpler in that various geometries are treated by different
rules. Model-based OPC is more complex and involves simulation of various
process effects, which may be accomplished by computing a weighted sum of
pre-simulated results for simple edges and corners that are stored in a library.
Managing the large geometry database is CPU intensive, and the simulations
involved in model-based OPC are even more CPU intensive since there is no closed
form solution for the optimal layout. Nick Cobb describes a high performance OPC
algorithm in his 1998 PhD thesis [1].
The following figure gives a general overview of an OPC algorithm (taken from [1]).
Problem
Size
The OPC problem is unique in its
complexity. Most algorithms become easier to perform each year as CPU speed is
constantly increasing. However, OPC involves using today’s processors to
design tomorrow’s processors, which means that the problem size scales with
the speed to today’s processors. In fact, the problem complexity may scale
more quickly than the current CPU speeds are increasing because of additional
factors such as an increasing number of mask (metal) layers.
Today’s integrated circuits
typically contain eight metal layers and two poly layers, which translates into
several dozen mask layers. A modern design with 50 million transistors can
contain more than a billion shapes. GDSII layout files can reach sizes of
50-100GB after OPC features are added and the geometry is flattened. Since the
OPC problem involves storing a dynamic polygon database, tens of GBs of memory
are likely needed for good performance. These figures roughly double with each
technology generation.
Parallel
Hardware
Exact runtimes are difficult to
obtain from industry, but one company in the semiconductor industry claimed that
a typical ten-iteration OPC run takes on the order of 30 hours on a 64
processor, 500MHz Alpha system [2]. With good efficiency and processor
utilization, this translates into 1920 CPU hours, or over two and a half months
of CPU usage. I assume this was a shared memory system since it was referred to
as a “64 processor machine” but am unsure of the specifics. This machine is
probably not fast enough for the top 500 list.
How parallelizable is the OPC algorithm? Since there are several variations of OPC algorithms, this is difficult to say. One obvious way to run OPC on N processors is to subdivide the layout into N regions, and process one region on each processor. However, the non-uniform distribution of geometry over the layout area (high density for cache and custom logic, low density for pins and large signal drivers, etc.) and the iterative nature of OPC will likely lead to problems in load balancing. Perhaps a better strategy is to divide the layout into >> N regions and initially assign the first N regions to the N processors. When a processor is finished with one region, it is assigned a new region until all regions have been exhausted. The actual method of parallelization is likely confidential information held by the EDA tool vendors.
The
Need for Parallel Algorithms
Time to market is important to
the success of semiconductor manufacturers. In the above section it was noted
that running OPC on a single processor system could take more than two months
per design iteration. This is clearly not feasible for a company with a tight
design and production schedule, as the OPC time would dominate the later stages
of design/early stages of production. Therefore, the use of parallel processing
is mandatory for cutting-edge semiconductor makers.
References
[1] Nick Cobb, “Fast Optical and Process Proximity Correction Algorithms
for Integrated Circuit Manufacturing,” PhD Thesis, University of California,
Berkeley, 1998.
[2] Private conversation with an engineer from the semiconductor
industry.
Links
http://www.sematech.org/resources/litho/meetings/ngl/20010806/Poster30%20Sigma-C.pdf
http://portal.acm.org/citation.cfm?id=378332&jmp=indexterms&dl=portal&dl=ACM
Note: After a while of Google searching I found a
great PPT presentation that included typical OPC runtimes in days, but IE
crashed when I pressed the back button, before I could bookmark the page, and I
can’t remember the search terms I used to find it.