bio

Optical Proximity Correction (OPC)

CS267 HW0, 1/28/04

Frank Gennari

 

Reasons for Choosing this Topic

I am a PhD student in Prof. Neureuther’s TCAD lithography group. I have never written or used a parallel processor application, and my research group’s TEMPEST simulator was discussed by Mike Lam in CS267 HW0 last year. My research involves a pattern matching system for locating groups of polygons in an integrated circuit mask layout that resemble the shapes of bitmap images representing the worst case with respect to particular process effects. I plan to eventually parallelize this software, but I haven’t yet so I can’t discuss it at this time. Optical Proximity Correction (OPC) is one parallel application that I’m familiar with that has a similar structure to the pattern matching system, so I will discuss OPC here.

What is OPC?

OPC is a step in the manufacturing process that semiconductor manufactures employ to improve the quality of high-performance integrated circuit designs such as microprocessors. The overall lithography process involves projecting a circuit design from a mask, through a complex lens system that shrinks the image, and onto a wafer that will later be divided into individual chips. These circuits contain tiny metal and polysilicon lines on the order of 100nm in width, in some cases smaller than the wavelength of the light used to print them.

Several problems arise from the small size of these features and the finite size and inherent limitations of the imaging system. First, the high frequency components required to reproduce the sharp edges in polygon features may fall outside the lens. Secondly, stray light entering the opening from one shape may find its way into another shape in close proximity, leading to a complex interaction of the electric fields of adjacent polygons. Thus the final shapes will have rounded corners and may bulge towards adjacent shapes, possibly shorting together and rendering the chip defective if the situation is bad enough.

Optical Proximity Correction is the process of modifying the polygons that are drawn by the designers to compensate for the non-ideal properties of the lithography process. Given the shapes desired on the wafer, the mask is modified to improve the reproduction of the critical geometry. This is done by dividing polygon edges into small segments and moving the segments around, and by adding additional small polygons to strategic locations in the layout. The addition of OPC features to the mask layout allows for tighter design rules and significantly improves process reliability and yield. The following figure demonstrates the use of and results of OPC (taken from [1]).

Here is another figure showing the results of applying OPC (opt) to a simple mask layout to reduce corner rounding (taken from [1]).

 

 

OPC Algorithms

OPC is often run on the entire chip at once. There are many different types of OPC algorithms, the two main classifications being rule-based and model-based. Each involves subdividing polygons into smaller shapes or edge segments (fragmentation), moving or adding to the shapes, performing a fast simulation to determine if the new locations are better, moving them somewhere else, and iteratively repeating this process. Rule-based OPC is simpler in that various geometries are treated by different rules. Model-based OPC is more complex and involves simulation of various process effects, which may be accomplished by computing a weighted sum of pre-simulated results for simple edges and corners that are stored in a library. Managing the large geometry database is CPU intensive, and the simulations involved in model-based OPC are even more CPU intensive since there is no closed form solution for the optimal layout. Nick Cobb describes a high performance OPC algorithm in his 1998 PhD thesis [1].

The following figure gives a general overview of an OPC algorithm (taken from [1]).

 

Problem Size

The OPC problem is unique in its complexity. Most algorithms become easier to perform each year as CPU speed is constantly increasing. However, OPC involves using today’s processors to design tomorrow’s processors, which means that the problem size scales with the speed to today’s processors. In fact, the problem complexity may scale more quickly than the current CPU speeds are increasing because of additional factors such as an increasing number of mask (metal) layers.

Today’s integrated circuits typically contain eight metal layers and two poly layers, which translates into several dozen mask layers. A modern design with 50 million transistors can contain more than a billion shapes. GDSII layout files can reach sizes of 50-100GB after OPC features are added and the geometry is flattened. Since the OPC problem involves storing a dynamic polygon database, tens of GBs of memory are likely needed for good performance. These figures roughly double with each technology generation.

 

Parallel Hardware

Exact runtimes are difficult to obtain from industry, but one company in the semiconductor industry claimed that a typical ten-iteration OPC run takes on the order of 30 hours on a 64 processor, 500MHz Alpha system [2]. With good efficiency and processor utilization, this translates into 1920 CPU hours, or over two and a half months of CPU usage. I assume this was a shared memory system since it was referred to as a “64 processor machine” but am unsure of the specifics. This machine is probably not fast enough for the top 500 list.

How parallelizable is the OPC algorithm? Since there are several variations of OPC algorithms, this is difficult to say. One obvious way to run OPC on N processors is to subdivide the layout into N regions, and process one region on each processor. However, the non-uniform distribution of geometry over the layout area (high density for cache and custom logic, low density for pins and large signal drivers, etc.) and the iterative nature of OPC will likely lead to problems in load balancing. Perhaps a better strategy is to divide the layout into >> N regions and initially assign the first N regions to the N processors. When a processor is finished with one region, it is assigned a new region until all regions have been exhausted. The actual method of parallelization is likely confidential information held by the EDA tool vendors.

 

The Need for Parallel Algorithms

Time to market is important to the success of semiconductor manufacturers. In the above section it was noted that running OPC on a single processor system could take more than two months per design iteration. This is clearly not feasible for a company with a tight design and production schedule, as the OPC time would dominate the later stages of design/early stages of production. Therefore, the use of parallel processing is mandatory for cutting-edge semiconductor makers.

 

References

[1] Nick Cobb, “Fast Optical and Process Proximity Correction Algorithms for Integrated Circuit Manufacturing,” PhD Thesis, University of California, Berkeley, 1998. : http://www-video.eecs.berkeley.edu/papers/ncobb/cobb_phd_thesis.pdf

[2] Private conversation with an engineer from the semiconductor industry.

 

Links

http://www.sematech.org/resources/litho/meetings/ngl/20010806/Poster30%20Sigma-C.pdf

http://portal.acm.org/citation.cfm?id=378332&jmp=indexterms&dl=portal&dl=ACM

Note: After a while of Google searching I found a great PPT presentation that included typical OPC runtimes in days, but IE crashed when I pressed the back button, before I could bookmark the page, and I can’t remember the search terms I used to find it.