What drives the main cost of an ASIC

What drives the main cost of an ASIC?

Post by: Bert Verrycken in ASIC, VLSI, SEMICONDUCTORS

The main cost of an ASIC, an Application Specific IC is related to the complexity. Basically, the silicon we start with is a clean slate. Imagine a circular surface, called a wafer, with billions of transistors. With a special technique, we will interconnect all transistors via metal layers on top of the silicon. Then the wafer is cut in squares, called dies, they are the pieces of silicon that implements the functionality. With bond wires, the pins of a package (the black thingie we call a chip) are connected with small squares or pads on the die. Hence the die from the wafer is placed and wired up inside a package that protects the tiny wiring from dust and other impurities that can destroy or short the nanometer scale wiring on top of the silicon. Hence, the road from specification to the actual packaged chip that is validated and ready to ship to a customer is a complicated process. Compare the clean slate with an off-the-shelf chip, the Field Programmable Gate Array, the business case will tell you ASIC or FPGA. BTW, a FPGA is an ASIC but with predefined logic blocks and an abundance of wiring capabilities that is configurable. Kind of a LEGO approach for digital design. Cells contain a few gates and a few flipflops. The circuit is formed by connecting those cells according to your hardware description (SystemVerilog, VHDL, ...). Whenever you change your description, you can put the new design on the FPGA. It is reusable and reconfigurable for other designs as long as they fit. In contrast, an ASIC has a fixed interconnect and the design is fixed. The increased flexibility of the FPGA has an extra cost in size of the die (fixed cost), power (dynamic cost) and maximum frequency (performance cost). So, let's look at the costs involved.

Before TAPE-OUT

Back in the nineties, the chip layout was put on a tape (a tapestreamer, who remembers them?) and physically sent to the foundry. The foundry is the company that manufactures chips (wafer in, die out). So the action of completing the layout was called "tape-out". To this day, this milestone refers to the final database delivery to the manufacturer of the chip. We no longer use tapes, we upload the database via FTP now.

Anyway, the ASIC project starts with a Marketing Request Specification (MRS). Note that some companies have different terminology for the same thing. It is a list of features marketing needs based on potential customers input. This MRS is the basis for the ASIC specification. The specification or spec is the technical reference for the chip. It defines and describes the functionality of the chip. Above all, it should contain an unambiguous description of the complete design. It seems like I am almost the only one that has seen state-of-the-art specifications. Admittedly, in the nineties, designs weren't that big. But one thing we realized back then is that the specification was the reference for everyone in the project. Subcontractors quoted based on the spec, whenever a change was made, it could mean the subcontractor could charge more. Hence, a spec was the most important document of the project. The company made sure it was airtight, or as tight it could be. Today, little time is spent on the specification. It is written and clarified while the project is already in the design phase. And the second most important document, the verification plan, relies on the functional spec. So, good luck finding a detailed verification plan in 2020. If it exists, it will be crap.

Before tape-out we have the front-end and then the back-end design cycle. Let's look at both for a complex System-On-Chip (SoC) in today's newest tech nodes.

Front-end VLSI tasks and costs:

Selecting and purchase IP (royalties?).
EDA tools (licenses).
Design.
Verification (pareto principle: 20% time spent in design, 80% verification).
Power simulation.
Synthesis.
Static Timing Analysis (STA).
Design For Test (DFT): scan, patterns, Bist, memBist, …
Debug features (licenses?).
FPGA prototyping or emulation (former more likely).
Gatelevel simulation pre-layout (before backend).
Gatelevel simulation post-layout (with real timing back-annotated).
Formal verification.
Spec, verification plan and documentation usually buffer time and absorbed into slip of the project.
...

Back end tasks and costs:

Clock tree synthesis.
Floorplan and placement.
Routing and congestion.
EDA tools for the above.
...

There is overlap in the front-end and back-end tasks. There are different milestones associated with percentages of completeness of the design. Once we have a toplevel of the chip and we got rid of the DFT, STA and synthesis issues, we will make a first netlist (cells and wires). This netlist pipecleans the back-end flow. Once we have 95% of the design ready and verified, the layout is near finished. The floorplan is final, all IOs have their final locations and congestion is almost all fixed. And we know the buffering that is needed for the clock trees. To make sure a clock edge arrives at the same moment for all flipflops in the same clock domain, the skew is compensated with clock buffers. Those buffers consume power as well. And they consume area, obviously. Again, one of those "experience" related decisions we make: the clock domain should be big enough to avoid synchronizing to another clock domain. But it cannot become too big because the buffering of the clock can become too big (area, power impact).

To summarize, the front-end delivers several netlists to the back-end team and the back-end team interacts with the front-end to solve issues that their tools cannot solve. To fix those back-end issues designers change the design, for example by pipelining long paths (long wires). Whenever the layout has a usable netlist after layout, it delivers the netlist together with the real timing information (real wiring delays) to the front-end team. They run post-layout back-annotated (timing) simulations making sure the design still is functional. Gatelevel simulations are a subset of RTL simulations that run on the netlist (real cells) and on RTL. Using the netlist and all cells with detailed timing takes 10 times longer to run on the same test scenario.

After TAPE-OUT

In 2020, we have several foundries like Global Foundries (GloFo), TSMC, UMC and Samsung to name just a few. GloFo is at 14nm and gave up on 7nm. TSMC and Samsung have 7nm in production. And they are running 5nm in risk production at the moment. The yield (percentage non-defective devices from the wafer) is rather low for 7nm. More mature tech nodes like 14nm have much higher yield since they went through several upgrades through the years that fine tune the yield and performance. Furthermore, wafers for 7nm are way more expensive than those for 14nm. So, why go to a smaller dimension tech node? Because the dynamic power consumption is lower, cells are smaller in area and potentially much higher maximum frequencies are possible. The latter is initially a challenge, but the first two are major factors. The same design has more devices on the 7nm wafer than on the 14nm wafer. Hence, the business case is always a combination of a lot of factors. As an example, AMD has the EPYC Rome 64-core beast that matches more or less the Xeon from Intel in performance. And AMD is in 7nm TSMC while Intel remains in 14nm Intel tech node for their CPU's. The TCO, total cost of ownership benefits from the lower power consumption of the EPYC processors for teh same or better performance. On top of that, AMD prices those chips lower than Intel's. It seems that AMD has a strategic advantage by using the 7nm TSMC process for their CPU's.

The costs:

Wafer cost.
Producing a die (wafer to die manufacturing process) + depends on the number of metal layers you want.
Test (reducing the risk of a defective device in the field), probe card, test program, binning.
Packaging, test after packaging and shipping.
Development board and silicon validation.
Cell and special libraries (RAM, ROM, single port, dual port) licenses and tools.
..

Software

My good friends of the software world are involved in the project as well. For the embedded processor(s), there are firmware engineers. For the application and rivers on the host, software engineers. Of course there are no software people needed when there is no design to work on. So they come on board a few months in the hardware design project. For the embedded software (firmware) people, an ASIC prototype in FPGA is a way to start working. The first goal for the FPGA prototype is a part of the design with access to the embedded processor up and running. In companies where hardware and software are split into different groups, this is always a potential issue. The hardware project leader needs a prototype of the chip, but can wait until the design is advanced enough. The firmware people will need it as soon as possible. For example, the hardware team likes to have it 6 six months or later after the project started. The firmware people are scheduled to start three months after the start of the hardware team. If the incentives, Key Performance Indicators (KPI) are set for hardware and software independently, usually the hardware team has different goals than the software team. At the start of the project both teams look at the specification. But the software team isn't involved yet and the resources are fully engaged on other projects. the chances that they thoroughly verify the spec is almost zero. Now, after they start effectively on the project, they suddenly realize this and that is not what they need. Suddenly, hardware parts that are designed (even in part) need to change. Because the hardware team leader must ensure the spec is signed off and final when the project starts. In case the specification is still changing on a dialy basis or software hasn't signed off, you have to absorb changes into the original time table and with the original budget. Even if parts were verified and finished. In contrast, if you freeze the spec and make all teams in the project sign the spec, all changes to the spec become change requests. And change requests have an importance level, priority and a cost impact. Make all involved accountable. Especially at the start of the project.

ASIC and FPGA engineers should be able to write basic firmware since in the verification simple code is executed on the embedded processors. Unfortunately, the labor market for digital design has changed over time. Twenty years ago, IC designers were able to write basic firmware (assembler or C/C++) and were used to doing all or most of the front-end design tasks. Projects were small and resources that could handle all tasks were more common. Today, the partitioning of engineers into "design only", "verification only", "DFT only" makes them one-trick pony's. It is a strategy used to reduce cost per resource. You don't need an experienced IC engineer that can do all front-end tasks if you only need that resource for verification for the next 6 months. Why pay extra for people with more experience than just verification? The only exception is a small company or a startup. There you need to be flexible and be able to cover many positions. They don't have the budget of the big companies. So, you will not be rewarded for your talent stack either. But it is a great opportunity to learn a lot about the different tasks and how all the front-end tasks link together. If anything, the missing understanding of the link between all tasks in the project creates stress and makes the budget explode. A designer that knows how to write RTL that passes timing constraints, DFT and synthesis issues delivers a design that goes to verification, synthesis, STA and then DFT. With little or no issues. The other designer will deliver a design, it passes verification (after RTL fixes) and goes to synthesis where latches are found so it returns to the designer, back to verification and passes synthesis. Then it goes to STA where timing issues are found, long paths with huge violations. So, the designer changes the RTL and that goes to verification again. There might be issues with the verification caused by the RTL changes or due to the verification that needs to adapt. Then it goes to synthesis where one latch is found due to the RTL changes, so it goes back to design. These changes can break verification, synthesis, STA and DFT and a fix for one front-end task can break another. Unless you have a lean and simple methodology that avoids all that. I have kept my eyes open for more than two decades. I have observed issues and problems and I have written them down. The I searched for solutions to avoid them. This leads to a methodology of a few rules that prevent those costly issues from arising in the first place. That is exactly what an engineer is supposed to do. It still puzzles me how people that were trained as engineers in meticulous observation and solving problems turn into mindless zombies that accept inefficiency that is so bad and costly. They stopped caring. Finally, has anyone ever wondered how startups in the hardware space, like hardware acceleration for Artificial Intelligence (AI) beat the big giants in budget (10% of the budget of the giant), with considerably less people and still be faster and with mind boggling performance? Habana Labs started with "nothing" in 2016 and was bought by Intel for $2 billion early 2020. Because the top people were silicon veterans. The world worships the people working for semiconductor giants. But they get beaten by small startups with innovative products regularly. If the big companies employ the best of the best, how come they need to buy innovators. And what happens to those "innovators" once they are absorbed? Is there a pattern? Maybe we can explain why AMD was virtually dead with not even mediocre processors and graphics chips (GPU) 5 years ago and is now competing in the server market for double digit percentages? They land exascale computer contracts!

As usual, I wrote a lot more in this post than I originally intended. The important part about cost reduction lies in the choice of the people that you employ. How that influences the most important costs is going to be another post I'm afraid. Keep on the lookout for the next one!

BLOG