One of the major differences between software and hardware is the concept of a clock. It defines the rhythm the sequential elements update inside the chip. Note that in complex System-On-Chips (SoC) there are many clock frequencies that distribute to parts of the silicon. In this post, we will take a basic look at a clock. At the bottom of this post, you will find links to the previous posts. The “ASIC FUNDAMENTALS” series of blog posts starts with “synchronous design” and the rest follows chronologically after that first post. In this post, we talk about the implications for the FPGA prototyping team.
I am old enough to remember the emulator in the basement. A very expensive tower that emulated the ASIC before a tape-out. We used it to verify the functionality, but a few times slower than real-time. Remember that in functional verification with simulators, we usually run test cases up to a few hundreds of milliseconds in real-time. For RTL code that took a few hours to run on a server. Maybe one day and night tops. If we wanted to simulate a gate-level netlist with timing, that would multiply the RTL server runtime by a factor of 10. An RTL simulation of the slow startup of a chip that takes 24 hours would take 10 days and nights to run in gate-level. Hence the emulator was something that could run for hours and see the design behave for longer than a few hundreds of milliseconds in simulation. Plus an emulator was a way to let the software people write their firmware and application in parallel with hardware design. The hardware team had to deliver a functional design first. Still, it was way faster than waiting until the first silicon samples arrived.
Then came the FPGA, an off-the-shelf component. Soon hardware teams realized it was cheaper to buy FPGAs than emulators. For larger Systems-On-Chip (SoC) the RTL code need to be partitioned. One FPGA could fit only part of the chip. Still, it was more cost effective than an emulator. Today, some companies still buy and use emulators, but you need deep pockets. And, personally, I haven’t worked with emulation for over 15 years. All teams I worked with used an FPGA based prototyping solution. So, we need to understand how clocks in an ASIC differ from clocks in an FPGA.
We talked about clock trees and clock gating in a previous post. The back-end engineers balance the clock skew. And clock gates are used to prevent a clock from toggling to reduce the dynamic consumption of the digital logic. The difference is that the clock routing network is already available on an FPGA. They are separate from the configurable wiring for connecting logic blocks. Clock lines in an FPGA take care of the skew. For static clock gating (to save power), every flip-flop on the FPGA has a clock-enable pin. In fact, each sequential element has its own clock-gate. In contrast, an ASIC has one clock gate and then a clock tree. Now, the delightful news is that the tools of the big vendors identify those clock gates automatically and use the local flip-flop clock-enable pins instead. With the caveat that if the ASIC design doesn’t fit in one FPGA, the clock block needs special consideration. If the clock generator or control block is in one part and not in the others, we need to improvise.
Paranoia is one of the most important factors in ASIC design. It is best to not deviate from the ASIC code as much as possible. With partitioning the design, Pandora’s box opens. Two ways to think about partitioning, using one FPGA or multiple FPGA’s. One could use development boards with one FPGA that contains a part of the design. The engineer configures the FPGA with the part he or she needs. But then special FPGA specific code replaces the part of the design that isn’t in the FPGA. You can’t leave those signals dangling. Plus, specific FPGA code for clocks is also a necessity. The farther we drift from the original, the higher the risk that we are prototyping something that behaves a little different from the original. If the embedded software (firmware) runs on FPGA, there is always a chance, it won’t work on actual silicon.
The second possibility is to have multiple FPGA’s and a partitioned design. Again, the communication between the FPGAs differs from the ASIC internals because they need to go off-chip through output buffers and a PCB track to the input buffers of another FPGA. Again clocks will need special treatment.
Another matter that needs attention is that the ASIC architect can define two clocks as synchronous. The back-end team solves the edge skew. But the clocks in an FPGA that have different clock networks are asynchronous. For example, if we have two clocks, a 20 MHz and a 40 MHz clock, coming from the same clock source in the ASIC design. In FPGA, all the logic needs to run on 40MHz and the 20MHz clock domain needs a clock-enable that has a 50% duty cycle on the 40MHz clock. This we call dynamic clock gating. The same clock enables in the flip-flops of an FPGA can emulate the clock gating whenever the two clocks have the same source clock.
What did we learn in this post?
Clocks are special and the experienced architect or team manager takes into consideration the impact of clock definitions for both the ASIC and for an FPGA prototype. This more or less concludes the series. If you have questions or suggestions, comment below.
Prerequisites to this post: