Clock Domain Crossing Verification — Completing sign-off by linking static & dynamic verification
Case Study By Krithivas Krishnaswami of NVIDIA
Case Study Overview
Krithivas Krishnaswami discusses NVIDIA’s successful evaluation of a methodology for completing clock domain crossing verification by linkings CDC static sign-off and simulation. Real Intent’s Meridian CDC and the Meridian CDC Simportal feature were deployed.
Edited transcript and graphics.
I. Completing Clock Domain Crossing Verification
I will discuss Nvidia’s methodology for completing clock domain crossing verification by utilizing both CDC static sign-off and simulation.
We use Real Intent Meridian CDC Simportal to link the CDC static sign-off world and the dynamic functional verification world; during simulation, we verify our protocols and design constraint assumptions from CDC static sign-off — and mitigate metastability issues.
II. Nvidia’s Clock Domain Crossing Verification Workflow
Our clock domain crossing verification workflow can be divided into two parts.
First, we run Meridian CDC static sign-off based on our RTL design and design constraints. We resolve all the identified errors and set waivers for unintended behavior.
The Meridian CDC Simportal feature will automatically create an assertions checker file – no additional setup is required. We plug the Simportal assertion checkers file into our dynamic verification (simulation) environment, along with our RTL and our SystemVerilog test bench.
When we perform CDC static sign-off, our constraints assume the design will behave in a specific way under certain conditions. Simportal’s assertion checkers let us catch functional issues by verifying that the protocols and constraint assumptions from CDC static sign-off are reflected in our simulation testbench.
III. Protocol Checks
The assertion files contain two types of checks, protocol checks and intent checks.
The protocol checks assess if our design protocols were met – ensuring the design will operate in certain conditions. They cover two CDC analysis path types.
DATA CROSSINGS. The file has:
a. Data stability checks. Thesewhich confirm whether or not the data we’re sending from the transmit side is held long enough in a stable condition for the data to be captured on the receive side. In this way, we eliminate data transmissions that are too fast for the receive flop to capture.
b. Metastability injection. For cases where an engineer may not be able to deterministically emulate metastability, Simportal offers a metastability injection library to artificially introduce metastability to see if the data path is protected.
CONTROL CROSSINGS. The file has:
Metastability injection
Metastability injection confirms whether the control path is protected; for example, the engineer may not have used the right type of synchronizers. It injects delays to signals so the engineer can analyze whether the signal receive side output is protected by metastability.
Pulse width checks
It’s important to check the pulse width of the receiving clock to ensure the data on the receive side can be captured without issues. An example would be to confirm the receiving clock pulse width is at least 1.5 times that of the transmit clock.
Gray code checks for FIFO Synchronizers
These checks verify that our FIFO controls are gray coded. The gray codes only allow one bit of the control to change at each clock transition, to avoid any race conditions or metastability when using the FIFO to move data between two different clock domains.
IV. Intent Checks
We use the “intent checks” that Simportal generates to cross-verify that our CDC static sign-off constraint assumptions hold true in our dynamic CDC verification world.
V. Hierarchy & Coverage
Addressing Hierarchy Differences
Our designs are comprised of complex IP as large as a million gates, which are made up of smaller blocks. Depending on the situation, engineers may either first run CDC static sign-off our smaller blocks and then sign-off at the top-level, or, first run flat CDC sign-off at the higher level.
When we connect our CDC static sign-off and simulation worlds, we may have situations where different teams do sign-off at different levels of hierarchy. For example, engineers may first run CDC static sign-off at a higher level and generate assertion files, then when they plug the assertion files into the simulation environment, the other team might run simulation at a module level.
Nvidia’s solution to address this handoff process between clock domain crossing verification domains while maintaining translation between the signal hierarchies (e.g., signal names) is to create glue logic to stitch the design together.
Ensuring Coverage During Simulation
We get full coverage during our CDC static sign-off step. To ensure coverage during dynamic CDC verification, the simulation testbench must have the correct test vectors to simulate/trigger the assertions during functional verification.
Certain assertions are intended to be triggered for specific scenarios. The onus is on the engineers doing the simulation to determine whether their existing test patterns are sufficient to cover those scenarios or if they must add new test patterns. They can determine the coverage in advance because they already know the design and their existing tests.
Engineering collaboration between the two verification teams comes into play if the person running the simulation is not familiar enough with the CDC tool and/or the generated assertion checkers to fully understand what is needed for full coverage.
VI. Case Study: Errors Uncovered by Meridian CDC Simportal
Below are errors we were able to catch during our CDC verification case study after we plugged Simportal’s generated assertion checker files into our simulator environment and ran our existing testbench.
Pulse width failure
The waveform shows a scenario where our protocol dictated that the width of the pulse should be held stable for at least 1.5 times the clock period of the received clock. However, the width of the data coming into the flop’s D pin input was less than one clock cycle.
The result would be that this particular pulse at the output of the flop would not have been registered. This meant we had an error, either in our test vectors or in the design itself.
Error: Pulse-width assertion failed (missed pulse) on at time 706.00ns
Constant Failure
The signal was set to the constant value 0 in static CDC sign-off but had a constant value 1 in simulation at the given timestamp.
Simulation Environment Error
Simportal’s metastability injection library artificially introduced metastability and showed the signal was not protected; the delay was increased and exceeded tolerance.
VII. Reducing Noise in Dynamic CDC Verification Violation Reports
Simportal writes out assertion checkers for every constraint and every design scenario. During simulation, the many thousands of test vectors may trigger conditions which are not possible in a true design functionality. Those can produce noise (false violations).
The noise can be caused by environmental differences between clock domain crossing static sign-off and simulation, as well as scenarios that the simulator may not be able to correctly interpret.
Nvidia collaborated with Real Intent on how to reduce this noise. The outcome is that Meridian CDC Simportal now also includes a mechanism to specify waivers that will suppress writing out assertions for simulation on a certain path or for certain types of check we know are safe.
From a CDC verification methodology standpoint, we first run our analysis without any waivers. Then, the CDC static sign-off owner and our dynamic functional verification engineers review the violations and collaborate on waivers for our subsequent runs. These waivers are a way of helping the tool understand the designer’s intent and how to deal with the different scenarios.
We analyzed three different designs before and after applying waivers. These two charts illustrate the high drop in violation count for the mutex and pulse width check when we added waivers.
For design one, adding the waivers dramatically reduced the violations count from thousands to none, and from one or two hundred to only a few.
With this massive reduction in the violation count, our designers were able focus their attention on actual errors — cleaning up design issues or mismatches in assumptions between CDC static signoff and CDC dynamic verification.
When they find assumption mismatches, they must then assess whether the error is in the design, the CDC constraints, or the test patterns.
VIII. Conclusion
Nvidia’s clock domain crossing verification methodology includes data sharing and collaboration between different domains, which is key to a complete clock domain crossing sign-off process. In this way, we avoid a single point of failure that results from either a design error or different assumptions between the teams.
We verify that our design is bug free, plus we verify our design is being tested under the correct conditions and that our constraint assumptions hold from one domain in another domain.
What is clock domain crossing verification?
Clock domain crossing verification is the process of analysis, debug, and designs modifications to ensure that data is properly transferred across clock domains without introducing design issues. For example, metastability can occur if a signal crossing between clock domains arrives too close to the receiving clock edge, potentially causing illegal signal values to propagate through the design.