How To Create L3 Cache Command Overflow Stress Test in Less Than 2 Days

One category of difficult SoC tests to create are stress tests, to validate the limits of SoC functionality and its behavior in this scenario. It’s usually not overloading a data input port that is challenging; something that can be accomplished with singular focus. The elusive stress tests are those that create conditions wherein a system resource becomes overwhelmed by requests from the myriad initiators that exist in a complex SoC. For example, to successfully initiate enough ARM CPU sub-system L3 Cache commands (Snoop, and other cache maintenance commands) to hit or exceed the L3 Command Queue watermark. Perspec System Verifier uniquely can be applied to achieve this result very quickly. Using the library for ARM CPU sub-systems, we recently designed and brought up such a stress test in just 2 days for a customer’s SoC. Much to the pleasure of the customer, who now has another very specific proof point of the value of Perspec for being able to generate complex tests. Project – Overwhelm the L3 Command Queue The objective of the stress scenario was to create cache/memory traffic that generates enough snoops and other L3 commands to cause the L3 command queue to hit and/or exceed its watermark. In this particular application, the customer requested creation of a L3 traffic stress scenario. The L3 cache uses a proprietary cache partitioning design. The Library for ARM CPU Sub-systems Perspec is based on the separation of the model describing the SoC and the scenarios that describe the tests. The Library for ARM CPU sub-systems comes with pre-defined elements for CPUs, caches, and other ARM architecture artifacts. It also comes with scenario specifications for a vast array of CPU sub-system integration tests that are valuable when testing the integration in the SoC and with other specific SoC resources. The Library enables connection of scenarios to the design’s particular ARM components, regardless of the number of CPUs, subsystems, caches, etc. This is accomplished through simple configuration tables that specify the CPU configuration, memory configuration, L1/L2/L3 cache configuration, MMU page table, etc. Achieving the Objective Setting up the Perspec environment and applying the Library for ARM CPU sub-systems, the team generated 400+ tests to cover all the use cases specified in the engagement plan. Coverage results of these tests are shown in Figure 1 . It took us into the 2nd day to get the basic memory tests running; once these basic memory tests were working, the rest went smoothly. Figure 1 : Coverage of 400+ tests as shown in the vPlan for ARM CPU sub-system The next 2 days were spent running a subset of the false-sharing and true-sharing tests, reviewing the test results and the waveform to get a better idea of how these tests affected the L3 cache, and fine-tuning some of these tests to target the L3 command queue. We selected two stress tests to run and was able to hit the stress target with one of the two tests. These are the detailed steps that we did once the basic memory tests passed: Ran simple false-sharing and true-sharing coherency tests: 2 cores within 1 cluster, 2 cores between clusters, all 4 cores in 1 clusters, multiple cores in multiple clusters. These are very short tests and the intention here is to ensure that the basic snooping operations are working correctly and also to get a feel of how long and how complex a test that we can generate and be able to simulate and get the results in a reasonable time. The next step was to run short stress tests: all cores in 4 clusters doing false sharing; we chose false-sharing instead of true-sharing since it is a known fact that false-sharing causes more stress on the memory/cache logic. Our true-sharing tests can also be configured to achieve such stress conditions. Once we understood how our short stress tests affected the L3 cache logic, we fined tuned our stress tests to generate more traffic and created much longer tests. In this step, tests can be very long which in general would be targeted for execution on Palladium. Optionally we can easily create shorter versions of the tests to execute on Xcelium, and in this case chose to do so for simplicity. We decided to run 2 of these tests on Xcelium; a long test and a medium length test. Medium test: 2 clusters doing false-sharing with 10x100 multi_rw_cache actions; we expected this test would complete overnight but probably would not hit the target; this test ended up running for 13 hours and also hit the target. See Figure 2 . Figure 2 : Visualizing the Medium test scenario Very long test: 3 clusters doing false-sharing with thousands of multi_rw_cache actions and true-sharing with thousands of copy_data actions; this entire test is repeated a thousand time (using run_time loop); we killed this test since this test would take a few days to a week to complete; also, we already hit the target with the medium test; so, it wasn’t important to complete this test. See Figure 3 . Figure 3 : Visualizing the Long test scenario Results The customer was very pleased that we were able to achieve this result so quickly. They struggle to create these kinds of tests, and competitors haven’t been able to achieve this result at all. For more information visit the Perspec website .

Latest Images

Trending Articles

Latest Images