How to debug job distribution issues in ADE? Long, long time ago, there was a tool named Analog Artist. If you have been around long enough to have used that tool, then welcome to our club, old timer! For those of you who are of the current generation’s app savvy experts, Analog Artist is the predecessor to what is now known as Virtuoso® Analog Design Environment or ADE . Over time, Artist morphed into ADE in our IC 5.1 release (I am sure some of our readers will not recognize that either, heck I am really getting old). And, then about 12 years ago when we released the IC 6.1 series, ADE became Virtuoso® Analog Design Environment L . Alongside ADE L, we launched another tool called Virtuoso® Analog Design Environment XL . ADE L became known as the single testbench tool and ADE XL was introduced as the multi testbench tool for corners and Monte Carlo. At this point, you must be wondering why I’m giving you a history lesson in ADE. Please bear with me for just a little longer. Artist or ADE or ADE L (whatever name we call it) was simple, ADE L was cute, ADE L started running single simulations very quickly. But, this cutie pie ADE L did everything in the main Virtuoso thread (and blocked it). When I say everything, I am referring to, Netlist creation and callback evaluation Simulation distribution and monitoring Expression evaluation ADE L is fast for a short simulation, but it’s not scalable for a significant number of corners or Monte Carlo. Even though ADE L allowed parallel distribution for the Spectre® Classic Simulator simulations, ~10 corners would choke not just ADE L, but also the whole Virtuoso session and make it unusable. Then, along came ADE XL. From the very beginning, ADE XL’s goal was to offer the end user a robust simulation environment. The new ADE tools – Virtuoso® ADE Explorer and Virtuoso® ADE Assembler – continue to support that goal. The job distribution flow is pretty much the same in ADE XL, ADE Assembler, and ADE Explorer, and for the remainder of this discussion, we’ll refer to these three tools as ADE. ADE uses an ICRP (IC Remote Process) for parallel job distribution. An ICRP is a “Virtuoso –nograph” process. The GUI (i.e., ADE) can fire up one or more ICRP sessions using DRMS (Distributed Resource Management Software) like LSF. Each ICRP session takes care of netlisting the design, running Spectre simulations, monitoring the simulation progress, and finally, evaluating the results. Having separate (ICRP) processes means that these tasks do not block the ADE UI, enabling the user to continue working as the run progresses. Over the years, we have seen our customers’ usage of ICRPs increase dramatically. It is now quite common to run 1000s of simulation points using 100s of ICRPs. And as this usage has increased, one of the challenges that has emerged is how to debug issues with ICRPs and job distribution, which is the topic of my write-up. So, what is it that I’m trying to address? Does the following sound familiar? User hits green button in ADE, expects simulations to run and see results. However: No simulation is starting . Why? Or, one or more simulations fail , or, one or more measurements fail . User wants to find out what happened? Or, things are running, but seem to take an excessive amount of time (usually not the actual simulation time, but rather setup time or time to obtain results). How can we debug what’s wrong? We have recently published a set of updated slides on debugging on Cadence Online Support portal. Click here to download the Troubleshooting ADE Explorer and ADE Assembler slides. The objective of these slides is to equip everyone with a baseline set of tools and information to be used when investigating issues in ADE. These slides can serve as a very good reference material when faced with job distribution issues in ADE. Through this and my future blogs on debugging, I’ll attempt to use the tools and recipes from the slides and apply these to real-life situations. But, before diving too deep into the issues and their symptoms, let’s first discuss some of the basic concepts. Even though some of you may be already familiar with these, it’ll be good to be on the same page. If you’re an ADE pro, then you already know this. But, if you’re new to ADE, and you’re wondering where you see this mysterious ICRP kicking into action, then worry no more. When you run simulations (i.e., hit the green Run button), you will see one or more little computer icons appearing in this assistant named Run Summary (see picture below). Each one of these computer icons is one ICRP session. In case you don’t see this assistant you can invoke it using Window – Assistant – Run Summary in ADE. How many ICRPs should you expect to run at one time? That depends on your job policy setup. There is a field on the Job Policy setup form (accessed through Options – Job Setup) called Max Jobs. This determines the maximum number of ICRPs ADE will run at any given time. For example, if my Max Jobs is set to 20, and I am running 10 corners, then I will see 10 ICRPs running in the Run Summary Assistant, each running a corner. On the other hand, if I am running 100 corners, then with 20 Max Jobs, each ICRP will run 5 corners. How does the ADE GUI communicate with the ICRPs? In order to explain this, I will use the section titled “ ADE Simulation Flow ” from the debugging slides. The picture below demonstrates how the ADE GUI communicates with the ICRPs. Basically, what happens is the ADE GUI gives each ICRP a simulation (e.g., a corner) to run. ICRP creates the netlist, evaluates the callbacks, runs Spectre, evaluates the results, sends those results back to the ADE GUI, and then it goes back to the ADE GUI and says, “Hey, I’m done, what do I do next?” The ADE GUI then finds the next corner from the list of remaining corners and tells the ICRP to run that one. And, then the process continues. Note that in recent versions of ADE Explorer and Assembler, there are some performance optimizations (group run, Spectre interactive plugin, /tmp RDB) which change the above picture slightly. We'll discuss those in more detail in future blogs, and you can read about them in the "ADE Simulation Flow" section of the debugging slides . Also note that the above picture points out some of the potential reasons that can give rise to job distribution problems. For example, the ADE GUI can become very busy with all the inter-process communication and can slow down. Also, since the ICRP is one monolithic process that manages everything related to simulation, the overhead for netlisting, licensing, and simulation startup can become a bottleneck. The picture below breaks down the communication between ADE and ICRP in more details. The arrows in red are part of the start, configure, and prepare stages. This is where ADE has asked the DRMS or DP (Distributed Processing) software, e.g., LSF, to start an ICRP session. After an ICRP is started on a remote machine and the configuration is complete, then the ICRP will start Spectre on that same remote machine and continue to monitor Spectre. After Spectre is done with the simulation, ICRP will evaluate the measurements, and send the results back to the ADE GUI. These tasks are identified by the green arrows. After that, the ADE GUI will see if there are more pending points for the test, or for other tests, and if yes, it will give a new point to the ICRP. These are the steps in blue arrows. By the way, if you have ever wondered what happens between the time you pressed the green Run button and the Results table shows running (in other words, “Grr, why haven’t my simulations started yet?”), it is because the ICRP is in one of the states in the first group. During this time, the ICRP is starting up and trying to get configured. These stages can take some time. Remember, that the ICRP session is a “Virtuoso -nograph” process, i.e., it is a full-blown Virtuoso process, except it’s running in no-GUI mode. So, if, in your environment Virtuoso takes some time to start up (depending on your customization, initialization, and PDK load time), then ICRP will be subject to that as well. Why does the ICRP need to load the full Virtuoso environment? Because, in this world nothing is free! And, because the netlisting will have to be done somewhere. In order to create the netlist, the PDK has to be loaded because Pcells may need to be evaluated and any SKILL customization codes will have to be loaded. Remember, ADE L used to create the netlist, but also remember that it used to block the Virtuoso process during that time. So, if you were running a big design, then the netlist generation process could take a long time with ADE L. Because ADE now uses ICRP to create the netlist, it frees up your Virtuoso session, but that does not mean that the netlist creation can be avoided. Sadly, we cannot have our cake and eat it too! So, the ICRP does the heavy lifting now and hence you see a delay between the time you pressed the green Run button and the Results table shows running. Now, you know what’s happening behind the scene! In my next blog, we will discuss how the job policies work. Stay tuned. Related Resources Application Note Troubleshooting ADE Explorer and ADE Assembler User Guides Virtuoso ADE Explorer User Guide Virtuoso ADE Assembler User Guide For more information on Cadence circuit design products and services, visit www.cadence.com . About Virtuosity Virtuosity has been our most viewed and admired blog series for a long time that has brought to fore some lesser known, yet very useful software and documentation improvements, and also shed light on some exciting new offerings in Virtuoso. We are now expanding the scope of this series by broadcasting the voice of different bloggers and experts, who would continue to preserve the legacy of Virtuosity, and try to give new dimensions to it by covering topics across the length and breadth of Virtuoso, and a lot more… Click Subscribe to visit the Subscription box at the top of the page in which you can submit your email address to receive notifications about our latest Virtuosity posts. Happy Reading! Kabir
↧