Arma virumque cano—of arms and the man I sing. This is the famous opening line of Virgil's Aeneid, the curse of many a schoolboy's Latin learning days, mine included. Anyway, the reason for all this talk about arms (Arma) is that today is the first day of Arm TechCon, and many journalists will be doing much singing about Arm. I'm sure I will be too, in the days to come. More interestingly, this morning Cadence announced that its full verification suite is available for Arm-based server datacenters. These are datacenters using HP Enterprise (HPE) Apollo 70 systems. Actually, I didn't have to study the opening line of the Aeneid, I had to study the end of the last book (book XII, lines 383 to the end, funny the things you remember 50 years later). I have heard it said that studying Latin (or Greek) is great, because it teaches you to think logically, which is useful for things like computer programming. As someone who has been a computer programmer much of my life, even while I was studying Latin in fact, I have to say that proposition seems dubious. On the other hand, my PhD supervisor, and one of the best programmers I've ever come across, had a degree in Classics from Oxford. On the other other hand, a degree in Classics from Oxford means really really smart. I have written about Arm servers before. See my previous posts: Xcelium Simulation on Arm Servers How ARM Servers Can Take Over the World HPE Apollo Systems The Apollo name was recently resurrected from the first workstations that I used in the early 1980s, from Apollo computer. At VLSI Technology, soon after we got there, we bought two (count 'em) DN100 black and white workstations. I tried to use Google to find out how much they cost and it said $25 billion...no, wait, that was the Apollo moon program. If I recall they were somewhere in the $50-100K range. Today you can buy a (more powerful) laptop for $150. Thank you Moore's Law. Apollo Computer was purchased by HP in 1989, and I won't even try to explain how HP also acquired Compaq and then was gradually split up. But one part of that is HPE. Under the hood of an HPE Apollo 70 Arm-based server is a Marvell ThunderX2 core. This was developed by Cavium before Marvell acquired them last year. HPE says that this is the first "purpose-built Arm HPC systems by #1 HPC vendor." Since HPE is the #1 HPC vendor (by quite a wide margin), this is a bit of a tautology. I think they really mean by any HPC vendor. Although the end product comes from HPE, HPE is driving a multi-vendor effort to accelerate Arm adoption for HPC, involving not just Marvell and Arm, but also Red Hat, Rogue Wave, Mellanox, Suse...and in our announcement today, Cadence. Datacenter Challenges There are many constraints on building datacenters, and updating the equipment in them. Obviously, one big factor is how much money, but also how much electric power can be got into the building. For example, Arm in Cambridge has maxed out the local grid. All the power that goes in has to come out, so HVAC is another big constraint. The basic square footage can't (easily) be increased. That's before you even get to technical stuff like network architectures, choice of servers and so on. A major challenge is to increase capacity in existing datacenter space. This drives the need for a density-optimized HPC platform with plug-and-play manageability and scalable performance. With the Apollo 70 solution, you can get 5,000 Arm cores (providing 20,000 threads) in a single rack. Verification Every marketing presentation about verification starts with a graph showing exponential growth, chips get bigger, verification gets bigger even faster. The cost of escapes goes up and up: even a respin of a chip is millions of dollars and months of delay. I'm just going to take that as given. Verification is important. But it's more than that. We are at an inflection point in how verification users operate in the datacenter with high-volume compute available to them. Yes, cloud provides a broad set of available compute, but because it is distributed among distinct servers, the throughput possible is attenuated by the communication infrastructure. Arm servers provide more cores on the server and more cores in the datacenter. This is going to change the way users think about verification shifting us from a per-job to a more appropriate per-regression focus. Verification workloads are very mixed. There are thousands, or hundreds of thousands, of small verification tasks that take just a few seconds to run. Library characterization is the easiest example to point to. On the other hand, there can be jobs that run for literally weeks. The mix changes as the project progresses. Further, most companies have multiple projects in flight, so any given engineer is not just in a sort of competition with the rest of the team, but also with the other teams, and perhaps even with non-EDA type activities (finance or mechanical design, say). The Cadence Verification Suite consists of: EDA tool Purpose Xcelium Parallel Logic Simulator Single and multicore simultions JasperGold Formal Formal verification analysis Specman/e Engine Test bench generation and re-use Select Verification IP (VIP) Modeling tools for SoC protocol and memory verification Indgo Debug Analyzer Resolve verification bugs Encryption Support Safe interchange of IP Red Hat (RHEL) 7.4 Linux Flexera License Support Licence management I don't want to downplay all the other tools—simulation and formal can never be fast enough—but probably the most important of the list in this context is vManager. In a big datacenter, on an SoC project with multiple teams, a huge number of simulations of varying length, it is of paramount importance to keep track of which jobs have been run, which succeed, which failed, and so on. From the local view of individual designers, they don't want to wait longer than necessary for their jobs to complete. From the global point of view, the servers should be running close to capacity, and jobs should be selected to keep all the available licenses busy. This makes prioritization really important since there is a mix of small and large jobs, licensed (needing a tool license) and unlicensed (typically post-processing log files), and the mix evolves over time. Adding Arm-based servers into the datacenter can take a 1000 job sequence that requires 3 days with existing capacity, down to just 1 day, and all within the existing datacenter footprint. Not Just Datacenters, But Also Supercomputers In April, HPE, Arm and Suse announced Catalyst UK, which will build: one of the largest Arm-based supercomputer deployments in the world, to be available to both industry and academia. The 12,000-core machine will be distributed across clusters located at the University of Edinburgh, the University of Bristol, and the University of Leicester. The supercomputer, built by HPE, is due to be completed in summer 2018. It will comprise three 30kW systems, each consisting of 64 HPE Apollo 70 servers equipped with two 32-core Cavium ThunderX2 processors, 128GB of memory and Mellanox InfiniBand interconnects. I think the picture to the right is just an HPE Arm-based datacenter, not one of the supercomputers. Although I doubt they will look much different. Then in June, the US Department of Energy Nuclear Security Organization announced Astra, an Arm-based supercomputer.expected to be deployed late this summer at Sandia National Labs. As it says in their announcement : Astra will be based on the recently announced Cavium Inc. ThunderX2 64-bit Arm-v8 microprocessor. The platform will consist of 2,592 compute nodes, of which each is 28-core, dual-socket, and will be at a theoretical peak of more than 2.3 petaflops, equivalent to 2.3 quadrillion floating-point operations (FLOPS), or calculations, per second. Exeat For more details, see the product page. Virgil's Aeneid is famous for ending "astoundingly abruptly." Not soon enough, was more like my opinion in Latin class. But it gives me an excuse to end this blog post abruptly. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.
↧