Systems are becoming more attuned to their operational environment, but this adds many questions and issues that need to be resolved.
Historically, the performance and power consumption of a system was controlled by what could be done at design time, but chips today are becoming a lot more adaptive. This has become a necessity for cutting edge nodes, but also provides a lot of additional benefits at the expense of greater complexity and verification challenges.
Design margins are a tradeoff between performance and yield. Cut margins too thin and failure rates go up. Increase margins and nominal performance goes down. Some companies have used binning to separate devices that were capable of performing faster than average, but as issues such as thermal and aging become more prevalent, these techniques are no longer sufficient. Chips must become adaptable to the environment they are operating in, the workloads they are subjected to, and the physical degradation they incur.
“Traditionally, you would establish the correlation between the voltage and frequency upfront,” says Tomer Morad, group director for R&D in Synopsys‘ Embedded Software Development Group. “You would take many chips and you would test them, find the point that they fail, and then find the Vmin per frequency and temperature that a defined percentage of them pass.”
Establishing those criteria for a defined yield can be problematic. “Knowing the right voltage to run at can be a challenge, especially as voltage drops into the near-threshold or even sub-threshold region,” says Scott Hanson, CTO and founder of Ambiq. “In these low-voltage regions, even a small change in voltage can mean a big shift in performance.”
But these are not the only performance goals that are linked. “Performance and energy efficiency go hand-in-hand,” says Amir Attarha, product manager for Veloce Power at Siemens EDA. “You have two variables, voltage and frequency. When voltage is decreased, performance decreases, but energy consumption drops by a greater amount. This is a fundamental relationship that allows you to do voltage scaling for energy efficiency. If I don’t need high performance, I can drop my voltage and still have enough performance to complete it.”
That means you don’t always have to run at the fastest possible rate, and this requires knowledge about when results are required.
Today, an increasing array of sensors are being placed into chip and these open up a host of adaptive control possibilities.
“One type of sensor is the path margin monitor,” says Synopsys’ Morad. “You put it on the critical paths and the path margin monitor will tell you how much slack you have until the clock edge. With the ability to measure slack you can be more aggressive in terms of lowering your Vmin or increasing your frequency. All of a sudden, each individual chip can have its own table of voltage, temperature, and frequency.”
But a chip running at a higher Fmax will consume more power and that changes the thermal profile for the device. “You can include synthetic paths, such as ring oscillators, to measure process corners,” says Firooz Massoudi, solutions architect for Synopsys. “Max frequency is affected by speed, process corner and voltage. But temperature is also indirectly affecting your margin. Based on that, you can decide how fast you can run or if you can reduce the voltage to keep the same speed.”
As more factors are included, control strategies become more complex. What granularity provides the best control with minimum complexity and verification cost? Should control logic be implemented in hardware, or are some aspects better controlled by software? Even that is not the full extent of the possibilities. When software is utilized, is the processing done locally, or in the cloud?
There are at least two dimensions to granularity. One is how big are the blocks that can be separately controlled. Second, within each block, how fine are the voltage and frequency changes that can be made? “You have to look at it from both the temporal as well as spatial angles,” says Siemens’ Attarha. “How quickly does a system need to respond to changes in operating conditions? How soon can a voltage or frequency be adjusted. This is the temporal aspect. The spatial consideration is how many domains do I have?”
You also need to define the precision of the control. “There are designs that have 10 voltage levels,” says Piyush Sancheti, vice president for system architects at Synopsys. “You have a nominal voltage for the processor, but they have defined domains operating at step voltages that are fractions higher or lower than nominal. Depending on the state a particular unit is operated in, they are able to step up or down the voltage based on the severity, or the load, running on that module.”
Granularity indirectly impacts the speed with which blocks can be turned on or off, or the voltage and frequency adjusted. The larger the block, the slower they are likely to respond. “If you need to turn things on and off quickly, you’re probably going to have to break things up into smaller islands,” says Gabriel Chidolue, senior principal product engineering manager for Siemens EDA. “Coupled with this, you would probably want to do this closer to the hardware, without sending too much data to other parts of the chip, including sending to software. You probably need a distributed power controller that can take this information and make these types of decisions.”
Most of the designs utilizing these techniques today tend to be large processor-based systems. “The granularity is at a coarse level, meaning that if you have a system with multiple cores, then you can set the operating condition for each of the cores independently,” says Attarha. “But change takes time. If your scheduling is happening through the operating system, you cannot make decisions faster than the OS quantum. But you also have to be careful when changing voltage. There are two ways to do this. The first is to put the device into an idle mode, then you bring the frequency and voltage to where you want it to be, wait for it to stabilize, and then you can start using it. The other mode is to ramp up voltage and frequency as you go. Each of them has tradeoffs.”
There are multiple routes to achieve the same results. “Fine-grained control without compromising on performance is best left up to hardware,” says Madhu Rangarajan, vice president for products at Ampere Computing. “This is the area where there will be a lot of work done in the future, especially with heterogenous platforms. Coarse-grained power management can be something that software can manage especially in a heterogeneous platform, whether it’s on chip integration or a platform with CPUs, DPUs, and XPUs, but this won’t solve the latency problem that results in many coarse-grained power management techniques generally being disabled in servers.”
While these techniques may be led by one sector, interest is spreading. “The hyperscalers are motivated by one thing — power,” says Rob Knoth, product management director in Cadence’s Digital & Signoff Group. “The energy footprint of a data center is not something you can just hide. It is a very clear cost — the thermal impact, and carbon footprint. There’s a strong motivation. But if you look at something like an embedded system, especially with edge-based intelligence, there is a very different cost in terms of battery life. This trickle-down of technology, tools, and methodology means they are able to leverage those exact same things that the hyperscale customers are using, but they’re doing it in a much smaller footprint and much more fine-grain method and achieving their own benefits.”
Fine-grained systems have additional system concerns. “When you have small blocks that can each operate at multiple frequencies, you have to think about communications,” says Attarha. “These systems have to be local synchronous and at least partially globally asynchronous. There needs to be some handshake scheme for the FIFOs used for communication between the blocks of the system that can handle different frequencies. It depends on the granularity of your clock domains and voltage domains. Then you have to consider how much the whole system can manage.”
That may depend on numerous factors. “The real innovation will come when software becomes co-designed and co-developed with the hardware, leveraging all these mechanisms,” says Sancheti. “It’s a tradeoff between complexity and energy efficiency — complexity both in hardware design, as well as software. Nothing is for free.”
There are many control loops to consider. The tightest of them are hardware control systems, followed by software that is executed locally. Increasingly, data is being sent into the cloud for analysis, which can result in software updates to the control strategies. But the largest control loop is over product development cycles, using data analysis to help refine future products.
“The hard work done by the sensors must be done as much as possible behind the scenes,” says Ambiq’s Hanson. “If they are hardware sensors, they need to be closed-loop, not to get involved with the software. Sensors used for tracking module usage, memory usage, processor usage, etc., should be wrapped in hardened code that works right out of the box without much configuration.”
The best path forward is sometimes the one with least resistance. “Pure hardware and firmware-based power management techniques are the easiest for the ecosystem to consume,” says Ampere’s Rangarajan. “You can see how the industry has transitioned over the years, even in the legacy space, from software managed power-states to hardware managed power-states, with much better state transition latencies and power savings. We expect hardware architecture to continue to deliver major innovations in power management. With semiconductor process-driven power scaling slowing down, you can expect to see architectural enhancements as the primary vector of innovation here.”
Others are thinking much bigger. “The biggest trends I saw at the last International Test Conference were system-level test and heterogeneous integration with chiplets,” says Cadence’s Knoth. “Underlying both of those is how to monitor what’s going on inside the silicon, and what pipe is used to bring that data out. Then it can be used by software running on the product, or sent to a cloud data analytics platform. Being able to accurately understand the thermal effects that are happening inside that system — whether that’s on silicon, or in the package, on the board — and getting that into software is very active.”
There is a limit to what can be thought about during the design phase. “Embedded sensors can track infield operation and regulate the chip based on the real-world workload,” says Sancheti. “Today, those sensors purely operate in the hardware domain but there is an aspect that you can bring that knowledge, or know how, into software. If you’re regulating the chip through your firmware or your OS, these sensors could provide actionable feedback into your software. That is yet another loop back from hardware into software. But this time it’s not just purely from a design perspective, but it from an infield operation perspective.”
Software enables decisions to be deferred. “Increasingly, designers are looking at this as a hardware-software co-optimization problem,” adds Sancheti. “That starts with partitioning – what functions can go into the hardware versus the software. Hardware comes with low latency and more security. But the flip side is that with software you get a lot more configurability, more programmability. You can even tune the power management to aspects of the design as a software refresh.”
Co-development is becoming increasingly important. “With many modern design issues, power control is best addressed from an effective hardware/software co-design approach,” says Bill Neifert, senior vice president of partnerships for Corellium. “There are fundamental tradeoffs that can be made on either side that will impact the overall system. The best way to understand those tradeoffs is using a design approach where hardware and software are designed in tandem. Turning off hardware blocks may make sense from a power perspective, but how will this affect performance? Do we turn the block back on every time it could be used by software or are there use cases where it may make sense to just handle the job entirely in software?”
There is a danger that software will not use all of the capabilities provided by hardware if the teams are separated. “Some dynamic mechanisms are only possible with the context available to higher-level operating system software, and some are possible in pure hardware,” says Chris Redpath, technology lead for power software in Arm’s Central Engineering. “We expect to increase the amount of hardware-based power controls (for example clock gating, clock rate control, micro-architectural pipeline gating, etc.) and provide the policies and context needed to influence these controls in higher-level software. Something we see in task scheduling is that knowledge of what is about to run allows you to select the correct operating point without waiting for demand to become visible in the system. We expect to continue using high level knowledge of system demands to influence power control decisions.”
Not everything can be done in hardware. “Powering off significant parts of a system often requires much more information on the current use-case,” adds Morten Rasmussen, principal software engineer for Arm’s Central Engineering. “At the hardware level, we have little overall context awareness, but do have the ability to make decisions quickly for fine-grained controls. In the OS, or even application level, the context awareness is high but time scales are different. Ideally, the context information available at the application and OS level is abstracted and passed down through the software stack toward the firmware layers controlling the hardware. Each layer has some degree of autonomy guided by policies at the higher levels. At the same time, it is also desirable to abstract platform-specific dependencies that might limit which parts can be DVFS-controlled independently or powered off.”
Should the data be sent to the cloud? “Does it make sense to send all this data to a data center and do the processing there, sending the raw data, paying for the communication, paying for all the processing?” asks Attarha. “Or does it make sense that we do some processing locally and understand what is happening, and then send the results to the data centers? This really is a multi-layer problem that chip designers and system designers are dealing with.”
That outer control loop requires additional capabilities and creates additional concerns as you get to the field applications. How often do you need to adjust? What kinds of adjustments you need to do? How big is the firmware? What’s the impact to this in-system? These are the types of questions that should be asked during system design. In addition, questions have to be asked about the software. Should the heavy lifting analysis of looking at the data, and finding the trends, be done with a cloud-based solution and how do you deploy software down to the edge?
If the data is available in the cloud, more can be done with it. “If you think about the semiconductor industry as a whole, over every generation of products, we’re getting smarter about leveraging the existing data that’s on the table, using it to create products that are better in the next generation,” adds Knoth. “Today, we have a more intelligent systems-based approach overall to how we design, verify, and analyze these systems.”
But there is a danger in believing that one technique can solve everything. “Sensors and control systems can regulate temperature, but it doesn’t solve the problem,” says Marc Swinnen, director of product marketing at Ansys. “It deals with it, but you get a slower chip. A more elegant way is to better predict the temperature so that you can design with less throttling and work at constant performance. To do this you need much better analysis tools. Most thermal tools at the system level and just being introduced for the IC level. We know how to do it algorithmically, but the tools and methodologies are not in place.”
Fig. 1: Understanding and tracking thermal flows in a system. Source: Ansys
Adaptive control systems clearly are going to become more important in the future. Sensors are now becoming ubiquitous, and they are producing huge amounts of data that can be used for a variety of purposes. Today, only a small fraction of that information is escaping from the confines of the package, but the industry has recognized the hidden potential of that data and is looking to use it in a much broader context.
Editor’s note: These types of systems create addition design and verification challenges, which will be examined in a future story.
(Note: This name will be displayed publicly)
(This will not be displayed publicly)
Open-source processor cores are beginning to show up in heterogeneous SoCs and packages.
Open source by itself doesn’t guarantee security. It still comes down to the fundamentals of design.
The benefits of heterogenous integration are well understood, but getting there isn’t easy.
Creative methods keep extending the performance of copper lines and vias.
New memory approaches and challenges in scaling CMOS point to radical changes — and potentially huge improvements — in semiconductor designs.
Changes are steady in the memory hierarchy, but how and where that memory is accessed is having a big impact.
113 startups raise $3.5B; batteries, AI, and new architectures top the list.
127 startups raise $2.6B; data center connectivity, quantum computing, and batteries draw big funding.
Thermal mismatch in heterogeneous designs, different use cases, can impact everything from accelerated aging to warpage and system failures.
To significantly reduce the power being consumed by machine learning will take more than optimization, it will take some fundamental rethinking.
Solutions are needed early as thermal becomes a systems issue.
Why and when it’s needed, and what tools and technologies are required.