Understanding Frequency Scaling and Power Saving in Intel Processors

Primer

Recently, I initiated experiments with a four-channel power meter in the lab to monitor the power consumption of our testbed, specifically the Distributed Unit (DU) and the Centralized Unit (CU). However, the initial findings were puzzling: the DU and CU displayed markedly different power consumption profiles (see plot below). The experimental procedure was straightforward: assessing the power consumption of the nodes (CU and DU) when the Radio Access Network (RAN) software was inactive to establish a baseline power consumption. Given that both nodes operated under identical loads and utilized the same hardware components (RAM, processor, GPU, etc.), I naturally anticipated similar power consumption profiles.

Power consumption of the CU and DU computers, measured over a 300 second interval using a power-meter.

This anomaly sparked my curiosity and, to my surprise, guided me towards a highly enlightening troubleshooting session. That’s precisely why I’m eager to share with you today what I learned from this experience and guide you towards a methodology that you can employ should you encounter a similar challenge in the future. In this post we’ll cover:

If any of this sounds interesting to you, then please read on!

How to monitor the power consumption of Intel CPUs

As I delved deeper into the troubleshooting process, it became evident that isolating power consumption at a per-component level - specifically, examining the processor, RAM, and other key components, the processor being my primary suspect - was crucial for understanding the significant disparity in power consumption between the two computers. However, I soon realized that monitoring energy consumption at such a granular level posed significant challenges, requiring specialized hardware and a meticulous approach. Thankfully, Intel CPUs have incorporated a feature known as Running Average Power Limit (RAPL) since the Sandy Bridge architecture. This feature allows us to monitor power consumption across various domains of the CPU chip, including attached DRAM and on-chip GPU . It’s important to note that the specific domains available may vary depending on the processor :

For the purpose of this troubleshooting process, there was no need to increase the granularity of power measurement, so I decided to stick with the package power consumption. Using the user-space tool turbostat , I went ahead and collected the power samples when no RAN process was running to establish a power consumption baseline at the processor level:

sudo turbostat --out [FILE] --cpu package --quiet --show PkgWatt --debug -i 1 --num_iterations 300

In the previous command, notice the -i option, which is set to 1. This option sets the sampling period and thus should not be ignored. According to the Linux man pages: extremely short measurement intervals (much less than 1 second), or system activity that prevents turbostat from being able to run on all CPUS to quickly collect data, will result in inconsistent results . Hence, I maintained a 1-second period consistently across all my experiments.

Upon parsing the turbostat output, it became evident that the processors were contributing to the disparate power consumption profiles of the CU and the DU. While I had managed to isolate the issue to a certain extent (it’s worth noting that this finding didn’t rule out the possibility of other components exhibiting differences in power consumption), the reason behind the varying power consumption levels of the processors remained unclear. The next step in this process involved examining the frequency at which each processor was operating. Once again, I parsed the turbostat output to retrieve this information. I then plotted the ‘Bzy_MHz’ metric given in the turbostat output, which represents the average clock rate while the CPU was actively executing instructions .

The new results revealed that the two processors were operating at notably different frequencies, even when running under comparable workloads. At this point, it became clear that the two processors couldn’t be using the same frequency scaling policies, and thus a closer inspection of the implemented frequency scaling configurations was needed.

CPU Frequency Scaling Basics

This section introduces concepts that relevant to understand how frequency scaling works. Note that some of these concepts are specific to Intel CPUs.

Performance Cores and Efficient Cores

One of the first things you’ll ever want to do when trying to configure frequency scaling in your CPU is to refer to specifications of your CPU model. You can do this in various ways. For example, in Linux you can run lscpuin the terminal to obtain this information. Alternatively, you may refer to your CPU manufacturer’s webpage. In my lab, the CU and DU computers run on 13th Gen Intel(R) Core(TM) i9-13900K CPUs. By looking into the specifications of my CPU https://www.intel.com/content/www/us/en/products/sku/230496/intel-core-i913900k-processor-36m-cache-up-to-5-80-ghz/specifications.html I was able to retrieve the following information:

  • CPU Model: 13th Gen Intel(R) Core(TM) i9-13900K
  • Max. Freq: 5.8 GHz
  • 1 socket
    • 24 cores per socket
      • 8 performance cores
        • Each can execute 2 threads using multithreading technology
        • Performance-core Max Turbo Frequency: 5.40 GHz
        • Performance-core Base Frequency: 3.00 GHz
      • 16 efficient cores
        • Efficient-core Max Turbo Frequency: 4.30 GHz
        • Efficient-core Base Frequency: 2.20 GHz
  • 32 total threads (16 threads max. for performance cores and other 16 threads for efficient cores)
  • Processor base power: 125 W
  • Maximum turbo power: 253 W

As you can see, the CPU has twenty-four cores in a single socket, but they are not all the same. Eight of those cores are performance cores (or P-cores) and the 16 of them are efficient cores (or E-cores). What’s the difference between them? The main difference is that P-cores are capable of hyper-threading (i.e. running two software threads at once), while E-cores can only run a single software thread. There are considerable differences also in terms of the frequencies these cores support. P-cores are tuned for high turbo frequencies and high IPC (instructions per cycle) , while E-cores are designed to maximize CPU efficiency, measured as performance-per-watt. This leads to lower base and turbo frequencies for E-Cores, and thus lower power consumption.

We can also see in the CPU’s specifications that both P-cores and E-cores support two different frequencies: Base Frequency and Turbo Frequency. The CPU frequency defines the number of cycles (also known as clock ticks) a processor/core can execute per second. With higher frequency, higher performance can be achieved, but higher power is consumed. Processor core clocks can effectively operate at different frequencies:

Processor core clocks can transition between these different frequencies due to:

C-States and P-States

Intel CPUs can dynamically adjust the frequency of the cores according to the current load, which can save energy or improve performance depending on the use-case. For this reason, Intel CPUs implement so-called C-states and P-states, which are correlated with different idle levels and thus different levels of power consumption The number of C-states varies between different CPU models, but generally one can expect at least states 0 through 3.</d-footnote>. By tuning the state of the cores, the processor can effectively turn off unused components to save energy.

Mode Definition
C0 Operational state. CPU fully turned on.
C1 First idle state. Stops CPU main internal clocks via software. Bus interface unit and APIC are kept running at full speed.
C2 Stops CPU main internal clocks via hardware. State in which the processor maintains all software-visible states, but may take longer to wake up through interrupts.
C3 Stops all CPU internal clocks. The processor does not need to keep its cache coherent, but maintains other states. Some processors have variations of the C3 state that differ in how long it takes to wake the processor through interrupts.
Available C-states. From .

C0 corresponds to an operational state, i.e. the core is executing instructions. The higher the C number, the deeper the sleep state, and thus the higher the energy savings. The downside is that the deeper the sleep state, the higher the latency introduced to put the core back to C0. It is possible to configure the processor to use up to a certain C-state if saving energy is a priority.

When a core is in C0 state, it can be in one of several performance states (P-states). Thus, unlike C-states, P-states are exclusively operational states that correspond to specific frequency and voltage valuesThe number of P-states varies across processor models.. As a rule, the higher the clock frequency and the higher the voltage, the more instructions can be retired by the CPU over a unit of time, but also the more energy is consumed over a unit of time (or the more power is drawn) by the CPU in the given P-state. What is important to retain is that the higher the P-state, the lower the frequency and voltage at which the core is running. Thus, P0 is always the highest-performance state (if we exclude ‘Turbo Boost) .

Turbo Boost

Some processors support raising their frequency above the normal maximum for a short burst of time, under appropriate thermal conditions . On Intel processors, this is called Turbo Boost. Turbo boost applies to both performance and efficient cores and allows to dynamically overclock active CPU cores (i.e. increase the clock speed up to the Max Turbo Frequency ) while other cores are in deep sleep states, depending on the workload. For this reason, it is sometimes called algorithmic overclocking .

For Intel® Turbo Boost Technology 2.0 and Intel® Turbo Boost Max Technology 3.0If you are interested in understanding the difference between versions 2.0 and 3.0 you can check article , there is no need to install any drivers or software for both technologies . Besides, turbo mode is enabled by default . You should pay special attention to the BIOS settings to check if Turbo Boost is enabled .

Manual CPU overclocking

As discussed previously, Turbo Boost can dynamically increase the frequency of some cores above the normal maximum, which is why it is sometimes called algorithmic overclocking. This differs from manual CPU overclocking, where the user can fine-tune the overclocking settings to meet some performance requirement (e.g. for gaming purposes). If you are interested in overclocking your CPU, there’s good news: various methods are available , one of them being overclocking from the BIOS. For instance, your BIOS may display a CPU Core Ratio option The actual name of the setting may vary since it is defined by the motherboard manufacturer., also known as a multiplier, which determines the speed of the CPU. The overall speed of your processor is calculated by multiplying the base clock speed (BCLK) by this ratio. For example, a BCLK of 100MHz multiplied by a CPU core ratio of 45 would result in a CPU speed of 4,500MHz, or 4.5GHz. This setting can usually be changed per core or across all cores .

Linux support for Performance/Frequency Scaling

The Linux kernel supports CPU performance/frequency scaling, by means of the CPUFreq (CPU Frequency scaling) subsystem which consists of three different levels of abstraction: the core, scaling governors and scaling drivers :

User-space tools such as cpupower allow us to select the scaling driver and governor . In principle, all available scaling governors can be used with every scaling driver . However, it is important to note that CPUFreq allows scaling drivers to bypass the governor layer and implement their own performance scaling algorithms, which can be done by the intel_pstate scaling driver . More information to follow.

Scaling Drivers

In modern Intel processors, there are at least two drivers available: intel_pstate and acpi_cpufreq. In this post, we focus on the intel_pstate driver since this is the one we have been using. It implements a scaling driver with an internal governor for Intel Core (Sandy Bridge and newer) processors . intel_pstate can operate in one of two modes:

The “governors” used in active mode are not generic scaling governors, but their names are the same as the names of some generic governors. You should pay special attention to this because they generally do not work in the same way as the generic governors they share the names with . For example:

The powersave P-state selection algorithm provided by intel_pstate is not a counterpart of the generic powersave governor (roughly, it corresponds to the schedutil and ondemand governors)

There are two P-state selection algorithms provided by intel_pstate in the active mode: powersave and performance. The way they both operate depends on whether or not the hardware-managed P-states (HWP) feature has been enabled in the processor and possibly on the processor model .

Generic Scaling Governors

ArchLinux’s Wiki provides a comprehensive overview of the generic CPU governors, which I have referenced below for easier access.

Governor Description
performance Run the CPU at the maximum frequency, obtained from /sys/devices/system/cpu/cpuX/cpufreq/scaling_max_freq
powersave Run the CPU at the minimum frequency, obtained from /sys/devices/system/cpu/cpuX/cpufreq/scaling_min_freq
userspace Run the CPU at user specified frequencies, configurable via /sys/devices/system/cpu/cpuX/cpufreq/scaling_setspeed
ondemand Scales the frequency dynamically according to current load. Jumps to the highest frequency and then possibly back off as the idle time increases
conservative Scales the frequency dynamically according to current load. Scales the frequency more gradually than ondemand
schedutil Scheduler-driven CPU frequency selection
Available C-states. From .

CPU Frequency Scaling Cheat Sheet

Default to the intel_pstate driver

  1. Head over to /etc/default/grub and set GRUB_CMDLINE_LINUX = 'intel_pstate=enable'. Note that if you set this option to disable then acpi_cpufreq is used instead.
  2. Execute sudo update-grub on the terminal.
  3. Reboot the computer with sudo reboot.

Set intel_pstate driver to active/passive mode

  1. Head over to /etc/default/grub and set GRUB_CMDLINE_LINUX = 'intel_pstate=X', where X can be either active or passive.
  2. Execute sudo update-grub on the terminal.
  3. Reboot the computer with sudo reboot.
  4. Check the status of the driver by executing cat /sys/devices/system/cpu/intel_pstate/status. Alternatively, you can execute sudo cpupower frequency-info: if the driver was set to passive mode, then you should see a line “driver: intel_cpufreq”, otherwise you should see “driver: intel_pstate”.

Check which governor is being used/Check CPU frequency scaling policies at the core level

cpupower -c all frequency-info

Set a generic CPU governor

  1. Head over to /etc/default/cpufrequtils and add the line GOVERNOR=”X”, where X can be one of the generic governors listed above (e.g. ondemand, performance, etc).
  2. Execute sudo systemctl restart cpufrequtils.
  3. Execute cpupower -c all frequency-info to check which performance governor is being used.

Check which governors (generic or not) are available

Execute cpupower frequency-info and look for “available cpufreq governors”.

Enable/disable turbo mode

Edit the file ‘/sys/devices/system/cpu/intel_pstate/no_turbo’ or ‘/sys/devices/system/cpu/cpufreq/intel_pstate/no_turbo’, depending on which one is available. There should be a single number written to that file: if 1 (see command below), then the driver is not allowed to set any turbo P-states; if equal to 0 (default), then turbo P-states can be set by the driver .

echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

Check frequency of each core in real-time

Execute watch cat /sys/devices/system/cpu/cpu[0-9]*/cpufreq/scaling_cur_freq .

Closing the troubleshooting process

As it turned out, the high differences in power consumption between the CU and DU computers could be explained by differences in the BIOS configuration of CPU Core Ratio. In particular, the DU was overclocked, while the CU was not. Once we re-configured the two computers to use the same settings (overclocking, state of the intel_pstate driver, and governor), we repeated our experiments and were pleased to find that the two computers were finally exhibiting comparable power consumption profiles.