Intel® 64 And IA 32 Architectures Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2 Intel 2018 11 [Intel Manual Vol.3B

User Manual:

Open the PDF directly: View PDF .
Page Count: 632

Download
Open PDF In Browser	View PDF

Intel® 64 and IA-32 Architectures
Software Developer’s Manual
Volume 3B:
System Programming Guide, Part 2

NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of ten volumes:
Basic Architecture, Order Number 253665; Instruction Set Reference A-L, Order Number 253666;
Instruction Set Reference M-U, Order Number 253667; Instruction Set Reference V-Z, Order Number
326018; Instruction Set Reference, Order Number 334569; System Programming Guide, Part 1, Order
Number 253668; System Programming Guide, Part 2, Order Number 253669; System Programming
Guide, Part 3, Order Number 326019; System Programming Guide, Part 4, Order Number 332831;
Model-Specific Registers, Order Number 335592. Refer to all ten volumes when evaluating your design
needs.

Order Number: 253669-068US
November 2018

Intel technologies features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Learn
more at intel.com, or from the OEM or retailer.
No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting
from such losses.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products
described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject
matter disclosed herein.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
This document contains information on products, services and/or processes in development. All information provided here is subject to change
without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1800-548-4725, or by visiting http://www.intel.com/design/literature.htm.
Intel, the Intel logo, Intel Atom, Intel Core, Intel SpeedStep, MMX, Pentium, VTune, and Xeon are trademarks of Intel Corporation in the U.S.
and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 1997-2018, Intel Corporation. All Rights Reserved.

CHAPTER 14
POWER AND THERMAL MANAGEMENT
This chapter describes facilities of Intel 64 and IA-32 architecture used for power management and thermal monitoring.

14.1

ENHANCED INTEL SPEEDSTEP® TECHNOLOGY

Enhanced Intel SpeedStep® Technology was introduced in the Pentium M processor. The technology enables the
management of processor power consumption via performance state transitions. These states are defined as
discrete operating points associated with different voltages and frequencies.
Enhanced Intel SpeedStep Technology differs from previous generations of Intel SpeedStep® Technology in two
ways:

•

Centralization of the control mechanism and software interface in the processor by using model-specific
registers.

•

Reduced hardware overhead; this permits more frequent performance state transitions.

Previous generations of the Intel SpeedStep Technology require processors to be a deep sleep state, holding off bus
master transfers for the duration of a performance state transition. Performance state transitions under the
Enhanced Intel SpeedStep Technology are discrete transitions to a new target frequency.
Support is indicated by CPUID, using ECX feature bit 07. Enhanced Intel SpeedStep Technology is enabled by
setting IA32_MISC_ENABLE MSR, bit 16. On reset, bit 16 of IA32_MISC_ENABLE MSR is cleared.

14.1.1

Software Interface For Initiating Performance State Transitions

State transitions are initiated by writing a 16-bit value to the IA32_PERF_CTL register, see Figure 14-2. If a transition is already in progress, transition to a new value will subsequently take effect.
Reads of IA32_PERF_CTL determine the last targeted operating point. The current operating point can be read from
IA32_PERF_STATUS. IA32_PERF_STATUS is updated dynamically.
The 16-bit encoding that defines valid operating points is model-specific. Applications and performance tools are
not expected to use either IA32_PERF_CTL or IA32_PERF_STATUS and should treat both as reserved. Performance
monitoring tools can access model-specific events and report the occurrences of state transitions.

14.2

P-STATE HARDWARE COORDINATION

The Advanced Configuration and Power Interface (ACPI) defines performance states (P-states) that are used to
facilitate system software’s ability to manage processor power consumption. Different P-states correspond to
different performance levels that are applied while the processor is actively executing instructions. Enhanced Intel
SpeedStep Technology supports P-states by providing software interfaces that control the operating frequency and
voltage of a processor.
With multiple processor cores residing in the same physical package, hardware dependencies may exist for a
subset of logical processors on a platform. These dependencies may impose requirements that impact the coordination of P-state transitions. As a result, multi-core processors may require an OS to provide additional software
support for coordinating P-state transitions for those subsets of logical processors.
ACPI firmware can choose to expose P-states as dependent and hardware-coordinated to OS power management
(OSPM) policy. To support OSPMs, multi-core processors must have additional built-in support for P-state hardware
coordination and feedback.
Intel 64 and IA-32 processors with dependent P-states amongst a subset of logical processors permit hardware
coordination of P-states and provide a hardware-coordination feedback mechanism using IA32_MPERF MSR and
Vol. 3B 14-1

POWER AND THERMAL MANAGEMENT

IA32_APERF MSR. See Figure 14-1 for an overview of the two 64-bit MSRs and the bullets below for a detailed
description.

IA32_MPERF (Addr: E7H)

IA32_APERF (Addr: E8H)

Figure 14-1. IA32_MPERF MSR and IA32_APERF MSR for P-state Coordination

•

Use CPUID to check the P-State hardware coordination feedback capability bit. CPUID.06H.ECX[Bit 0] = 1
indicates IA32_MPERF MSR and IA32_APERF MSR are present.

•

IA32_MPERF MSR (E7H) increments in proportion to a fixed frequency, which is configured when the processor
is booted.

•

IA32_APERF MSR (E8H) increments in proportion to actual performance, while accounting for hardware coordination of P-state and TM1/TM2; or software initiated throttling.

•

The MSRs are per logical processor; they measure performance only when the targeted processor is in the C0
state.

•

Only the IA32_APERF/IA32_MPERF ratio is architecturally defined; software should not attach meaning to the
content of the individual of IA32_APERF or IA32_MPERF MSRs.

•
•

When either MSR overflows, both MSRs are reset to zero and continue to increment.
Both MSRs are full 64-bits counters. Each MSR can be written to independently. However, software should
follow the guidelines illustrated in Example 14-1.

If P-states are exposed by the BIOS as hardware coordinated, software is expected to confirm processor support
for P-state hardware coordination feedback and use the feedback mechanism to make P-state decisions. The OSPM
is expected to either save away the current MSR values (for determination of the delta of the counter ratio at a later
time) or reset both MSRs (execute WRMSR with 0 to these MSRs individually) at the start of the time window used
for making the P-state decision. When not resetting the values, overflow of the MSRs can be detected by checking
whether the new values read are less than the previously saved values.
Example 14-1 demonstrates steps for using the hardware feedback mechanism provided by IA32_APERF MSR and
IA32_MPERF MSR to determine a target P-state.
Example 14-1. Determine Target P-state From Hardware Coordinated Feedback
DWORD PercentBusy; // Percentage of processor time not idle.
// Measure “PercentBusy“ during previous sampling window.
// Typically, “PercentBusy“ is measure over a time scale suitable for
// power management decisions
//
// RDMSR of MCNT and ACNT should be performed without delay.
// Software needs to exercise care to avoid delays between
// the two RDMSRs (for example, interrupts).
MCNT = RDMSR(IA32_MPERF);
ACNT = RDMSR(IA32_APERF);
// PercentPerformance indicates the percentage of the processor
// that is in use. The calculation is based on the PercentBusy,
// that is the percentage of processor time not idle and the P-state
// hardware coordinated feedback using the ACNT/MCNT ratio.
// Note that both values need to be calculated over the same

14-2 Vol. 3B

POWER AND THERMAL MANAGEMENT

// time window.
PercentPerformance = PercentBusy * (ACNT/MCNT);

// This example does not cover the additional logic or algorithms
// necessary to coordinate multiple logical processors to a target P-state.
TargetPstate = FindPstate(PercentPerformance);
if (TargetPstate ≠ currentPstate) {
SetPState(TargetPstate);
}
// WRMSR of MCNT and ACNT should be performed without delay.
// Software needs to exercise care to avoid delays between
// the two WRMSRs (for example, interrupts).
WRMSR(IA32_MPERF, 0);
WRMSR(IA32_APERF, 0);

14.3

SYSTEM SOFTWARE CONSIDERATIONS AND OPPORTUNISTIC PROCESSOR
PERFORMANCE OPERATION

An Intel 64 processor may support a form of processor operation that takes advantage of design headroom to
opportunistically increase performance. The Intel® Turbo Boost Technology can convert thermal headroom into
higher performance across multi-threaded and single-threaded workloads. The Intel® Dynamic Acceleration Technology feature can convert thermal headroom into higher performance if only one thread is active.

14.3.1

Intel® Dynamic Acceleration Technology

The Intel Core 2 Duo processor T 7700 introduces Intel Dynamic Acceleration Technology. Intel Dynamic Acceleration Technology takes advantage of thermal design headroom and opportunistically allows a single core to operate
at a higher performance level when the operating system requests increased performance.

14.3.2

System Software Interfaces for Opportunistic Processor Performance Operation

Opportunistic processor performance operation, applicable to Intel Dynamic Acceleration Technology and Intel®
Turbo Boost Technology, has the following characteristics:

•

A transition from a normal state of operation (e.g. Intel Dynamic Acceleration Technology/Turbo mode
disengaged) to a target state is not guaranteed, but may occur opportunistically after the corresponding enable
mechanism is activated, the headroom is available and certain criteria are met.

•
•

The opportunistic processor performance operation is generally transparent to most application software.

•

When opportunistic processor performance operation is engaged, the OS should use hardware coordination
feedback mechanisms to prevent un-intended policy effects if it is activated during inappropriate situations.

System software (BIOS and Operating system) must be aware of hardware support for opportunistic processor
performance operation and may need to temporarily disengage opportunistic processor performance operation
when it requires more predictable processor operation.

14.3.2.1

Discover Hardware Support and Enabling of Opportunistic Processor Performance Operation

If an Intel 64 processor has hardware support for opportunistic processor performance operation, the power-on
default state of IA32_MISC_ENABLE[38] indicates the presence of such hardware support. For Intel 64 processors
that support opportunistic processor performance operation, the default value is 1, indicating its presence. For
processors that do not support opportunistic processor performance operation, the default value is 0. The powerVol. 3B 14-3

POWER AND THERMAL MANAGEMENT

on default value of IA32_MISC_ENABLE[38] allows BIOS to detect the presence of hardware support of opportunistic processor performance operation.
IA32_MISC_ENABLE[38] is shared across all logical processors in a physical package. It is written by BIOS during
platform initiation to enable/disable opportunistic processor performance operation in conjunction of OS power
management capabilities, see Section 14.3.2.2. BIOS can set IA32_MISC_ENABLE[38] with 1 to disable opportunistic processor performance operation; it must clear the default value of IA32_MISC_ENABLE[38] to 0 to enable
opportunistic processor performance operation. OS and applications must use CPUID leaf 06H if it needs to detect
processors that have opportunistic processor performance operation enabled.
When CPUID is executed with EAX = 06H on input, Bit 1 of EAX in Leaf 06H (i.e. CPUID.06H:EAX[1]) indicates
opportunistic processor performance operation, such as Intel Dynamic Acceleration Technology, has been enabled
by BIOS.
Opportunistic processor performance operation can be disabled by setting bit 38 of IA32_MISC_ENABLE. This
mechanism is intended for BIOS only. If IA32_MISC_ENABLE[38] is set, CPUID.06H:EAX[1] will return 0.

14.3.2.2

OS Control of Opportunistic Processor Performance Operation

There may be phases of software execution in which system software cannot tolerate the non-deterministic aspects
of opportunistic processor performance operation. For example, when calibrating a real-time workload to make a
CPU reservation request to the OS, it may be undesirable to allow the possibility of the processor delivering
increased performance that cannot be sustained after the calibration phase.
System software can temporarily disengage opportunistic processor performance operation by setting bit 32 of the
IA32_PERF_CTL MSR (0199H), using a read-modify-write sequence on the MSR. The opportunistic processor
performance operation can be re-engaged by clearing bit 32 in IA32_PERF_CTL MSR, using a read-modify-write
sequence. The DISENAGE bit in IA32_PERF_CTL is not reflected in bit 32 of the IA32_PERF_STATUS MSR (0198H),
and it is not shared between logical processors in a physical package. In order for OS to engage Intel Dynamic
Acceleration Technology/Turbo mode, the BIOS must:

•
•

Enable opportunistic processor performance operation, as described in Section 14.3.2.1.
Expose the operating points associated with Intel Dynamic Acceleration Technology/Turbo mode to the OS.

33 32 31
Reserved

16 15

Reserved

Intel® Dynamic Acceleration Technology / Turbo DISENGAGE
Enhanced Intel Speedstep® Technology Transition Target

Figure 14-2. IA32_PERF_CTL Register

14.3.2.3

Required Changes to OS Power Management P-State Policy

Intel Dynamic Acceleration Technology and Intel Turbo Boost Technology can provide opportunistic performance
greater than the performance level corresponding to the Processor Base frequency of the processor (see CPUID’s
processor frequency information). System software can use a pair of MSRs to observe performance feedback. Software must query for the presence of IA32_APERF and IA32_MPERF (see Section 14.2). The ratio between
IA32_APERF and IA32_MPERF is architecturally defined and a value greater than unity indicates performance
increase occurred during the observation period due to Intel Dynamic Acceleration Technology. Without incorporating such performance feedback, the target P-state evaluation algorithm can result in a non-optimal P-state
target.

14-4 Vol. 3B

POWER AND THERMAL MANAGEMENT

There are other scenarios under which OS power management may want to disable Intel Dynamic Acceleration
Technology, some of these are listed below:

•

When engaging ACPI defined passive thermal management, it may be more effective to disable Intel Dynamic
Acceleration Technology for the duration of passive thermal management.

•

When the user has indicated a policy preference of power savings over performance, OS power management
may want to disable Intel Dynamic Acceleration Technology while that policy is in effect.

14.3.3

Intel® Turbo Boost Technology

Intel Turbo Boost Technology is supported in Intel Core i7 processors and Intel Xeon processors based on Intel®
microarchitecture code name Nehalem. It uses the same principle of leveraging thermal headroom to dynamically
increase processor performance for single-threaded and multi-threaded/multi-tasking environment. The programming interface described in Section 14.3.2 also applies to Intel Turbo Boost Technology.

14.3.4

Performance and Energy Bias Hint support

Intel 64 processors may support additional software hint to guide the hardware heuristic of power management
features to favor increasing dynamic performance or conserve energy consumption.
Software can detect the processor's capability to support the performance-energy bias preference hint by examining bit 3 of ECX in CPUID leaf 6. The processor supports this capability if CPUID.06H:ECX.SETBH[bit 3] is set and
it also implies the presence of a new architectural MSR called IA32_ENERGY_PERF_BIAS (1B0H).
Software can program the lowest four bits of IA32_ENERGY_PERF_BIAS MSR with a value from 0 - 15. The values
represent a sliding scale, where a value of 0 (the default reset value) corresponds to a hint preference for highest
performance and a value of 15 corresponds to the maximum energy savings. A value of 7 roughly translates into a
hint to balance performance with energy consumption.

Reserved

Energy Policy Preference Hint

Figure 14-3. IA32_ENERGY_PERF_BIAS Register
The layout of IA32_ENERGY_PERF_BIAS is shown in Figure 14-3. The scope of IA32_ENERGY_PERF_BIAS is per
logical processor, which means that each of the logical processors in the package can be programmed with a
different value. This may be especially important in virtualization scenarios, where the performance / energy
requirements of one logical processor may differ from the other. Conflicting “hints” from various logical processors
at higher hierarchy level will be resolved in favor of performance over energy savings.
Software can use whatever criteria it sees fit to program the MSR with an appropriate value. However, the value
only serves as a hint to the hardware and the actual impact on performance and energy savings is model specific.

14.4

HARDWARE-CONTROLLED PERFORMANCE STATES (HWP)

Intel processors may contain support for Hardware-Controlled Performance States (HWP), which autonomously
selects performance states while utilizing OS supplied performance guidance hints. The Enhanced Intel SpeedStep® Technology provides a means for the OS to control and monitor discrete frequency-based operating points
via the IA32_PERF_CTL and IA32_PERF_STATUS MSRs.

Vol. 3B 14-5

POWER AND THERMAL MANAGEMENT

In contrast, HWP is an implementation of the ACPI-defined Collaborative Processor Performance Control (CPPC),
which specifies that the platform enumerates a continuous, abstract unit-less, performance value scale that is not
tied to a specific performance state / frequency by definition. While the enumerated scale is roughly linear in terms
of a delivered integer workload performance result, the OS is required to characterize the performance value range
to comprehend the delivered performance for an applied workload.
When HWP is enabled, the processor autonomously selects performance states as deemed appropriate for the
applied workload and with consideration of constraining hints that are programmed by the OS. These OS-provided
hints include minimum and maximum performance limits, preference towards energy efficiency or performance,
and the specification of a relevant workload history observation time window. The means for the OS to override
HWP's autonomous selection of performance state with a specific desired performance target is also provided,
however, the effective frequency delivered is subject to the result of energy efficiency and performance optimizations.

14.4.1

HWP Programming Interfaces

The programming interfaces provided by HWP include the following:

•

The CPUID instruction allows software to discover the presence of HWP support in an Intel processor. Specifically, execute CPUID instruction with EAX=06H as input will return 5 bit flags covering the following aspects in
bits 7 through 11 of CPUID.06H:EAX:
— Availability of HWP baseline resource and capability, CPUID.06H:EAX[bit 7]: If this bit is set, HWP provides
several new architectural MSRs: IA32_PM_ENABLE, IA32_HWP_CAPABILITIES, IA32_HWP_REQUEST,
IA32_HWP_STATUS.
— Availability of HWP Notification upon dynamic Guaranteed Performance change, CPUID.06H:EAX[bit 8]: If
this bit is set, HWP provides IA32_HWP_INTERRUPT MSR to enable interrupt generation due to dynamic
Performance changes and excursions.
— Availability of HWP Activity window control, CPUID.06H:EAX[bit 9]: If this bit is set, HWP allows software to
program activity window in the IA32_HWP_REQUEST MSR.
— Availability of HWP energy/performance preference control, CPUID.06H:EAX[bit 10]: If this bit is set, HWP
allows software to set an energy/performance preference hint in the IA32_HWP_REQUEST MSR.
— Availability of HWP package level control, CPUID.06H:EAX[bit 11]:If this bit is set, HWP provides the
IA32_HWP_REQUEST_PKG MSR to convey OS Power Management’s control hints for all logical processors
in the physical package.

Table 14-1. Architectural and Non-Architectural MSRs Related to HWP
Address

Architectural

Description

770H

IA32_PM_ENABLE

Enable/Disable HWP.

771H

IA32_HWP_CAPABILITIES

Enumerates the HWP performance range (static and dynamic).

772H

IA32_HWP_REQUEST_PKG

Conveys OSPM's control hints (Min, Max, Activity Window, Energy
Performance Preference, Desired) for all logical processor in the
physical package.

773H

IA32_HWP_INTERRUPT

Controls HWP native interrupt generation (Guaranteed Performance
changes, excursions).

774H

IA32_HWP_REQUEST

Conveys OSPM's control hints (Min, Max, Activity Window, Energy
Performance Preference, Desired) for a single logical processor.

775H

IA32_HWP_PECI_REQUEST_INFO

Conveys embedded system controller requests to override some of
the OS HWP Request settings via the PECI mechanism.

777H

IA32_HWP_STATUS

Status bits indicating changes to Guaranteed Performance and
excursions to Minimum Performance.

19CH

IA32_THERM_STATUS[bits 15:12]

Conveys reasons for performance excursions.

64EH

MSR_PPERF

Productive Performance Count.

14-6 Vol. 3B

POWER AND THERMAL MANAGEMENT

•

Additionally, HWP may provide a non-architectural MSR, MSR_PPERF, which provides a quantitative metric to
software of hardware’s view of workload scalability. This hardware’s view of workload scalability is implementation specific.

14.4.2

Enabling HWP

The layout of the IA32_PM_ENABLE MSR is shown in Figure 14-4. The bit fields are described below:

1 0
Reserved

HWP_ENABLE

Figure 14-4. IA32_PM_ENABLE MSR

•

HWP_ENABLE (bit 0, R/W1Once) — Software sets this bit to enable HWP with autonomous selection of
processor P-States. When set, the processor will disregard input from the legacy performance control interface
(IA32_PERF_CTL). Note this bit can only be enabled once from the default value. Once set, writes to the
HWP_ENABLE bit are ignored. Only RESET will clear this bit. Default = zero (0).

•

Bits 63:1 are reserved and must be zero.

After software queries CPUID and verifies the processor’s support of HWP, system software can write 1 to
IA32_PM_ENABLE.HWP_ENABLE (bit 0) to enable hardware controlled performance states. The default value of
IA32_PM_ENABLE MSR at power-on is 0, i.e. HWP is disabled.
Additional MSRs associated with HWP may only be accessed after HWP is enabled, with the exception of
IA32_HWP_INTERRUPT and MSR_PPERF. Accessing the IA32_HWP_INTERRUPT MSR requires only HWP is present
as enumerated by CPUID but does not require enabling HWP.
IA32_PM_ENABLE is a package level MSR, i.e., writing to it from any logical processor within a package affects all
logical processors within that package.

14.4.3

HWP Performance Range and Dynamic Capabilities

The OS reads the IA32_HWP_CAPABILITIES MSR to comprehend the limits of the HWP-managed performance
range as well as the dynamic capability, which may change during processor operation. The enumerated performance range values reported by IA32_HWP_CAPABILITIES directly map to initial frequency targets (prior to workload-specific frequency optimizations of HWP). However the mapping is processor family specific.
The layout of the IA32_HWP_CAPABILITIES MSR is shown in Figure 14-5. The bit fields are described below:

Vol. 3B 14-7

POWER AND THERMAL MANAGEMENT

32 31

24 23

16 15

Reserved

Lowest_Performance
Most_Efficient_Performance
Guaranteed_Performance
Highest_Performance

Figure 14-5. IA32_HWP_CAPABILITIES Register

•
•

Highest_Performance (bits 7:0, RO) — Value for the maximum non-guaranteed performance level.

•

Most_Efficient_Performance (bits 23:16, RO) — Current value of the most efficient performance level.
This value can change dynamically as a result of workload characteristics.

•

Lowest_Performance (bits 31:24, RO) — Value for the lowest performance level that software can program
to IA32_HWP_REQUEST.

•

Bits 63:32 are reserved and must be zero.

Guaranteed_Performance (bits 15:8, RO) — Current value for the guaranteed performance level. This
value can change dynamically as a result of internal or external constraints, e.g. thermal or power limits.

The value returned in the Guaranteed_Performance field is hardware's best-effort approximation of the available performance given current operating constraints. Changes to the Guaranteed_Performance value will
primarily occur due to a shift in operational mode. This includes a power or other limit applied by an external agent,
e.g. RAPL (see Figure 14.9.1), or the setting of a Configurable TDP level (see model-specific controls related to
Programmable TDP Limit in Chapter 2, “Model-Specific Registers (MSRs)” in the Intel® 64 and IA-32 Architectures
Software Developer’s Manual, Volume 4.). Notification of a change to the Guaranteed_Performance occurs via
interrupt (if configured) and the IA32_HWP_Status MSR. Changes to Guaranteed_Performance are indicated when
a macroscopically meaningful change in performance occurs i.e. sustained for greater than one second. Consequently, notification of a change in Guaranteed Performance will typically occur no more frequently than once per
second. Rapid changes in platform configuration, e.g. docking / undocking, with corresponding changes to a
Configurable TDP level could potentially cause more frequent notifications.
The value returned by the Most_Efficient_Performance field provides the OS with an indication of the practical
lower limit for the IA32_HWP_REQUEST. The processor may not honor IA32_HWP_REQUEST.Maximum Performance settings below this value.

14.4.4

Managing HWP

14.4.4.1

IA32_HWP_REQUEST MSR (Address: 0x774 Logical Processor Scope)

Typically, the operating system controls HWP operation for each logical processor via the writing of control hints /
constraints to the IA32_HWP_REQUEST MSR. The layout of the IA32_HWP_REQUEST MSR is shown in Figure 14-6.
The bit fields are described below Figure 14-6.
Operating systems can control HWP by writing both IA32_HWP_REQUEST and IA32_HWP_REQUEST_PKG MSRs
(see Section 14.4.4.2). Five valid bits within the IA32_HWP_REQUEST MSR let the operating system flexibly select
which of its five hint / constraint fields should be derived by the processor from the IA32_HWP_REQUEST MSR and
which should be derived from the IA32_HWP_REQUEST_PKG MSR. These five valid bits are supported if
CPUID[6].EAX[17] is set.
14-8 Vol. 3B

POWER AND THERMAL MANAGEMENT

When the IA32_HWP_REQUEST MSR Package Control bit is set, any valid bit that is NOT set indicates to the
processor to use the respective field value from the IA32_HWP_REQUEST_PKG MSR. Otherwise, the values are
derived from the IA32_HWP_REQUEST MSR. The valid bits are ignored when the IA32_HWP_REQUEST MSR
Package Control bit is zero.

63 62 61 60 59

43 42 41

32 31

24 23

16 15

Reserved
Minimum Valid
Maximum Valid
Desired Valid
EPP Valid
Activity_Window Valid
Package_Control
Activity_Window
Energy_Performance_Preference
Desired_Performance
Maximum_Performance
Minimum_Performance

Figure 14-6. IA32_HWP_REQUEST Register

•

Minimum_Performance (bits 7:0, RW) — Conveys a hint to the HWP hardware. The OS programs the
minimum performance hint to achieve the required quality of service (QOS) or to meet a service level
agreement (SLA) as needed. Note that an excursion below the level specified is possible due to hardware
constraints. The default value of this field is IA32_HWP_CAPABILITIES.Lowest_Performance.

•

Maximum_Performance (bits 15:8, RW) — Conveys a hint to the HWP hardware. The OS programs this
field to limit the maximum performance that is expected to be supplied by the HWP hardware. Excursions
above the limit requested by OS are possible due to hardware coordination between the processor cores and
other components in the package. The default value of this field is
IA32_HWP_CAPABILITIES.Highest_Performance.

•

Desired_Performance (bits 23:16, RW) — Conveys a hint to the HWP hardware. When set to zero,
hardware autonomous selection determines the performance target. When set to a non-zero value (between
the range of Lowest_Performance and Highest_Performance of IA32_HWP_CAPABILITIES) conveys an explicit
performance request hint to the hardware; effectively disabling HW Autonomous selection. The
Desired_Performance input is non-constraining in terms of Performance and Energy Efficiency optimizations,
which are independently controlled. The default value of this field is 0.

•

Energy_Performance_Preference (bits 31:24, RW) — Conveys a hint to the HWP hardware. The OS may
write a range of values from 0 (performance preference) to 0FFH (energy efficiency preference) to influence
the rate of performance increase /decrease and the result of the hardware's energy efficiency and performance
optimizations. The default value of this field is 80H. Note: If CPUID.06H:EAX[bit 10] indicates that this field is
not supported, HWP uses the value of the IA32_ENERGY_PERF_BIAS MSR to determine the energy efficiency /
performance preference.

•

Activity_Window (bits 41:32, RW) — Conveys a hint to the HWP hardware specifying a moving workload
history observation window for performance/frequency optimizations. If 0, the hardware will determine the
appropriate window size. When writing a non-zero value to this field, this field is encoded in the format of bits
38:32 as a 7-bit mantissa and bits 41:39 as a 3-bit exponent value in powers of 10. The resultant value is in
microseconds. Thus, the minimal/maximum activity window size is 1 microsecond/1270 seconds. Combined
with the Energy_Performance_Preference input, Activity_Window influences the rate of performance increase

Vol. 3B 14-9

POWER AND THERMAL MANAGEMENT

/ decrease. This non-zero hint only has meaning when Desired_Performance = 0. The default value of this field
is 0.

•

Package_Control (bit 42, RW) — When set, causes this logical processor's IA32_HWP_REQUEST control
inputs to be derived from the IA32_HWP_REQUEST_PKG MSR.

•
•

Bits 58:43 are reserved and must be zero.

•

EPP Valid (bit 60, RW) — When set, indicates to the processor to derive the EPP field value from the
IA32_HWP_REQUEST MSR even if the package control bit is set. Otherwise, derive it from the
IA32_HWP_REQUEST_PKG MSR. The default value of this field is 0.

•

Desired Valid (bit 61, RW) — When set, indicates to the processor to derive the Desired Performance field
value from the IA32_HWP_REQUEST MSR even if the package control bit is set. Otherwise, derive it from the
IA32_HWP_REQUEST_PKG MSR. The default value of this field is 0.

•

Maximum Valid (bit 62, RW) — When set, indicates to the processor to derive the Maximum Performance
field value from the IA32_HWP_REQUEST MSR even if the package control bit is set. Otherwise, derive it from
the IA32_HWP_REQUEST_PKG MSR. The default value of this field is 0.

•

Minimum Valid (bit 63, RW) — When set, indicates to the processor to derive the Minimum Performance field
value from the IA32_HWP_REQUEST MSR even if the package control bit is set. Otherwise, derive it from the
IA32_HWP_REQUEST_PKG MSR. The default value of this field is 0.

Activity_Window Valid (bit 59, RW) — When set, indicates to the processor to derive the Activity Window
field value from the IA32_HWP_REQUEST MSR even if the package control bit is set. Otherwise, derive it from
the IA32_HWP_REQUEST_PKG MSR. The default value of this field is 0.

The HWP hardware clips and resolves the field values as necessary to the valid range. Reads return the last value
written not the clipped values.
Processors may support a subset of IA32_HWP_REQUEST fields as indicated by CPUID. Reads of non-supported
fields will return 0. Writes to non-supported fields are ignored.
The OS may override HWP's autonomous selection of performance state with a specific performance target by
setting the Desired_Performance field to a non-zero value, however, the effective frequency delivered is subject to
the result of energy efficiency and performance optimizations, which are influenced by the Energy Performance
Preference field.
Software may disable all hardware optimizations by setting Minimum_Performance = Maximum_Performance
(subject to package coordination).
Note: The processor may run below the Minimum_Performance level due to hardware constraints including: power,
thermal, and package coordination constraints. The processor may also run below the Minimum_Performance level
for short durations (few milliseconds) following C-state exit, and when Hardware Duty Cycling (see Section 14.5) is
enabled.
When the IA32_HWP_REQUEST MSR is set to fast access mode, writes of this MSR are posted, i.e., the WRMSR
instruction retires before the data reaches its destination within the processor. It may retire even before all
preceding IA stores are globally visible, i.e., it is not an architecturally serializing instruction anymore (no store
fence). A new CPUID bit indicates this new characteristic of the IA32_HWP_REQUEST MSR (see Section 14.4.8 for
additional details).

14-10 Vol. 3B

POWER AND THERMAL MANAGEMENT

14.4.4.2

IA32_HWP_REQUEST_PKG MSR (Address: 0x772 Package Scope)

42 41

32 31

24 23

16 15

Reserved

Activity_Window
Energy_Performance_Preference
Desired_Performance
Maximum_Performance
Minimum_Performance

Figure 14-7. IA32_HWP_REQUEST_PKG Register
The structure of the IA32_HWP_REQUEST_PKG MSR (package-level) is identical to the IA32_HWP_REQUEST MSR
with the exception of the the Package Control bit field and the five valid bit fields, which do not exist in the
IA32_HWP_REQUEST_PKG MSR. Field values written to this MSR apply to all logical processors within the physical
package with the exception of logical processors whose IA32_HWP_REQUEST.Package Control field is clear (zero).
Single P-state Control mode is only supported when IA32_HWP_REQUEST_PKG is not supported.

14.4.4.3

IA32_HWP_PECI_REQUEST_INFO MSR (Address 0x775 Package Scope)

When an embedded system controller is integrated in the platform, it can override some of the OS HWP Request
settings via the PECI mechanism. PECI initiated settings take precedence over the relevant fields in the
IA32_HWP_REQUEST MSR and in the IA32_HWP_REQUEST_PKG MSR, irrespective of the Package Control bit or
the Valid Bit values described above. PECI can independently control each of: Minimum Performance, Maximum
Performance and EPP fields. This MSR contains both the PECI induced values and the control bits that indicate
whether the embedded controller actually set the processor to use the respective value.
PECI override is supported if CPUID[6].EAX[16] is set.

63 62 61 60 59

32 31
Reserved

24 23

16 15

Reserved

Min PECI Override
Max PECI Override
Reserved
EPP PECI Override
Energy_Performance_Preference
Maximum_Performance
Minimum_Performance

Figure 14-8. IA32_HWP_PECI_REQUEST_INFO MSR

Vol. 3B 14-11

POWER AND THERMAL MANAGEMENT

The layout of the IA32_HWP_PECI_REQUEST_INFO MSR is shown in Figure 14-8. This MSR is writable by the
embedded controller but is read-only by software executing on the CPU. This MSR has Package scope. The bit fields
are described below:

•

Minimum_Performance (bits 7:0, RO) — Used by the OS to read the latest value of PECI minimum
performance input.

•

Maximum_Performance (bits 15:8, RO) — Used by the OS to read the latest value of PECI maximum
performance input.

•
•

Bits 23:16 are reserved and must be zero.

•
•

Bits 59:32 are reserved and must be zero.

•
•

Bit 61 is reserved and must be zero.

•

Min_PECI_Override (bit 63, RO) — Indicates whether PECI if currently overriding the Minimum Performance
input. If set(1), PECI is overriding the Minimum Performance input. If clear(0), OS has control over Minimum
Performance input.

Energy_Performance_Preference (bits 31:24, RO) — Used by the OS to read the latest value of PECI
energy performance preference input.
EPP_PECI_Override (bit 60, RO) — Indicates whether PECI if currently overriding the Energy Performance
Preference input. If set(1), PECI is overriding the Energy Performance Preference input. If clear(0), OS has
control over Energy Performance Preference input.
Max_PECI_Override (bit 62, RO) — Indicates whether PECI if currently overriding the Maximum
Performance input. If set(1), PECI is overriding the Maximum Performance input. If clear(0), OS has control
over Maximum Performance input.

HWP Request Field Hierarchical Resolution
HWP Request field resolution is fed by three MSRs: IA32_HWP_REQUEST, IA32_HWP_REQUEST_PKG and
IA32_HWP_PECI_REQUEST_INFO. The flow that the processor goes through to resolve which field value is chosen
is shown below.
For each of the two HWP Request fields; Desired and Activity Window:
If IA32_HWP_REQUEST.PACKAGE_CONTROL = 1 and IA32_HWP_REQUEST. valid bit = 0
Resolved Field Value = IA32_HWP_REQUEST_PKG.
Else
Resolved Field Value = IA32_HWP_REQUEST.
For each of the three HWP Request fields; Min, Max and EPP:
If IA32_HWP_PECI_REQUEST_INFO. PECI Override bit = 1
Resolved Field Value = IA32_HWP_PECI_REQUEST_INFO.
Else if IA32_HWP_REQUEST.PACKAGE_CONTROL = 1 and IA32_HWP_REQUEST. valid bit = 0
Resolved Field Value = IA32_HWP_REQUEST_PKG.
Else
Resolved Field Value = IA32_HWP_REQUEST.

14.4.5 HWP Feedback
The processor provides several types of feedback to the OS during HWP operation.
The IA32_MPERF MSR and IA32_APERF MSR mechanism (see Section 14.2) allows the OS to calculate the resultant
effective frequency delivered over a time period. Energy efficiency and performance optimizations directly impact
the resultant effective frequency delivered.
The layout of the IA32_HWP_STATUS MSR is shown in Figure 14-9. It provides feedback regarding changes to
IA32_HWP_CAPABILITIES.Guaranteed_Performance, IA32_HWP_CAPABILITIES.Highest_Performance, excursions
to IA32_HWP_CAPABILITIES.Minimum_Performance, and PECI_Override entry/exit events. The bit fields are
described below:

14-12 Vol. 3B

POWER AND THERMAL MANAGEMENT

•

Guaranteed_Performance_Change (bit 0, RWC0) — If set (1), a change to Guaranteed_Performance has
occurred. Software should query IA32_HWP_CAPABILITIES.Guaranteed_Performance value to ascertain the
new Guaranteed Performance value and to assess whether to re-adjust HWP hints via IA32_HWP_REQUEST.
Software must clear this bit by writing a zero (0).

•
•

Bit 1 is reserved and must be zero.

•

Highest_Change (bit 3, RWC0) — If set (1), a change to Highest Performance has occurred. Software
should query IA32_HWP_CAPABILITIES to ascertain the new Highest Performance value. Software must clear
this bit by writing a zero (0). Interrupts upon Highest Performance change are supported if CPUID[6].EAX[15]
is set.

•

PECI_Override_Entry (bit 4, RWC0) — If set (1), an embedded/management controller has started a PECI
override of one or more OS control hints (Min, Max, EPP) specified in IA32_HWP_REQUEST or
IA32_HWP_REQUEST_PKG. Software may query IA32_HWP_PECI_REQUEST_INFO MSR to ascertain which
fields are now overridden via the PECI mechanism and what their values are (see Section 14.4.4.3 for
additional details). Software must clear this bit by writing a zero (0). Interrupts upon PECI override entry are
supported if CPUID[6].EAX[16] is set.

•

PECI_Override_Exit (bit 5, RWC0) — If set (1), an embedded/management controller has stopped
overriding one or more OS control hints (Min, Max, EPP) specified in IA32_HWP_REQUEST or
IA32_HWP_REQUEST_PKG. Software may query IA32_HWP_PECI_REQUEST_INFO MSR to ascertain which
fields are still overridden via the PECI mechanism and which fields are now back under software control (see
Section 14.4.4.3 for additional details). Software must clear this bit by writing a zero (0). Interrupts upon PECI
override exit are supported if CPUID[6].EAX[16] is set.

•

Bits 63:6 are reserved and must be zero.

Excursion_To_Minimum (bit 2, RWC0) — If set (1), an excursion to Minimum_Performance of
IA32_HWP_REQUEST has occurred. Software must clear this bit by writing a zero (0).

6 5 4 3 2 1 0
Reserved

PECI_Override_Exit
PECI_Override_Entry
Highest_Change
Excursion_To_Minimum
Reserved
Guaranteed_Performance_Change

Figure 14-9. IA32_HWP_STATUS MSR
The status bits of IA32_HWP_STATUS must be cleared (0) by software so that a new status condition change will
cause the hardware to set the bit again and issue the notification. Status bits are not set for “normal” excursions,
e.g., running below Minimum Performance for short durations during C-state exit. Changes to
Guaranteed_Performance, Highest_Performance, excursions to Minimum_Performance, or PECI_Override
entry/exit will occur no more than once per second.
The OS can determine the specific reasons for a Guaranteed_Performance change or an excursion to
Minimum_Performance in IA32_HWP_REQUEST by examining the associated status and log bits reported in the
IA32_THERM_STATUS MSR. The layout of the IA32_HWP_STATUS MSR that HWP uses to support software query
of HWP feedback is shown in Figure 14-10. The bit fields of IA32_THERM_STATUS associated with HWP feedback
are described below (Bit fields of IA32_THERM_STATUS unrelated to HWP can be found in Section 14.7.5.2).

Vol. 3B 14-13

POWER AND THERMAL MANAGEMENT

32 31

23 22

16 15 14 13 12 11 10 9 8 7

6 5

1 0

Reserved
Reading Valid
Resolution in Deg. Celsius
Digital Readout
Cross-domain Limit Log
Cross-domain Limit Status
Current Limit Log
Current Limit Status
Power Limit Notification Log
Power Limit Notification Status
Thermal Threshold #2 Log
Thermal Threshold #2 Status
Thermal Threshold #1 Log
Thermal Threshold #1 Status
Critical Temperature Log
Critical Temperature Status
PROCHOT# or FORCEPR# Log
PROCHOT# or FORCEPR# Event
Thermal Status Log
Thermal Status

Figure 14-10. IA32_THERM_STATUS Register With HWP Feedback

•
•

Bits 11:0, See Section 14.7.5.2.

•

Current Limit Log (bit 13, RWC0) — If set (1), an electrical current limit has been exceeded that has
adversely impacted energy efficiency optimizations since the last clearing of this bit or a reset. This bit is sticky,
software may clear this bit by writing a zero (0).

•

Cross-domain Limit Status (bit 14, RO) — If set (1), indicates another hardware domain (e.g. processor
graphics) is currently limiting energy efficiency optimizations in the processor core domain.

•

Cross-domain Limit Log (bit 15, RWC0) — If set (1), indicates another hardware domain (e.g. processor
graphics) has limited energy efficiency optimizations in the processor core domain since the last clearing of this
bit or a reset. This bit is sticky, software may clear this bit by writing a zero (0).

•

Bits 63:16, See Section 14.7.5.2.

Current Limit Status (bit 12, RO) — If set (1), indicates an electrical current limit (e.g. Electrical Design
Point/IccMax) is being exceeded and is adversely impacting energy efficiency optimizations.

14.4.5.1

Non-Architectural HWP Feedback

The Productive Performance (MSR_PPERF) MSR (non-architectural) provides hardware's view of workload scalability, which is a rough assessment of the relationship between frequency and workload performance, to software.
The layout of the MSR_PPERF is shown in Figure 14-11.
63

PCNT - Productive Performance Count

Figure 14-11. MSR_PPERF MSR

•

PCNT (bits 63:0, RO) — Similar to IA32_APERF but only counts cycles perceived by hardware as contributing
to instruction execution (e.g. unhalted and unstalled cycles). This counter increments at the same rate as
IA32_APERF, where the ratio of (ΔPCNT/ΔACNT) is an indicator of workload scalability (0% to 100%). Note that
values in this register are valid even when HWP is not enabled.

14-14 Vol. 3B

POWER AND THERMAL MANAGEMENT

14.4.6

HWP Notifications

Processors may support interrupt-based notification of changes to HWP status as indicated by CPUID. If supported,
the IA32_HWP_INTERRUPT MSR is used to enable interrupt-based notifications. Notification events, when enabled,
are delivered using the existing thermal LVT entry. The layout of the IA32_HWP_INTERRUPT is shown in
Figure 14-12. The bit fields are described below:

4 3 2 1 0
Reserved

EN_PECI_OVERRIDE
EN_Highest_Change
EN_Excursion_Minimum
EN_Guaranteed_Performance_Change

Figure 14-12. IA32_HWP_INTERRUPT MSR

•

EN_Guaranteed_Performance_Change (bit 0, RW) — When set (1), an HWP Interrupt will be generated
whenever a change to the IA32_HWP_CAPABILITIES.Guaranteed_Performance occurs. The default value is 0
(Interrupt generation is disabled).

•

EN_Excursion_Minimum (bit 1, RW) — When set (1), an HWP Interrupt will be generated whenever the
HWP hardware is unable to meet the IA32_HWP_REQUEST.Minimum_Performance setting. The default value is
0 (Interrupt generation is disabled).

•

EN_Highest_Change (bit 2, RW) — When set (1), an HWP Interrupt will be generated whenever a change
to the IA32_HWP_CAPABILITIES.Highest_Performance occurs. The default value is 0 (interrupt generation is
disabled). Interrupts upon Highest Performance change are supported if CPUID[6].EAX[15] is set.

•

EN_PECI_OVERRIDE (bit 3, RW) — When set (1), an HWP Interrupt will be generated whenever PECI starts
or stops overriding any of the three HWP fields described in Section 14.4.4.3. The default value is 0 (interrupt
generation is disabled). See Section 14.4.5 and Section 14.4.4.3 for details on how the OS learns what is the
current set of HWP fields that are overridden by PECI. Interrupts upon PECI override change are supported if
CPUID[6].EAX[16] is set.

•

Bits 63:4 are reserved and must be zero.

14.4.7

Idle Logical Processor Impact on Core Frequency

Intel processors use one of two schemes for setting core frequency:
1. All cores share same frequency.
2. Each physical core is set to a frequency of its own.
In both cases the two logical processors that share a single physical core are set to the same frequency, so the
processor accounts for the IA32_HWP_REQUEST MSR fields of both logical processors when defining the core
frequency or the whole package frequency.
When CPUID[6].EAX[20] is set and only one logical processor of the two is active, while the other is idle (in any
C1 sub-state or in a deeper sleep state), only the active logical processor's IA32_HWP_REQUEST MSR fields
are considered, i.e., the HWP Request fields of a logical processor in the C1E sub-state or in a deeper sleep state
are ignored.
Note: when a logical processor is in C1 state its HWP Request fields are accounted for.
Vol. 3B 14-15

POWER AND THERMAL MANAGEMENT

14.4.8

Fast Write of Uncore MSR (Model Specific Feature)

There are a few logical processor scope MSRs whose values need to be observed outside the logical processor. The
WRMSR instruction takes over 1000 cycles to complete (retire) for those MSRs. This overhead forces operating
systems to avoid writing them too often whereas in many cases it is preferable that the OS writes them quite
frequently for optimal power/performance operation of the processor.
The model specific “Fast Write MSR” feature reduces this overhead by an order of magnitude to a level of 100 cycles
for a selected subset of MSRs.
Note: Writes to Fast Write MSRs are posted, i.e., when the WRMSR instruction completes, the data may still be “in
transit” within the processor. Software can check the status by querying the processor to ensure data is already
visible outside the logical processor (see Section 14.4.8.3 for additional details). Once the data is visible outside the
logical processor, software is ensured that later writes by the same logical processor to the same MSR will be visible
later (will not bypass the earlier writes).
MSRs that are selected for Fast Write are specified in a special capability MSR (see Section 14.4.8.1). Architectural
MSRs that existed prior to the introduction of this feature and are selected for Fast Write, thus turning from slow to
fast write MSRs, will be noted as such via a new CPUID bit. New MSRs that are fast upon introduction will be documented as such without an additional CPUID bit.
Three model specific MSRs are associated with the feature itself. They enable enumerating, controlling and monitoring it. All three are logical processor scope.

14.4.8.1

FAST_UNCORE_MSRS_CAPABILITY (Address: 0x65F, Logical Processor Scope)

Operating systems or BIOS can read the FAST_UNCORE_MSRS_CAPABILITY MSR to enumerate those MSRs that
are Fast Write MSRs.

1 0
Reserved

FAST_IA32_HWP_REQUEST MSR

Figure 14-13. FAST_UNCORE_MSRS_CAPABILITY MSR

•

FAST_IA32_HWP_REQUEST MSR (bit 0, RO) — When set (1), indicates that the IA32_HWP_REQUEST MSR
is supported as a Fast Write MSR. A value of 0 indicates the IA32_HWP_REQUEST MSR is not supported as a
Fast Write MSR.

•

Bits 63:1 are reserved and must be zero.

14.4.8.2

FAST_UNCORE_MSRS_CTL (Address: 0x657, Logical Processor Scope)

Operating Systems or BIOS can use the FAST_UNCORE_MSRS_CTL MSR to opt-in or opt-out for fast write of
specific MSRs that are enabled for Fast Write by the processor.
Note: Not all MSRs that are selected for this feature will necessarily have this opt-in/opt-out option. They may be
supported in fast write mode only.

14-16 Vol. 3B

POWER AND THERMAL MANAGEMENT

1 0
Reserved

FAST_IA32_HWP_REQUEST_MSR_ENABLE

Figure 14-14. FAST_UNCORE_MSRS_CTL MSR

•

FAST_IA32_HWP_REQUEST_MSR_ENABLE (bit 0, RW) — When set (1), enables fast access mode for the
IA32_HWP_REQUEST MSR and sets the low latency, posted IA32_HWP_REQUESRT MSR' CPUID[6].EAX[18].
The default value is 0. Note that this bit can only be enabled once from the default value. Once set, writes to
this bit are ignored. Only RESET will clear this bit.

•

Bits 63:1 are reserved and must be zero.

14.4.8.3

FAST_UNCORE_MSRS_STATUS (Address: 0x65E, Logical Processor Scope)

Software that executes the WRMSR instruction of a Fast Write MSR can check whether the data is already visible
outside the logical processor by reading the FAST_UNCORE_MSRS_STATUS MSR. For each Fast Write MSR there is
a status bit that indicates whether the data is already visible outside the logical processor or is still in “transit”.

1 0
Reserved

FAST_IA32_HWP_REQUEST_WRITE_STATUS

Figure 14-15. FAST_UNCORE_MSRS_STATUS MSR

•

FAST_IA32_HWP_REQUEST_WRITE_STATUS (bit 0, RO) — Indicates whether the CPU is still in the
middle of writing IA32_HWP_REQUEST MSR, even after the WRMSR instruction has retired. A value of 1
indicates the last write of IA32_HWP_REQUEST is still ongoing. A value of 0 indicates the last write of
IA32_HWP_REQUEST is visible outside the logical processor.

•

Bits 63:1 are reserved and must be zero.

14.4.9

Fast_IA32_HWP_REQUEST CPUID

IA32_HWP_REQUEST is an architectural MSR that exists in processors whose CPUID[6].EAX[7] is set (HWP BASE
is enabled). This MSR has logical processor scope, but after its contents are written the contents become visible
outside the logical processor. When the FAST_IA32_HWP_REQUEST CPUID[6].EAX[18] bit is set, writes to the
IA32_HWP_REQUEST MSR are visible outside the logical processor via the “Fast Write” feature described in Section
14.4.8.

14.4.10 Recommendations for OS use of HWP Controls
Common Cases of Using HWP
The default HWP control field values are expected to be suitable for many applications. The OS can enable autonomous HWP for these common cases by
Vol. 3B 14-17

POWER AND THERMAL MANAGEMENT

•

Setting IA32_HWP_REQUEST.Desired Performance = 0 (hardware autonomous selection determines the
performance target). Set IA32_HWP_REQUEST.Activity Window = 0 (enable HW dynamic selection of window
size).

To maximize HWP benefit for the common cases, the OS should set

•
•

IA32_HWP_REQUEST.Minimum_Performance = IA32_HWP_CAPABILITIES.Lowest_Performance and
IA32_HWP_REQUEST.Maximum_Performance = IA32_HWP_CAPABILITIES.Highest_Performance.

Setting IA32_HWP_REQUEST.Minimum_Performance = IA32_HWP_REQUEST.Maximum_Performance is functionally equivalent to using of the IA32_PERF_CTL interface and is therefore not recommended (bypassing HWP).

Calibrating HWP for Application-Specific HWP Optimization
In some applications, the OS may have Quality of Service requirements that may not be met by the default values.
The OS can characterize HWP by:

•

keeping IA32_HWP_REQUEST.Minimum_Performance = IA32_HWP_REQUEST.Maximum_Performance to
prevent non-linearity in the characterization process,

•

utilizing the range values enumerated from the IA32_HWP_CAPABILITIES MSR to program
IA32_HWP_REQUEST while executing workloads of interest and observing the power and performance result.

The power and performance result of characterization is also influenced by the IA32_HWP_REQUEST.Energy
Performance Preference field, which must also be characterized.
Characterization can be used to set IA32_HWP_REQUEST.Minimum_Performance to achieve the required QOS in
terms of performance. If IA32_HWP_REQUEST.Minimum_Performance is set higher than
IA32_HWP_CAPABILITIES.Guaranteed Performance then notification of excursions to Minimum Performance may
be continuous.
If autonomous selection does not deliver the required workload performance, the OS should assess the current
delivered effective frequency and for the duration of the specific performance requirement set
IA32_HWP_REQUEST.Desired_Performance ≠ 0 and adjust IA32_HWP_REQUEST.Energy_Performance_Preference
as necessary to achieve the required workload performance. The MSR_PPERF.PCNT value can be used to better
comprehend the potential performance result from adjustments to IA32_HWP_REQUEST.Desired_Performance.
The OS should set IA32_HWP_REQUEST.Desired_Performance = 0 to re-enable autonomous selection.

Tuning for Maximum Performance or Lowest Power Consumption
Maximum performance will be delivered by setting IA32_HWP_REQUEST.Minimum_Performance =
IA32_HWP_REQUEST.Maximum_Performance = IA32_HWP_CAPABILITIES.Highest_Performance and setting
IA32_HWP_REQUEST.Energy_Performance_Preference = 0 (performance preference).
Lowest power will be achieved by setting IA32_HWP_REQUEST.Minimum_Performance =
IA32_HWP_REQUEST.Maximum_Performance = IA32_HWP_CAPABILITIES.Lowest_Performance and setting
IA32_HWP_REQUEST.Energy_Performance_Preference = 0FFH (energy efficiency preference).

Mixing Logical Processor and Package Level HWP Field Settings
Using the IA32_HWP_REQUEST Package_Control bit and the five valid bits in that MSR, the OS can mix and match
between selecting the Logical Processor scope fields and the Package level fields. For example, the OS can set all
logical cores' IA32_HWP_REQUEST.Package_Control bit to ‘1’, and for those logical processors if it prefers a
different EPP value than the one set in the IA32_HWP_REQUEST_PKG MSR, the OS can set the desired EPP value
and the EPP valid bit. This overrides the package EPP value for only a subset of the logical processors in the
package.

Additional Guidelines
Set IA32_HWP_REQUEST.Energy_Performance_Preference as appropriate for the platform's current mode of operation. For example, a mobile platforms' setting may be towards performance preference when on AC power and
more towards energy efficiency when on DC power.

14-18 Vol. 3B

POWER AND THERMAL MANAGEMENT

The use of the Running Average Power Limit (RAPL) processor capability (see section 14.7.1) is highly recommended when HWP is enabled. Use of IA32_HWP_Request.Maximum_Performance for thermal control is subject to
limitations and can adversely impact the performance of other processor components e.g. Graphics
If default values deliver undesirable performance latency in response to events, the OS should set
IA32_HWP_REQUEST. Activity_Window to a low (non-zero) value and
IA32_HWP_REQUEST.Energy_Performance_Preference towards performance (0) for the event duration.
Similarly, for “real-time” threads, set IA32_HWP_REQUEST.Energy_Performance_Preference towards performance
(0) and IA32_HWP_REQUEST. Activity_Window to a low value, e.g. 01H, for the duration of their execution.
When executing low priority work that may otherwise cause the hardware to deliver high performance, set
IA32_HWP_REQUEST. Activity_Window to a longer value and reduce the
IA32_HWP_Request.Maximum_Performance value as appropriate to control energy efficiency. Adjustments to
IA32_HWP_REQUEST.Energy_Performance_Preference may also be necessary.

14.5

HARDWARE DUTY CYCLING (HDC)

Intel processors may contain support for Hardware Duty Cycling (HDC), which enables the processor to autonomously force its components inside the physical package into idle state. For example, the processor may selectively
force only the processor cores into an idle state.
HDC is disabled by default on processors that support it. System software can dynamically enable or disable HDC
to force one or more components into an idle state or wake up those components previously forced into an idle
state. Forced Idling (and waking up) of multiple components in a physical package can be done with one WRMSR
to a packaged-scope MSR from any logical processor within the same package.
HDC does not delay events such as timer expiration, but it may affect the latency of short (less than 1 msec) software threads, e.g. if a thread is forced to idle state just before completion and entering a “natural idle”.
HDC forced idle operation can be thought of as operating at a lower effective frequency. The effective average
frequency computed by software will include the impact of HDC forced idle.
The primary use of HDC is enable system software to manage low active workloads to increase the package level
C6 residency. Additionally, HDC can lower the effective average frequency in case or power or thermal limitation.
When HDC forces a logical processor, a processor core or a physical package to enter an idle state, its C-State is set
to C3 or deeper. The deep “C-states” referred to in this section are processor-specific C-states.

14.5.1

Hardware Duty Cycling Programming Interfaces

The programming interfaces provided by HDC include the following:

•

The CPUID instruction allows software to discover the presence of HDC support in an Intel processor. Specifically, execute CPUID instruction with EAX=06H as input, bit 13 of EAX indicates the processor’s support of the
following aspects of HDC.
— Availability of HDC baseline resource, CPUID.06H:EAX[bit 13]: If this bit is set, HDC provides the following
architectural MSRs: IA32_PKG_HDC_CTL, IA32_PM_CTL1, and the IA32_THREAD_STALL MSRs.

•

Additionally, HDC may provide several non-architectural MSR.

Table 14-2. Architectural and non-Architecture MSRs Related to HDC
Address

Architec
tural

DB0H

IA32_PKG_HDC_CTL

DB1H

IA32_PM_CTL1

DB2H

IA32_THREAD_STALL

Description
Package Enable/Disable HDC.
Per-logical-processor select control to allow/block HDC forced idling.
Accumulate stalled cycles on this logical processor due to HDC forced idling.

Vol. 3B 14-19

POWER AND THERMAL MANAGEMENT

Table 14-2. Architectural and non-Architecture MSRs Related to HDC
653H

MSR_CORE_HDC_RESIDENCY

Core level stalled cycle counter due to HDC forced idling on one or more
logical processor.

655H

MSR_PKG_HDC_SHALLOW_RE
SIDENCY

Accumulate the cycles the package was in C21 state and at least one logical
processor was in forced idle

656H

MSR_PKG_HDC_DEEP_RESIDE
NCY

Accumulate the cycles the package was in the software specified Cx1 state
and at least one logical processor was in forced idle. Cx is specified in
MSR_PKG_HDC_CONFIG_CTL.

652H

MSR_PKG_HDC_CONFIG_CTL

HDC configuration controls

NOTES:
1. The package “C-states” referred to in this section are processor-specific C-states.

14.5.2

Package level Enabling HDC

The layout of the IA32_PKG_HDC_CTL MSR is shown in Figure 14-16. IA32_PKG_HDC_CTL is a writable MSR from
any logical processor in a package. The bit fields are described below:
63

1 0

Reserved
Reserved

HDC_PKG_Enable

Figure 14-16. IA32_PKG_HDC_CTL MSR

•

HDC_PKG_Enable (bit 0, R/W) — Software sets this bit to enable HDC operation by allowing the processor
to force to idle all “HDC-allowed” (see Figure 14.5.3) logical processors in the package. Clearing this bit
disables HDC operation in the package by waking up all the processor cores that were forced into idle by a
previous ‘0’-to-’1’ transition in IA32_PKG_HDC_CTL.HDC_PKG_Enable. This bit is writable only if
CPUID.06H:EAX[bit 13] = 1. Default = zero (0).

•

Bits 63:1 are reserved and must be zero.

After processor support is determined via CPUID, system software can enable HDC operation by setting
IA32_PKG_HDC_CTL.HDC_PKG_Enable to 1. At reset, IA32_PKG_HDC_CTL.HDC_PKG_Enable is cleared to 0. A
'0'-to-'1' transition in HDC_PKG_Enable allows the processor to force to idle all HDC-allowed (indicated by the nonzero state of IA32_PM_CTL1[bit 0]) logical processors in the package. A ‘1’-to-’0’ transition wakes up those HDC
force-idled logical processors.
Software can enable or disable HDC using this package level control multiple times from any logical processor in the
package. Note the latency of writing a value to the package-visible IA32_PKG_HDC_CTL.HDC_PKG_Enable is
longer than the latency of a WRMSR operation to a Logical Processor MSR (as opposed to package level MSR) such
as: IA32_PM_CTL1 (described in Section 14.5.3). Propagation of the change in
IA32_PKG_HDC_CTL.HDC_PKG_Enable and reaching all HDC idled logical processor to be woken up may take on
the order of core C6 exit latency.

14.5.3

Logical-Processor Level HDC Control

The layout of the IA32_PM_CTL1 MSR is shown in Figure 14-17. Each logical processor in a package has its own
IA32_PM_CTL1 MSR. The bit fields are described below:

14-20 Vol. 3B

POWER AND THERMAL MANAGEMENT

1 0

Reserved
HDC_Allow_Block

Reserved

Figure 14-17. IA32_PM_CTL1 MSR

•

HDC_Allow_Block (bit 0, R/W) — Software sets this bit to allow this logical processors to honor the
package-level IA32_PKG_HDC_CTL.HDC_PKG_Enable control. Clearing this bit prevents this logical processor
from using the HDC. This bit is writable only if CPUID.06H:EAX[bit 13] = 1. Default = one (1).

•

Bits 63:1 are reserved and must be zero.

Fine-grain OS control of HDC operation at the granularity of per-logical-processor is provided by IA32_PM_CTL1.
At RESET, all logical processors are allowed to participate in HDC operation such that OS can manage HDC using
the package-level IA32_PKG_HDC_CTL.
Writes to IA32_PM_CTL1 complete with the latency that is typical to WRMSR to a Logical Processor level MSR.
When the OS chooses to manage HDC operation at per-logical-processor granularity, it can write to IA32_PM_CTL1
on one or more logical processors as desired. Each write to IA32_PM_CTL1 must be done by code that executes on
the logical processor targeted to be allowed into or blocked from HDC operation.
Blocking one logical processor for HDC operation may have package level impact. For example, the processor may
decide to stop duty cycling of all other Logical Processors as well.
The propagation of IA32_PKG_HDC_CTL.HDC_PKG_Enable in a package takes longer than a WRMSR to
IA32_PM_CTL1. The last completed write to IA32_PM_CTL1 on a logical processor will be honored when a ‘0’-to-’1’
transition of IA32_PKG_HDC_CTL.HDC_PKG_Enable arrives to a logical processor.

14.5.4

HDC Residency Counters

There is a collection of counters available for software to track various residency metrics related to HDC operation.
In general, HDC residency time is defined as the time in HDC forced idle state at the granularity of per-logicalprocessor, per-core, or package. At the granularity of per-core/package-level HDC residency, at least one of the
logical processor in a core/package must be in the HDC forced idle state.

14.5.4.1

IA32_THREAD_STALL

Software can track per-logical-processor HDC residency using the architectural MSR IA32_THREAD_STALL.The
layout of the IA32_THREAD_STALL MSR is shown in Figure 14-18. Each logical processor in a package has its own
IA32_THREAD_STALL MSR. The bit fields are described below:
63

Stall_cycle_cnt

Figure 14-18. IA32_THREAD_STALL MSR

•

Stall_Cycle_Cnt (bits 63:0, R/O) — Stores accumulated HDC forced-idle cycle count of this processor core
since last RESET. This counter increments at the same rate of the TSC. The count is updated only after the
logical processor exits from the forced idled C-state. At each update, the number of cycles that the logical
processor was stalled due to forced-idle will be added to the counter. This counter is available only if
CPUID.06H:EAX[bit 13] = 1. Default = zero (0).
Vol. 3B 14-21

POWER AND THERMAL MANAGEMENT

A value of zero in IA32_THREAD_STALL indicates either HDC is not supported or the logical processor never
serviced any forced HDC idle. A non-zero value in IA32_THREAD_STALL indicates the HDC forced-idle residency
times of the logical processor. It also indicates the forced-idle cycles due to HDC that could appear as C0 time to
traditional OS accounting mechanisms (e.g. time-stamping OS idle/exit events).
Software can read IA32_THREAD_STALL irrespective of the state of IA32_PKG_HDC_CTL and IA32_PM_CTL1, as
long as CPUID.06H:EAX[bit 13] = 1.

14.5.4.2

Non-Architectural HDC Residency Counters

Processors that support HDC operation may provide the following model-specific HDC residency counters.

MSR_CORE_HDC_RESIDENCY
Software can track per-core HDC residency using the counter MSR_CORE_HDC_RESIDENCY. This counter increments when the core is in C3 state or deeper (all logical processors in this core are idle due to either HDC or other
mechanisms) and at least one of the logical processors is in HDC forced idle state. The layout of the
MSR_CORE_HDC_RESIDENCY is shown in Figure 14-19. Each processor core in a package has its own
MSR_CORE_HDC_RESIDENCY MSR. The bit fields are described below:
63

Core_Cx_duty_cycle_cnt

Figure 14-19. MSR_CORE_HDC_RESIDENCY MSR

•

Core_Cx_Duty_Cycle_Cnt (bits 63:0, R/O) — Stores accumulated HDC forced-idle cycle count of this
processor core since last RESET. This counter increments at the same rate of the TSC. The count is updated only
after core C-state exit from a forced idled C-state. At each update, the increment counts cycles when the core
is in a Cx state (all its logical processor are idle) and at least one logical processor in this core was forced into
idle state due to HDC. If CPUID.06H:EAX[bit 13] = 0, attempt to access this MSR will cause a #GP fault. Default
= zero (0).

A value of zero in MSR_CORE_HDC_RESIDENCY indicates either HDC is not supported or this processor core never
serviced any forced HDC idle.

MSR_PKG_HDC_SHALLOW_RESIDENCY
The counter MSR_PKG_HDC_SHALLOW_RESIDENCY allows software to track HDC residency time when the
package is in C2 state, all processor cores in the package are not active and at least one logical processor was
forced into idle state due to HDC. The layout of the MSR_PKG_HDC_SHALLOW_RESIDENCY is shown in
Figure 14-20. There is one MSR_PKG_HDC_SHALLOW_RESIDENCY per package. The bit fields are described
below:
63

Pkg_Duty_cycle_cnt

Figure 14-20. MSR_PKG_HDC_SHALLOW_RESIDENCY MSR

•

Pkg_Duty_Cycle_Cnt (bits 63:0, R/O) — Stores accumulated HDC forced-idle cycle count of this processor
core since last RESET. This counter increments at the same rate of the TSC. Package shallow residency may be
implementation specific. In the initial implementation, the threshold is package C2-state. The count is
updated only after package C2-state exit from a forced idled C-state. At each update, the increment counts

14-22 Vol. 3B

POWER AND THERMAL MANAGEMENT

cycles when the package is in C2 state and at least one processor core in this package was forced into idle state
due to HDC. If CPUID.06H:EAX[bit 13] = 0, attempt to access this MSR may cause a #GP fault. Default = zero
(0).
A value of zero in MSR_PKG_HDC_SHALLOW_RESIDENCY indicates either HDC is not supported or this processor
package never serviced any forced HDC idle.

MSR_PKG_HDC_DEEP_RESIDENCY
The counter MSR_PKG_HDC_DEEP_RESIDENCY allows software to track HDC residency time when the package is
in a software-specified package Cx state, all processor cores in the package are not active and at least one logical
processor was forced into idle state due to HDC. Selection of a specific package Cx state can be configured using
MSR_PKG_HDC_CONFIG. The layout of the MSR_PKG_HDC_DEEP_RESIDENCY is shown in Figure 14-21. There is
one MSR_PKG_HDC_DEEP_RESIDENCY per package. The bit fields are described below:
63

Pkg_Cx_duty_cycle_cnt

Figure 14-21. MSR_PKG_HDC_DEEP_RESIDENCY MSR

•

Pkg_Cx_Duty_Cycle_Cnt (bits 63:0, R/O) — Stores accumulated HDC forced-idle cycle count of this
processor core since last RESET. This counter increments at the same rate of the TSC. The count is updated
only after package C-state exit from a forced idle state. At each update, the increment counts cycles when the
package is in the software-configured Cx state and at least one processor core in this package was forced into
idle state due to HDC. If CPUID.06H:EAX[bit 13] = 0, attempt to access this MSR may cause a #GP fault.
Default = zero (0).

A value of zero in MSR_PKG_HDC_SHALLOW_RESIDENCY indicates either HDC is not supported or this processor
package never serviced any forced HDC idle.

MSR_PKG_HDC_CONFIG
MSR_PKG_HDC_CONFIG allows software to configure the package Cx state that the counter
MSR_PKG_HDC_DEEP_RESIDENCY monitors. The layout of the MSR_PKG_HDC_CONFIG is shown in Figure 14-22.
There is one MSR_PKG_HDC_CONFIG per package. The bit fields are described below:

Reserved

HDC_Cx_Monitor

Figure 14-22. MSR_PKG_HDC_CONFIG MSR

•

Pkg_Cx_Monitor (bits 2:0, R/W) — Selects which package C-state the MSR_HDC_DEEP_RESIDENCY
counter will monitor. The encoding of the HDC_Cx_Monitor field are: 0: no-counting; 1: count package C2 only,
2: count package C3 and deeper; 3: count package C6 and deeper; 4: count package C7 and deeper; other
encodings are reserved. If CPUID.06H:EAX[bit 13] = 0, attempt to access this MSR may cause a #GP fault.
Default = zero (0).

•

Bits 63:3 are reserved and must be zero.

Vol. 3B 14-23

POWER AND THERMAL MANAGEMENT

14.5.5

MPERF and APERF Counters Under HDC

HDC operation can be thought of as an average effective frequency drop due to all or some of the Logical Processors enter an idle state period.
1600 MHz: 25% Utilization /75% Forced Idle

Effective Frequency @ 100% Utilization: 400 MHz

Figure 14-23. Example of Effective Frequency Reduction and Forced Idle Period of HDC
By default, the IA32_MPERF counter counts during forced idle periods as if the logical processor was active. The
IA32_APERF counter does not count during forced idle state. This counting convention allows the OS to compute
the average effective frequency of the Logical Processor between the last MWAIT exit and the next MWAIT entry
(OS visible C0) by ΔACNT/ΔMCNT * TSC Frequency.

14.6

MWAIT EXTENSIONS FOR ADVANCED POWER MANAGEMENT

IA-32 processors may support a number of C-states1 that reduce power consumption for inactive states. Intel Core
Solo and Intel Core Duo processors support both deeper C-state and MWAIT extensions that can be used by OS to
implement power management policy.
Software should use CPUID to discover if a target processor supports the enumeration of MWAIT extensions. If
CPUID.05H.ECX[Bit 0] = 1, the target processor supports MWAIT extensions and their enumeration (see Chapter
4, “Instruction Set Reference, M-U,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume
2B).
If CPUID.05H.ECX[Bit 1] = 1, the target processor supports using interrupts as break-events for MWAIT, even
when interrupts are disabled. Use this feature to measure C-state residency as follows:

•

Software can write to bit 0 in the MWAIT Extensions register (ECX) when issuing an MWAIT to enter into a
processor-specific C-state or sub C-state.

•

When a processor comes out of an inactive C-state or sub C-state, software can read a timestamp before an
interrupt service routine (ISR) is potentially executed.

CPUID.05H.EDX allows software to enumerate processor-specific C-states and sub C-states available for use with
MWAIT extensions. IA-32 processors may support more than one C-state of a given C-state type. These are called
sub C-states. Numerically higher C-state have higher power savings and latency (upon entering and exiting) than
lower-numbered C-state.
At CPL = 0, system software can specify desired C-state and sub C-state by using the MWAIT hints register (EAX).
Processors will not go to C-state and sub C-state deeper than what is specified by the hint register. If CPL > 0 and
if MONITOR/MWAIT is supported at CPL > 0, the processor will only enter C1-state (regardless of the C-state
request in the hints register).
Executing MWAIT generates an exception on processors operating at a privilege level where MONITOR/MWAIT are
not supported.

1. The processor-specific C-states defined in MWAIT extensions can map to ACPI defined C-state types (C0, C1, C2, C3). The mapping
relationship depends on the definition of a C-state by processor implementation and is exposed to OSPM by the BIOS using the ACPI
defined _CST table.
14-24 Vol. 3B

POWER AND THERMAL MANAGEMENT

NOTE
If MWAIT is used to enter a C-state (including sub C-state) that is numerically higher than C1, a
store to the address range armed by MONITOR instruction will cause the processor to exit MWAIT if
the store was originated by other processor agents. A store from non-processor agent may not
cause the processor to exit MWAIT.

14.7

THERMAL MONITORING AND PROTECTION

The IA-32 architecture provides the following mechanisms for monitoring temperature and controlling thermal
power:
1. The catastrophic shutdown detector forces processor execution to stop if the processor’s core temperature
rises above a preset limit.
2. Automatic and adaptive thermal monitoring mechanisms force the processor to reduce it’s power
consumption in order to operate within predetermined temperature limits.
3. The software controlled clock modulation mechanism permits operating systems to implement power
management policies that reduce power consumption; this is in addition to the reduction offered by automatic
thermal monitoring mechanisms.
4. On-die digital thermal sensor and interrupt mechanisms permit the OS to manage thermal conditions
natively without relying on BIOS or other system board components.
The first mechanism is not visible to software. The other three mechanisms are visible to software using processor
feature information returned by executing CPUID with EAX = 1.
The second mechanism includes:

•

Automatic thermal monitoring provides two modes of operation. One mode modulates the clock duty cycle;
the second mode changes the processor’s frequency. Both modes are used to control the core temperature of
the processor.

•

Adaptive thermal monitoring can provide flexible thermal management on processors made of multiple
cores.

The third mechanism modulates the clock duty cycle of the processor. As shown in Figure 14-24, the phrase ‘duty
cycle’ does not refer to the actual duty cycle of the clock signal. Instead it refers to the time period during which
the clock signal is allowed to drive the processor chip. By using the stop clock mechanism to control how often the
processor is clocked, processor power consumption can be modulated.
Clock Applied to Processor

Stop-Clock Duty Cycle

25% Duty Cycle (example only)

Figure 14-24. Processor Modulation Through Stop-Clock Mechanism
For previous automatic thermal monitoring mechanisms, software controlled mechanisms that changed processor
operating parameters to impact changes in thermal conditions. Software did not have native access to the native
thermal condition of the processor; nor could software alter the trigger condition that initiated software program
control.
The fourth mechanism (listed above) provides access to an on-die digital thermal sensor using a model-specific
register and uses an interrupt mechanism to alert software to initiate digital thermal monitoring.

Vol. 3B 14-25

POWER AND THERMAL MANAGEMENT

14.7.1

Catastrophic Shutdown Detector

P6 family processors introduced a thermal sensor that acts as a catastrophic shutdown detector. This catastrophic
shutdown detector was also implemented in Pentium 4, Intel Xeon and Pentium M processors. It is always enabled.
When processor core temperature reaches a factory preset level, the sensor trips and processor execution is halted
until after the next reset cycle.

14.7.2

Thermal Monitor

Pentium 4, Intel Xeon and Pentium M processors introduced a second temperature sensor that is factory-calibrated
to trip when the processor’s core temperature crosses a level corresponding to the recommended thermal design
envelop. The trip-temperature of the second sensor is calibrated below the temperature assigned to the catastrophic shutdown detector.

14.7.2.1

Thermal Monitor 1

The Pentium 4 processor uses the second temperature sensor in conjunction with a mechanism called Thermal
Monitor 1 (TM1) to control the core temperature of the processor. TM1 controls the processor’s temperature by
modulating the duty cycle of the processor clock. Modulation of duty cycles is processor model specific. Note that
the processors STPCLK# pin is not used here; the stop-clock circuitry is controlled internally.
Support for TM1 is indicated by CPUID.1:EDX.TM[bit 29] = 1.
TM1 is enabled by setting the thermal-monitor enable flag (bit 3) in IA32_MISC_ENABLE [see Chapter 2, “ModelSpecific Registers (MSRs)” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 4].
Following a power-up or reset, the flag is cleared, disabling TM1. BIOS is required to enable only one automatic
thermal monitoring modes. Operating systems and applications must not disable the operation of these mechanisms.

14.7.2.2

Thermal Monitor 2

An additional automatic thermal protection mechanism, called Thermal Monitor 2 (TM2), was introduced in the
Intel Pentium M processor and also incorporated in newer models of the Pentium 4 processor family. Intel Core Duo
and Solo processors, and Intel Core 2 Duo processor family all support TM1 and TM2. TM2 controls the core
temperature of the processor by reducing the operating frequency and voltage of the processor and offers a higher
performance level for a given level of power reduction than TM1.
TM2 is triggered by the same temperature sensor as TM1. The mechanism to enable TM2 may be implemented
differently across various IA-32 processor families with different CPUID signatures in the family encoding value, but
will be uniform within an IA-32 processor family.
Support for TM2 is indicated by CPUID.1:ECX.TM2[bit 8] = 1.

14.7.2.3

Two Methods for Enabling TM2

On processors with CPUID family/model/stepping signature encoded as 0x69n or 0x6Dn (early Pentium M processors), TM2 is enabled if the TM_SELECT flag (bit 16) of the MSR_THERM2_CTL register is set to 1 (Figure 14-25)
and bit 3 of the IA32_MISC_ENABLE register is set to 1.
Following a power-up or reset, the TM_SELECT flag may be cleared. BIOS is required to enable either TM1 or TM2.
Operating systems and applications must not disable mechanisms that enable TM1 or TM2. If bit 3 of the
IA32_MISC_ENABLE register is set and TM_SELECT flag of the MSR_THERM2_CTL register is cleared, TM1 is
enabled.

14-26 Vol. 3B

POWER AND THERMAL MANAGEMENT

16
Reserved

Reserved
TM_SELECT

Figure 14-25. MSR_THERM2_CTL Register On Processors with CPUID Family/Model/Stepping Signature Encoded
as 0x69n or 0x6Dn
On processors introduced after the Pentium 4 processor (this includes most Pentium M processors), the method
used to enable TM2 is different. TM2 is enable by setting bit 13 of IA32_MISC_ENABLE register to 1. This applies to
Intel Core Duo, Core Solo, and Intel Core 2 processor family.
The target operating frequency and voltage for the TM2 transition after TM2 is triggered is specified by the value
written to MSR_THERM2_CTL, bits 15:0 (Figure 14-26). Following a power-up or reset, BIOS is required to enable
at least one of these two thermal monitoring mechanisms. If both TM1 and TM2 are supported, BIOS may choose
to enable TM2 instead of TM1. Operating systems and applications must not disable the mechanisms that enable
TM1or TM2; and they must not alter the value in bits 15:0 of the MSR_THERM2_CTL register.

Reserved

TM2 Transition Target

Figure 14-26. MSR_THERM2_CTL Register for Supporting TM2

14.7.2.4

Performance State Transitions and Thermal Monitoring

If the thermal control circuitry (TCC) for thermal monitor (TM1/TM2) is active, writes to the IA32_PERF_CTL will
effect a new target operating point as follows:

•

If TM1 is enabled and the TCC is engaged, the performance state transition can commence before the TCC is
disengaged.

•

If TM2 is enabled and the TCC is engaged, the performance state transition specified by a write to the
IA32_PERF_CTL will commence after the TCC has disengaged.

14.7.2.5

Thermal Status Information

The status of the temperature sensor that triggers the thermal monitor (TM1/TM2) is indicated through the thermal
status flag and thermal status log flag in the IA32_THERM_STATUS MSR (see Figure 14-27).
The functions of these flags are:

•

Thermal Status flag, bit 0 — When set, indicates that the processor core temperature is currently at the trip
temperature of the thermal monitor and that the processor power consumption is being reduced via either TM1
or TM2, depending on which is enabled. When clear, the flag indicates that the core temperature is below the
thermal monitor trip temperature. This flag is read only.

•

Thermal Status Log flag, bit 1 — When set, indicates that the thermal sensor has tripped since the last
power-up or reset or since the last time that software cleared this flag. This flag is a sticky bit; once set it
remains set until cleared by software or until a power-up or reset of the processor. The default state is clear.

Vol. 3B 14-27

POWER AND THERMAL MANAGEMENT

210

Reserved
Thermal Status Log
Thermal Status

Figure 14-27. IA32_THERM_STATUS MSR
After the second temperature sensor has been tripped, the thermal monitor (TM1/TM2) will remain engaged for a
minimum time period (on the order of 1 ms). The thermal monitor will remain engaged until the processor core
temperature drops below the preset trip temperature of the temperature sensor, taking hysteresis into account.
While the processor is in a stop-clock state, interrupts will be blocked from interrupting the processor. This holding
off of interrupts increases the interrupt latency, but does not cause interrupts to be lost. Outstanding interrupts
remain pending until clock modulation is complete.
The thermal monitor can be programmed to generate an interrupt to the processor when the thermal sensor is
tripped. The delivery mode, mask and vector for this interrupt can be programmed through the thermal entry in the
local APIC’s LVT (see Section 10.5.1, “Local Vector Table”). The low-temperature interrupt enable and hightemperature interrupt enable flags in the IA32_THERM_INTERRUPT MSR (see Figure 14-28) control when the
interrupt is generated; that is, on a transition from a temperature below the trip point to above and/or vice-versa.
63

210

Reserved
Low-Temperature Interrupt Enable
High-Temperature Interrupt Enable

Figure 14-28. IA32_THERM_INTERRUPT MSR

•

High-Temperature Interrupt Enable flag, bit 0 — Enables an interrupt to be generated on the transition
from a low-temperature to a high-temperature when set; disables the interrupt when clear.(R/W).

•

Low-Temperature Interrupt Enable flag, bit 1 — Enables an interrupt to be generated on the transition
from a high-temperature to a low-temperature when set; disables the interrupt when clear.

The thermal monitor interrupt can be masked by the thermal LVT entry. After a power-up or reset, the low-temperature interrupt enable and high-temperature interrupt enable flags in the IA32_THERM_INTERRUPT MSR are
cleared (interrupts are disabled) and the thermal LVT entry is set to mask interrupts. This interrupt should be
handled either by the operating system or system management mode (SMM) code.
Note that the operation of the thermal monitoring mechanism has no effect upon the clock rate of the processor's
internal high-resolution timer (time stamp counter).

14.7.2.6

Adaptive Thermal Monitor

The Intel Core 2 Duo processor family supports enhanced thermal management mechanism, referred to as Adaptive Thermal Monitor (Adaptive TM).
Unlike TM2, Adaptive TM is not limited to one TM2 transition target. During a thermal trip event, Adaptive TM (if
enabled) selects an optimal target operating point based on whether or not the current operating point has effectively cooled the processor.
Similar to TM2, Adaptive TM is enable by BIOS. The BIOS is required to test the TM1 and TM2 feature flags and
enable all available thermal control mechanisms (including Adaptive TM) at platform initiation.
Adaptive TM is available only to a subset of processors that support TM2.

14-28 Vol. 3B

POWER AND THERMAL MANAGEMENT

In each chip-multiprocessing (CMP) silicon die, each core has a unique thermal sensor that triggers independently.
These thermal sensor can trigger TM1 or TM2 transitions in the same manner as described in Section 14.7.2.1 and
Section 14.7.2.2. The trip point of the thermal sensor is not programmable by software since it is set during the
fabrication of the processor.
Each thermal sensor in a processor core may be triggered independently to engage thermal management features.
In Adaptive TM, both cores will transition to a lower frequency and/or lower voltage level if one sensor is triggered.
Triggering of this sensor is visible to software via the thermal interrupt LVT entry in the local APIC of a given core.

14.7.3

Software Controlled Clock Modulation

Pentium 4, Intel Xeon and Pentium M processors also support software-controlled clock modulation. This provides
a means for operating systems to implement a power management policy to reduce the power consumption of the
processor. Here, the stop-clock duty cycle is controlled by software through the IA32_CLOCK_MODULATION MSR
(see Figure 14-29).
63

543

Reserved
On-Demand Clock Modulation Enable
On-Demand Clock Modulation Duty Cycle
Reserved

Figure 14-29. IA32_CLOCK_MODULATION MSR
The IA32_CLOCK_MODULATION MSR contains the following flag and field used to enable software-controlled clock
modulation and to select the clock modulation duty cycle:

•

On-Demand Clock Modulation Enable, bit 4 — Enables on-demand software controlled clock modulation
when set; disables software-controlled clock modulation when clear.

•

On-Demand Clock Modulation Duty Cycle, bits 1 through 3 — Selects the on-demand clock modulation
duty cycle (see Table 14-3). This field is only active when the on-demand clock modulation enable flag is set.

Note that the on-demand clock modulation mechanism (like the thermal monitor) controls the processor’s stopclock circuitry internally to modulate the clock signal. The STPCLK# pin is not used in this mechanism.

Table 14-3. On-Demand Clock Modulation Duty Cycle Field Encoding
Duty Cycle Field Encoding

Duty Cycle

000B

Reserved

001B

12.5% (Default)

010B

25.0%

011B

37.5%

100B

50.0%

101B

63.5%

110B

75%

111B

87.5%

The on-demand clock modulation mechanism can be used to control processor power consumption. Power
management software can write to the IA32_CLOCK_MODULATION MSR to enable clock modulation and to select
a modulation duty cycle. If on-demand clock modulation and TM1 are both enabled and the thermal status of the
processor is hot (bit 0 of the IA32_THERM_STATUS MSR is set), clock modulation at the duty cycle specified by TM1
takes precedence, regardless of the setting of the on-demand clock modulation duty cycle.
Vol. 3B 14-29

POWER AND THERMAL MANAGEMENT

For Hyper-Threading Technology enabled processors, the IA32_CLOCK_MODULATION register is duplicated for
each logical processor. In order for the On-demand clock modulation feature to work properly, the feature must be
enabled on all the logical processors within a physical processor. If the programmed duty cycle is not identical for
all the logical processors, the processor core clock will modulate to the highest duty cycle programmed for processors with any of the following CPUID DisplayFamily_DisplayModel signatures (see CPUID instruction in Chapter3,
“Instruction Set Reference, A-L” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume
2A): 06_1A, 06_1C, 06_1E, 06_1F, 06_25, 06_26, 06_27, 06_2C, 06_2E, 06_2F, 06_35, 06_36, and 0F_xx. For all
other processors, if the programmed duty cycle is not identical for all logical processors in the same core, the
processor core will modulate at the lowest programmed duty cycle.
For multiple processor cores in a physical package, each processor core can modulate to a programmed duty cycle
independently.
For the P6 family processors, on-demand clock modulation was implemented through the chipset, which controlled
clock modulation through the processor’s STPCLK# pin.

14.7.3.1

Extension of Software Controlled Clock Modulation

Extension of the software controlled clock modulation facility supports on-demand clock modulation duty cycle with
4-bit dynamic range (increased from 3-bit range). Granularity of clock modulation duty cycle is increased to 6.25%
(compared to 12.5%).
Four bit dynamic range control is provided by using bit 0 in conjunction with bits 3:1 of the
IA32_CLOCK_MODULATION MSR (see Figure 14-30).
63

543

Reserved
On-Demand Clock Modulation Enable
Extended On-Demand Clock Modulation Duty Cycle
Reserved

Figure 14-30. IA32_CLOCK_MODULATION MSR with Clock Modulation Extension
Extension to software controlled clock modulation is supported only if CPUID.06H:EAX[Bit 5] = 1. If
CPUID.06H:EAX[Bit 5] = 0, then bit 0 of IA32_CLOCK_MODULATION is reserved.

14.7.4

Detection of Thermal Monitor and Software Controlled
Clock Modulation Facilities

The ACPI flag (bit 22) of the CPUID feature flags indicates the presence of the IA32_THERM_STATUS,
IA32_THERM_INTERRUPT, IA32_CLOCK_MODULATION MSRs, and the xAPIC thermal LVT entry.
The TM1 flag (bit 29) of the CPUID feature flags indicates the presence of the automatic thermal monitoring facilities that modulate clock duty cycles.

14.7.4.1

Detection of Software Controlled Clock Modulation Extension

Processor’s support of software controlled clock modulation extension is indicated by CPUID.06H:EAX[Bit 5] = 1.

14.7.5

On Die Digital Thermal Sensors

On die digital thermal sensor can be read using an MSR (no I/O interface). In Intel Core Duo processors, each core
has a unique digital sensor whose temperature is accessible using an MSR. The digital thermal sensor is the
preferred method for reading the die temperature because (a) it is located closer to the hottest portions of the die,
(b) it enables software to accurately track the die temperature and the potential activation of thermal throttling.

14-30 Vol. 3B

POWER AND THERMAL MANAGEMENT

14.7.5.1

Digital Thermal Sensor Enumeration

The processor supports a digital thermal sensor if CPUID.06H.EAX[0] = 1. If the processor supports digital thermal
sensor, EBX[bits 3:0] determine the number of thermal thresholds that are available for use.
Software sets thermal thresholds by using the IA32_THERM_INTERRUPT MSR. Software reads output of the digital
thermal sensor using the IA32_THERM_STATUS MSR.

14.7.5.2

Reading the Digital Sensor

Unlike traditional analog thermal devices, the output of the digital thermal sensor is a temperature relative to the
maximum supported operating temperature of the processor.
Temperature measurements returned by digital thermal sensors are always at or below TCC activation temperature. Critical temperature conditions are detected using the “Critical Temperature Status” bit. When this bit is set,
the processor is operating at a critical temperature and immediate shutdown of the system should occur. Once the
“Critical Temperature Status” bit is set, reliable operation is not guaranteed.
See Figure 14-31 for the layout of IA32_THERM_STATUS MSR. Bit fields include:

•

Thermal Status (bit 0, RO) — This bit indicates whether the digital thermal sensor high-temperature output
signal (PROCHOT#) is currently active. Bit 0 = 1 indicates the feature is active. This bit may not be written by
software; it reflects the state of the digital thermal sensor.

•

Thermal Status Log (bit 1, R/WC0) — This is a sticky bit that indicates the history of the thermal sensor
high temperature output signal (PROCHOT#). Bit 1 = 1 if PROCHOT# has been asserted since a previous
RESET or the last time software cleared the bit. Software may clear this bit by writing a zero.

•

PROCHOT# or FORCEPR# Event (bit 2, RO) — Indicates whether PROCHOT# or FORCEPR# is being
asserted by another agent on the platform.

32 31

23 22

16 15

11 10 9 8 7

6 5

1 0

Reserved

Reading Valid
Resolution in Deg. Celsius
Digital Readout
Power Limit Notification Log
Power Limit Notification Status
Thermal Threshold #2 Log
Thermal Threshold #2 Status
Thermal Threshold #1 Log
Thermal Threshold #1 Status
Critical Temperature Log
Critical Temperature Status
PROCHOT# or FORCEPR# Log
PROCHOT# or FORCEPR# Event
Thermal Status Log
Thermal Status

Figure 14-31. IA32_THERM_STATUS Register

•

PROCHOT# or FORCEPR# Log (bit 3, R/WC0) — Sticky bit that indicates whether PROCHOT# or
FORCEPR# has been asserted by another agent on the platform since the last clearing of this bit or a reset. If
bit 3 = 1, PROCHOT# or FORCEPR# has been externally asserted. Software may clear this bit by writing a zero.
External PROCHOT# assertions are only acknowledged if the Bidirectional Prochot feature is enabled.

•

Critical Temperature Status (bit 4, RO) — Indicates whether the critical temperature detector output signal
is currently active. If bit 4 = 1, the critical temperature detector output signal is currently active.

Vol. 3B 14-31

POWER AND THERMAL MANAGEMENT

•

Critical Temperature Log (bit 5, R/WC0) — Sticky bit that indicates whether the critical temperature
detector output signal has been asserted since the last clearing of this bit or reset. If bit 5 = 1, the output
signal has been asserted. Software may clear this bit by writing a zero.

•

Thermal Threshold #1 Status (bit 6, RO) — Indicates whether the actual temperature is currently higher
than or equal to the value set in Thermal Threshold #1. If bit 6 = 0, the actual temperature is lower. If
bit 6 = 1, the actual temperature is greater than or equal to TT#1. Quantitative information of actual
temperature can be inferred from Digital Readout, bits 22:16.

•

Thermal Threshold #1 Log (bit 7, R/WC0) — Sticky bit that indicates whether the Thermal Threshold #1
has been reached since the last clearing of this bit or a reset. If bit 7 = 1, the Threshold #1 has been reached.
Software may clear this bit by writing a zero.

•

Thermal Threshold #2 Status (bit 8, RO) — Indicates whether actual temperature is currently higher than
or equal to the value set in Thermal Threshold #2. If bit 8 = 0, the actual temperature is lower. If bit 8 = 1, the
actual temperature is greater than or equal to TT#2. Quantitative information of actual temperature can be
inferred from Digital Readout, bits 22:16.

•

Thermal Threshold #2 Log (bit 9, R/WC0) — Sticky bit that indicates whether the Thermal Threshold #2
has been reached since the last clearing of this bit or a reset. If bit 9 = 1, the Thermal Threshold #2 has been
reached. Software may clear this bit by writing a zero.

•

Power Limitation Status (bit 10, RO) — Indicates whether the processor is currently operating below OSrequested P-state (specified in IA32_PERF_CTL) or OS-requested clock modulation duty cycle (specified in
IA32_CLOCK_MODULATION). This field is supported only if CPUID.06H:EAX[bit 4] = 1. Package level power
limit notification can be delivered independently to IA32_PACKAGE_THERM_STATUS MSR.

•

Power Notification Log (bit 11, R/WCO) — Sticky bit that indicates the processor went below OS-requested
P-state or OS-requested clock modulation duty cycle since the last clearing of this or RESET. This field is
supported only if CPUID.06H:EAX[bit 4] = 1. Package level power limit notification is indicated independently
in IA32_PACKAGE_THERM_STATUS MSR.

•

Digital Readout (bits 22:16, RO) — Digital temperature reading in 1 degree Celsius relative to the TCC
activation temperature.
0: TCC Activation temperature,
1: (TCC Activation - 1) , etc. See the processor’s data sheet for details regarding TCC activation.
A lower reading in the Digital Readout field (bits 22:16) indicates a higher actual temperature.

•

Resolution in Degrees Celsius (bits 30:27, RO) — Specifies the resolution (or tolerance) of the digital
thermal sensor. The value is in degrees Celsius. It is recommended that new threshold values be offset from the
current temperature by at least the resolution + 1 in order to avoid hysteresis of interrupt generation.

•

Reading Valid (bit 31, RO) — Indicates if the digital readout in bits 22:16 is valid. The readout is valid if
bit 31 = 1.

Changes to temperature can be detected using two thresholds (see Figure 14-32); one is set above and the other
below the current temperature. These thresholds have the capability of generating interrupts using the core's local
APIC which software must then service. Note that the local APIC entries used by these thresholds are also used by
the Intel® Thermal Monitor; it is up to software to determine the source of a specific interrupt.

14-32 Vol. 3B

POWER AND THERMAL MANAGEMENT

25 24 23 22

16 15 14

Reserved
Power Limit Notification Enable
Threshold #2 Interrupt Enable
Threshold #2 Value
Threshold #1 Interrupt Enable
Threshold #1 Value
Overheat Interrupt Enable
FORCPR# Interrupt Enable
PROCHOT# Interrupt Enable
Low Temp. Interrupt Enable
High Temp. Interrupt Enable

Figure 14-32. IA32_THERM_INTERRUPT Register

See Figure 14-32 for the layout of IA32_THERM_INTERRUPT MSR. Bit fields include:

•

High-Temperature Interrupt Enable (bit 0, R/W) — This bit allows the BIOS to enable the generation of
an interrupt on the transition from low-temperature to a high-temperature threshold. Bit 0 = 0 (default)
disables interrupts; bit 0 = 1 enables interrupts.

•

Low-Temperature Interrupt Enable (bit 1, R/W) — This bit allows the BIOS to enable the generation of an
interrupt on the transition from high-temperature to a low-temperature (TCC de-activation). Bit 1 = 0 (default)
disables interrupts; bit 1 = 1 enables interrupts.

•

PROCHOT# Interrupt Enable (bit 2, R/W) — This bit allows the BIOS or OS to enable the generation of an
interrupt when PROCHOT# has been asserted by another agent on the platform and the Bidirectional Prochot
feature is enabled. Bit 2 = 0 disables the interrupt; bit 2 = 1 enables the interrupt.

•

FORCEPR# Interrupt Enable (bit 3, R/W) — This bit allows the BIOS or OS to enable the generation of an
interrupt when FORCEPR# has been asserted by another agent on the platform. Bit 3 = 0 disables the
interrupt; bit 3 = 1 enables the interrupt.

•

Critical Temperature Interrupt Enable (bit 4, R/W) — Enables the generation of an interrupt when the
Critical Temperature Detector has detected a critical thermal condition. The recommended response to this
condition is a system shutdown. Bit 4 = 0 disables the interrupt; bit 4 = 1 enables the interrupt.

•

Threshold #1 Value (bits 14:8, R/W) — A temperature threshold, encoded relative to the TCC Activation
temperature (using the same format as the Digital Readout). This threshold is compared against the Digital
Readout and is used to generate the Thermal Threshold #1 Status and Log bits as well as the Threshold #1
thermal interrupt delivery.

•

Threshold #1 Interrupt Enable (bit 15, R/W) — Enables the generation of an interrupt when the actual
temperature crosses the Threshold #1 setting in any direction. Bit 15 = 1 enables the interrupt; bit 15 = 0
disables the interrupt.

•

Threshold #2 Value (bits 22:16, R/W) —A temperature threshold, encoded relative to the TCC Activation
temperature (using the same format as the Digital Readout). This threshold is compared against the Digital
Readout and is used to generate the Thermal Threshold #2 Status and Log bits as well as the Threshold #2
thermal interrupt delivery.

•

Threshold #2 Interrupt Enable (bit 23, R/W) — Enables the generation of an interrupt when the actual
temperature crosses the Threshold #2 setting in any direction. Bit 23 = 1enables the interrupt; bit 23 = 0
disables the interrupt.

•

Power Limit Notification Enable (bit 24, R/W) — Enables the generation of power notification events when
the processor went below OS-requested P-state or OS-requested clock modulation duty cycle. This field is
supported only if CPUID.06H:EAX[bit 4] = 1. Package level power limit notification can be enabled independently by IA32_PACKAGE_THERM_INTERRUPT MSR.

Vol. 3B 14-33

POWER AND THERMAL MANAGEMENT

14.7.6

Power Limit Notification

Platform firmware may be capable of specifying a power limit to restrict power delivered to a platform component,
such as a physical processor package. This constraint imposed by platform firmware may occasionally cause the
processor to operate below OS-requested P or T-state. A power limit notification event can be delivered using the
existing thermal LVT entry in the local APIC.
Software can enumerate the presence of the processor’s support for power limit notification by verifying
CPUID.06H:EAX[bit 4] = 1.
If CPUID.06H:EAX[bit 4] = 1, then IA32_THERM_INTERRUPT and IA32_THERM_STATUS provides the following
facility to manage power limit notification:

•

Bits 10 and 11 in IA32_THERM_STATUS informs software of the occurrence of processor operating below OSrequested P-state or clock modulation duty cycle setting (see Figure 14-31).

•

Bit 24 in IA32_THERM_INTERRUPT enables the local APIC to deliver a thermal event when the processor went
below OS-requested P-state or clock modulation duty cycle setting (see Figure 14-32).

14.8

PACKAGE LEVEL THERMAL MANAGEMENT

The thermal management facilities like IA32_THERM_INTERRUPT and IA32_THERM_STATUS are often implemented with a processor core granularity. To facilitate software manage thermal events from a package level granularity, two architectural MSR is provided for package level thermal management. The
IA32_PACKAGE_THERM_STATUS and IA32_PACKAGE_THERM_INTERRUPT MSRs use similar interfaces as
IA32_THERM_STATUS and IA32_THERM_INTERRUPT, but are shared in each physical processor package.
Software can enumerate the presence of the processor’s support for package level thermal management facility
(IA32_PACKAGE_THERM_STATUS and IA32_PACKAGE_THERM_INTERRUPT) by verifying CPUID.06H:EAX[bit 6] =
1.
The layout of IA32_PACKAGE_THERM_STATUS MSR is shown in Figure 14-33.

32 31

23 22

16 15

11 10 9 8 7

6 5

1 0

Reserved

PKG Digital Readout
PKG Power Limit Notification Log
PKG Power Limit Notification Status
PKG Thermal Threshold #2 Log
PKG Thermal Threshold #2 Status
PKG Thermal Threshold #1 Log
PKG Thermal Threshold #1 Status
PKG Critical Temperature Log
PKG Critical Temperature Status
PKG PROCHOT# or FORCEPR# Log
PKG PROCHOT# or FORCEPR# Event
PKG Thermal Status Log
PKG Thermal Status

Figure 14-33. IA32_PACKAGE_THERM_STATUS Register

•

Package Thermal Status (bit 0, RO) — This bit indicates whether the digital thermal sensor hightemperature output signal (PROCHOT#) for the package is currently active. Bit 0 = 1 indicates the feature is
active. This bit may not be written by software; it reflects the state of the digital thermal sensor.

14-34 Vol. 3B

POWER AND THERMAL MANAGEMENT

•

Package Thermal Status Log (bit 1, R/WC0) — This is a sticky bit that indicates the history of the thermal
sensor high temperature output signal (PROCHOT#) of the package. Bit 1 = 1 if package PROCHOT# has been
asserted since a previous RESET or the last time software cleared the bit. Software may clear this bit by writing
a zero.

•

Package PROCHOT# Event (bit 2, RO) — Indicates whether package PROCHOT# is being asserted by
another agent on the platform.

•

Package PROCHOT# Log (bit 3, R/WC0) — Sticky bit that indicates whether package PROCHOT# has been
asserted by another agent on the platform since the last clearing of this bit or a reset. If bit 3 = 1, package
PROCHOT# has been externally asserted. Software may clear this bit by writing a zero.

•

Package Critical Temperature Status (bit 4, RO) — Indicates whether the package critical temperature
detector output signal is currently active. If bit 4 = 1, the package critical temperature detector output signal
is currently active.

•

Package Critical Temperature Log (bit 5, R/WC0) — Sticky bit that indicates whether the package critical
temperature detector output signal has been asserted since the last clearing of this bit or reset. If bit 5 = 1, the
output signal has been asserted. Software may clear this bit by writing a zero.

•

Package Thermal Threshold #1 Status (bit 6, RO) — Indicates whether the actual package temperature is
currently higher than or equal to the value set in Package Thermal Threshold #1. If bit 6 = 0, the actual
temperature is lower. If bit 6 = 1, the actual temperature is greater than or equal to PTT#1. Quantitative
information of actual package temperature can be inferred from Package Digital Readout, bits 22:16.

•

Package Thermal Threshold #1 Log (bit 7, R/WC0) — Sticky bit that indicates whether the Package
Thermal Threshold #1 has been reached since the last clearing of this bit or a reset. If bit 7 = 1, the Package
Threshold #1 has been reached. Software may clear this bit by writing a zero.

•

Package Thermal Threshold #2 Status (bit 8, RO) — Indicates whether actual package temperature is
currently higher than or equal to the value set in Package Thermal Threshold #2. If bit 8 = 0, the actual
temperature is lower. If bit 8 = 1, the actual temperature is greater than or equal to PTT#2. Quantitative
information of actual temperature can be inferred from Package Digital Readout, bits 22:16.

•

Package Thermal Threshold #2 Log (bit 9, R/WC0) — Sticky bit that indicates whether the Package
Thermal Threshold #2 has been reached since the last clearing of this bit or a reset. If bit 9 = 1, the Package
Thermal Threshold #2 has been reached. Software may clear this bit by writing a zero.

•

Package Power Limitation Status (bit 10, RO) — Indicates package power limit is forcing one ore more
processors to operate below OS-requested P-state. Note that package power limit violation may be caused by
processor cores or by devices residing in the uncore. Software can examine IA32_THERM_STATUS to
determine if the cause originates from a processor core (see Figure 14-31).

•

Package Power Notification Log (bit 11, R/WCO) — Sticky bit that indicates any processor in the package
went below OS-requested P-state or OS-requested clock modulation duty cycle since the last clearing of this or
RESET.

•

Package Digital Readout (bits 22:16, RO) — Package digital temperature reading in 1 degree Celsius
relative to the package TCC activation temperature.
0: Package TCC Activation temperature,
1: (PTCC Activation - 1) , etc. See the processor’s data sheet for details regarding PTCC activation.
A lower reading in the Package Digital Readout field (bits 22:16) indicates a higher actual temperature.

The layout of IA32_PACKAGE_THERM_INTERRUPT MSR is shown in Figure 14-34.

Vol. 3B 14-35

POWER AND THERMAL MANAGEMENT

25 24 23 22

16 15 14

Reserved
Pkg Power Limit Notification Enable
Pkg Threshold #2 Interrupt Enable
Pkg Threshold #2 Value
Pkg Threshold #1 Interrupt Enable
Pkg Threshold #1 Value
Pkg Overheat Interrupt Enable
Pkg PROCHOT# Interrupt Enable
Pkg Low Temp. Interrupt Enable
Pkg High Temp. Interrupt Enable

Figure 14-34. IA32_PACKAGE_THERM_INTERRUPT Register

•

Package High-Temperature Interrupt Enable (bit 0, R/W) — This bit allows the BIOS to enable the
generation of an interrupt on the transition from low-temperature to a package high-temperature threshold.
Bit 0 = 0 (default) disables interrupts; bit 0 = 1 enables interrupts.

•

Package Low-Temperature Interrupt Enable (bit 1, R/W) — This bit allows the BIOS to enable the
generation of an interrupt on the transition from high-temperature to a low-temperature (TCC de-activation).
Bit 1 = 0 (default) disables interrupts; bit 1 = 1 enables interrupts.

•

Package PROCHOT# Interrupt Enable (bit 2, R/W) — This bit allows the BIOS or OS to enable the
generation of an interrupt when Package PROCHOT# has been asserted by another agent on the platform and
the Bidirectional Prochot feature is enabled. Bit 2 = 0 disables the interrupt; bit 2 = 1 enables the interrupt.

•

Package Critical Temperature Interrupt Enable (bit 4, R/W) — Enables the generation of an interrupt
when the Package Critical Temperature Detector has detected a critical thermal condition. The recommended
response to this condition is a system shutdown. Bit 4 = 0 disables the interrupt; bit 4 = 1 enables the
interrupt.

•

Package Threshold #1 Value (bits 14:8, R/W) — A temperature threshold, encoded relative to the
Package TCC Activation temperature (using the same format as the Digital Readout). This threshold is
compared against the Package Digital Readout and is used to generate the Package Thermal Threshold #1
Status and Log bits as well as the Package Threshold #1 thermal interrupt delivery.

•

Package Threshold #1 Interrupt Enable (bit 15, R/W) — Enables the generation of an interrupt when the
actual temperature crosses the Package Threshold #1 setting in any direction. Bit 15 = 1 enables the interrupt;
bit 15 = 0 disables the interrupt.

•

Package Threshold #2 Value (bits 22:16, R/W) —A temperature threshold, encoded relative to the PTCC
Activation temperature (using the same format as the Package Digital Readout). This threshold is compared
against the Package Digital Readout and is used to generate the Package Thermal Threshold #2 Status and Log
bits as well as the Package Threshold #2 thermal interrupt delivery.

•

Package Threshold #2 Interrupt Enable (bit 23, R/W) — Enables the generation of an interrupt when the
actual temperature crosses the Package Threshold #2 setting in any direction. Bit 23 = 1 enables the interrupt;
bit 23 = 0 disables the interrupt.

•

Package Power Limit Notification Enable (bit 24, R/W) — Enables the generation of package power
notification events.

14.8.1

Support for Passive and Active cooling

Passive and active cooling may be controlled by the OS power management agent through ACPI control methods.
On platforms providing package level thermal management facility described in the previous section, it is recommended that active cooling (FAN control) should be driven by measuring the package temperature using the
IA32_PACKAGE_THERM_INTERRUPT MSR.

14-36 Vol. 3B

POWER AND THERMAL MANAGEMENT

Passive cooling (frequency throttling) should be driven by measuring (a) the core and package temperatures, or
(b) only the package temperature. If measured package temperature led the power management agent to choose
which core to execute passive cooling, then all cores need to execute passive cooling. Core temperature is
measured using the IA32_THERMAL_STATUS and IA32_THERMAL_INTERRUPT MSRs. The exact implementation
details depend on the platform firmware and possible solutions include defining two different thermal zones (one
for core temperature and passive cooling and the other for package temperature and active cooling).

14.9

PLATFORM SPECIFIC POWER MANAGEMENT SUPPORT

This section covers power management interfaces that are not architectural but addresses the power management
needs of several platform specific components. Specifically, RAPL (Running Average Power Limit) interfaces provide
mechanisms to enforce power consumption limit. Power limiting usages have specific usages in client and server
platforms.
For client platform power limit control and for server platforms used in a data center, the following power and
thermal related usages are desirable:

•

Platform Thermal Management: Robust mechanisms to manage component, platform, and group-level
thermals, either proactively or reactively (e.g., in response to a platform-level thermal trip point).

•

Platform Power Limiting: More deterministic control over the system's power consumption, for example to
meet battery life targets on rack-level or container-level power consumption goals within a datacenter.

•

Power/Performance Budgeting: Efficient means to control the power consumed (and therefore the sustained
performance delivered) within and across platforms.

The server and client usage models are addressed by RAPL interfaces, which expose multiple domains of power
rationing within each processor socket. Generally, these RAPL domains may be viewed to include hierarchically:

•
•

Package domain is the processor die.
Memory domain includes the directly-attached DRAM; an additional power plane may constitute a separate
domain.

In order to manage the power consumed across multiple sockets via RAPL, individual limits must be programmed
for each processor complex. Programming specific RAPL domain across multiple sockets is not supported.

14.9.1

RAPL Interfaces

RAPL interfaces consist of non-architectural MSRs. Each RAPL domain supports the following set of capabilities,
some of which are optional as stated below.

•
•
•

Power limit - MSR interfaces to specify power limit, time window; lock bit, clamp bit etc.

•

Power Info (Optional) - Interface providing information on the range of parameters for a given domain,
minimum power, maximum power etc.

•

Policy (Optional) - 4-bit priority information that is a hint to hardware for dividing budget between sub-domains
in a parent domain.

Energy Status - Power metering interface providing energy consumption information.
Perf Status (Optional) - Interface providing information on the performance effects (regression) due to power
limits. It is defined as a duration metric that measures the power limit effect in the respective domain. The
meaning of duration is domain specific.

Each of the above capabilities requires specific units in order to describe them. Power is expressed in Watts, Time
is expressed in Seconds, and Energy is expressed in Joules. Scaling factors are supplied to each unit to make the
information presented meaningful in a finite number of bits. Units for power, energy, and time are exposed in the
read-only MSR_RAPL_POWER_UNIT MSR.

Vol. 3B 14-37

POWER AND THERMAL MANAGEMENT

20 19

16 15 13 12

8 7

Reserved

Time units
Energy status units
Power units

Figure 14-35. MSR_RAPL_POWER_UNIT Register
MSR_RAPL_POWER_UNIT (Figure 14-35) provides the following information across all RAPL domains:

•

Power Units (bits 3:0): Power related information (in Watts) is based on the multiplier, 1/ 2^PU; where PU is
an unsigned integer represented by bits 3:0. Default value is 0011b, indicating power unit is in 1/8 Watts
increment.

•

Energy Status Units (bits 12:8): Energy related information (in Joules) is based on the multiplier, 1/2^ESU;
where ESU is an unsigned integer represented by bits 12:8. Default value is 10000b, indicating energy status
unit is in 15.3 micro-Joules increment.

•

Time Units (bits 19:16): Time related information (in Seconds) is based on the multiplier, 1/ 2^TU; where TU
is an unsigned integer represented by bits 19:16. Default value is 1010b, indicating time unit is in 976 microseconds increment.

14.9.2

RAPL Domains and Platform Specificity

The specific RAPL domains available in a platform vary across product segments. Platforms targeting the client
segment support the following RAPL domain hierarchy:

•
•

Package
Two power planes: PP0 and PP1 (PP1 may reflect to uncore devices)

Platforms targeting the server segment support the following RAPL domain hierarchy:

•
•
•

Package
Power plane: PP0
DRAM

Each level of the RAPL hierarchy provides a respective set of RAPL interface MSRs. Table 14-4 lists the RAPL MSR
interfaces available for each RAPL domain. The power limit MSR of each RAPL domain is located at offset 0 relative
to an MSR base address which is non-architectural (see Chapter 2, “Model-Specific Registers (MSRs)” in the Intel®
64 and IA-32 Architectures Software Developer’s Manual, Volume 4). The energy status MSR of each domain is
located at offset 1 relative to the MSR base address of respective domain.

Table 14-4. RAPL MSR Interfaces and RAPL Domains
Domain
PKG
DRAM
PP0

14-38 Vol. 3B

Power Limit
(Offset 0)
MSR_PKG_POWER_
LIMIT

Energy Status (Offset
1)

MSR_PKG_ENERGY_STA RESERVED
TUS

MSR_DRAM_POWER MSR_DRAM_ENERGY_S
_LIMIT
TATUS
MSR_PP0_POWER_
LIMIT

Policy
(Offset 2)

RESERVED

MSR_PP0_ENERGY_STA MSR_PP0_POLICY
TUS

Perf Status
(Offset 3)

Power Info
(Offset 4)

MSR_PKG_PERF_STATUS

MSR_PKG_POWER_I
NFO

MSR_DRAM_PERF_STATUS

MSR_DRAM_POWER
_INFO

MSR_PP0_PERF_STATUS

RESERVED

POWER AND THERMAL MANAGEMENT

Table 14-4. RAPL MSR Interfaces and RAPL Domains
PP1

MSR_PP1_POWER_
LIMIT

MSR_PP1_ENERGY_STA MSR_PP1_POLICY
TUS

RESERVED

The presence of the optional MSR interfaces (the three right-most columns of Table 14-4) may be model-specific.
See Chapter 2, “Model-Specific Registers (MSRs)” in the Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Volume 4 for details.

14.9.3

Package RAPL Domain

The MSR interfaces defined for the package RAPL domain are:

•

MSR_PKG_POWER_LIMIT allows software to set power limits for the package and measurement attributes
associated with each limit,

•
•

MSR_PKG_ENERGY_STATUS reports measured actual energy usage,
MSR_PKG_POWER_INFO reports the package power range information for RAPL usage.

MSR_PKG_PERF_STATUS can report the performance impact of power limiting, but its availability may be modelspecific.

63 62
L
O
C
K

56 55

49 48 47 46

Time window
Power Limit #2

32 31

24 23

Pkg Power Limit #2

17 16 15 14

Time window
Power Limit #1

Pkg Power Limit #1

Enable limit #1
Pkg clamping limit #1
Enable limit #2
Pkg clamping limit #2

Figure 14-36. MSR_PKG_POWER_LIMIT Register
MSR_PKG_POWER_LIMIT allows a software agent to define power limitation for the package domain. Power limitation is defined in terms of average power usage (Watts) over a time window specified in MSR_PKG_POWER_LIMIT.
Two power limits can be specified, corresponding to time windows of different sizes. Each power limit provides
independent clamping control that would permit the processor cores to go below OS-requested state to meet the
power limits. A lock mechanism allow the software agent to enforce power limit settings. Once the lock bit is set,
the power limit settings are static and un-modifiable until next RESET.
The bit fields of MSR_PKG_POWER_LIMIT (Figure 14-36) are:

•

Package Power Limit #1(bits 14:0): Sets the average power usage limit of the package domain corresponding to time window # 1. The unit of this field is specified by the “Power Units” field of
MSR_RAPL_POWER_UNIT.

•
•

Enable Power Limit #1(bit 15): 0 = disabled; 1 = enabled.

•

Time Window for Power Limit #1 (bits 23:17): Indicates the time window for power limit #1

Package Clamping Limitation #1 (bit 16): Allow going below OS-requested P/T state setting during time
window specified by bits 23:17.
Time limit = 2^Y * (1.0 + Z/4.0) * Time_Unit
Here “Y” is the unsigned integer value represented. by bits 21:17, “Z” is an unsigned integer represented by
bits 23:22. “Time_Unit” is specified by the “Time Units” field of MSR_RAPL_POWER_UNIT.

Vol. 3B 14-39

POWER AND THERMAL MANAGEMENT

•

Package Power Limit #2(bits 46:32): Sets the average power usage limit of the package domain corresponding to time window # 2. The unit of this field is specified by the “Power Units” field of
MSR_RAPL_POWER_UNIT.

•
•

Enable Power Limit #2(bit 47): 0 = disabled; 1 = enabled.

•

Time Window for Power Limit #2 (bits 55:49): Indicates the time window for power limit #2

Package Clamping Limitation #2 (bit 48): Allow going below OS-requested P/T state setting during time
window specified by bits 23:17.
Time limit = 2^Y * (1.0 + Z/4.0) * Time_Unit
Here “Y” is the unsigned integer value represented. by bits 53:49, “Z” is an unsigned integer represented by
bits 55:54. “Time_Unit” is specified by the “Time Units” field of MSR_RAPL_POWER_UNIT. This field may have
a hard-coded value in hardware and ignores values written by software.

•

Lock (bit 63): If set, all write attempts to this MSR are ignored until next RESET.

MSR_PKG_ENERGY_STATUS is a read-only MSR. It reports the actual energy use for the package domain. This MSR
is updated every ~1msec. It has a wraparound time of around 60 secs when power consumption is high, and may
be longer otherwise.
32 31

Reserved
Total Energy Consumed
Reserved

Figure 14-37. MSR_PKG_ENERGY_STATUS MSR

•

Total Energy Consumed (bits 31:0): The unsigned integer value represents the total amount of energy
consumed since that last time this register is cleared. The unit of this field is specified by the “Energy Status
Units” field of MSR_RAPL_POWER_UNIT.

MSR_PKG_POWER_INFO is a read-only MSR. It reports the package power range information for RAPL usage. This
MSR provides maximum/minimum values (derived from electrical specification), thermal specification power of the
package domain. It also provides the largest possible time window for software to program the RAPL interface.

54 53

48 47 46

Maximum Time window

32 31 30

Maximum Power

Minimum Power

16 15 14

Thermal Spec Power

Figure 14-38. MSR_PKG_POWER_INFO Register

•

Thermal Spec Power (bits 14:0): The unsigned integer value is the equivalent of thermal specification power
of the package domain. The unit of this field is specified by the “Power Units” field of MSR_RAPL_POWER_UNIT.

•

Minimum Power (bits 30:16): The unsigned integer value is the equivalent of minimum power derived from
electrical spec of the package domain. The unit of this field is specified by the “Power Units” field of
MSR_RAPL_POWER_UNIT.

•

Maximum Power (bits 46:32): The unsigned integer value is the equivalent of maximum power derived from
the electrical spec of the package domain. The unit of this field is specified by the “Power Units” field of
MSR_RAPL_POWER_UNIT.

14-40 Vol. 3B

POWER AND THERMAL MANAGEMENT

•

Maximum Time Window (bits 53:48): The unsigned integer value is the equivalent of largest acceptable
value to program the time window of MSR_PKG_POWER_LIMIT. The unit of this field is specified by the “Time
Units” field of MSR_RAPL_POWER_UNIT.

MSR_PKG_PERF_STATUS is a read-only MSR. It reports the total time for which the package was throttled due to
the RAPL power limits. Throttling in this context is defined as going below the OS-requested P-state or T-state. It
has a wrap-around time of many hours. The availability of this MSR is platform specific (see Chapter 2, “ModelSpecific Registers (MSRs)” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 4).
32 31

Reserved
Accumulated pkg throttled time
Reserved

Figure 14-39. MSR_PKG_PERF_STATUS MSR

•

Accumulated Package Throttled Time (bits 31:0): The unsigned integer value represents the cumulative
time (since the last time this register is cleared) that the package has throttled. The unit of this field is specified
by the “Time Units” field of MSR_RAPL_POWER_UNIT.

14.9.4

PP0/PP1 RAPL Domains

The MSR interfaces defined for the PP0 and PP1 domains are identical in layout. Generally, PP0 refers to the
processor cores. The availability of PP1 RAPL domain interface is platform-specific. For a client platform, the PP1
domain refers to the power plane of a specific device in the uncore. For server platforms, the PP1 domain is not
supported, but its PP0 domain supports the MSR_PP0_PERF_STATUS interface.

•

MSR_PP0_POWER_LIMIT/MSR_PP1_POWER_LIMIT allow software to set power limits for the respective power
plane domain.

•
•

MSR_PP0_ENERGY_STATUS/MSR_PP1_ENERGY_STATUS report actual energy usage on a power plane.
MSR_PP0_POLICY/MSR_PP1_POLICY allow software to adjust balance for respective power plane.

MSR_PP0_PERF_STATUS can report the performance impact of power limiting, but it is not available in client platforms.

32 31 30
L
O
C
K

24 23
Time window
Power Limit

17 16 15 14

Power Limit

Enable limit
Clamping limit

Figure 14-40. MSR_PP0_POWER_LIMIT/MSR_PP1_POWER_LIMIT Register
MSR_PP0_POWER_LIMIT/MSR_PP1_POWER_LIMIT allow a software agent to define power limitation for the
respective power plane domain. A lock mechanism in each power plane domain allows the software agent to
enforce power limit settings independently. Once a lock bit is set, the power limit settings in that power plane are
static and un-modifiable until next RESET.
The bit fields of MSR_PP0_POWER_LIMIT/MSR_PP1_POWER_LIMIT (Figure 14-40) are:

Vol. 3B 14-41

POWER AND THERMAL MANAGEMENT

•

Power Limit (bits 14:0): Sets the average power usage limit of the respective power plane domain. The unit
of this field is specified by the “Power Units” field of MSR_RAPL_POWER_UNIT.

•
•

Enable Power Limit (bit 15): 0 = disabled; 1 = enabled.

•

Time Window for Power Limit (bits 23:17): Indicates the length of time window over which the power limit
#1 will be used by the processor. The numeric value encoded by bits 23:17 is represented by the product of
2^Y *F; where F is a single-digit decimal floating-point value between 1.0 and 1.3 with the fraction digit
represented by bits 23:22, Y is an unsigned integer represented by bits 21:17. The unit of this field is specified
by the “Time Units” field of MSR_RAPL_POWER_UNIT.

•

Lock (bit 31): If set, all write attempts to the MSR and corresponding policy
MSR_PP0_POLICY/MSR_PP1_POLICY are ignored until next RESET.

Clamping Limitation (bit 16): Allow going below OS-requested P/T state setting during time window specified
by bits 23:17.

MSR_PP0_ENERGY_STATUS/MSR_PP1_ENERGY_STATUS are read-only MSRs. They report the actual energy use
for the respective power plane domains. These MSRs are updated every ~1msec.
32 31

Reserved
Total Energy Consumed
Reserved

Figure 14-41. MSR_PP0_ENERGY_STATUS/MSR_PP1_ENERGY_STATUS MSR

•

Total Energy Consumed (bits 31:0): The unsigned integer value represents the total amount of energy
consumed since the last time this register was cleared. The unit of this field is specified by the “Energy Status
Units” field of MSR_RAPL_POWER_UNIT.

MSR_PP0_POLICY/MSR_PP1_POLICY provide balance power policy control for each power plane by providing
inputs to the power budgeting management algorithm. On platforms that support PP0 (IA cores) and PP1 (uncore
graphic device), the default values give priority to the non-IA power plane. These MSRs enable the PCU to balance
power consumption between the IA cores and uncore graphic device.

5 4

Priority Level

Figure 14-42. MSR_PP0_POLICY/MSR_PP1_POLICY Register

•

Priority Level (bits 4:0): Priority level input to the PCU for respective power plane. PP0 covers the IA
processor cores, PP1 covers the uncore graphic device. The value 31 is considered highest priority.

MSR_PP0_PERF_STATUS is a read-only MSR. It reports the total time for which the PP0 domain was throttled due
to the power limits. This MSR is supported only in server platform. Throttling in this context is defined as going
below the OS-requested P-state or T-state.

14-42 Vol. 3B

POWER AND THERMAL MANAGEMENT

32 31

Reserved
Accumulated PP0 throttled time
Reserved

Figure 14-43. MSR_PP0_PERF_STATUS MSR

•

Accumulated PP0 Throttled Time (bits 31:0): The unsigned integer value represents the cumulative time
(since the last time this register is cleared) that the PP0 domain has throttled. The unit of this field is specified
by the “Time Units” field of MSR_RAPL_POWER_UNIT.

14.9.5

DRAM RAPL Domain

The MSR interfaces defined for the DRAM domains are supported only in the server platform. The MSR interfaces
are:

•

MSR_DRAM_POWER_LIMIT allows software to set power limits for the DRAM domain and measurement
attributes associated with each limit.

•
•
•

MSR_DRAM_ENERGY_STATUS reports measured actual energy usage.
MSR_DRAM_POWER_INFO reports the DRAM domain power range information for RAPL usage.
MSR_DRAM_PERF_STATUS can report the performance impact of power limiting.

32 31 30

24 23

L
O
C
K

Time window
Power Limit

17 16 15 14

Power Limit

Enable limit
Clamping limit

Figure 14-44. MSR_DRAM_POWER_LIMIT Register
MSR_DRAM_POWER_LIMIT allows a software agent to define power limitation for the DRAM domain. Power limitation is defined in terms of average power usage (Watts) over a time window specified in
MSR_DRAM_POWER_LIMIT. A power limit can be specified along with a time window. A lock mechanism allow the
software agent to enforce power limit settings. Once the lock bit is set, the power limit settings are static and unmodifiable until next RESET.
The bit fields of MSR_DRAM_POWER_LIMIT (Figure 14-44) are:

•

DRAM Power Limit #1(bits 14:0): Sets the average power usage limit of the DRAM domain corresponding to
time window # 1. The unit of this field is specified by the “Power Units” field of MSR_RAPL_POWER_UNIT.

•
•

Enable Power Limit #1(bit 15): 0 = disabled; 1 = enabled.

•

Time Window for Power Limit (bits 23:17): Indicates the length of time window over which the power limit
will be used by the processor. The numeric value encoded by bits 23:17 is represented by the product of 2^Y
*F; where F is a single-digit decimal floating-point value between 1.0 and 1.3 with the fraction digit
represented by bits 23:22, Y is an unsigned integer represented by bits 21:17. The unit of this field is specified
by the “Time Units” field of MSR_RAPL_POWER_UNIT.
Lock (bit 31): If set, all write attempts to this MSR are ignored until next RESET.

Vol. 3B 14-43

POWER AND THERMAL MANAGEMENT

MSR_DRAM_ENERGY_STATUS is a read-only MSR. It reports the actual energy use for the DRAM domain. This MSR
is updated every ~1msec.
32 31

Reserved
Total Energy Consumed
Reserved

Figure 14-45. MSR_DRAM_ENERGY_STATUS MSR

•

MSR_DRAM_POWER_INFO is a read-only MSR. It reports the DRAM power range information for RAPL usage. This
MSR provides maximum/minimum values (derived from electrical specification), thermal specification power of the
DRAM domain. It also provides the largest possible time window for software to program the RAPL interface.

54 53

48 47 46

Maximum Time window

32 31 30

Maximum Power

Minimum Power

16 15 14

Thermal Spec Power

Figure 14-46. MSR_DRAM_POWER_INFO Register

•

Thermal Spec Power (bits 14:0): The unsigned integer value is the equivalent of thermal specification power
of the DRAM domain. The unit of this field is specified by the “Power Units” field of MSR_RAPL_POWER_UNIT.

•

Minimum Power (bits 30:16): The unsigned integer value is the equivalent of minimum power derived from
electrical spec of the DRAM domain. The unit of this field is specified by the “Power Units” field of
MSR_RAPL_POWER_UNIT.

•

Maximum Power (bits 46:32): The unsigned integer value is the equivalent of maximum power derived from
the electrical spec of the DRAM domain. The unit of this field is specified by the “Power Units” field of
MSR_RAPL_POWER_UNIT.

•

Maximum Time Window (bits 53:48): The unsigned integer value is the equivalent of largest acceptable
value to program the time window of MSR_DRAM_POWER_LIMIT. The unit of this field is specified by the “Time
Units” field of MSR_RAPL_POWER_UNIT.

MSR_DRAM_PERF_STATUS is a read-only MSR. It reports the total time for which the package was throttled due to
the RAPL power limits. Throttling in this context is defined as going below the OS-requested P-state or T-state. It
has a wrap-around time of many hours. The availability of this MSR is platform specific (see Chapter 2, “ModelSpecific Registers (MSRs)” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 4).
32 31

Reserved
Accumulated DRAM throttled time
Reserved

Figure 14-47. MSR_DRAM_PERF_STATUS MSR
14-44 Vol. 3B

POWER AND THERMAL MANAGEMENT

•

Accumulated Package Throttled Time (bits 31:0): The unsigned integer value represents the cumulative
time (since the last time this register is cleared) that the DRAM domain has throttled. The unit of this field is
specified by the “Time Units” field of MSR_RAPL_POWER_UNIT.

Vol. 3B 14-45

POWER AND THERMAL MANAGEMENT

14-46 Vol. 3B

CHAPTER 15
MACHINE-CHECK ARCHITECTURE
This chapter describes the machine-check architecture and machine-check exception mechanism found in the
Pentium 4, Intel Xeon, Intel Atom, and P6 family processors. See Chapter 6, “Interrupt 18—Machine-Check Exception (#MC),” for more information on machine-check exceptions. A brief description of the Pentium processor’s
machine check capability is also given.
Additionally, a signaling mechanism for software to respond to hardware corrected machine check error is covered.

15.1

MACHINE-CHECK ARCHITECTURE

The Pentium 4, Intel Xeon, Intel Atom, and P6 family processors implement a machine-check architecture that
provides a mechanism for detecting and reporting hardware (machine) errors, such as: system bus errors, ECC
errors, parity errors, cache errors, and TLB errors. It consists of a set of model-specific registers (MSRs) that are
used to set up machine checking and additional banks of MSRs used for recording errors that are detected.
The processor signals the detection of an uncorrected machine-check error by generating a machine-check exception (#MC), which is an abort class exception. The implementation of the machine-check architecture does not
ordinarily permit the processor to be restarted reliably after generating a machine-check exception. However, the
machine-check-exception handler can collect information about the machine-check error from the machine-check
MSRs.
Starting with 45 nm Intel 64 processor on which CPUID reports DisplayFamily_DisplayModel as 06H_1AH (see
CPUID instruction in Chapter 3, “Instruction Set Reference, A-L” in the Intel® 64 and IA-32 Architectures Software
Developer’s Manual, Volume 2A), the processor can report information on corrected machine-check errors and
deliver a programmable interrupt for software to respond to MC errors, referred to as corrected machine-check
error interrupt (CMCI). See Section 15.5 for detail.
Intel 64 processors supporting machine-check architecture and CMCI may also support an additional enhancement, namely, support for software recovery from certain uncorrected recoverable machine check errors. See
Section 15.6 for detail.

15.2

COMPATIBILITY WITH PENTIUM PROCESSOR

The Pentium 4, Intel Xeon, Intel Atom, and P6 family processors support and extend the machine-check exception
mechanism introduced in the Pentium processor. The Pentium processor reports the following machine-check
errors:

•
•

data parity errors during read cycles
unsuccessful completion of a bus cycle

The above errors are reported using the P5_MC_TYPE and P5_MC_ADDR MSRs (implementation specific for the
Pentium processor). Use the RDMSR instruction to read these MSRs. See Chapter 2, “Model-Specific Registers
(MSRs)” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 4 for the addresses.
The machine-check error reporting mechanism that Pentium processors use is similar to that used in Pentium 4,
Intel Xeon, Intel Atom, and P6 family processors. When an error is detected, it is recorded in P5_MC_TYPE and
P5_MC_ADDR; the processor then generates a machine-check exception (#MC).
See Section 15.3.3, “Mapping of the Pentium Processor Machine-Check Errors to the Machine-Check Architecture,”
and Section 15.10.2, “Pentium Processor Machine-Check Exception Handling,” for information on compatibility
between machine-check code written to run on the Pentium processors and code written to run on P6 family
processors.

Vol. 3B 15-1

MACHINE-CHECK ARCHITECTURE

15.3

MACHINE-CHECK MSRS

Machine check MSRs in the Pentium 4, Intel Atom, Intel Xeon, and P6 family processors consist of a set of global
control and status registers and several error-reporting register banks. See Figure 15-1.

Error-Reporting Bank Registers
(One Set for Each Hardware Unit)

Global Control MSRs
63

IA32_MCG_CAP MSR
63

IA32_MCG_STATUS MSR
63

IA32_MCG_CTL MSR

0
IA32_MCi_ADDR MSR

0
IA32_MCi_STATUS MSR

0
IA32_MCi_CTL MSR

IA32_MCG_EXT_CTL MSR

0
IA32_MCi_MISC MSR
0

63
IA32_MCi_CTL2 MSR

Figure 15-1. Machine-Check MSRs
Each error-reporting bank is associated with a specific hardware unit (or group of hardware units) in the processor.
Use RDMSR and WRMSR to read and to write these registers.

15.3.1

Machine-Check Global Control MSRs

The machine-check global control MSRs include the IA32_MCG_CAP, IA32_MCG_STATUS, and optionally
IA32_MCG_CTL and IA32_MCG_EXT_CTL. See Chapter 2, “Model-Specific Registers (MSRs)” in the Intel® 64 and
IA-32 Architectures Software Developer’s Manual, Volume 4 for the addresses of these registers.

15.3.1.1

IA32_MCG_CAP MSR

The IA32_MCG_CAP MSR is a read-only register that provides information about the machine-check architecture of
the processor. Figure 15-2 shows the layout of the register.

15-2 Vol. 3B

MACHINE-CHECK ARCHITECTURE

27 26

25 24 23

16 15

12 11 10 9 8 7

Count

Reserved
MCG_LMCE_P[27]
MCG_ELOG_P[26]
MCG_EMC_P[25]
MCG_SER_P[24]
MCG_EXT_CNT[23:16]
MCG_TES_P[11]
MCG_CMCI_P[10]
MCG_EXT_P[9]
MCG_CTL_P[8]

Figure 15-2. IA32_MCG_CAP Register
Where:

•

Count field, bits 7:0 — Indicates the number of hardware unit error-reporting banks available in a particular
processor implementation.

•

MCG_CTL_P (control MSR present) flag, bit 8 — Indicates that the processor implements the
IA32_MCG_CTL MSR when set; this register is absent when clear.

•

MCG_EXT_P (extended MSRs present) flag, bit 9 — Indicates that the processor implements the extended
machine-check state registers found starting at MSR address 180H; these registers are absent when clear.

•

MCG_CMCI_P (Corrected MC error counting/signaling extension present) flag, bit 10 — Indicates
(when set) that extended state and associated MSRs necessary to support the reporting of an interrupt on a
corrected MC error event and/or count threshold of corrected MC errors, is present. When this bit is set, it does
not imply this feature is supported across all banks. Software should check the availability of the necessary
logic on a bank by bank basis when using this signaling capability (i.e. bit 30 settable in individual
IA32_MCi_CTL2 register).

•

MCG_TES_P (threshold-based error status present) flag, bit 11 — Indicates (when set) that bits 56:53
of the IA32_MCi_STATUS MSR are part of the architectural space. Bits 56:55 are reserved, and bits 54:53 are
used to report threshold-based error status. Note that when MCG_TES_P is not set, bits 56:53 of the
IA32_MCi_STATUS MSR are model-specific.

•

MCG_EXT_CNT, bits 23:16 — Indicates the number of extended machine-check state registers present. This
field is meaningful only when the MCG_EXT_P flag is set.

•

MCG_SER_P (software error recovery support present) flag, bit 24 — Indicates (when set) that the
processor supports software error recovery (see Section 15.6), and IA32_MCi_STATUS MSR bits 56:55 are
used to report the signaling of uncorrected recoverable errors and whether software must take recovery
actions for uncorrected errors. Note that when MCG_TES_P is not set, bits 56:53 of the IA32_MCi_STATUS MSR
are model-specific. If MCG_TES_P is set but MCG_SER_P is not set, bits 56:55 are reserved.

•

MCG_EMC_P (Enhanced Machine Check Capability) flag, bit 25 — Indicates (when set) that the
processor supports enhanced machine check capabilities for firmware first signaling.

•

MCG_ELOG_P (extended error logging) flag, bit 26 — Indicates (when set) that the processor allows
platform firmware to be invoked when an error is detected so that it may provide additional platform specific
information in an ACPI format “Generic Error Data Entry” that augments the data included in machine check
bank registers.
For additional information about extended error logging interface, see
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/enhanced-mca-loggingxeon-paper.pdf.

Vol. 3B 15-3

MACHINE-CHECK ARCHITECTURE

•

MCG_LMCE_P (local machine check exception) flag, bit 27 — Indicates (when set) that the following
interfaces are present:
— an extended state LMCE_S (located in bit 3 of IA32_MCG_STATUS), and
— the IA32_MCG_EXT_CTL MSR, necessary to support Local Machine Check Exception (LMCE).
A non-zero MCG_LMCE_P indicates that, when LMCE is enabled as described in Section 15.3.1.5, some machine
check errors may be delivered to only a single logical processor.

The effect of writing to the IA32_MCG_CAP MSR is undefined.

15.3.1.2

IA32_MCG_STATUS MSR

The IA32_MCG_STATUS MSR describes the current state of the processor after a machine-check exception has
occurred (see Figure 15-3).

3 2 1 0

Reserved

M
C
I
P

E R
I I
P P
V V

LMCE_S—Local machine check exception signaled
MCIP—Machine check in progress flag
EIPV—Error IP valid flag
RIPV—Restart IP valid flag

Figure 15-3. IA32_MCG_STATUS Register
Where:

•

RIPV (restart IP valid) flag, bit 0 — Indicates (when set) that program execution can be restarted reliably
at the instruction pointed to by the instruction pointer pushed on the stack when the machine-check exception
is generated. When clear, the program cannot be reliably restarted at the pushed instruction pointer.

•

EIPV (error IP valid) flag, bit 1 — Indicates (when set) that the instruction pointed to by the instruction
pointer pushed onto the stack when the machine-check exception is generated is directly associated with the
error. When this flag is cleared, the instruction pointed to may not be associated with the error.

•

MCIP (machine check in progress) flag, bit 2 — Indicates (when set) that a machine-check exception was
generated. Software can set or clear this flag. The occurrence of a second Machine-Check Event while MCIP is
set will cause the processor to enter a shutdown state. For information on processor behavior in the shutdown
state, please refer to the description in Chapter 6, “Interrupt and Exception Handling”: “Interrupt 8—Double
Fault Exception (#DF)”.

•

LMCE_S (local machine check exception signaled), bit 3 — Indicates (when set) that a local machinecheck exception was generated. This indicates that the current machine-check event was delivered to only this
logical processor.

Bits 63:04 in IA32_MCG_STATUS are reserved. An attempt to write to IA32_MCG_STATUS with any value other
than 0 would result in #GP.

15.3.1.3

IA32_MCG_CTL MSR

The IA32_MCG_CTL MSR is present if the capability flag MCG_CTL_P is set in the IA32_MCG_CAP MSR.
IA32_MCG_CTL controls the reporting of machine-check exceptions. If present, writing 1s to this register enables
machine-check features and writing all 0s disables machine-check features. All other values are undefined and/or
implementation specific.

15-4 Vol. 3B

MACHINE-CHECK ARCHITECTURE

15.3.1.4

IA32_MCG_EXT_CTL MSR

The IA32_MCG_EXT_CTL MSR is present if the capability flag MCG_LMCE_P is set in the IA32_MCG_CAP MSR.
IA32_MCG_EXT_CTL.LMCE_EN (bit 0) allows the processor to signal some MCEs to only a single logical processor
in the system.
If MCG_LMCE_P is not set in IA32_MCG_CAP, or platform software has not enabled LMCE by setting
IA32_FEATURE_CONTROL.LMCE_ON (bit 20), any attempt to write or read IA32_MCG_EXT_CTL will result in #GP.
The IA32_MCG_EXT_CTL MSR is cleared on RESET.
Figure 15-4 shows the layout of the IA32_MCG_EXT_CTL register

1 0

Reserved

LMCE_EN - system software control to enable/disable LMCE

Figure 15-4. IA32_MCG_EXT_CTL Register
where

•

LMCE_EN (local machine check exception enable) flag, bit 0 - System software sets this to allow
hardware to signal some MCEs to only a single logical processor. System software can set LMCE_EN only if the
platform software has configured IA32_FEATURE_CONTROL as described in Section 15.3.1.5.

15.3.1.5

Enabling Local Machine Check

The intended usage of LMCE requires proper configuration by both platform software and system software. Platform software can turn LMCE on by setting bit 20 (LMCE_ON) in IA32_FEATURE_CONTROL MSR (MSR address
3AH).
System software must ensure that both IA32_FEATURE_CONTROL.Lock (bit 0)and
IA32_FEATURE_CONTROL.LMCE_ON (bit 20) are set before attempting to set IA32_MCG_EXT_CTL.LMCE_EN (bit
0). When system software has enabled LMCE, then hardware will determine if a particular error can be delivered
only to a single logical processor. Software should make no assumptions about the type of error that hardware can
choose to deliver as LMCE. The severity and override rules stay the same as described in Table 15-8 to determine
the recovery actions.

15.3.2

Error-Reporting Register Banks

Each error-reporting register bank can contain the IA32_MCi_CTL, IA32_MCi_STATUS, IA32_MCi_ADDR, and
IA32_MCi_MISC MSRs. The number of reporting banks is indicated by bits [7:0] of IA32_MCG_CAP MSR (address
0179H). The first error-reporting register (IA32_MC0_CTL) always starts at address 400H.
See Chapter 2, “Model-Specific Registers (MSRs)” in the Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Volume 4 for addresses of the error-reporting registers in the Pentium 4, Intel Atom, and Intel Xeon
processors; and for addresses of the error-reporting registers P6 family processors.

15.3.2.1

IA32_MCi_CTL MSRs

The IA32_MCi_CTL MSR controls signaling of #MC for errors produced by a particular hardware unit (or group of
hardware units). Each of the 64 flags (EEj) represents a potential error. Setting an EEj flag enables signaling #MC
of the associated error and clearing it disables signaling of the error. Error logging happens regardless of the setting
of these bits. The processor drops writes to bits that are not implemented. Figure 15-5 shows the bit fields of
IA32_MCi_CTL.
Vol. 3B 15-5

MACHINE-CHECK ARCHITECTURE

63 62 61
E
E
6
3

E
E
6
2

.....

E
E
6
1

3 2 1 0
E
E
0
2

E E
E E
0 0
1 0

EEj—Error reporting enable flag
(where j is 00 through 63)

Figure 15-5. IA32_MCi_CTL Register

NOTE
For P6 family processors, processors based on Intel Core microarchitecture (excluding those on
which on which CPUID reports DisplayFamily_DisplayModel as 06H_1AH and onward): the
operating system or executive software must not modify the contents of the IA32_MC0_CTL MSR.
This MSR is internally aliased to the EBL_CR_POWERON MSR and controls platform-specific error
handling features. System specific firmware (the BIOS) is responsible for the appropriate initialization of the IA32_MC0_CTL MSR. P6 family processors only allow the writing of all 1s or all 0s to
the IA32_MCi_CTL MSR.

15.3.2.2

IA32_MCi_STATUS MSRS

Each IA32_MCi_STATUS MSR contains information related to a machine-check error if its VAL (valid) flag is set (see
Figure 15-6). Software is responsible for clearing IA32_MCi_STATUS MSRs by explicitly writing 0s to them; writing
1s to them causes a general-protection exception.

NOTE
Figure 15-6 depicts the IA32_MCi_STATUS MSR when IA32_MCG_CAP[24] = 1,
IA32_MCG_CAP[11] = 1 and IA32_MCG_CAP[10] = 1. When IA32_MCG_CAP[24] = 0 and
IA32_MCG_CAP[11] = 1, bits 56:55 is reserved and bits 54:53 for threshold-based error reporting.
When IA32_MCG_CAP[11] = 0, bits 56:53 are part of the “Other Information” field. The use of bits
54:53 for threshold-based error reporting began with Intel Core Duo processors, and is currently
used for cache memory. See Section 15.4, “Enhanced Cache Error reporting,” for more information.
When IA32_MCG_CAP[10] = 0, bits 52:38 are part of the “Other Information” field. The use of bits
52:38 for corrected MC error count is introduced with Intel 64 processor on which CPUID reports
DisplayFamily_DisplayModel as 06H_1AH.
Where:

•

MCA (machine-check architecture) error code field, bits 15:0 — Specifies the machine-check architecture-defined error code for the machine-check error condition detected. The machine-check architecturedefined error codes are guaranteed to be the same for all IA-32 processors that implement the machine-check
architecture. See Section 15.9, “Interpreting the MCA Error Codes,” and Chapter 16, “Interpreting MachineCheck Error Codes”, for information on machine-check error codes.

•

Model-specific error code field, bits 31:16 — Specifies the model-specific error code that uniquely
identifies the machine-check error condition detected. The model-specific error codes may differ among IA-32
processors for the same machine-check error condition. See Chapter 16, “Interpreting Machine-Check Error
Codes”for information on model-specific error codes.

•

Reserved, Error Status, and Other Information fields, bits 56:32 —

•

If IA32_MCG_CAP.MCG_EMC_P[bit 25] is 0, bits 37:32 contain “Other Information” that is implementation-specific and is not part of the machine-check architecture.

•

If IA32_MCG_CAP.MCG_EMC_P is 1, “Other Information” is in bits 36:32. If bit 37 is 0, system firmware
has not changed the contents of IA32_MCi_STATUS. If bit 37 is 1, system firmware may have edited the
contents of IA32_MCi_STATUS.

•

If IA32_MCG_CAP.MCG_CMCI_P[bit 10] is 0, bits 52:38 also contain “Other Information” (in the same
sense as bits 37:32).

15-6 Vol. 3B

MACHINE-CHECK ARCHITECTURE

63 62 61 60 59 58 57 56 55 54 53 52
V O U E
A V C N
L E
R

P S A
C
R
C

Corrected Error
Count

38 37 36

32 31

Other
Info

16 15

MSCOD Model
Specific Error Code

MCA Error Code

Firmware updated error status indicator (37)*
Threshold-based error status (54:53)**
AR — Recovery action required for UCR error (55)***
S — Signaling an uncorrected recoverable (UCR) error (56)***
PCC — Processor context corrupted (57)
ADDRV — MCi_ADDR register valid (58)
MISCV — MCi_MISC register valid (59)
EN — Error reporting enabled (60)
UC — Uncorrected error (61)
OVER — Error overflow (62)
VAL — MCi_STATUS register valid (63)
* When IA32_MCG_CAP[25] (MCG_EMC_P) is set, bit 37 is not part of “Other Information”.
** When IA32_MCG_CAP[11] (MCG_TES_P) is not set, these bits are model-specific
(part of “Other Information”).
*** When IA32_MCG_CAP[11] or IA32_MCG_CAP[24] are not set, these bits are reserved, or
model-specific (part of “Other Information”).

Figure 15-6. IA32_MCi_STATUS Register

•

If IA32_MCG_CAP[10] is 1, bits 52:38 are architectural (not model-specific). In this case, bits 52:38
reports the value of a 15 bit counter that increments each time a corrected error is observed by the MCA
recording bank. This count value will continue to increment until cleared by software. The most
significant bit, 52, is a sticky count overflow bit.

•
•

If IA32_MCG_CAP[11] is 0, bits 56:53 also contain “Other Information” (in the same sense).
If IA32_MCG_CAP[11] is 1, bits 56:53 are architectural (not model-specific). In this case, bits 56:53
have the following functionality:

•
•
•

If IA32_MCG_CAP[24] is 0, bits 56:55 are reserved.
If IA32_MCG_CAP[24] is 1, bits 56:55 are defined as follows:
S (Signaling) flag, bit 56 - Signals the reporting of UCR errors in this MC bank. See Section 15.6.2
for additional detail.

•

AR (Action Required) flag, bit 55 - Indicates (when set) that MCA error code specific recovery
action must be performed by system software at the time this error was signaled. See Section
15.6.2 for additional detail.

•
•

If the UC bit (Figure 15-6) is 1, bits 54:53 are undefined.
If the UC bit (Figure 15-6) is 0, bits 54:53 indicate the status of the hardware structure that
reported the threshold-based error. See Table 15-1.

Table 15-1. Bits 54:53 in IA32_MCi_STATUS MSRs when IA32_MCG_CAP[11] = 1 and UC = 0
Bits 54:53

Meaning

No tracking - No hardware status tracking is provided for the structure reporting this event.

Green - Status tracking is provided for the structure posting the event; the current status is green (below threshold).
For more information, see Section 15.4, “Enhanced Cache Error reporting”.

Yellow - Status tracking is provided for the structure posting the event; the current status is yellow (above threshold).
For more information, see Section 15.4, “Enhanced Cache Error reporting”.

Reserved

Vol. 3B 15-7

MACHINE-CHECK ARCHITECTURE

•

PCC (processor context corrupt) flag, bit 57 — Indicates (when set) that the state of the processor might
have been corrupted by the error condition detected and that reliable restarting of the processor may not be
possible. When clear, this flag indicates that the error did not affect the processor’s state, and software may be
able to restart. When system software supports recovery, consult Section 15.10.4, “Machine-Check Software
Handler Guidelines for Error Recovery” for additional rules that apply.

•

ADDRV (IA32_MCi_ADDR register valid) flag, bit 58 — Indicates (when set) that the IA32_MCi_ADDR
register contains the address where the error occurred (see Section 15.3.2.3, “IA32_MCi_ADDR MSRs”). When
clear, this flag indicates that the IA32_MCi_ADDR register is either not implemented or does not contain the
address where the error occurred. Do not read these registers if they are not implemented in the processor.

•

MISCV (IA32_MCi_MISC register valid) flag, bit 59 — Indicates (when set) that the IA32_MCi_MISC
register contains additional information regarding the error. When clear, this flag indicates that the
IA32_MCi_MISC register is either not implemented or does not contain additional information regarding the
error. Do not read these registers if they are not implemented in the processor.

•

EN (error enabled) flag, bit 60 — Indicates (when set) that the error was enabled by the associated EEj bit
of the IA32_MCi_CTL register.

•

UC (error uncorrected) flag, bit 61 — Indicates (when set) that the processor did not or was not able to
correct the error condition. When clear, this flag indicates that the processor was able to correct the error
condition.

•

OVER (machine check overflow) flag, bit 62 — Indicates (when set) that a machine-check error occurred
while the results of a previous error were still in the error-reporting register bank (that is, the VAL bit was
already set in the IA32_MCi_STATUS register). The processor sets the OVER flag and software is responsible for
clearing it. In general, enabled errors are written over disabled errors, and uncorrected errors are written over
corrected errors. Uncorrected errors are not written over previous valid uncorrected errors. When
MCG_CMCI_P is set, corrected errors may not set the OVER flag. Software can rely on corrected error count in
IA32_MCi_Status[52:38] to determine if any additional corrected errors may have occurred. For more information, see Section 15.3.2.2.1, “Overwrite Rules for Machine Check Overflow”.

•

VAL (IA32_MCi_STATUS register valid) flag, bit 63 — Indicates (when set) that the information within the
IA32_MCi_STATUS register is valid. When this flag is set, the processor follows the rules given for the OVER flag
in the IA32_MCi_STATUS register when overwriting previously valid entries. The processor sets the VAL flag
and software is responsible for clearing it.

15.3.2.2.1 Overwrite Rules for Machine Check Overflow
Table 15-2 shows the overwrite rules for how to treat a second event if the cache has already posted an event to
the MC bank – that is, what to do if the valid bit for an MC bank already is set to 1. When more than one structure
posts events in a given bank, these rules specify whether a new event will overwrite a previous posting or not.
These rules define a priority for uncorrected (highest priority), yellow, and green/unmonitored (lowest priority)
status.
In Table 15-2, the values in the two left-most columns are IA32_MCi_STATUS[54:53].

Table 15-2. Overwrite Rules for Enabled Errors

First Event

Second Event

UC bit

Color

MCA Info

00/green

either

00/green

yellow

second error

yellow

00/green

yellow

first error

yellow

either

00/green/yellow

undefined

second

00/green/yellow

undefined

first

If a second event overwrites a previously posted event, the information (as guarded by individual valid bits) in the
MCi bank is entirely from the second event. Similarly, if a first event is retained, all of the information previously
posted for that event is retained. In general, when the logged error or the recent error is a corrected error, the
OVER bit (MCi_Status[62]) may be set to indicate an overflow. When MCG_CMCI_P is set in IA32_MCG_CAP,
system software should consult IA32_MCi_STATUS[52:38] to determine if additional corrected errors may have

15-8 Vol. 3B

MACHINE-CHECK ARCHITECTURE

occurred. Software may re-read IA32_MCi_STATUS, IA32_MCi_ADDR and IA32_MCi_MISC appropriately to ensure
data collected represent the last error logged.
After software polls a posting and clears the register, the valid bit is no longer set and therefore the meaning of the
rest of the bits, including the yellow/green/00 status field in bits 54:53, is undefined. The yellow/green indication
will only be posted for events associated with monitored structures – otherwise the unmonitored (00) code will be
posted in IA32_MCi_STATUS[54:53].

15.3.2.3

IA32_MCi_ADDR MSRs

The IA32_MCi_ADDR MSR contains the address of the code or data memory location that produced the machinecheck error if the ADDRV flag in the IA32_MCi_STATUS register is set (see Section 15-7, “IA32_MCi_ADDR MSR”).
The IA32_MCi_ADDR register is either not implemented or contains no address if the ADDRV flag in the
IA32_MCi_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will
cause a general protection exception.
The address returned is an offset into a segment, linear address, or physical address. This depends on the error
encountered. When these registers are implemented, these registers can be cleared by explicitly writing 0s to
these registers. Writing 1s to these registers will cause a general-protection exception. See Figure 15-7.

Processor Without Support For Intel 64 Architecture
63

36 35

Address

Reserved

Processor With Support for Intel 64 Architecture
63

Address*

* Useful bits in this field depend on the address methodology in use when the
the register state is saved.

Figure 15-7. IA32_MCi_ADDR MSR

15.3.2.4

IA32_MCi_MISC MSRs

The IA32_MCi_MISC MSR contains additional information describing the machine-check error if the MISCV flag in
the IA32_MCi_STATUS register is set. The IA32_MCi_MISC_MSR is either not implemented or does not contain
additional information if the MISCV flag in the IA32_MCi_STATUS register is clear.
When not implemented in the processor, all reads and writes to this MSR will cause a general protection exception.
When implemented in a processor, these registers can be cleared by explicitly writing all 0s to them; writing 1s to
them causes a general-protection exception to be generated. This register is not implemented in any of the errorreporting register banks for the P6 or Intel Atom family processors.
If both MISCV and IA32_MCG_CAP[24] are set, the IA32_MCi_MISC_MSR is defined according to Figure 15-8 to
support software recovery of uncorrected errors (see Section 15.6).

Vol. 3B 15-9

MACHINE-CHECK ARCHITECTURE

9 8

6 5

Model Specific Information

Address Mode
Recoverable Address LSB

Figure 15-8. UCR Support in IA32_MCi_MISC Register

•

Recoverable Address LSB (bits 5:0): The lowest valid recoverable address bit. Indicates the position of the least
significant bit (LSB) of the recoverable error address. For example, if the processor logs bits [43:9] of the
address, the LSB sub-field in IA32_MCi_MISC is 01001b (9 decimal). For this example, bits [8:0] of the
recoverable error address in IA32_MCi_ADDR should be ignored.

•

Address Mode (bits 8:6): Address mode for the address logged in IA32_MCi_ADDR. The supported address
modes are given in Table 15-3.

Table 15-3. Address Mode in IA32_MCi_MISC[8:6]
IA32_MCi_MISC[8:6] Encoding
000

Segment Offset

001

Linear Address

010

Physical Address

011

Memory Address

100 to 110
111

•

Definition

Reserved
Generic

Model Specific Information (bits 63:9): Not architecturally defined.

15.3.2.4.2 IOMCA
Logging and Signaling of errors from PCI Express domain is governed by PCI Express Advanced Error Reporting
(AER) architecture. PCI Express architecture divides errors in two categories: Uncorrectable errors and Correctable
errors. Uncorrectable errors can further be classified as Fatal or Non-Fatal. Uncorrected IO errors are signaled to
the system software either as AER Message Signaled Interrupt (MSI) or via platform specific mechanisms such as
NMI. Generally, the signaling mechanism is controlled by BIOS and/or platform firmware. Certain processors
support an error handling mode, called IOMCA mode, where Uncorrected PCI Express errors are signaled in the
form of machine check exception and logged in machine check banks.
When a processor is in this mode, Uncorrected PCI Express errors are logged in the MCACOD field of the
IA32_MCi_STATUS register as Generic I/O error. The corresponding MCA error code is defined in Table 15-8.
IA32_MCi_Status [15:0] Simple Error Code Encoding. Machine check logging complements and does not replace
AER logging that occurs inside the PCI Express hierarchy. The PCI Express Root Complex and Endpoints continue to
log the error in accordance with PCI Express AER mechanism. In IOMCA mode, MCi_MISC register in the bank that
logged IOMCA can optionally contain information that link the Machine Check logs with the AER logs or proprietary
logs. In such a scenario, the machine check handler can utilize the contents of MCi_MISC to locate the next level of
error logs corresponding to the same error. Specifically, if MCi_Status.MISCV is 1 and MCACOD is 0x0E0B,
MCi_MISC contains the PCI Express address of the Root Complex device containing the AER Logs. Software can
consult the header type and class code registers in the Root Complex device's PCIe Configuration space to determine what type of device it is. This Root Complex device can either be a PCI Express Root Port, PCI Express Root
Complex Event Collector or a proprietary device.

15-10 Vol. 3B

MACHINE-CHECK ARCHITECTURE

Errors that originate from PCI Express or Legacy Endpoints are logged in the corresponding Root Port in addition to
the generating device. If MISCV=1 and MCi_MISC contains the address of the Root Port or a Root Complex Event
collector, software can parse the AER logs to learn more about the error.
If MISCV=1 and MCi_MISC points to a device that is neither a Root Complex Event Collector not a Root Port, software must consult the Vendor ID/Device ID and use device specific knowledge to locate and interpret the error log
registers. In some cases, the Root Complex device configuration space may not be accessible to the software and
both the Vendor and Device ID read as 0xFFFF.

•

The format of MCi_MISC for IOMCA errors is shown in Table 15-4.

Table 15-4. Address Mode in IA32_MCi_MISC[8:6]
63:40
RSVD

39:32
PCI Express Segment
number

31:16

15:9

PCI Express
Requestor ID

8:6

RSVD

ADDR MODE

5:0
1

RECOV ADDR LSB1

NOTES:
1. Not Applicable if ADDRV=0.
Refer to PCI Express Specification 3.0 for definition of PCI Express Requestor ID and AER architecture. Refer to PCI
Firmware Specification 3.0 for an explanation of PCI Ex-press Segment number and how software can access
configuration space of a PCI Ex-press device given the segment number and Requestor ID.

15.3.2.5

IA32_MCi_CTL2 MSRs

The IA32_MCi_CTL2 MSR provides the programming interface to use corrected MC error signaling capability that is
indicated by IA32_MCG_CAP[10] = 1. Software must check for the presence of IA32_MCi_CTL2 on a per-bank
basis.
When IA32_MCG_CAP[10] = 1, the IA32_MCi_CTL2 MSR for each bank exists, i.e. reads and writes to these MSR
are supported. However, signaling interface for corrected MC errors may not be supported in all banks.
The layout of IA32_MCi_CTL2 is shown in Figure 15-9:

31 30 29

15 14

Reserved

CMCI_EN—Enable/disable CMCI
Corrected error count threshold

Figure 15-9. IA32_MCi_CTL2 Register

•

Corrected error count threshold, bits 14:0 — Software must initialize this field. The value is compared with
the corrected error count field in IA32_MCi_STATUS, bits 38 through 52. An overflow event is signaled to the
CMCI LVT entry (see Table 10-1) in the APIC when the count value equals the threshold value. The new LVT
entry in the APIC is at 02F0H offset from the APIC_BASE. If CMCI interface is not supported for a particular
bank (but IA32_MCG_CAP[10] = 1), this field will always read 0.

•

CMCI_EN (Corrected error interrupt enable/disable/indicator), bits 30 — Software sets this bit to
enable the generation of corrected machine-check error interrupt (CMCI). If CMCI interface is not supported for
a particular bank (but IA32_MCG_CAP[10] = 1), this bit is writeable but will always return 0 for that bank. This
bit also indicates CMCI is supported or not supported in the corresponding bank. See Section 15.5 for details of
software detection of CMCI facility.

Vol. 3B 15-11

MACHINE-CHECK ARCHITECTURE

Some microarchitectural sub-systems that are the source of corrected MC errors may be shared by more than one
logical processors. Consequently, the facilities for reporting MC errors and controlling mechanisms may be shared
by more than one logical processors. For example, the IA32_MCi_CTL2 MSR is shared between logical processors
sharing a processor core. Software is responsible to program IA32_MCi_CTL2 MSR in a consistent manner with
CMCI delivery and usage.
After processor reset, IA32_MCi_CTL2 MSRs are zero’ed.

15.3.2.6

IA32_MCG Extended Machine Check State MSRs

The Pentium 4 and Intel Xeon processors implement a variable number of extended machine-check state MSRs.
The MCG_EXT_P flag in the IA32_MCG_CAP MSR indicates the presence of these extended registers, and the
MCG_EXT_CNT field indicates the number of these registers actually implemented. See Section 15.3.1.1,
“IA32_MCG_CAP MSR.” Also see Table 15-5.

Table 15-5. Extended Machine Check State MSRs
in Processors Without Support for Intel 64 Architecture
MSR

Address

Description

IA32_MCG_EAX

180H

Contains state of the EAX register at the time of the machine-check error.

IA32_MCG_EBX

181H

Contains state of the EBX register at the time of the machine-check error.

IA32_MCG_ECX

182H

Contains state of the ECX register at the time of the machine-check error.

IA32_MCG_EDX

183H

Contains state of the EDX register at the time of the machine-check error.

IA32_MCG_ESI

184H

Contains state of the ESI register at the time of the machine-check error.

IA32_MCG_EDI

185H

Contains state of the EDI register at the time of the machine-check error.

IA32_MCG_EBP

186H

Contains state of the EBP register at the time of the machine-check error.

IA32_MCG_ESP

187H

Contains state of the ESP register at the time of the machine-check error.

IA32_MCG_EFLAGS

188H

Contains state of the EFLAGS register at the time of the machine-check error.

IA32_MCG_EIP

189H

Contains state of the EIP register at the time of the machine-check error.

IA32_MCG_MISC

18AH

When set, indicates that a page assist or page fault occurred during DS normal
operation.

In processors with support for Intel 64 architecture, 64-bit machine check state MSRs are aliased to the legacy
MSRs. In addition, there may be registers beyond IA32_MCG_MISC. These may include up to five reserved MSRs
(IA32_MCG_RESERVED[1:5]) and save-state MSRs for registers introduced in 64-bit mode. See Table 15-6.

Table 15-6. Extended Machine Check State MSRs
In Processors With Support For Intel 64 Architecture
MSR

Address

Description

IA32_MCG_RAX

180H

Contains state of the RAX register at the time of the machine-check error.

IA32_MCG_RBX

181H

Contains state of the RBX register at the time of the machine-check error.

IA32_MCG_RCX

182H

Contains state of the RCX register at the time of the machine-check error.

IA32_MCG_RDX

183H

Contains state of the RDX register at the time of the machine-check error.

IA32_MCG_RSI

184H

Contains state of the RSI register at the time of the machine-check error.

IA32_MCG_RDI

185H

Contains state of the RDI register at the time of the machine-check error.

IA32_MCG_RBP

186H

Contains state of the RBP register at the time of the machine-check error.

IA32_MCG_RSP

187H

Contains state of the RSP register at the time of the machine-check error.

IA32_MCG_RFLAGS

188H

Contains state of the RFLAGS register at the time of the machine-check error.

IA32_MCG_RIP

189H

Contains state of the RIP register at the time of the machine-check error.

15-12 Vol. 3B

MACHINE-CHECK ARCHITECTURE

Table 15-6. Extended Machine Check State MSRs
In Processors With Support For Intel 64 Architecture (Contd.)
MSR

Address

Description

IA32_MCG_MISC

18AH

When set, indicates that a page assist or page fault occurred during DS normal
operation.

IA32_MCG_
RSERVED[1:5]

18BH18FH

These registers, if present, are reserved.

IA32_MCG_R8

190H

Contains state of the R8 register at the time of the machine-check error.

IA32_MCG_R9

191H

Contains state of the R9 register at the time of the machine-check error.

IA32_MCG_R10

192H

Contains state of the R10 register at the time of the machine-check error.

IA32_MCG_R11

193H

Contains state of the R11 register at the time of the machine-check error.

IA32_MCG_R12

194H

Contains state of the R12 register at the time of the machine-check error.

IA32_MCG_R13

195H

Contains state of the R13 register at the time of the machine-check error.

IA32_MCG_R14

196H

Contains state of the R14 register at the time of the machine-check error.

IA32_MCG_R15

197H

Contains state of the R15 register at the time of the machine-check error.

When a machine-check error is detected on a Pentium 4 or Intel Xeon processor, the processor saves the state of
the general-purpose registers, the R/EFLAGS register, and the R/EIP in these extended machine-check state MSRs.
This information can be used by a debugger to analyze the error.
These registers are read/write to zero registers. This means software can read them; but if software writes to
them, only all zeros is allowed. If software attempts to write a non-zero value into one of these registers, a generalprotection (#GP) exception is generated. These registers are cleared on a hardware reset (power-up or RESET),
but maintain their contents following a soft reset (INIT reset).

15.3.3

Mapping of the Pentium Processor Machine-Check Errors
to the Machine-Check Architecture

The Pentium processor reports machine-check errors using two registers: P5_MC_TYPE and P5_MC_ADDR. The
Pentium 4, Intel Xeon, Intel Atom, and P6 family processors map these registers to the IA32_MCi_STATUS and
IA32_MCi_ADDR in the error-reporting register bank. This bank reports on the same type of external bus errors
reported in P5_MC_TYPE and P5_MC_ADDR.
The information in these registers can then be accessed in two ways:

•

By reading the IA32_MCi_STATUS and IA32_MCi_ADDR registers as part of a general machine-check exception
handler written for Pentium 4, Intel Atom and P6 family processors.

•

By reading the P5_MC_TYPE and P5_MC_ADDR registers using the RDMSR instruction.

The second capability permits a machine-check exception handler written to run on a Pentium processor to be run
on a Pentium 4, Intel Xeon, Intel Atom, or P6 family processor. There is a limitation in that information returned by
the Pentium 4, Intel Xeon, Intel Atom, and P6 family processors is encoded differently than information returned
by the Pentium processor. To run a Pentium processor machine-check exception handler on a Pentium 4, Intel
Xeon, Intel Atom, or P6 family processor; the handler must be written to interpret P5_MC_TYPE encodings
correctly.

15.4

ENHANCED CACHE ERROR REPORTING

Starting with Intel Core Duo processors, cache error reporting was enhanced. In earlier Intel processors, cache
status was based on the number of correction events that occurred in a cache. In the new paradigm, called
“threshold-based error status”, cache status is based on the number of lines (ECC blocks) in a cache that incur
repeated corrections. The threshold is chosen by Intel, based on various factors. If a processor supports thresholdbased error status, it sets IA32_MCG_CAP[11] (MCG_TES_P) to 1; if not, to 0.

Vol. 3B 15-13

MACHINE-CHECK ARCHITECTURE

A processor that supports enhanced cache error reporting contains hardware that tracks the operating status of
certain caches and provides an indicator of their “health”. The hardware reports a “green” status when the number
of lines that incur repeated corrections is at or below a pre-defined threshold, and a “yellow” status when the
number of affected lines exceeds the threshold. Yellow status means that the cache reporting the event is operating
correctly, but you should schedule the system for servicing within a few weeks.
Intel recommends that you rely on this mechanism for structures supported by threshold-base error reporting.
The CPU/system/platform response to a yellow event should be less severe than its response to an uncorrected
error. An uncorrected error means that a serious error has actually occurred, whereas the yellow condition is a
warning that the number of affected lines has exceeded the threshold but is not, in itself, a serious event: the error
was corrected and system state was not compromised.
The green/yellow status indicator is not a foolproof early warning for an uncorrected error resulting from the failure
of two bits in the same ECC block. Such a failure can occur and cause an uncorrected error before the yellow
threshold is reached. However, the chance of an uncorrected error increases as the number of affected lines
increases.

15.5

CORRECTED MACHINE CHECK ERROR INTERRUPT

Corrected machine-check error interrupt (CMCI) is an architectural enhancement to the machine-check architecture. It provides capabilities beyond those of threshold-based error reporting (Section 15.4). With threshold-based
error reporting, software is limited to use periodic polling to query the status of hardware corrected MC errors.
CMCI provides a signaling mechanism to deliver a local interrupt based on threshold values that software can
program using the IA32_MCi_CTL2 MSRs.
CMCI is disabled by default. System software is required to enable CMCI for each IA32_MCi bank that support the
reporting of hardware corrected errors if IA32_MCG_CAP[10] = 1.
System software use IA32_MCi_CTL2 MSR to enable/disable the CMCI capability for each bank and program
threshold values into IA32_MCi_CTL2 MSR. CMCI is not affected by the CR4.MCE bit, and it is not affected by the
IA32_MCi_CTL MSRs.
To detect the existence of thresholding for a given bank, software writes only bits 14:0 with the threshold value. If
the bits persist, then thresholding is available (and CMCI is available). If the bits are all 0's, then no thresholding
exists. To detect that CMCI signaling exists, software writes a 1 to bit 30 of the MCi_CTL2 register. Upon subsequent
read, if bit 30 = 0, no CMCI is available for this bank and no corrected or UCNA errors will be reported on this bank.
If bit 30 = 1, then CMCI is available and enabled.

15.5.1

CMCI Local APIC Interface

The operation of CMCI is depicted in Figure 15-10.

Software write 1 to enable
31 30 29

Error threshold

?=
53 52
MCi_STATUS

MCi_CTL2

Count overflow threshold -> CMCI LVT in local APIC

38 37

Error count

Figure 15-10. CMCI Behavior

15-14 Vol. 3B

APIC_BASE + 2F0H

MACHINE-CHECK ARCHITECTURE

CMCI interrupt delivery is configured by writing to the LVT CMCI register entry in the local APIC register space at
default address of APIC_BASE + 2F0H. A CMCI interrupt can be delivered to more than one logical processors if
multiple logical processors are affected by the associated MC errors. For example, if a corrected bit error in a cache
shared by two logical processors caused a CMCI, the interrupt will be delivered to both logical processors sharing
that microarchitectural sub-system. Similarly, package level errors may cause CMCI to be delivered to all logical
processors within the package. However, system level errors will not be handled by CMCI.
See Section 10.5.1, “Local Vector Table” for details regarding the LVT CMCI register.

15.5.2

System Software Recommendation for Managing CMCI and Machine Check Resources

System software must enable and manage CMCI, set up interrupt handlers to service CMCI interrupts delivered to
affected logical processors, program CMCI LVT entry, and query machine check banks that are shared by more
than one logical processors.
This section describes techniques system software can implement to manage CMCI initialization, service CMCI
interrupts in a efficient manner to minimize contentions to access shared MSR resources.

15.5.2.1

CMCI Initialization

Although a CMCI interrupt may be delivered to more than one logical processors depending on the nature of the
corrected MC error, only one instance of the interrupt service routine needs to perform the necessary service and
make queries to the machine-check banks. The following steps describes a technique that limits the amount of
work the system has to do in response to a CMCI.

•

To provide maximum flexibility, system software should define per-thread data structure for each logical
processor to allow equal-opportunity and efficient response to interrupt delivery. Specifically, the per-thread
data structure should include a set of per-bank fields to track which machine check bank it needs to access in
response to a delivered CMCI interrupt. The number of banks that needs to be tracked is determined by
IA32_MCG_CAP[7:0].

•

Initialization of per-thread data structure. The initialization of per-thread data structure must be done serially
on each logical processor in the system. The sequencing order to start the per-thread initialization between
different logical processor is arbitrary. But it must observe the following specific detail to satisfy the shared
nature of specific MSR resources:
a. Each thread initializes its data structure to indicate that it does not own any MC bank registers.
b. Each thread examines IA32_MCi_CTL2[30] indicator for each bank to determine if another thread has
already claimed ownership of that bank.

•

If IA32_MCi_CTL2[30] had been set by another thread. This thread can not own bank i and should
proceed to step b. and examine the next machine check bank until all of the machine check banks are
exhausted.

•

If IA32_MCi_CTL2[30] = 0, proceed to step c.

Check whether writing a 1 into IA32_MCi_CTL2[30] can return with 1 on a subsequent read to determine
this bank can support CMCI.

•

If IA32_MCi_CTL2[30] = 0, this bank does not support CMCI. This thread can not own bank i and should
proceed to step b. and examine the next machine check bank until all of the machine check banks are
exhausted.

•

If IA32_MCi_CTL2[30] = 1, modify the per-thread data structure to indicate this thread claims
ownership to the MC bank; proceed to initialize the error threshold count (bits 15:0) of that bank as
described in Chapter 15, “CMCI Threshold Management”. Then proceed to step b. and examine the next
machine check bank until all of the machine check banks are exhausted.

After the thread has examined all of the machine check banks, it sees if it owns any MC banks to service CMCI.
If any bank has been claimed by this thread:
— Ensure that the CMCI interrupt handler has been set up as described in Chapter 15, “CMCI Interrupt
Handler”.
— Initialize the CMCI LVT entry, as described in Section 15.5.1, “CMCI Local APIC Interface”.
Vol. 3B 15-15

MACHINE-CHECK ARCHITECTURE

— Log and clear all of IA32_MCi_Status registers for the banks that this thread owns. This will allow new
errors to be logged.

15.5.2.2

CMCI Threshold Management

The Corrected MC error threshold field, IA32_MCi_CTL2[15:0], is architecturally defined. Specifically, all these bits
are writable by software, but different processor implementations may choose to implement less than 15 bits as
threshold for the overflow comparison with IA32_MCi_STATUS[52:38]. The following describes techniques that
software can manage CMCI threshold to be compatible with changes in implementation characteristics:

•

Software can set the initial threshold value to 1 by writing 1 to IA32_MCi_CTL2[15:0]. This will cause overflow
condition on every corrected MC error and generates a CMCI interrupt.

•

To increase the threshold and reduce the frequency of CMCI servicing:
a. Find the maximum threshold value a given processor implementation supports. The steps are:

•
•

Write 7FFFH to IA32_MCi_CTL2[15:0],
Read back IA32_MCi_CTL2[15:0], the lower 15 bits (14:0) is the maximum threshold supported by the
processor.

b. Increase the threshold to a value below the maximum value discovered using step a.

15.5.2.3

CMCI Interrupt Handler

The following describes techniques system software may consider to implement a CMCI service routine:

•

The service routine examines its private per-thread data structure to check which set of MC banks it has
ownership. If the thread does not have ownership of a given MC bank, proceed to the next MC bank. Ownership
is determined at initialization time which is described in Section [Cross Reference to 14.5.2.1].

If the thread had claimed ownership to an MC bank, this technique will allow each logical processors to handle
corrected MC errors independently and requires no synchronization to access shared MSR resources. Consult
Example 15-5 for guidelines on logging when processing CMCI.

15.6

RECOVERY OF UNCORRECTED RECOVERABLE (UCR) ERRORS

Recovery of uncorrected recoverable machine check errors is an enhancement in machine-check architecture. The
first processor that supports this feature is 45 nm Intel 64 processor on which CPUID reports
DisplayFamily_DisplayModel as 06H_2EH (see CPUID instruction in Chapter 3, “Instruction Set Reference, A-L” in
the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A). This allow system software to
perform recovery action on certain class of uncorrected errors and continue execution.

15.6.1

Detection of Software Error Recovery Support

Software must use bit 24 of IA32_MCG_CAP (MCG_SER_P) to detect the presence of software error recovery
support (see Figure 15-2). When IA32_MCG_CAP[24] is set, this indicates that the processor supports software
error recovery. When this bit is clear, this indicates that there is no support for error recovery from the processor
and the primary responsibility of the machine check handler is logging the machine check error information and
shutting down the system.
The new class of architectural MCA errors from which system software can attempt recovery is called Uncorrected
Recoverable (UCR) Errors. UCR errors are uncorrected errors that have been detected and signaled but have not
corrupted the processor context. For certain UCR errors, this means that once system software has performed a
certain recovery action, it is possible to continue execution on this processor. UCR error reporting provides an error
containment mechanism for data poisoning. The machine check handler will use the error log information from the
error reporting registers to analyze and implement specific error recovery actions for UCR errors.

15-16 Vol. 3B

MACHINE-CHECK ARCHITECTURE

15.6.2

UCR Error Reporting and Logging

IA32_MCi_STATUS MSR is used for reporting UCR errors and existing corrected or uncorrected errors. The definitions of IA32_MCi_STATUS, including bit fields to identify UCR errors, is shown in Figure 15-6. UCR errors can be
signaled through either the corrected machine check interrupt (CMCI) or machine check exception (MCE) path
depending on the type of the UCR error.
When IA32_MCG_CAP[24] is set, a UCR error is indicated by the following bit settings in the IA32_MCi_STATUS
register:

•
•
•

Valid (bit 63) = 1
UC (bit 61) = 1
PCC (bit 57) = 0

Additional information from the IA32_MCi_MISC and the IA32_MCi_ADDR registers for the UCR error are available
when the ADDRV and the MISCV flags in the IA32_MCi_STATUS register are set (see Section 15.3.2.4). The MCA
error code field of the IA32_MCi_STATUS register indicates the type of UCR error. System software can interpret
the MCA error code field to analyze and identify the necessary recovery action for the given UCR error.
In addition, the IA32_MCi_STATUS register bit fields, bits 56:55, are defined (see Figure 15-6) to provide additional information to help system software to properly identify the necessary recovery action for the UCR error:

•

S (Signaling) flag, bit 56 - Indicates (when set) that a machine check exception was generated for the UCR
error reported in this MC bank and system software needs to check the AR flag and the MCA error code fields in
the IA32_MCi_STATUS register to identify the necessary recovery action for this error. When the S flag in the
IA32_MCi_STATUS register is clear, this UCR error was not signaled via a machine check exception and instead
was reported as a corrected machine check (CMC). System software is not required to take any recovery action
when the S flag in the IA32_MCi_STATUS register is clear.

•

AR (Action Required) flag, bit 55 - Indicates (when set) that MCA error code specific recovery action must be
performed by system software at the time this error was signaled. This recovery action must be completed
successfully before any additional work is scheduled for this processor. When the RIPV flag in the
IA32_MCG_STATUS is clear, an alternative execution stream needs to be provided; when the MCA error code
specific recovery specific recovery action cannot be successfully completed, system software must shut down
the system. When the AR flag in the IA32_MCi_STATUS register is clear, system software may still take MCA
error code specific recovery action but this is optional; system software can safely resume program execution
at the instruction pointer saved on the stack from the machine check exception when the RIPV flag in the
IA32_MCG_STATUS register is set.

Both the S and the AR flags in the IA32_MCi_STATUS register are defined to be sticky bits, which mean that once
set, the processor does not clear them. Only software and good power-on reset can clear the S and the AR-flags.
Both the S and the AR flags are only set when the processor reports the UCR errors (MCG_CAP[24] is set).

15.6.3

UCR Error Classification

With the S and AR flag encoding in the IA32_MCi_STATUS register, UCR errors can be classified as:

•

Uncorrected no action required (UCNA) - is a UCR error that is not signaled via a machine check exception and,
instead, is reported to system software as a corrected machine check error. UCNA errors indicate that some
data in the system is corrupted, but the data has not been consumed and the processor state is valid and you
may continue execution on this processor. UCNA errors require no action from system software to continue
execution. A UNCA error is indicated with UC=1, PCC=0, S=0 and AR=0 in the IA32_MCi_STATUS register.

•

Software recoverable action optional (SRAO) - a UCR error is signaled either via a machine check exception or
CMCI. System software recovery action is optional and not required to continue execution from this machine
check exception. SRAO errors indicate that some data in the system is corrupt, but the data has not been
consumed and the processor state is valid. SRAO errors provide the additional error information for system
software to perform a recovery action. An SRAO error when signaled as a machine check is indicated with
UC=1, PCC=0, S=1, EN=1 and AR=0 in the IA32_MCi_STATUS register. In cases when SRAO is signaled via
CMCI the error signature is indicated via UC=1, PCC=0, S=0. Recovery actions for SRAO errors are MCA error
code specific. The MISCV and the ADDRV flags in the IA32_MCi_STATUS register are set when the additional
error information is available from the IA32_MCi_MISC and the IA32_MCi_ADDR registers. System software
needs to inspect the MCA error code fields in the IA32_MCi_STATUS register to identify the specific recovery

Vol. 3B 15-17

MACHINE-CHECK ARCHITECTURE

action for a given SRAO error. If MISCV and ADDRV are not set, it is recommended that no system software
error recovery be performed however, system software can resume execution.

•

Software recoverable action required (SRAR) - a UCR error that requires system software to take a recovery
action on this processor before scheduling another stream of execution on this processor. SRAR errors indicate
that the error was detected and raised at the point of the consumption in the execution flow. An SRAR error is
indicated with UC=1, PCC=0, S=1, EN=1 and AR=1 in the IA32_MCi_STATUS register. Recovery actions are
MCA error code specific. The MISCV and the ADDRV flags in the IA32_MCi_STATUS register are set when the
additional error information is available from the IA32_MCi_MISC and the IA32_MCi_ADDR registers. System
software needs to inspect the MCA error code fields in the IA32_MCi_STATUS register to identify the specific
recovery action for a given SRAR error. If MISCV and ADDRV are not set, it is recommended that system
software shutdown the system.

Table 15-7 summarizes UCR, corrected, and uncorrected errors.

Table 15-7. MC Error Classifications

Type of Error1

PCC

Signaling Software Action

Uncorrected Error (UC)

MCE

If EN=1, reset the system, else log
and OK to keep the system running.

SRAR

MCE

For known MCACOD, take specific
recovery action;

Example

Cache to processor load
error.

For unknown MCACOD, must
bugcheck.
If OVER=1, reset system, else take
specific recovery action.
SRAO

x2 0

MCE/CMC

For known MCACOD, take specific
recovery action;

Patrol scrub and explicit
writeback poison errors.

For unknown MCACOD, OK to keep
the system running.
UCNA

CMC

Log the error and Ok to keep the
system running.

Poison detection error.

Corrected Error (CE)

CMC

Log the error and no corrective
action required.

ECC in caches and
memory.

NOTES:
1. SRAR, SRAO and UCNA errors are supported by the processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set.
2. EN=1, S=1 when signaled via MCE. EN=x, S=0 when signaled via CMC.

15.6.4

UCR Error Overwrite Rules

In general, the overwrite rules are as follows:

•
•
•
•

UCR errors will overwrite corrected errors.
Uncorrected (PCC=1) errors overwrite UCR (PCC=0) errors.
UCR errors are not written over previous UCR errors.
Corrected errors do not write over previous UCR errors.

Regardless of whether the 1st error is retained or the 2nd error is overwritten over the 1st error, the OVER flag in
the IA32_MCi_STATUS register will be set to indicate an overflow condition. As the S flag and AR flag in the
IA32_MCi_STATUS register are defined to be sticky flags, a second event cannot clear these 2 flags once set,
however the MC bank information may be filled in for the 2nd error. The table below shows the overwrite rules and
how to treat a second error if the first event is already logged in a MC bank along with the resulting bit setting of
the UC, PCC, and AR flags in the IA32_MCi_STATUS register. As UCNA and SRA0 errors do not require recovery
action from system software to continue program execution, a system reset by system software is not required
unless the AR flag or PCC flag is set for the UCR overflow case (OVER=1, VAL=1, UC=1, PCC=0).

15-18 Vol. 3B

MACHINE-CHECK ARCHITECTURE

Table 15-8 lists overwrite rules for uncorrected errors, corrected errors, and uncorrected recoverable errors.

Table 15-8. Overwrite Rules for UC, CE, and UCR Errors

First Event

Second Event

PCC

MCA Bank

Reset System

UCR

0 if UCNA, else 1 1 if SRAR, else 0

second

yes, if AR=1

UCR

0 if UCNA, else 1 1 if SRAR, else 0

first

yes, if AR=1

UCNA

first

UCNA

SRAO

first

UCNA

SRAR

first

yes

SRAO

UCNA

first

SRAO

first

SRAO

SRAR

first

yes

SRAR

UCNA

first

yes

SRAR

SRAO

first

yes

SRAR

first

yes

UCR

undefined

second

yes

UCR

undefined

first

yes

15.7

MACHINE-CHECK AVAILABILITY

The machine-check architecture and machine-check exception (#MC) are model-specific features. Software can
execute the CPUID instruction to determine whether a processor implements these features. Following the execution of the CPUID instruction, the settings of the MCA flag (bit 14) and MCE flag (bit 7) in EDX indicate whether the
processor implements the machine-check architecture and machine-check exception.

15.8

MACHINE-CHECK INITIALIZATION

To use the processors machine-check architecture, software must initialize the processor to activate the machinecheck exception and the error-reporting mechanism.
Example 15-1 gives pseudocode for performing this initialization. This pseudocode checks for the existence of the
machine-check architecture and exception; it then enables machine-check exception and the error-reporting
register banks. The pseudocode shown is compatible with the Pentium 4, Intel Xeon, Intel Atom, P6 family, and
Pentium processors.
Following power up or power cycling, IA32_MCi_STATUS registers are not guaranteed to have valid data until after
they are initially cleared to zero by software (as shown in the initialization pseudocode in Example 15-1). In addition, when using P6 family processors, software must set MCi_STATUS registers to zero when doing a soft-reset.
Example 15-1. Machine-Check Initialization Pseudocode
Check CPUID Feature Flags for MCE and MCA support
IF CPU supports MCE
THEN
IF CPU supports MCA
THEN
IF (IA32_MCG_CAP.MCG_CTL_P = 1)
(* IA32_MCG_CTL register is present *)
THEN
IA32_MCG_CTL ← FFFFFFFFFFFFFFFFH;
(* enables all MCA features *)
FI
IF (IA32_MCG_CAP.MCG_LMCE_P = 1 and IA32_FEATURE_CONTROL.LOCK = 1 and IA32_FEATURE_CONTROL.LMCE_ON= 1)

Vol. 3B 15-19

MACHINE-CHECK ARCHITECTURE

(* IA32_MCG_EXT_CTL register is present and platform has enabled LMCE to permit system software to use LMCE *)
THEN
IA32_MCG_EXT_CTL ← IA32_MCG_EXT_CTL | 01H;
(* System software enables LMCE capability for hardware to signal MCE to a single logical processor*)
FI
(* Determine number of error-reporting banks supported *)
COUNT← IA32_MCG_CAP.Count;
MAX_BANK_NUMBER ← COUNT - 1;
IF (Processor Family is 6H and Processor EXTMODEL:MODEL is less than 1AH)
THEN
(* Enable logging of all errors except for MC0_CTL register *)
FOR error-reporting banks (1 through MAX_BANK_NUMBER)
DO
IA32_MCi_CTL ← 0FFFFFFFFFFFFFFFFH;
OD
ELSE
(* Enable logging of all errors including MC0_CTL register *)
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
IA32_MCi_CTL ← 0FFFFFFFFFFFFFFFFH;
OD
FI
(* BIOS clears all errors only on power-on reset *)
IF (BIOS detects Power-on reset)
THEN
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
IA32_MCi_STATUS ← 0;
OD
ELSE
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
(Optional for BIOS and OS) Log valid errors
(OS only) IA32_MCi_STATUS ← 0;
OD

Setup the Machine Check Exception (#MC) handler for vector 18 in IDT
Set the MCE bit (bit 6) in CR4 register to enable Machine-Check Exceptions
FI

15.9

INTERPRETING THE MCA ERROR CODES

When the processor detects a machine-check error condition, it writes a 16-bit error code to the MCA error code
field of one of the IA32_MCi_STATUS registers and sets the VAL (valid) flag in that register. The processor may also
write a 16-bit model-specific error code in the IA32_MCi_STATUS register depending on the implementation of the
machine-check architecture of the processor.
The MCA error codes are architecturally defined for Intel 64 and IA-32 processors. To determine the cause of a
machine-check exception, the machine-check exception handler must read the VAL flag for each
IA32_MCi_STATUS register. If the flag is set, the machine check-exception handler must then read the MCA error
code field of the register. It is the encoding of the MCA error code field [15:0] that determines the type of error
being reported and not the register bank reporting it.
There are two types of MCA error codes: simple error codes and compound error codes.

15-20 Vol. 3B

MACHINE-CHECK ARCHITECTURE

15.9.1

Simple Error Codes

Table 15-9 shows the simple error codes. These unique codes indicate global error information.

Table 15-9. IA32_MCi_Status [15:0] Simple Error Code Encoding
Error Code

Binary Encoding

Meaning

No Error

0000 0000 0000 0000

No error has been reported to this bank of error-reporting
registers.

Unclassified

0000 0000 0000 0001

This error has not been classified into the MCA error classes.

Microcode ROM Parity Error

0000 0000 0000 0010

Parity error in internal microcode ROM

External Error

0000 0000 0000 0011

The BINIT# from another processor caused this processor to
enter machine check.1

FRC Error

0000 0000 0000 0100

FRC (functional redundancy check) master/slave error

Internal Parity Error

0000 0000 0000 0101

Internal parity error.

SMM Handler Code Access
Violation

0000 0000 0000 0110

An attempt was made by the SMM Handler to execute
outside the ranges specified by SMRR.

Internal Timer Error

0000 0100 0000 0000

Internal timer error.

I/O Error

0000 1110 0000 1011

generic I/O error.

Internal Unclassified

0000 01xx xxxx xxxx

Internal unclassified errors. 2

NOTES:
1. BINIT# assertion will cause a machine check exception if the processor (or any processor on the same external bus) has BINIT#
observation enabled during power-on configuration (hardware strapping) and if machine check exceptions are enabled (by setting
CR4.MCE = 1).
2. At least one X must equal one. Internal unclassified errors have not been classified.

15.9.2

Compound Error Codes

Compound error codes describe errors related to the TLBs, memory, caches, bus and interconnect logic, and
internal timer. A set of sub-fields is common to all of compound errors. These sub-fields describe the type of
access, level in the cache hierarchy, and type of request. Table 15-10 shows the general form of the compound
error codes.

Table 15-10. IA32_MCi_Status [15:0] Compound Error Code Encoding
Type

Form

Interpretation

Generic Cache Hierarchy

000F 0000 0000 11LL

Generic cache hierarchy error

TLB Errors

000F 0000 0001 TTLL

{TT}TLB{LL}_ERR

Memory Controller Errors

000F 0000 1MMM CCCC

{MMM}_CHANNEL{CCCC}_ERR

Cache Hierarchy Errors

000F 0001 RRRR TTLL

{TT}CACHE{LL}_{RRRR}_ERR

Bus and Interconnect Errors

000F 1PPT RRRR IILL

BUS{LL}_{PP}_{RRRR}_{II}_{T}_ERR

The “Interpretation” column in the table indicates the name of a compound error. The name is constructed by
substituting mnemonics for the sub-field names given within curly braces. For example, the error code
ICACHEL1_RD_ERR is constructed from the form:
{TT}CACHE{LL}_{RRRR}_ERR,
where {TT} is replaced by I, {LL} is replaced by L1, and {RRRR} is replaced by RD.

For more information on the “Form” and “Interpretation” columns, see Sections Section 15.9.2.1, “Correction
Report Filtering (F) Bit” through Section 15.9.2.5, “Bus and Interconnect Errors”.

Vol. 3B 15-21

MACHINE-CHECK ARCHITECTURE

15.9.2.1

Correction Report Filtering (F) Bit

Starting with Intel Core Duo processors, bit 12 in the “Form” column in Table 15-10 is used to indicate that a particular posting to a log may be the last posting for corrections in that line/entry, at least for some time:

•
•

0 in bit 12 indicates “normal” filtering (original P6/Pentium4/Atom/Xeon processor meaning).
1 in bit 12 indicates “corrected” filtering (filtering is activated for the line/entry in the posting). Filtering means
that some or all of the subsequent corrections to this entry (in this structure) will not be posted. The enhanced
error reporting introduced with the Intel Core Duo processors is based on tracking the lines affected by
repeated corrections (see Section 15.4, “Enhanced Cache Error reporting”). This capability is indicated by
IA32_MCG_CAP[11]. Only the first few correction events for a line are posted; subsequent redundant
correction events to the same line are not posted. Uncorrected events are always posted.

The behavior of error filtering after crossing the yellow threshold is model-specific. Filtering has meaning only for
corrected errors (UC=0 in IA32_MCi_STATUS MSR). System software must ignore filtering bit (12) for uncorrected
errors.

15.9.2.2

Transaction Type (TT) Sub-Field

The 2-bit TT sub-field (Table 15-11) indicates the type of transaction (data, instruction, or generic). The sub-field
applies to the TLB, cache, and interconnect error conditions. Note that interconnect error conditions are primarily
associated with P6 family and Pentium processors, which utilize an external APIC bus separate from the system
bus. The generic type is reported when the processor cannot determine the transaction type.

Table 15-11. Encoding for TT (Transaction Type) Sub-Field
Transaction Type

Mnemonic

Binary Encoding

Instruction

Data

Generic

15.9.2.3

Level (LL) Sub-Field

The 2-bit LL sub-field (see Table 15-12) indicates the level in the memory hierarchy where the error occurred (level
0, level 1, level 2, or generic). The LL sub-field also applies to the TLB, cache, and interconnect error conditions.
The Pentium 4, Intel Xeon, Intel Atom, and P6 family processors support two levels in the cache hierarchy and one
level in the TLBs. Again, the generic type is reported when the processor cannot determine the hierarchy level.

Table 15-12. Level Encoding for LL (Memory Hierarchy Level) Sub-Field
Hierarchy Level

Mnemonic

Binary Encoding

Level 0

Level 1

Level 2

Generic

15.9.2.4

Request (RRRR) Sub-Field

The 4-bit RRRR sub-field (see Table 15-13) indicates the type of action associated with the error. Actions include
read and write operations, prefetches, cache evictions, and snoops. Generic error is returned when the type of
error cannot be determined. Generic read and generic write are returned when the processor cannot determine the
type of instruction or data request that caused the error. Eviction and snoop requests apply only to the caches. All
of the other requests apply to TLBs, caches and interconnects.

Table 15-13. Encoding of Request (RRRR) Sub-Field
Request Type

Mnemonic

Binary Encoding

Generic Error

ERR

0000

15-22 Vol. 3B

MACHINE-CHECK ARCHITECTURE

Table 15-13. Encoding of Request (RRRR) Sub-Field (Contd.)
Generic Read

0001

Generic Write

0010

Data Read

DRD

0011

Data Write

DWR

0100

Instruction Fetch

IRD

0101

Prefetch

PREFETCH

0110

Eviction

EVICT

0111

Snoop

SNOOP

1000

15.9.2.5

Bus and Interconnect Errors

The bus and interconnect errors are defined with the 2-bit PP (participation), 1-bit T (time-out), and 2-bit II
(memory or I/O) sub-fields, in addition to the LL and RRRR sub-fields (see Table 15-14). The bus error conditions
are implementation dependent and related to the type of bus implemented by the processor. Likewise, the interconnect error conditions are predicated on a specific implementation-dependent interconnect model that describes
the connections between the different levels of the storage hierarchy. The type of bus is implementation dependent, and as such is not specified in this document. A bus or interconnect transaction consists of a request involving
an address and a response.

Table 15-14. Encodings of PP, T, and II Sub-Fields
Sub-Field

Transaction

Mnemonic

Binary Encoding

PP (Participation)

Local processor* originated request

SRC

Local processor* responded to request

RES

Local processor* observed error as third party

OBS

Request timed out

TIMEOUT

Request did not time out

NOTIMEOUT

Memory Access

Generic
T (Time-out)
II (Memory or I/O)

Reserved
I/O

Other transaction

NOTE:
* Local processor differentiates the processor reporting the error from other system components (including the APIC, other processors, etc.).

15.9.2.6

Memory Controller Errors

The memory controller errors are defined with the 3-bit MMM (memory transaction type), and 4-bit CCCC
(channel) sub-fields. The encodings for MMM and CCCC are defined in Table 15-15.

Table 15-15. Encodings of MMM and CCCC Sub-Fields
Sub-Field

Transaction

Mnemonic

Binary Encoding

MMM

Generic undefined request

GEN

000

Memory read error

001

Memory write error

010

Address/Command Error

011

Memory Scrubbing Error

100

Reserved

101-111
Vol. 3B 15-23

MACHINE-CHECK ARCHITECTURE

Table 15-15. Encodings of MMM and CCCC Sub-Fields (Contd.)
CCCC

Channel number

CHN

0000-1110

Channel not specified

15.9.3

1111

Architecturally Defined UCR Errors

Software recoverable compound error code are defined in this section.

15.9.3.1

Architecturally Defined SRAO Errors

The following two SRAO errors are architecturally defined.

•
•

UCR Errors detected by memory controller scrubbing; and
UCR Errors detected during L3 cache (L3) explicit writebacks.

The MCA error code encodings for these two architecturally-defined UCR errors corresponds to sub-classes of
compound MCA error codes (see Table 15-10). Their values and compound encoding format are given in Table
15-16.

Table 15-16. MCA Compound Error Code Encoding for SRAO Errors
Type

MCACOD Value

MCA Error Code Encoding1

Memory Scrubbing

C0H - CFH

0000_0000_1100_CCCC
000F 0000 1MMM CCCC (Memory Controller Error), where
Memory subfield MMM = 100B (memory scrubbing)
Channel subfield CCCC = channel # or generic

L3 Explicit Writeback 17AH

0000_0001_0111_1010
000F 0001 RRRR TTLL (Cache Hierarchy Error) where
Request subfields RRRR = 0111B (Eviction)
Transaction Type subfields TT = 10B (Generic)
Level subfields LL = 10B

NOTES:
1. Note that for both of these errors the correction report filtering (F) bit (bit 12) of the MCA error must be ignored.
Table 15-17 lists values of relevant bit fields of IA32_MCi_STATUS for architecturally defined SRAO errors.

Table 15-17. IA32_MCi_STATUS Values for SRAO Errors

SRAO Error

Valid

OVER

MISCV

ADDRV

PCC

MCACOD

Memory Scrubbing

C0H-CFH

L3 Explicit Writeback

17AH

NOTES:
1. When signaled as MCE, EN=1 and S=1. If error was signaled via CMC, then EN=x, and S=0.
For both the memory scrubbing and L3 explicit writeback errors, the ADDRV and MISCV flags in the
IA32_MCi_STATUS register are set to indicate that the offending physical address information is available from the
IA32_MCi_MISC and the IA32_MCi_ADDR registers. For the memory scrubbing and L3 explicit writeback errors,
the address mode in the IA32_MCi_MISC register should be set as physical address mode (010b) and the address
LSB information in the IA32_MCi_MISC register should indicate the lowest valid address bit in the address information provided from the IA32_MCi_ADDR register.
MCE signal is broadcast to all logical processors as outlined in Section 15.10.4.1. If LMCE is supported and enabled,
some errors (not limited to UCR errors) may be delivered to only a single logical processor. System software should
consult IA32_MCG_STATUS.LMCE_S to determine if the MCE signaled is only to this logical processor.

15-24 Vol. 3B

MACHINE-CHECK ARCHITECTURE

IA32_MCi_STATUS banks can be shared by logical processors within a core or within the same package. So several
logical processors may find an SRAO error in the shared IA32_MCi_STATUS bank but other processors do not find
it in any of the IA32_MCi_STATUS banks. Table 15-18 shows the RIPV and EIPV flag indication in the
IA32_MCG_STATUS register for the memory scrubbing and L3 explicit writeback errors on both the reporting and
non-reporting logical processors.

Table 15-18. IA32_MCG_STATUS Flag Indication for SRAO Errors
SRAO Type

Reporting Logical Processors

Non-reporting Logical Processors

RIPV

EIPV

Memory Scrubbing

L3 Explicit Writeback

15.9.3.2

Architecturally Defined SRAR Errors

The following two SRAR errors are architecturally defined.

•
•

UCR Errors detected on data load; and
UCR Errors detected on instruction fetch.

Table 15-19. MCA Compound Error Code Encoding for SRAR Errors
Type

MCACOD Value

MCA Error Code Encoding1

Data Load

134H

0000_0001_0011_0100
000F 0001 RRRR TTLL (Cache Hierarchy Error), where
Request subfield RRRR = 0011B (Data Load)
Transaction Type subfield TT= 01B (Data)
Level subfield LL = 00B (Level 0)

Instruction Fetch

150H

0000_0001_0101_0000
000F 0001 RRRR TTLL (Cache Hierarchy Error), where
Request subfield RRRR = 0101B (Instruction Fetch)
Transaction Type subfield TT= 00B (Instruction)
Level subfield LL = 00B (Level 0)

NOTES:
1. Note that for both of these errors the correction report filtering (F) bit (bit 12) of the MCA error must be ignored.
Table 15-20 lists values of relevant bit fields of IA32_MCi_STATUS for architecturally defined SRAR errors.

Table 15-20. IA32_MCi_STATUS Values for SRAR Errors

SRAR Error

Valid

OVER

MISCV

ADDRV

PCC

MCACOD

Data Load

134H

Instruction Fetch

150H

For both the data load and instruction fetch errors, the ADDRV and MISCV flags in the IA32_MCi_STATUS register
are set to indicate that the offending physical address information is available from the IA32_MCi_MISC and the
IA32_MCi_ADDR registers. For the memory scrubbing and L3 explicit writeback errors, the address mode in the
IA32_MCi_MISC register should be set as physical address mode (010b) and the address LSB information in the
IA32_MCi_MISC register should indicate the lowest valid address bit in the address information provided from the
IA32_MCi_ADDR register.
MCE signal is broadcast to all logical processors on the system on which the UCR errors are supported, except when
the processor supports LMCE and LMCE is enabled by system software (see Section 15.3.1.5). The

Vol. 3B 15-25

MACHINE-CHECK ARCHITECTURE

IA32_MCG_STATUS MSR allows system software to distinguish the affected logical processor of an SRAR error
amongst logical processors that observed SRAR via MCi_STATUS bank.
Table 15-21 shows the RIPV and EIPV flag indication in the IA32_MCG_STATUS register for the data load and
instruction fetch errors on both the reporting and non-reporting logical processors. The recoverable SRAR error
reported by a processor may be continuable, where the system software can interpret the context of continuable
as follows: the error was isolated, contained. If software can rectify the error condition in the current instruction
stream, the execution context on that logical processor can be continued without loss of information.

Table 15-21. IA32_MCG_STATUS Flag Indication for SRAR Errors
SRAR Type

Affected Logical Processor

Non-Affected Logical Processors

RIPV

EIPV

Continuable

Recoverablecontinuable

Yes1

Recoverable-notcontinuable

RIPV

EIPV

Continuable

Yes

NOTES:
1. see the definition of the context of “continuable” above and additional detail below.

SRAR Error And Affected Logical Processors
The affected logical processor is the one that has detected and raised an SRAR error at the point of the consumption in the execution flow. The affected logical processor should find the Data Load or the Instruction Fetch error
information in the IA32_MCi_STATUS register that is reporting the SRAR error.
Table 15-21 list the actionable scenarios that system software can respond to an SRAR error on an affected logical
processor according to RIPV and EIPV values:

•

Recoverable-Continuable SRAR Error (RIPV=1, EIPV=1):
For Recoverable-Continuable SRAR errors, the affected logical processor should find that both the
IA32_MCG_STATUS.RIPV and the IA32_MCG_STATUS.EIPV flags are set, indicating that system software may
be able to restart execution from the interrupted context if it is able to rectify the error condition. If system
software cannot rectify the error condition then it must treat the error as a recoverable error where restarting
execution with the interrupted context is not possible. Restarting without rectifying the error condition will
result in most cases with another SRAR error on the same instruction.

•

Recoverable-not-continuable SRAR Error (RIPV=0, EIPV=x):
For Recoverable-not-continuable errors, the affected logical processor should find that either
— IA32_MCG_STATUS.RIPV= 0, IA32_MCG_STATUS.EIPV=1, or
— IA32_MCG_STATUS.RIPV= 0, IA32_MCG_STATUS.EIPV=0.
In either case, this indicates that the error is detected at the instruction pointer saved on the stack for this
machine check exception and restarting execution with the interrupted context is not possible. System
software may take the following recovery actions for the affected logical processor:

•

The current executing thread cannot be continued. System software must terminate the interrupted
stream of execution and provide a new stream of execution on return from the machine check handler
for the affected logical processor.

SRAR Error And Non-Affected Logical Processors
The logical processors that observed but not affected by an SRAR error should find that the RIPV flag in the
IA32_MCG_STATUS register is set and the EIPV flag in the IA32_MCG_STATUS register is cleared, indicating that it
is safe to restart the execution at the instruction saved on the stack for the machine check exception on these
processors after the recovery action is successfully taken by system software.

15-26 Vol. 3B

MACHINE-CHECK ARCHITECTURE

15.9.4

Multiple MCA Errors

When multiple MCA errors are detected within a certain detection window, the processor may aggregate the
reporting of these errors together as a single event, i.e. a single machine exception condition. If this occurs,
system software may find multiple MCA errors logged in different MC banks on one logical processor or find
multiple MCA errors logged across different processors for a single machine check broadcast event. In order to
handle multiple UCR errors reported from a single machine check event and possibly recover from multiple errors,
system software may consider the following:

•

Whether it can recover from multiple errors is determined by the most severe error reported on the system. If
the most severe error is found to be an unrecoverable error (VAL=1, UC=1, PCC=1 and EN=1) after system
software examines the MC banks of all processors to which the MCA signal is broadcast, recovery from the
multiple errors is not possible and system software needs to reset the system.

•

When multiple recoverable errors are reported and no other fatal condition (e.g. overflowed condition for SRAR
error) is found for the reported recoverable errors, it is possible for system software to recover from the
multiple recoverable errors by taking necessary recovery action for each individual recoverable error. However,
system software can no longer expect one to one relationship with the error information recorded in the
IA32_MCi_STATUS register and the states of the RIPV and EIPV flags in the IA32_MCG_STATUS register as the
states of the RIPV and the EIPV flags in the IA32_MCG_STATUS register may indicate the information for the
most severe error recorded on the processor. System software is required to use the RIPV flag indication in the
IA32_MCG_STATUS register to make a final decision of recoverability of the errors and find the restart-ability
requirement after examining each IA32_MCi_STATUS register error information in the MC banks.
In certain cases where system software observes more than one SRAR error logged for a single logical
processor, it can no longer rely on affected threads as specified in Table 15-20 above. System software is
recommended to reset the system if this condition is observed.

15.9.5

Machine-Check Error Codes Interpretation

Chapter 16, “Interpreting Machine-Check Error Codes,” provides information on interpreting the MCA error code,
model-specific error code, and other information error code fields. For P6 family processors, information has been
included on decoding external bus errors. For Pentium 4 and Intel Xeon processors; information is included on
external bus, internal timer and cache hierarchy errors.

15.10

GUIDELINES FOR WRITING MACHINE-CHECK SOFTWARE

The machine-check architecture and error logging can be used in three different ways:

•
•
•

To detect machine errors during normal instruction execution, using the machine-check exception (#MC).
To periodically check and log machine errors.
To examine recoverable UCR errors, determine software recoverability and perform recovery actions via a
machine-check exception handler or a corrected machine-check interrupt handler.

To use the machine-check exception, the operating system or executive software must provide a machine-check
exception handler. This handler may need to be designed specifically for each family of processors.
A special program or utility is required to log machine errors.
Guidelines for writing a machine-check exception handler or a machine-error logging utility are given in the
following sections.

15.10.1 Machine-Check Exception Handler
The machine-check exception (#MC) corresponds to vector 18. To service machine-check exceptions, a trap gate
must be added to the IDT. The pointer in the trap gate must point to a machine-check exception handler. Two
approaches can be taken to designing the exception handler:
1. The handler can merely log all the machine status and error information, then call a debugger or shut down the
system.

Vol. 3B 15-27

MACHINE-CHECK ARCHITECTURE

2. The handler can analyze the reported error information and, in some cases, attempt to correct the error and
restart the processor.
For Pentium 4, Intel Xeon, Intel Atom, P6 family, and Pentium processors; virtually all machine-check conditions
cannot be corrected (they result in abort-type exceptions). The logging of status and error information is therefore
a baseline implementation requirement.
When IA32_MCG_CAP[24] is clear, consider the following when writing a machine-check exception handler:

•

To determine the nature of the error, the handler must read each of the error-reporting register banks. The
count field in the IA32_MCG_CAP register gives number of register banks. The first register of register bank 0
is at address 400H.

•

The VAL (valid) flag in each IA32_MCi_STATUS register indicates whether the error information in the register
is valid. If this flag is clear, the registers in that bank do not contain valid error information and do not need to
be checked.

•

To write a portable exception handler, only the MCA error code field in the IA32_MCi_STATUS register should be
checked. See Section 15.9, “Interpreting the MCA Error Codes,” for information that can be used to write an
algorithm to interpret this field.

•

Correctable errors are corrected automatically by the processor. The UC flag in each IA32_MCi_STATUS register indicates whether the processor automatically corrected an error.

•

The RIPV, PCC, and OVER flags in each IA32_MCi_STATUS register indicate whether recovery from the error is
possible. If PCC or OVER are set, recovery is not possible. If RIPV is not set, program execution can not be
restarted reliably. When recovery is not possible, the handler typically records the error information and signals
an abort to the operating system.

•

The RIPV flag in the IA32_MCG_STATUS register indicates whether the program can be restarted at the
instruction indicated by the instruction pointer (the address of the instruction pushed on the stack when the
exception was generated). If this flag is clear, the processor may still be able to be restarted (for debugging
purposes) but not without loss of program continuity.

•

For unrecoverable errors, the EIPV flag in the IA32_MCG_STATUS register indicates whether the instruction
indicated by the instruction pointer pushed on the stack (when the exception was generated) is related to the
error. If the flag is clear, the pushed instruction may not be related to the error.

•

The MCIP flag in the IA32_MCG_STATUS register indicates whether a machine-check exception was generated.
Before returning from the machine-check exception handler, software should clear this flag so that it can be
used reliably by an error logging utility. The MCIP flag also detects recursion. The machine-check architecture
does not support recursion. When the processor detects machine-check recursion, it enters the shutdown
state.

Example 15-2 gives typical steps carried out by a machine-check exception handler.

15-28 Vol. 3B

MACHINE-CHECK ARCHITECTURE

Example 15-2. Machine-Check Exception Handler Pseudocode
IF CPU supports MCE
THEN
IF CPU supports MCA
THEN
call errorlogging routine; (* returns restartability *)
FI;
ELSE (* Pentium(R) processor compatible *)
READ P5_MC_ADDR
READ P5_MC_TYPE;
report RESTARTABILITY to console;
FI;
IF error is not restartable
THEN
report RESTARTABILITY to console;
abort system;
FI;
CLEAR MCIP flag in IA32_MCG_STATUS;

15.10.2 Pentium Processor Machine-Check Exception Handling
Machine-check exception handler on P6 family, Intel Atom and later processor families, should follow the guidelines
described in Section 15.10.1 and Example 15-2 that check the processor’s support of MCA.

NOTE
On processors that support MCA (CPUID.1.EDX.MCA = 1) reading the P5_MC_TYPE and
P5_MC_ADDR registers may produce invalid data.
When machine-check exceptions are enabled for the Pentium processor (MCE flag is set in control register CR4),
the machine-check exception handler uses the RDMSR instruction to read the error type from the P5_MC_TYPE
register and the machine check address from the P5_MC_ADDR register. The handler then normally reports these
register values to the system console before aborting execution (see Example 15-2).

15.10.3 Logging Correctable Machine-Check Errors
The error handling routine for servicing the machine-check exceptions is responsible for logging uncorrected
errors.
If a machine-check error is correctable, the processor does not generate a machine-check exception for it. To
detect correctable machine-check errors, a utility program must be written that reads each of the machine-check
error-reporting register banks and logs the results in an accounting file or data structure. This utility can be implemented in either of the following ways.

•
•

A system daemon that polls the register banks on an infrequent basis, such as hourly or daily.

•

An interrupt service routine servicing CMCI can read the MC banks and log the error. Please refer to Section
15.10.4.2 for guidelines on logging correctable machine checks.

A user-initiated application that polls the register banks and records the exceptions. Here, the actual polling
service is provided by an operating-system driver or through the system call interface.

Example 15-3 gives pseudocode for an error logging utility.

Vol. 3B 15-29

MACHINE-CHECK ARCHITECTURE

Example 15-3. Machine-Check Error Logging Pseudocode
Assume that execution is restartable;
IF the processor supports MCA
THEN
FOR each bank of machine-check registers
DO
READ IA32_MCi_STATUS;
IF VAL flag in IA32_MCi_STATUS = 1
THEN
IF ADDRV flag in IA32_MCi_STATUS = 1
THEN READ IA32_MCi_ADDR;
FI;
IF MISCV flag in IA32_MCi_STATUS = 1
THEN READ IA32_MCi_MISC;
FI;
IF MCIP flag in IA32_MCG_STATUS = 1
(* Machine-check exception is in progress *)
AND PCC flag in IA32_MCi_STATUS = 1
OR RIPV flag in IA32_MCG_STATUS = 0
(* execution is not restartable *)
THEN
RESTARTABILITY = FALSE;
return RESTARTABILITY to calling procedure;
FI;
Save time-stamp counter and processor ID;
Set IA32_MCi_STATUS to all 0s;
Execute serializing instruction (i.e., CPUID);
FI;
OD;
FI;

If the processor supports the machine-check architecture, the utility reads through the banks of error-reporting
registers looking for valid register entries. It then saves the values of the IA32_MCi_STATUS, IA32_MCi_ADDR,
IA32_MCi_MISC and IA32_MCG_STATUS registers for each bank that is valid. The routine minimizes processing
time by recording the raw data into a system data structure or file, reducing the overhead associated with polling.
User utilities analyze the collected data in an off-line environment.
When the MCIP flag is set in the IA32_MCG_STATUS register, a machine-check exception is in progress and the
machine-check exception handler has called the exception logging routine.
Once the logging process has been completed the exception-handling routine must determine whether execution
can be restarted, which is usually possible when damage has not occurred (The PCC flag is clear, in the
IA32_MCi_STATUS register) and when the processor can guarantee that execution is restartable (the RIPV flag is
set in the IA32_MCG_STATUS register). If execution cannot be restarted, the system is not recoverable and the
exception-handling routine should signal the console appropriately before returning the error status to the Operating System kernel for subsequent shutdown.
The machine-check architecture allows buffering of exceptions from a given error-reporting bank although the
Pentium 4, Intel Xeon, Intel Atom, and P6 family processors do not implement this feature. The error logging
routine should provide compatibility with future processors by reading each hardware error-reporting bank's
IA32_MCi_STATUS register and then writing 0s to clear the OVER and VAL flags in this register. The error logging
utility should re-read the IA32_MCi_STATUS register for the bank ensuring that the valid bit is clear. The processor
will write the next error into the register bank and set the VAL flags.
Additional information that should be stored by the exception-logging routine includes the processor’s time-stamp
counter value, which provides a mechanism to indicate the frequency of exceptions. A multiprocessing operating
system stores the identity of the processor node incurring the exception using a unique identifier, such as the
processor’s APIC ID (see Section 10.8, “Handling Interrupts”).
The basic algorithm given in Example 15-3 can be modified to provide more robust recovery techniques. For
example, software has the flexibility to attempt recovery using information unavailable to the hardware. Specifically, the machine-check exception handler can, after logging carefully analyze the error-reporting registers when
the error-logging routine reports an error that does not allow execution to be restarted. These recovery techniques

15-30 Vol. 3B

MACHINE-CHECK ARCHITECTURE

can use external bus related model-specific information provided with the error report to localize the source of the
error within the system and determine the appropriate recovery strategy.

15.10.4 Machine-Check Software Handler Guidelines for Error Recovery
15.10.4.1 Machine-Check Exception Handler for Error Recovery
When writing a machine-check exception (MCE) handler to support software recovery from Uncorrected Recoverable (UCR) errors, consider the following:

•

When IA32_MCG_CAP [24] is zero, there are no recoverable errors supported and all machine-check are fatal
exceptions. The logging of status and error information is therefore a baseline implementation requirement.

•

When IA32_MCG_CAP [24] is 1, certain uncorrected errors called uncorrected recoverable (UCR) errors may be
software recoverable. The handler can analyze the reported error information, and in some cases attempt to
recover from the uncorrected error and continue execution.

•

For processors on which CPUID reports DisplayFamily_DisplayModel as 06H_0EH and onward, an MCA signal is
broadcast to all logical processors in the system (see CPUID instruction in Chapter 3, “Instruction Set
Reference, A-L” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A). Due to the
potentially shared machine check MSR resources among the logical processors on the same package/core, the
MCE handler may be required to synchronize with the other processors that received a machine check error and
serialize access to the machine check registers when analyzing, logging and clearing the information in the
machine check registers.
— On processors that indicate ability for local machine-check exception (MCG_LMCE_P), hardware can choose
to report the error to only a single logical processor if system software has enabled LMCE by setting
IA32_MCG_EXT_CTL[LMCE_EN] = 1 as outlined in Section 15.3.1.5.

•

The MCE handler is primarily responsible for processing uncorrected errors. The UC flag in each
IA32_MCi_Status register indicates whether the reported error was corrected (UC=0) or uncorrected (UC=1).
The MCE handler can optionally log and clear the corrected errors in the MC banks if it can implement software
algorithm to avoid the undesired race conditions with the CMCI or CMC polling handler.

•

For uncorrectable errors, the EIPV flag in the IA32_MCG_STATUS register indicates (when set) that the
instruction pointed to by the instruction pointer pushed onto the stack when the machine-check exception is
generated is directly associated with the error. When this flag is cleared, the instruction pointed to may not be
associated with the error.

•

The MCIP flag in the IA32_MCG_STATUS register indicates whether a machine-check exception was generated.
When a machine check exception is generated, it is expected that the MCIP flag in the IA32_MCG_STATUS
register is set to 1. If it is not set, this machine check was generated by either an INT 18 instruction or some
piece of hardware signaling an interrupt with vector 18.

When IA32_MCG_CAP [24] is 1, the following rules can apply when writing a machine check exception (MCE)
handler to support software recovery:

•

The PCC flag in each IA32_MCi_STATUS register indicates whether recovery from the error is possible for
uncorrected errors (UC=1). If the PCC flag is set for enabled uncorrected errors (UC=1 and EN=1), recovery is
not possible. When recovery is not possible, the MCE handler typically records the error information and signals
the operating system to reset the system.

•

The RIPV flag in the IA32_MCG_STATUS register indicates whether restarting the program execution from the
instruction pointer saved on the stack for the machine check exception is possible. When the RIPV is set,
program execution can be restarted reliably when recovery is possible. If the RIPV flag is not set, program
execution cannot be restarted reliably. In this case the recovery algorithm may involve terminating the current
program execution and resuming an alternate thread of execution upon return from the machine check handler
when recovery is possible. When recovery is not possible, the MCE handler signals the operating system to
reset the system.

Vol. 3B 15-31

MACHINE-CHECK ARCHITECTURE

•

When the EN flag is zero but the VAL and UC flags are one in the IA32_MCi_STATUS register, the reported
uncorrected error in this bank is not enabled. As uncorrected errors with the EN flag = 0 are not the source of
machine check exceptions, the MCE handler should log and clear non-enabled errors when the S bit is set and
should continue searching for enabled errors from the other IA32_MCi_STATUS registers. Note that when
IA32_MCG_CAP [24] is 0, any uncorrected error condition (VAL =1 and UC=1) including the one with the EN
flag cleared are fatal and the handler must signal the operating system to reset the system. For the errors that
do not generate machine check exceptions, the EN flag has no meaning.

•

When the VAL flag is one, the UC flag is one, the EN flag is one and the PCC flag is zero in the IA32_MCi_STATUS
register, the error in this bank is an uncorrected recoverable (UCR) error. The MCE handler needs to examine
the S flag and the AR flag to find the type of the UCR error for software recovery and determine if software error
recovery is possible.

•

When both the S and the AR flags are clear in the IA32_MCi_STATUS register for the UCR error (VAL=1, UC=1,
EN=x and PCC=0), the error in this bank is an uncorrected no-action required error (UCNA). UCNA errors are
uncorrected but do not require any OS recovery action to continue execution. These errors indicate that some
data in the system is corrupt, but that data has not been consumed and may not be consumed. If that data is
consumed a non-UNCA machine check exception will be generated. UCNA errors are signaled in the same way
as corrected machine check errors and the CMCI and CMC polling handler is primarily responsible for handling
UCNA errors. Like corrected errors, the MCA handler can optionally log and clear UCNA errors as long as it can
avoid the undesired race condition with the CMCI or CMC polling handler. As UCNA errors are not the source of
machine check exceptions, the MCA handler should continue searching for uncorrected or software recoverable
errors in all other MC banks.

•

When the S flag in the IA32_MCi_STATUS register is set for the UCR error ((VAL=1, UC=1, EN=1 and PCC=0),
the error in this bank is software recoverable and it was signaled through a machine-check exception. The AR
flag in the IA32_MCi_STATUS register further clarifies the type of the software recoverable errors.

•

When the AR flag in the IA32_MCi_STATUS register is clear for the software recoverable error (VAL=1, UC=1,
EN=1, PCC=0 and S=1), the error in this bank is a software recoverable action optional (SRAO) error. The MCE
handler and the operating system can analyze the IA32_MCi_STATUS [15:0] to implement MCA error code
specific optional recovery action, but this recovery action is optional. System software can resume the program
execution from the instruction pointer saved on the stack for the machine check exception when the RIPV flag
in the IA32_MCG_STATUS register is set.

•

Even if the OVER flag in the IA32_MCi_STATUS register is set for the SRAO error (VAL=1, UC=1, EN=1, PCC=0,
S=1 and AR=0), the MCE handler can take recovery action for the SRAO error logged in the IA32_MCi_STATUS
register. Since the recovery action for SRAO errors is optional, restarting the program execution from the
instruction pointer saved on the stack for the machine check exception is still possible for the overflowed SRAO
error if the RIPV flag in the IA32_MCG_STATUS is set.

•

When the AR flag in the IA32_MCi_STATUS register is set for the software recoverable error (VAL=1, UC=1,
EN=1, PCC=0 and S=1), the error in this bank is a software recoverable action required (SRAR) error. The MCE
handler and the operating system must take recovery action in order to continue execution after the machinecheck exception. The MCA handler and the operating system need to analyze the IA32_MCi_STATUS [15:0] to
determine the MCA error code specific recovery action. If no recovery action can be performed, the operating
system must reset the system.

•

When the OVER flag in the IA32_MCi_STATUS register is set for the SRAR error (VAL=1, UC=1, EN=1, PCC=0,
S=1 and AR=1), the MCE handler cannot take recovery action as the information of the SRAR error in the
IA32_MCi_STATUS register was potentially lost due to the overflow condition. Since the recovery action for
SRAR errors must be taken, the MCE handler must signal the operating system to reset the system.

•

When the MCE handler cannot find any uncorrected (VAL=1, UC=1 and EN=1) or any software recoverable
errors (VAL=1, UC=1, EN=1, PCC=0 and S=1) in any of the IA32_MCi banks of the processors, this is an
unexpected condition for the MCE handler and the handler should signal the operating system to reset the
system.

•

Before returning from the machine-check exception handler, software must clear the MCIP flag in the
IA32_MCG_STATUS register. The MCIP flag is used to detect recursion. The machine-check architecture does
not support recursion. When the processor receives a machine check when MCIP is set, it automatically enters
the shutdown state.

Example 15-4 gives pseudocode for an MC exception handler that supports recovery of UCR.

15-32 Vol. 3B

MACHINE-CHECK ARCHITECTURE

Example 15-4. Machine-Check Error Handler Pseudocode Supporting UCR
MACHINE CHECK HANDLER: (* Called from INT 18 handler *)
NOERROR = TRUE;
ProcessorCount = 0;
IF CPU supports MCA
THEN
RESTARTABILITY = TRUE;
IF (Processor Family = 6 AND DisplayModel ≥ 0EH) OR (Processor Family > 6)
THEN
IF ( MCG_LMCE = 1)
MCA_BROADCAST = FALSE;
ELSE
MCA_BROADCAST = TRUE;
FI;
Acquire SpinLock;
ProcessorCount++; (* Allowing one logical processor at a time to examine machine check registers *)
CALL MCA ERROR PROCESSING; (* returns RESTARTABILITY and NOERROR *)
ELSE
MCA_BROADCAST = FALSE;
(* Implement a rendezvous mechanism with the other processors if necessary *)
CALL MCA ERROR PROCESSING;
FI;
ELSE (* Pentium(R) processor compatible *)
READ P5_MC_ADDR
READ P5_MC_TYPE;
RESTARTABILITY = FALSE;
FI;
IF NOERROR = TRUE
THEN
IF NOT (MCG_RIPV = 1 AND MCG_EIPV = 0)
THEN
RESTARTABILITY = FALSE;
FI
FI;
IF RESTARTABILITY = FALSE
THEN
Report RESTARTABILITY to console;
Reset system;
FI;
IF MCA_BROADCAST = TRUE
THEN
IF ProcessorCount = MAX_PROCESSORS
AND NOERROR = TRUE
THEN
Report RESTARTABILITY to console;
Reset system;
FI;
Release SpinLock;
Wait till ProcessorCount = MAX_PROCESSRS on system;
(* implement a timeout and abort function if necessary *)
FI;
CLEAR IA32_MCG_STATUS;
RESUME Execution;
(* End of MACHINE CHECK HANDLER*)

MCA ERROR PROCESSING: (* MCA Error Processing Routine called from MCA Handler *)
IF MCIP flag in IA32_MCG_STATUS = 0
THEN (* MCIP=0 upon MCA is unexpected *)
RESTARTABILITY = FALSE;
FI;

Vol. 3B 15-33

MACHINE-CHECK ARCHITECTURE

FOR each bank of machine-check registers
DO
CLEAR_MC_BANK = FALSE;
READ IA32_MCi_STATUS;
IF VAL Flag in IA32_MCi_STATUS = 1
THEN
IF UC Flag in IA32_MCi_STATUS = 1
THEN
IF Bit 24 in IA32_MCG_CAP = 0
THEN (* the processor does not support software error recovery *)
RESTARTABILITY = FALSE;
NOERROR = FALSE;
GOTO LOG MCA REGISTER;
FI;
(* the processor supports software error recovery *)
IF EN Flag in IA32_MCi_STATUS = 0 AND OVER Flag in IA32_MCi_STATUS=0
THEN (* It is a spurious MCA Log. Log and clear the register *)
CLEAR_MC_BANK = TRUE;
GOTO LOG MCA REGISTER;
FI;
IF PCC = 1 and EN = 1 in IA32_MCi_STATUS
THEN (* processor context might have been corrupted *)
RESTARTABILITY = FALSE;
ELSE (* It is a uncorrected recoverable (UCR) error *)
IF S Flag in IA32_MCi_STATUS = 0
THEN
IF AR Flag in IA32_MCi_STATUS = 0
THEN (* It is a uncorrected no action required (UCNA) error *)
GOTO CONTINUE; (* let CMCI and CMC polling handler to process *)
ELSE
RESTARTABILITY = FALSE; (* S=0, AR=1 is illegal *)
FI
FI;
IF RESTARTABILITY = FALSE
THEN (* no need to take recovery action if RESTARTABILITY is already false *)
NOERROR = FALSE;
GOTO LOG MCA REGISTER;
FI;
(* S in IA32_MCi_STATUS = 1 *)
IF AR Flag in IA32_MCi_STATUS = 1
THEN (* It is a software recoverable and action required (SRAR) error *)
IF OVER Flag in IA32_MCi_STATUS = 1
THEN
RESTARTABILITY = FALSE;
NOERROR = FALSE;
GOTO LOG MCA REGISTER;
FI
IF MCACOD Value in IA32_MCi_STATUS is recognized
AND Current Processor is an Affected Processor
THEN
Implement MCACOD specific recovery action;
CLEAR_MC_BANK = TRUE;
ELSE
RESTARTABILITY = FALSE;
FI;
ELSE (* It is a software recoverable and action optional (SRAO) error *)
IF OVER Flag in IA32_MCi_STATUS = 0 AND
MCACOD in IA32_MCi_STATUS is recognized
THEN
Implement MCACOD specific recovery action;
FI;
CLEAR_MC_BANK = TRUE;
FI; AR
FI; PCC
NOERROR = FALSE;
15-34 Vol. 3B

MACHINE-CHECK ARCHITECTURE

GOTO LOG MCA REGISTER;
ELSE (* It is a corrected error; continue to the next IA32_MCi_STATUS *)
GOTO CONTINUE;
FI; UC

FI; VAL
LOG MCA REGISTER:
SAVE IA32_MCi_STATUS;
If MISCV in IA32_MCi_STATUS
THEN
SAVE IA32_MCi_MISC;
FI;
IF ADDRV in IA32_MCi_STATUS
THEN
SAVE IA32_MCi_ADDR;
FI;
IF CLEAR_MC_BANK = TRUE
THEN
SET all 0 to IA32_MCi_STATUS;
If MISCV in IA32_MCi_STATUS
THEN
SET all 0 to IA32_MCi_MISC;
FI;
IF ADDRV in IA32_MCi_STATUS
THEN
SET all 0 to IA32_MCi_ADDR;
FI;
FI;
CONTINUE:
OD;
( *END FOR *)
RETURN;
(* End of MCA ERROR PROCESSING*)

15.10.4.2 Corrected Machine-Check Handler for Error Recovery
When writing a corrected machine check handler, which is invoked as a result of CMCI or called from an OS CMC
Polling dispatcher, consider the following:

•

The VAL (valid) flag in each IA32_MCi_STATUS register indicates whether the error information in the register
is valid. If this flag is clear, the registers in that bank does not contain valid error information and does not need
to be checked.

•

The CMCI or CMC polling handler is responsible for logging and clearing corrected errors. The UC flag in each
IA32_MCi_Status register indicates whether the reported error was corrected (UC=0) or not (UC=1).

•

When IA32_MCG_CAP [24] is one, the CMC handler is also responsible for logging and clearing uncorrected noaction required (UCNA) errors. When the UC flag is one but the PCC, S, and AR flags are zero in the
IA32_MCi_STATUS register, the reported error in this bank is an uncorrected no-action required (UCNA) error.
In cases when SRAO error are signaled as UCNA error via CMCI, software can perform recovery for those errors
identified in Table 15-16.

•

In addition to corrected errors and UCNA errors, the CMC handler optionally logs uncorrected (UC=1 and
PCC=1), software recoverable machine check errors (UC=1, PCC=0 and S=1), but should avoid clearing those
errors from the MC banks. Clearing these errors may result in accidentally removing these errors before these
errors are actually handled and processed by the MCE handler for attempted software error recovery.

Example 15-5 gives pseudocode for a CMCI handler with UCR support.

Vol. 3B 15-35

MACHINE-CHECK ARCHITECTURE

Example 15-5. Corrected Error Handler Pseudocode with UCR Support
Corrected Error HANDLER: (* Called from CMCI handler or OS CMC Polling Dispatcher*)
IF CPU supports MCA
THEN
FOR each bank of machine-check registers
DO
READ IA32_MCi_STATUS;
IF VAL flag in IA32_MCi_STATUS = 1
THEN
IF UC Flag in IA32_MCi_STATUS = 0 (* It is a corrected error *)
THEN
GOTO LOG CMC ERROR;
ELSE
IF Bit 24 in IA32_MCG_CAP = 0
THEN
GOTO CONTINUE;
FI;
IF S Flag in IA32_MCi_STATUS = 0 AND AR Flag in IA32_MCi_STATUS = 0
THEN (* It is a uncorrected no action required error *)
GOTO LOG CMC ERROR
FI
IF EN Flag in IA32_MCi_STATUS = 0
THEN (* It is a spurious MCA error *)
GOTO LOG CMC ERROR
FI;
FI;
FI;
GOTO CONTINUE;
LOG CMC ERROR:
SAVE IA32_MCi_STATUS;
If MISCV Flag in IA32_MCi_STATUS
THEN
SAVE IA32_MCi_MISC;
SET all 0 to IA32_MCi_MISC;
FI;
IF ADDRV Flag in IA32_MCi_STATUS
THEN
SAVE IA32_MCi_ADDR;
SET all 0 to IA32_MCi_ADDR
FI;
SET all 0 to IA32_MCi_STATUS;
CONTINUE:
OD;
( *END FOR *)
FI;

15-36 Vol. 3B

CHAPTER 16
INTERPRETING MACHINE-CHECK
ERROR CODES
Encoding of the model-specific and other information fields is different across processor families. The differences
are documented in the following sections.

16.1

INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY 06H
MACHINE ERROR CODES FOR MACHINE CHECK

Section 16.1 provides information for interpreting additional model-specific fields for external bus errors relating to
processor family 06H. The references to processor family 06H refers to only IA-32 processors with CPUID signatures listed in Table 16-1.

Table 16-1. CPUID DisplayFamily_DisplayModel Signatures for Processor Family 06H
DisplayFamily_DisplayModel

Processor Families/Processor Number Series

06_0EH

Intel Core Duo, Intel Core Solo processors

06_0DH

Intel Pentium M processor

06_09H

Intel Pentium M processor

06_7H, 06_08H, 06_0AH, 06_0BH

Intel Pentium III Xeon Processor, Intel Pentium III Processor

06_03H, 06_05H

Intel Pentium II Xeon Processor, Intel Pentium II Processor

06_01H

Intel Pentium Pro Processor

These errors are reported in the IA32_MCi_STATUS MSRs. They are reported architecturally as compound errors
with a general form of 0000 1PPT RRRR IILL in the MCA error code field. See Chapter 15 for information on the
interpretation of compound error codes. Incremental decoding information is listed in Table 16-2.

Table 16-2. Incremental Decoding Information: Processor Family 06H Machine Error Codes For Machine Check
Type

Bit No.

Bit Function

Bit Description

MCA error
codes1

15:0

Model specific
errors

18:16

Reserved

Model specific
errors

24:19

Bus queue request
type

000000 for BQ_DCU_READ_TYPE error
000010 for BQ_IFU_DEMAND_TYPE error
000011 for BQ_IFU_DEMAND_NC_TYPE error
000100 for BQ_DCU_RFO_TYPE error
000101 for BQ_DCU_RFO_LOCK_TYPE error
000110 for BQ_DCU_ITOM_TYPE error
001000 for BQ_DCU_WB_TYPE error
001010 for BQ_DCU_WCEVICT_TYPE error
001011 for BQ_DCU_WCLINE_TYPE error
001100 for BQ_DCU_BTM_TYPE error

Vol. 3B 16-1

INTERPRETING MACHINE-CHECK ERROR CODES

Table 16-2. Incremental Decoding Information: Processor Family 06H Machine Error Codes For Machine Check
Type

Bit No.

Bit Function

Bit Description
001101 for BQ_DCU_INTACK_TYPE error
001110 for BQ_DCU_INVALL2_TYPE error
001111 for BQ_DCU_FLUSHL2_TYPE error
010000 for BQ_DCU_PART_RD_TYPE error
010010 for BQ_DCU_PART_WR_TYPE error
010100 for BQ_DCU_SPEC_CYC_TYPE error
011000 for BQ_DCU_IO_RD_TYPE error
011001 for BQ_DCU_IO_WR_TYPE error
011100 for BQ_DCU_LOCK_RD_TYPE error
011110 for BQ_DCU_SPLOCK_RD_TYPE error
011101 for BQ_DCU_LOCK_WR_TYPE error

Model specific
errors

27:25

Bus queue error type

000 for BQ_ERR_HARD_TYPE error
001 for BQ_ERR_DOUBLE_TYPE error
010 for BQ_ERR_AERR2_TYPE error
100 for BQ_ERR_SINGLE_TYPE error
101 for BQ_ERR_AERR1_TYPE error

Model specific
errors

Other
information

FRC error

1 if FRC error active

BERR

1 if BERR is driven

Internal BINIT

1 if BINIT driven for this processor

Reserved

34:32

Reserved

External BINIT

1 if BINIT is received from external bus.

Response parity error This bit is asserted in IA32_MCi_STATUS if this component has received a parity
error on the RS[2:0]# pins for a response transaction. The RS signals are checked
by the RSP# external pin.

Bus BINIT

This bit is asserted in IA32_MCi_STATUS if this component has received a hard
error response on a split transaction one access that has needed to be split across
the 64-bit external bus interface into two accesses).

Timeout BINIT

This bit is asserted in IA32_MCi_STATUS if this component has experienced a ROB
time-out, which indicates that no micro-instruction has been retired for a
predetermined period of time.
A ROB time-out occurs when the 15-bit ROB time-out counter carries a 1 out of its
high order bit. 2 The timer is cleared when a micro-instruction retires, an exception
is detected by the core processor, RESET is asserted, or when a ROB BINIT occurs.
The ROB time-out counter is prescaled by the 8-bit PIC timer which is a divide by
128 of the bus clock the bus clock is 1:2, 1:3, 1:4 of the core clock). When a carry
out of the 8-bit PIC timer occurs, the ROB counter counts up by one. While this bit
is asserted, it cannot be overwritten by another error.

16-2 Vol. 3B

41:39

Reserved

Hard error

This bit is asserted in IA32_MCi_STATUS if this component has initiated a bus
transactions which has received a hard error response. While this bit is asserted, it
cannot be overwritten.

INTERPRETING MACHINE-CHECK ERROR CODES

Table 16-2. Incremental Decoding Information: Processor Family 06H Machine Error Codes For Machine Check
Type

Bit No.

Bit Function

Bit Description

IERR

This bit is asserted in IA32_MCi_STATUS if this component has experienced a
failure that causes the IERR pin to be asserted. While this bit is asserted, it cannot
be overwritten.

AERR

This bit is asserted in IA32_MCi_STATUS if this component has initiated 2 failing
bus transactions which have failed due to Address Parity Errors AERR asserted).
While this bit is asserted, it cannot be overwritten.

UECC

The Uncorrectable ECC error bit is asserted in IA32_MCi_STATUS for uncorrected
ECC errors. While this bit is asserted, the ECC syndrome field will not be
overwritten.

CECC

The correctable ECC error bit is asserted in IA32_MCi_STATUS for corrected ECC
errors.

54:47

ECC syndrome

The ECC syndrome field in IA32_MCi_STATUS contains the 8-bit ECC syndrome only
if the error was a correctable/uncorrectable ECC error and there wasn't a previous
valid ECC error syndrome logged in IA32_MCi_STATUS.
A previous valid ECC error in IA32_MCi_STATUS is indicated by
IA32_MCi_STATUS.bit45 uncorrectable error occurred) being asserted. After
processing an ECC error, machine-check handling software should clear
IA32_MCi_STATUS.bit45 so that future ECC error syndromes can be logged.

56:55
Status register
validity
indicators1

Reserved

Reserved.

63:57

NOTES:
1. These fields are architecturally defined. Refer to Chapter 15, “Machine-Check Architecture,” for more information.
2. For processors with a CPUID signature of 06_0EH, a ROB time-out occurs when the 23-bit ROB time-out counter carries a 1 out of its
high order bit.

16.2

INCREMENTAL DECODING INFORMATION: INTEL CORE 2 PROCESSOR
FAMILY MACHINE ERROR CODES FOR MACHINE CHECK

Table 16-4 provides information for interpreting additional model-specific fields for external bus errors relating to
processor based on Intel Core microarchitecture, which implements the P4 bus specification. Table 16-3 lists the
CPUID signatures for Intel 64 processors that are covered by Table 16-4. These errors are reported in the
IA32_MCi_STATUS MSRs. They are reported architecturally as compound errors with a general form of
0000 1PPT RRRR IILL in the MCA error code field. See Chapter 15 for information on the interpretation of
compound error codes.

Table 16-3. CPUID DisplayFamily_DisplayModel Signatures for Processors Based on Intel Core Microarchitecture
DisplayFamily_DisplayModel Processor Families/Processor Number Series
06_1DH

Intel Xeon Processor 7400 series.

06_17H

Intel Xeon Processor 5200, 5400 series, Intel Core 2 Quad processor Q9650.

06_0FH

Intel Xeon Processor 3000, 3200, 5100, 5300, 7300 series, Intel Core 2 Quad, Intel Core 2 Extreme,
Intel Core 2 Duo processors, Intel Pentium dual-core processors.

Vol. 3B 16-3

INTERPRETING MACHINE-CHECK ERROR CODES

Table 16-4. Incremental Bus Error Codes of Machine Check for Processors
Based on Intel Core Microarchitecture
Type

Bit No.

Bit Function

Bit Description

MCA error
codes1

15:0

Model specific
errors

18:16

Reserved

Model specific
errors

24:19

Bus queue request
type

‘000001 for BQ_PREF_READ_TYPE error
000000 for BQ_DCU_READ_TYPE error
000010 for BQ_IFU_DEMAND_TYPE error
000011 for BQ_IFU_DEMAND_NC_TYPE error
000100 for BQ_DCU_RFO_TYPE error
000101 for BQ_DCU_RFO_LOCK_TYPE error
000110 for BQ_DCU_ITOM_TYPE error
001000 for BQ_DCU_WB_TYPE error
001010 for BQ_DCU_WCEVICT_TYPE error
001011 for BQ_DCU_WCLINE_TYPE error
001100 for BQ_DCU_BTM_TYPE error
001101 for BQ_DCU_INTACK_TYPE error
001110 for BQ_DCU_INVALL2_TYPE error
001111 for BQ_DCU_FLUSHL2_TYPE error
010000 for BQ_DCU_PART_RD_TYPE error
010010 for BQ_DCU_PART_WR_TYPE error
010100 for BQ_DCU_SPEC_CYC_TYPE error
011000 for BQ_DCU_IO_RD_TYPE error
011001 for BQ_DCU_IO_WR_TYPE error
011100 for BQ_DCU_LOCK_RD_TYPE error
011110 for BQ_DCU_SPLOCK_RD_TYPE error
011101 for BQ_DCU_LOCK_WR_TYPE error
100100 for BQ_L2_WI_RFO_TYPE error
100110 for BQ_L2_WI_ITOM_TYPE error

Model specific
errors

27:25

Bus queue error type

‘001 for Address Parity Error

Model specific
errors

MCE Driven

1 if MCE is driven

MCE Observed

1 if MCE is observed

Internal BINIT

1 if BINIT driven for this processor

BINIT Observed

1 if BINIT is observed for this processor

33:32

Reserved

PIC and FSB data
parity

Data Parity detected on either PIC or FSB access

Reserved

‘010 for Response Hard Error
‘011 for Response Parity Error

Other
information

16-4 Vol. 3B

INTERPRETING MACHINE-CHECK ERROR CODES

Table 16-4. Incremental Bus Error Codes of Machine Check for Processors
Based on Intel Core Microarchitecture (Contd.)
Type

Bit No.

Bit Function

Bit Description

FSB address parity

Address parity error detected:
1 = Address parity error detected
0 = No address parity error

Timeout BINIT

This bit is asserted in IA32_MCi_STATUS if this component has experienced a ROB
time-out, which indicates that no micro-instruction has been retired for a
predetermined period of time.
A ROB time-out occurs when the 23-bit ROB time-out counter carries a 1 out of its
high order bit. The timer is cleared when a micro-instruction retires, an exception is
detected by the core processor, RESET is asserted, or when a ROB BINIT occurs.
The ROB time-out counter is prescaled by the 8-bit PIC timer which is a divide by
128 of the bus clock the bus clock is 1:2, 1:3, 1:4 of the core clock). When a carry
out of the 8-bit PIC timer occurs, the ROB counter counts up by one. While this bit
is asserted, it cannot be overwritten by another error.

Status register
validity
indicators1

41:39

Reserved

Hard error

This bit is asserted in IA32_MCi_STATUS if this component has initiated a bus
transactions which has received a hard error response. While this bit is asserted, it
cannot be overwritten.

IERR

This bit is asserted in IA32_MCi_STATUS if this component has experienced a
failure that causes the IERR pin to be asserted. While this bit is asserted, it cannot
be overwritten.

Reserved

54:47

Reserved

56:55

Reserved

Reserved.

63:57

NOTES:
1. These fields are architecturally defined. Refer to Chapter 15, “Machine-Check Architecture,” for more information.

16.2.1

Model-Specific Machine Check Error Codes for Intel Xeon Processor 7400 Series

Intel Xeon processor 7400 series has machine check register banks that generally follows the description of
Chapter 15 and Section 16.2. Additional error codes specific to Intel Xeon processor 7400 series is describe in this
section.
MC4_STATUS[63:0] is the main error logging for the processor’s L3 and front side bus errors for Intel Xeon
processor 7400 series. It supports the L3 Errors, Bus and Interconnect Errors Compound Error Codes in the MCA
Error Code Field.

Vol. 3B 16-5

INTERPRETING MACHINE-CHECK ERROR CODES

16.2.1.1

Processor Machine Check Status Register
Incremental MCA Error Code Definition

Intel Xeon processor 7400 series use compound MCA Error Codes for logging its Bus internal machine check
errors, L3 Errors, and Bus/Interconnect Errors. It defines incremental Machine Check error types
(IA32_MC6_STATUS[15:0]) beyond those defined in Chapter 15. Table 16-5 lists these incremental MCA error
code types that apply to IA32_MC6_STATUS. Error code details are specified in MC6_STATUS [31:16] (see
Section 16.2.2), the “Model Specific Error Code” field. The information in the “Other_Info” field
(MC4_STATUS[56:32]) is common to the three processor error types and contains a correctable event count and
specifies the MC6_MISC register format.

Table 16-5. Incremental MCA Error Code Types for Intel Xeon Processor 7400
Processor MCA_Error_Code (MC6_STATUS[15:0])
Type

Error Code

Binary Encoding

Meaning

Internal Error

0000 0100 0000 0000 Internal Error Type Code

Bus and
Interconnect

0000 100x 0000 1111

Not used but this encoding is reserved for compatibility with other MCA
implementations

Error

0000 101x 0000 1111

Not used but this encoding is reserved for compatibility with other MCA
implementations

0000 110x 0000 1111

Not used but this encoding is reserved for compatibility with other MCA
implementations

0000 1110 0000 1111 Bus and Interconnection Error Type Code
0000 1111 0000 1111 Not used but this encoding is reserved for compatibility with other MCA
implementations
The Bold faced binary encodings are the only encodings used by the processor for MC4_STATUS[15:0].

16.2.2

Intel Xeon Processor 7400 Model Specific Error Code Field

16.2.2.1

Processor Model Specific Error Code Field
Type B: Bus and Interconnect Error

Note:

The Model Specific Error Code field in MC6_STATUS (bits 31:16).

Table 16-6. Type B Bus and Interconnect Error Codes
Bit Num

Sub-Field Name

Description

FSB Request Parity

Parity error detected during FSB request phase

19:17

Reserved

FSB Hard Fail Response

“Hard Failure“ response received for a local transaction

FSB Response Parity

Parity error on FSB response field detected

FSB Data Parity

FSB data parity error on inbound data detected

31:23

---

Reserved

16-6 Vol. 3B

INTERPRETING MACHINE-CHECK ERROR CODES

16.2.2.2

Processor Model Specific Error Code Field
Type C: Cache Bus Controller Error
Table 16-7. Type C Cache Bus Controller Error Codes

MC4_STATUS[31:16] (MSCE) Value

Error Description

0000_0000_0000_0001 0001H

Inclusion Error from Core 0

0000_0000_0000_0010 0002H

Inclusion Error from Core 1

0000_0000_0000_0011 0003H

Write Exclusive Error from Core 0

0000_0000_0000_0100 0004H

Write Exclusive Error from Core 1

0000_0000_0000_0101 0005H

Inclusion Error from FSB

0000_0000_0000_0110 0006H

SNP Stall Error from FSB

0000_0000_0000_0111 0007H

Write Stall Error from FSB

0000_0000_0000_1000 0008H

FSB Arb Timeout Error

0000_0000_0000_1010 000AH

Inclusion Error from Core 2

0000_0000_0000_1011 000BH

Write Exclusive Error from Core 2

0000_0010_0000_0000 0200H

Internal Timeout error

0000_0011_0000_0000 0300H

Internal Timeout Error

0000_0100_0000_0000 0400H

Intel® Cache Safe Technology Queue Full Error or Disabled-ways-in-a-set overflow

0000_0101_0000_0000 0500H

Quiet cycle Timeout Error (correctable)

1100_0000_0000_0010 C002H

Correctable ECC event on outgoing Core 0 data

1100_0000_0000_0100 C004H

Correctable ECC event on outgoing Core 1 data

1100_0000_0000_1000 C008H

Correctable ECC event on outgoing Core 2 data

1110_0000_0000_0010 E002H

Uncorrectable ECC error on outgoing Core 0 data

1110_0000_0000_0100 E004H

Uncorrectable ECC error on outgoing Core 1 data

1110_0000_0000_1000 E008H

Uncorrectable ECC error on outgoing Core 2 data

— all other encodings —

Reserved

16.3

INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY WITH
CPUID DISPLAYFAMILY_DISPLAYMODEL SIGNATURE 06_1AH, MACHINE
ERROR CODES FOR MACHINE CHECK

Table 16-8 through Table 16-12 provide information for interpreting additional model-specific fields for memory
controller errors relating to the processor family with CPUID DisplayFamily_DisplaySignature 06_1AH, which
supports Intel QuickPath Interconnect links. Incremental MC error codes related to the Intel QPI links are reported
in the register banks IA32_MC0 and IA32_MC1, incremental error codes for internal machine check is reported in
the register bank IA32_MC7, and incremental error codes for the memory controller unit is reported in the register
banks IA32_MC8.

Vol. 3B 16-7

INTERPRETING MACHINE-CHECK ERROR CODES

16.3.1

Intel QPI Machine Check Errors
Table 16-8. Intel QPI Machine Check Error Codes for IA32_MC0_STATUS and IA32_MC1_STATUS

Type
1

MCA error codes

Bit No.

Bit Function

Bit Description

15:0

MCACOD

Bus error format: 1PPTRRRRIILL

Header Parity

if 1, QPI Header had bad parity

Data Parity

If 1, QPI Data packet had bad parity

Retries Exceeded

If 1, number of QPI retries was exceeded

Received Poison

if 1, Received a data packet that was marked as poisoned by the sender

21:20

Reserved

Unsupported Message

If 1, QPI received a message encoding it does not support

Unsupported Credit

If 1, QPI credit type is not supported.

Receive Flit Overrun

If 1, Sender sent too many QPI flits to the receiver.

Received Failed
Response

If 1, Indicates that sender sent a failed response to receiver.

Receiver Clock Jitter

If 1, clock jitter detected in the internal QPI clocking

56:27

Reserved

Model specific errors

Status register
validity indicators1

63:57

NOTES:
1. These fields are architecturally defined. Refer to Chapter 15, “Machine-Check Architecture,” for more information.

Table 16-9. Intel QPI Machine Check Error Codes for IA32_MC0_MISC and IA32_MC1_MISC
Type
Model specific

Bit No.

Bit Function

Bit Description

7:0

QPI Opcode

Message class and opcode from the packet with the error

13:8

RTId

QPI Request Transaction ID

15:14

Reserved

18:16

RHNID

QPI Requestor/Home Node ID

23:19

Reserved

IIB

QPI Interleave/Head Indication Bit

errors1

NOTES:
1. Which of these fields are valid depends on the error type.

16.3.2

Internal Machine Check Errors
Table 16-10. Machine Check Error Codes for IA32_MC7_STATUS

Type

Bit No.

Bit Function

MCA error codes1

15:0

MCACOD

Model specific errors

16-8 Vol. 3B

Bit Description

INTERPRETING MACHINE-CHECK ERROR CODES

Type

Bit No.

Bit Function

Bit Description

23:16

Reserved

31:24

Reserved except for
the following

00h - No Error
03h - Reset firmware did not complete
08h - Received an invalid CMPD
0Ah - Invalid Power Management Request
0Dh - Invalid S-state transition
11h - VID controller does not match POC controller selected
1Ah - MSID from POC does not match CPU MSID

56:32
Status register validity
indicators1

Reserved

63:57

NOTES:
1. These fields are architecturally defined. Refer to Chapter 15, “Machine-Check Architecture,” for more information.

16.3.3

Memory Controller Errors
Table 16-11. Incremental Memory Controller Error Codes of Machine Check for IA32_MC8_STATUS

Type
MCA error

codes1

Bit No.

Bit Function

Bit Description

15:0

MCACOD

Memory error format: 1MMMCCCC

Read ECC error

if 1, ECC occurred on a read

RAS ECC error

If 1, ECC occurred on a scrub

Write parity error

If 1, bad parity on a write

Redundancy loss

if 1, Error in half of redundant memory

Reserved

Memory range error

If 1, Memory access out of range

RTID out of range

If 1, Internal ID invalid

Address parity error

If 1, bad address parity

Byte enable parity
error

If 1, bad enable parity

37:25

Reserved

52:38

CORE_ERR_CNT

Corrected error count

56:53

Reserved

Model specific errors

Other information

Status register validity
indicators1

63:57

NOTES:
1. These fields are architecturally defined. Refer to Chapter 15, “Machine-Check Architecture,” for more information.

Vol. 3B 16-9

INTERPRETING MACHINE-CHECK ERROR CODES

Table 16-12. Incremental Memory Controller Error Codes of Machine Check for IA32_MC8_MISC
Type
Model specific

Bit No.

Bit Function

Bit Description

7:0

RTId

Transaction Tracker ID

15:8

Reserved

17:16

DIMM

DIMM ID which got the error

19:18

Channel

Channel ID which got the error

31:20

Reserved

63:32

Syndrome

ECC Syndrome

errors1

NOTES:
1. Which of these fields are valid depends on the error type.

16.4

INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY WITH CPUID
DISPLAYFAMILY_DISPLAYMODEL SIGNATURE 06_2DH, MACHINE ERROR
CODES FOR MACHINE CHECK

Table 16-13 through Table 16-15 provide information for interpreting additional model-specific fields for memory
controller errors relating to the processor family with CPUID DisplayFamily_DisplaySignature 06_2DH, which
supports Intel QuickPath Interconnect links. Incremental MC error codes related to the Intel QPI links are reported
in the register banks IA32_MC6 and IA32_MC7, incremental error codes for internal machine check error from PCU
controller is reported in the register bank IA32_MC4, and incremental error codes for the memory controller unit is
reported in the register banks IA32_MC8-IA32_MC11.

16.4.1

Internal Machine Check Errors
Table 16-13. Machine Check Error Codes for IA32_MC4_STATUS

Type

Bit No.

Bit Function

MCA error
codes1

15:0

MCACOD

Model specific
errors

19:16

Reserved except for
the following

Bit Description

0000b - No Error
0001b - Non_IMem_Sel
0010b - I_Parity_Error
0011b - Bad_OpCode
0100b - I_Stack_Underflow
0101b - I_Stack_Overflow
0110b - D_Stack_Underflow
0111b - D_Stack_Overflow
1000b - Non-DMem_Sel
1001b - D_Parity_Error

16-10 Vol. 3B

INTERPRETING MACHINE-CHECK ERROR CODES

Type

Bit No.

Bit Function

Bit Description

23:20

Reserved

31:24

Reserved except for
the following

00h - No Error
0Dh - MC_IMC_FORCE_SR_S3_TIMEOUT
0Eh - MC_CPD_UNCPD_ST_TIMEOUT
0Fh - MC_PKGS_SAFE_WP_TIMEOUT
43h - MC_PECI_MAILBOX_QUIESCE_TIMEOUT
5Ch - MC_MORE_THAN_ONE_LT_AGENT
60h - MC_INVALID_PKGS_REQ_PCH
61h - MC_INVALID_PKGS_REQ_QPI
62h - MC_INVALID_PKGS_RES_QPI
63h - MC_INVALID_PKGC_RES_PCH
64h - MC_INVALID_PKG_STATE_CONFIG
70h - MC_WATCHDG_TIMEOUT_PKGC_SLAVE
71h - MC_WATCHDG_TIMEOUT_PKGC_MASTER
72h - MC_WATCHDG_TIMEOUT_PKGS_MASTER
7ah - MC_HA_FAILSTS_CHANGE_DETECTED
81h - MC_RECOVERABLE_DIE_THERMAL_TOO_HOT

56:32
Status register
validity
indicators1

Reserved

63:57

NOTES:
1. These fields are architecturally defined. Refer to Chapter 15, “Machine-Check Architecture,” for more information.

16.4.2

Intel QPI Machine Check Errors
Table 16-14. Intel QPI MC Error Codes for IA32_MC6_STATUS and IA32_MC7_STATUS

Type

Bit No.

Bit Function

Bit Description

MCA error
codes1

15:0

MCACOD

Bus error format: 1PPTRRRRIILL

56:16

Reserved

Model specific
errors
Status register
validity
indicators1

63:57

NOTES:
1. These fields are architecturally defined. Refer to Chapter 15, “Machine-Check Architecture,” for more information.

16.4.3

Integrated Memory Controller Machine Check Errors

MC error codes associated with integrated memory controllers are reported in the MSRs IA32_MC8_STATUSIA32_MC11_STATUS. The supported error codes are follows the architectural MCACOD definition type 1MMMCCCC
(see Chapter 15, “Machine-Check Architecture,”). MSR_ERROR_CONTROL.[bit 1] can enable additional informaVol. 3B 16-11

INTERPRETING MACHINE-CHECK ERROR CODES

tion logging of the IMC. The additional error information logged by the IMC is stored in IA32_MCi_STATUS and
IA32_MCi_MISC; (i = 8, 11).

Table 16-15. Intel IMC MC Error Codes for IA32_MCi_STATUS (i= 8, 11)
Type

Bit No.

Bit Function

Bit Description

MCA error codes

15:0

MCACOD

Bus error format: 1PPTRRRRIILL

Model specific
errors

31:16

Reserved except for
the following

Model specific
errors

36:32

Other info

When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log first device
error when corrected error is detected during normal read.

Reserved

56:38
Status register
validity indicators1

See Chapter 15, “Machine-Check Architecture,”

63:57

NOTES:
1. These fields are architecturally defined. Refer to Chapter 15, “Machine-Check Architecture,” for more information.

Table 16-16. Intel IMC MC Error Codes for IA32_MCi_MISC (i= 8, 11)
Type

Bit No.

MCA addr info1

Bit Function

Bit Description

8:0

See Chapter 15, “Machine-Check Architecture,”

Model specific
errors

13:9

• When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log second device
error when corrected error is detected during normal read.
• Otherwise contain parity error if MCi_Status indicates HA_WB_Data or
HA_W_BE parity error.

Model specific
errors