|
|
|
APEI Error INJection
|
|
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
EINJ provides a hardware error injection mechanism
|
|
|
|
It is very useful for debugging and testing of other APEI and RAS features.
|
|
|
|
|
|
|
|
To use EINJ, make sure the following are enabled in your kernel
|
|
|
|
configuration:
|
|
|
|
|
|
|
|
CONFIG_DEBUG_FS
|
|
|
|
CONFIG_ACPI_APEI
|
|
|
|
CONFIG_ACPI_APEI_EINJ
|
|
|
|
|
|
|
|
The user interface of EINJ is debug file system, under the
|
|
|
|
directory apei/einj. The following files are provided.
|
|
|
|
|
|
|
|
- available_error_type
|
|
|
|
Reading this file returns the error injection capability of the
|
|
|
|
platform, that is, which error types are supported. The error type
|
|
|
|
definition is as follow, the left field is the error type value, the
|
|
|
|
right field is error description.
|
|
|
|
|
|
|
|
0x00000001 Processor Correctable
|
|
|
|
0x00000002 Processor Uncorrectable non-fatal
|
|
|
|
0x00000004 Processor Uncorrectable fatal
|
|
|
|
0x00000008 Memory Correctable
|
|
|
|
0x00000010 Memory Uncorrectable non-fatal
|
|
|
|
0x00000020 Memory Uncorrectable fatal
|
|
|
|
0x00000040 PCI Express Correctable
|
|
|
|
0x00000080 PCI Express Uncorrectable fatal
|
|
|
|
0x00000100 PCI Express Uncorrectable non-fatal
|
|
|
|
0x00000200 Platform Correctable
|
|
|
|
0x00000400 Platform Uncorrectable non-fatal
|
|
|
|
0x00000800 Platform Uncorrectable fatal
|
|
|
|
|
|
|
|
The format of file contents are as above, except there are only the
|
|
|
|
available error type lines.
|
|
|
|
|
|
|
|
- error_type
|
|
|
|
This file is used to set the error type value. The error type value
|
|
|
|
is defined in "available_error_type" description.
|
|
|
|
|
|
|
|
- error_inject
|
|
|
|
Write any integer to this file to trigger the error
|
|
|
|
injection. Before this, please specify all necessary error
|
|
|
|
parameters.
|
|
|
|
|
|
|
|
- flags
|
|
|
|
Present for kernel version 3.13 and above. Used to specify which
|
|
|
|
of param{1..4} are valid and should be used by BIOS during injection.
|
|
|
|
Value is a bitmask as specified in ACPI5.0 spec for the
|
|
|
|
SET_ERROR_TYPE_WITH_ADDRESS data structure:
|
|
|
|
Bit 0 - Processor APIC field valid (see param3 below)
|
|
|
|
Bit 1 - Memory address and mask valid (param1 and param2)
|
|
|
|
Bit 2 - PCIe (seg,bus,dev,fn) valid (param4 below)
|
|
|
|
If set to zero, legacy behaviour is used where the type of injection
|
|
|
|
specifies just one bit set, and param1 is multiplexed.
|
|
|
|
|
|
|
|
- param1
|
|
|
|
This file is used to set the first error parameter value. Effect of
|
|
|
|
parameter depends on error_type specified. For example, if error
|
|
|
|
type is memory related type, the param1 should be a valid physical
|
|
|
|
memory address. [Unless "flag" is set - see above]
|
|
|
|
|
|
|
|
- param2
|
|
|
|
This file is used to set the second error parameter value. Effect of
|
|
|
|
parameter depends on error_type specified. For example, if error
|
|
|
|
type is memory related type, the param2 should be a physical memory
|
|
|
|
address mask. Linux requires page or narrower granularity, say,
|
|
|
|
0xfffffffffffff000.
|
|
|
|
|
|
|
|
- param3
|
|
|
|
Used when the 0x1 bit is set in "flag" to specify the APIC id
|
|
|
|
|
|
|
|
- param4
|
|
|
|
Used when the 0x4 bit is set in "flag" to specify target PCIe device
|
|
|
|
|
|
|
|
- notrigger
|
|
|
|
The EINJ mechanism is a two step process. First inject the error, then
|
|
|
|
perform some actions to trigger it. Setting "notrigger" to 1 skips the
|
|
|
|
trigger phase, which *may* allow the user to cause the error in some other
|
|
|
|
context by a simple access to the cpu, memory location, or device that is
|
|
|
|
the target of the error injection. Whether this actually works depends
|
|
|
|
on what operations the BIOS actually includes in the trigger phase.
|
|
|
|
|
|
|
|
BIOS versions based in the ACPI 4.0 specification have limited options
|
|
|
|
to control where the errors are injected. Your BIOS may support an
|
|
|
|
extension (enabled with the param_extension=1 module parameter, or
|
|
|
|
boot command line einj.param_extension=1). This allows the address
|
|
|
|
and mask for memory injections to be specified by the param1 and
|
|
|
|
param2 files in apei/einj.
|
|
|
|
|
|
|
|
BIOS versions using the ACPI 5.0 specification have more control over
|
|
|
|
the target of the injection. For processor related errors (type 0x1,
|
|
|
|
0x2 and 0x4) the APICID of the target should be provided using the
|
|
|
|
param1 file in apei/einj. For memory errors (type 0x8, 0x10 and 0x20)
|
|
|
|
the address is set using param1 with a mask in param2 (0x0 is equivalent
|
|
|
|
to all ones). For PCI express errors (type 0x40, 0x80 and 0x100) the
|
|
|
|
segment, bus, device and function are specified using param1:
|
|
|
|
|
|
|
|
31 24 23 16 15 11 10 8 7 0
|
|
|
|
+-------------------------------------------------+
|
|
|
|
| segment | bus | device | function | reserved |
|
|
|
|
+-------------------------------------------------+
|
|
|
|
|
|
|
|
An ACPI 5.0 BIOS may also allow vendor specific errors to be injected.
|
|
|
|
In this case a file named vendor will contain identifying information
|
|
|
|
from the BIOS that hopefully will allow an application wishing to use
|
|
|
|
the vendor specific extension to tell that they are running on a BIOS
|
|
|
|
that supports it. All vendor extensions have the 0x80000000 bit set in
|
|
|
|
error_type. A file vendor_flags controls the interpretation of param1
|
|
|
|
and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
|
|
|
|
documentation for details (and expect changes to this API if vendors
|
|
|
|
creativity in using this feature expands beyond our expectations).
|
|
|
|
|
|
|
|
Example:
|
|
|
|
# cd /sys/kernel/debug/apei/einj
|
|
|
|
# cat available_error_type # See which errors can be injected
|
|
|
|
0x00000002 Processor Uncorrectable non-fatal
|
|
|
|
0x00000008 Memory Correctable
|
|
|
|
0x00000010 Memory Uncorrectable non-fatal
|
|
|
|
# echo 0x12345000 > param1 # Set memory address for injection
|
|
|
|
# echo 0xfffffffffffff000 > param2 # Mask - anywhere in this page
|
|
|
|
# echo 0x8 > error_type # Choose correctable memory error
|
|
|
|
# echo 1 > error_inject # Inject now
|
|
|
|
|
|
|
|
|
|
|
|
For more information about EINJ, please refer to ACPI specification
|
|
|
|
version 4.0, section 17.5 and ACPI 5.0, section 18.6.
|