summaryrefslogtreecommitdiff
path: root/Documentation/acpi/apei
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/acpi/apei')
-rw-r--r--Documentation/acpi/apei/einj.txt107
-rw-r--r--Documentation/acpi/apei/output_format.txt147
2 files changed, 254 insertions, 0 deletions
diff --git a/Documentation/acpi/apei/einj.txt b/Documentation/acpi/apei/einj.txt
new file mode 100644
index 00000000..e20b6daa
--- /dev/null
+++ b/Documentation/acpi/apei/einj.txt
@@ -0,0 +1,107 @@
+ APEI Error INJection
+ ~~~~~~~~~~~~~~~~~~~~
+
+EINJ provides a hardware error injection mechanism
+It is very useful for debugging and testing of other APEI and RAS features.
+
+To use EINJ, make sure the following are enabled in your kernel
+configuration:
+
+CONFIG_DEBUG_FS
+CONFIG_ACPI_APEI
+CONFIG_ACPI_APEI_EINJ
+
+The user interface of EINJ is debug file system, under the
+directory apei/einj. The following files are provided.
+
+- available_error_type
+ Reading this file returns the error injection capability of the
+ platform, that is, which error types are supported. The error type
+ definition is as follow, the left field is the error type value, the
+ right field is error description.
+
+ 0x00000001 Processor Correctable
+ 0x00000002 Processor Uncorrectable non-fatal
+ 0x00000004 Processor Uncorrectable fatal
+ 0x00000008 Memory Correctable
+ 0x00000010 Memory Uncorrectable non-fatal
+ 0x00000020 Memory Uncorrectable fatal
+ 0x00000040 PCI Express Correctable
+ 0x00000080 PCI Express Uncorrectable fatal
+ 0x00000100 PCI Express Uncorrectable non-fatal
+ 0x00000200 Platform Correctable
+ 0x00000400 Platform Uncorrectable non-fatal
+ 0x00000800 Platform Uncorrectable fatal
+
+ The format of file contents are as above, except there are only the
+ available error type lines.
+
+- error_type
+ This file is used to set the error type value. The error type value
+ is defined in "available_error_type" description.
+
+- error_inject
+ Write any integer to this file to trigger the error
+ injection. Before this, please specify all necessary error
+ parameters.
+
+- param1
+ This file is used to set the first error parameter value. Effect of
+ parameter depends on error_type specified.
+
+- param2
+ This file is used to set the second error parameter value. Effect of
+ parameter depends on error_type specified.
+
+- notrigger
+ The EINJ mechanism is a two step process. First inject the error, then
+ perform some actions to trigger it. Setting "notrigger" to 1 skips the
+ trigger phase, which *may* allow the user to cause the error in some other
+ context by a simple access to the cpu, memory location, or device that is
+ the target of the error injection. Whether this actually works depends
+ on what operations the BIOS actually includes in the trigger phase.
+
+BIOS versions based in the ACPI 4.0 specification have limited options
+to control where the errors are injected. Your BIOS may support an
+extension (enabled with the param_extension=1 module parameter, or
+boot command line einj.param_extension=1). This allows the address
+and mask for memory injections to be specified by the param1 and
+param2 files in apei/einj.
+
+BIOS versions using the ACPI 5.0 specification have more control over
+the target of the injection. For processor related errors (type 0x1,
+0x2 and 0x4) the APICID of the target should be provided using the
+param1 file in apei/einj. For memory errors (type 0x8, 0x10 and 0x20)
+the address is set using param1 with a mask in param2 (0x0 is equivalent
+to all ones). For PCI express errors (type 0x40, 0x80 and 0x100) the
+segment, bus, device and function are specified using param1:
+
+ 31 24 23 16 15 11 10 8 7 0
+ +-------------------------------------------------+
+ | segment | bus | device | function | reserved |
+ +-------------------------------------------------+
+
+An ACPI 5.0 BIOS may also allow vendor specific errors to be injected.
+In this case a file named vendor will contain identifying information
+from the BIOS that hopefully will allow an application wishing to use
+the vendor specific extension to tell that they are running on a BIOS
+that supports it. All vendor extensions have the 0x80000000 bit set in
+error_type. A file vendor_flags controls the interpretation of param1
+and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
+documentation for details (and expect changes to this API if vendors
+creativity in using this feature expands beyond our expectations).
+
+Example:
+# cd /sys/kernel/debug/apei/einj
+# cat available_error_type # See which errors can be injected
+0x00000002 Processor Uncorrectable non-fatal
+0x00000008 Memory Correctable
+0x00000010 Memory Uncorrectable non-fatal
+# echo 0x12345000 > param1 # Set memory address for injection
+# echo 0xfffffffffffff000 > param2 # Mask - anywhere in this page
+# echo 0x8 > error_type # Choose correctable memory error
+# echo 1 > error_inject # Inject now
+
+
+For more information about EINJ, please refer to ACPI specification
+version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/Documentation/acpi/apei/output_format.txt b/Documentation/acpi/apei/output_format.txt
new file mode 100644
index 00000000..0c49c197
--- /dev/null
+++ b/Documentation/acpi/apei/output_format.txt
@@ -0,0 +1,147 @@
+ APEI output format
+ ~~~~~~~~~~~~~~~~~~
+
+APEI uses printk as hardware error reporting interface, the output
+format is as follow.
+
+<error record> :=
+APEI generic hardware error status
+severity: <integer>, <severity string>
+section: <integer>, severity: <integer>, <severity string>
+flags: <integer>
+<section flags strings>
+fru_id: <uuid string>
+fru_text: <string>
+section_type: <section type string>
+<section data>
+
+<severity string>* := recoverable | fatal | corrected | info
+
+<section flags strings># :=
+[primary][, containment warning][, reset][, threshold exceeded]\
+[, resource not accessible][, latent error]
+
+<section type string> := generic processor error | memory error | \
+PCIe error | unknown, <uuid string>
+
+<section data> :=
+<generic processor section data> | <memory section data> | \
+<pcie section data> | <null>
+
+<generic processor section data> :=
+[processor_type: <integer>, <proc type string>]
+[processor_isa: <integer>, <proc isa string>]
+[error_type: <integer>
+<proc error type strings>]
+[operation: <integer>, <proc operation string>]
+[flags: <integer>
+<proc flags strings>]
+[level: <integer>]
+[version_info: <integer>]
+[processor_id: <integer>]
+[target_address: <integer>]
+[requestor_id: <integer>]
+[responder_id: <integer>]
+[IP: <integer>]
+
+<proc type string>* := IA32/X64 | IA64
+
+<proc isa string>* := IA32 | IA64 | X64
+
+<processor error type strings># :=
+[cache error][, TLB error][, bus error][, micro-architectural error]
+
+<proc operation string>* := unknown or generic | data read | data write | \
+instruction execution
+
+<proc flags strings># :=
+[restartable][, precise IP][, overflow][, corrected]
+
+<memory section data> :=
+[error_status: <integer>]
+[physical_address: <integer>]
+[physical_address_mask: <integer>]
+[node: <integer>]
+[card: <integer>]
+[module: <integer>]
+[bank: <integer>]
+[device: <integer>]
+[row: <integer>]
+[column: <integer>]
+[bit_position: <integer>]
+[requestor_id: <integer>]
+[responder_id: <integer>]
+[target_id: <integer>]
+[error_type: <integer>, <mem error type string>]
+
+<mem error type string>* :=
+unknown | no error | single-bit ECC | multi-bit ECC | \
+single-symbol chipkill ECC | multi-symbol chipkill ECC | master abort | \
+target abort | parity error | watchdog timeout | invalid address | \
+mirror Broken | memory sparing | scrub corrected error | \
+scrub uncorrected error
+
+<pcie section data> :=
+[port_type: <integer>, <pcie port type string>]
+[version: <integer>.<integer>]
+[command: <integer>, status: <integer>]
+[device_id: <integer>:<integer>:<integer>.<integer>
+slot: <integer>
+secondary_bus: <integer>
+vendor_id: <integer>, device_id: <integer>
+class_code: <integer>]
+[serial number: <integer>, <integer>]
+[bridge: secondary_status: <integer>, control: <integer>]
+[aer_status: <integer>, aer_mask: <integer>
+<aer status string>
+[aer_uncor_severity: <integer>]
+aer_layer=<aer layer string>, aer_agent=<aer agent string>
+aer_tlp_header: <integer> <integer> <integer> <integer>]
+
+<pcie port type string>* := PCIe end point | legacy PCI end point | \
+unknown | unknown | root port | upstream switch port | \
+downstream switch port | PCIe to PCI/PCI-X bridge | \
+PCI/PCI-X to PCIe bridge | root complex integrated endpoint device | \
+root complex event collector
+
+if section severity is fatal or recoverable
+<aer status string># :=
+unknown | unknown | unknown | unknown | Data Link Protocol | \
+unknown | unknown | unknown | unknown | unknown | unknown | unknown | \
+Poisoned TLP | Flow Control Protocol | Completion Timeout | \
+Completer Abort | Unexpected Completion | Receiver Overflow | \
+Malformed TLP | ECRC | Unsupported Request
+else
+<aer status string># :=
+Receiver Error | unknown | unknown | unknown | unknown | unknown | \
+Bad TLP | Bad DLLP | RELAY_NUM Rollover | unknown | unknown | unknown | \
+Replay Timer Timeout | Advisory Non-Fatal
+fi
+
+<aer layer string> :=
+Physical Layer | Data Link Layer | Transaction Layer
+
+<aer agent string> :=
+Receiver ID | Requester ID | Completer ID | Transmitter ID
+
+Where, [] designate corresponding content is optional
+
+All <field string> description with * has the following format:
+
+field: <integer>, <field string>
+
+Where value of <integer> should be the position of "string" in <field
+string> description. Otherwise, <field string> will be "unknown".
+
+All <field strings> description with # has the following format:
+
+field: <integer>
+<field strings>
+
+Where each string in <fields strings> corresponding to one set bit of
+<integer>. The bit position is the position of "string" in <field
+strings> description.
+
+For more detailed explanation of every field, please refer to UEFI
+specification version 2.3 or later, section Appendix N: Common
+Platform Error Record.