diff options
Diffstat (limited to 'Documentation/acpi/apei')
-rw-r--r-- | Documentation/acpi/apei/einj.txt | 107 | ||||
-rw-r--r-- | Documentation/acpi/apei/output_format.txt | 147 |
2 files changed, 254 insertions, 0 deletions
diff --git a/Documentation/acpi/apei/einj.txt b/Documentation/acpi/apei/einj.txt new file mode 100644 index 00000000..e20b6daa --- /dev/null +++ b/Documentation/acpi/apei/einj.txt @@ -0,0 +1,107 @@ + APEI Error INJection + ~~~~~~~~~~~~~~~~~~~~ + +EINJ provides a hardware error injection mechanism +It is very useful for debugging and testing of other APEI and RAS features. + +To use EINJ, make sure the following are enabled in your kernel +configuration: + +CONFIG_DEBUG_FS +CONFIG_ACPI_APEI +CONFIG_ACPI_APEI_EINJ + +The user interface of EINJ is debug file system, under the +directory apei/einj. The following files are provided. + +- available_error_type + Reading this file returns the error injection capability of the + platform, that is, which error types are supported. The error type + definition is as follow, the left field is the error type value, the + right field is error description. + + 0x00000001 Processor Correctable + 0x00000002 Processor Uncorrectable non-fatal + 0x00000004 Processor Uncorrectable fatal + 0x00000008 Memory Correctable + 0x00000010 Memory Uncorrectable non-fatal + 0x00000020 Memory Uncorrectable fatal + 0x00000040 PCI Express Correctable + 0x00000080 PCI Express Uncorrectable fatal + 0x00000100 PCI Express Uncorrectable non-fatal + 0x00000200 Platform Correctable + 0x00000400 Platform Uncorrectable non-fatal + 0x00000800 Platform Uncorrectable fatal + + The format of file contents are as above, except there are only the + available error type lines. + +- error_type + This file is used to set the error type value. The error type value + is defined in "available_error_type" description. + +- error_inject + Write any integer to this file to trigger the error + injection. Before this, please specify all necessary error + parameters. + +- param1 + This file is used to set the first error parameter value. Effect of + parameter depends on error_type specified. + +- param2 + This file is used to set the second error parameter value. Effect of + parameter depends on error_type specified. + +- notrigger + The EINJ mechanism is a two step process. First inject the error, then + perform some actions to trigger it. Setting "notrigger" to 1 skips the + trigger phase, which *may* allow the user to cause the error in some other + context by a simple access to the cpu, memory location, or device that is + the target of the error injection. Whether this actually works depends + on what operations the BIOS actually includes in the trigger phase. + +BIOS versions based in the ACPI 4.0 specification have limited options +to control where the errors are injected. Your BIOS may support an +extension (enabled with the param_extension=1 module parameter, or +boot command line einj.param_extension=1). This allows the address +and mask for memory injections to be specified by the param1 and +param2 files in apei/einj. + +BIOS versions using the ACPI 5.0 specification have more control over +the target of the injection. For processor related errors (type 0x1, +0x2 and 0x4) the APICID of the target should be provided using the +param1 file in apei/einj. For memory errors (type 0x8, 0x10 and 0x20) +the address is set using param1 with a mask in param2 (0x0 is equivalent +to all ones). For PCI express errors (type 0x40, 0x80 and 0x100) the +segment, bus, device and function are specified using param1: + + 31 24 23 16 15 11 10 8 7 0 + +-------------------------------------------------+ + | segment | bus | device | function | reserved | + +-------------------------------------------------+ + +An ACPI 5.0 BIOS may also allow vendor specific errors to be injected. +In this case a file named vendor will contain identifying information +from the BIOS that hopefully will allow an application wishing to use +the vendor specific extension to tell that they are running on a BIOS +that supports it. All vendor extensions have the 0x80000000 bit set in +error_type. A file vendor_flags controls the interpretation of param1 +and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor +documentation for details (and expect changes to this API if vendors +creativity in using this feature expands beyond our expectations). + +Example: +# cd /sys/kernel/debug/apei/einj +# cat available_error_type # See which errors can be injected +0x00000002 Processor Uncorrectable non-fatal +0x00000008 Memory Correctable +0x00000010 Memory Uncorrectable non-fatal +# echo 0x12345000 > param1 # Set memory address for injection +# echo 0xfffffffffffff000 > param2 # Mask - anywhere in this page +# echo 0x8 > error_type # Choose correctable memory error +# echo 1 > error_inject # Inject now + + +For more information about EINJ, please refer to ACPI specification +version 4.0, section 17.5 and ACPI 5.0, section 18.6. diff --git a/Documentation/acpi/apei/output_format.txt b/Documentation/acpi/apei/output_format.txt new file mode 100644 index 00000000..0c49c197 --- /dev/null +++ b/Documentation/acpi/apei/output_format.txt @@ -0,0 +1,147 @@ + APEI output format + ~~~~~~~~~~~~~~~~~~ + +APEI uses printk as hardware error reporting interface, the output +format is as follow. + +<error record> := +APEI generic hardware error status +severity: <integer>, <severity string> +section: <integer>, severity: <integer>, <severity string> +flags: <integer> +<section flags strings> +fru_id: <uuid string> +fru_text: <string> +section_type: <section type string> +<section data> + +<severity string>* := recoverable | fatal | corrected | info + +<section flags strings># := +[primary][, containment warning][, reset][, threshold exceeded]\ +[, resource not accessible][, latent error] + +<section type string> := generic processor error | memory error | \ +PCIe error | unknown, <uuid string> + +<section data> := +<generic processor section data> | <memory section data> | \ +<pcie section data> | <null> + +<generic processor section data> := +[processor_type: <integer>, <proc type string>] +[processor_isa: <integer>, <proc isa string>] +[error_type: <integer> +<proc error type strings>] +[operation: <integer>, <proc operation string>] +[flags: <integer> +<proc flags strings>] +[level: <integer>] +[version_info: <integer>] +[processor_id: <integer>] +[target_address: <integer>] +[requestor_id: <integer>] +[responder_id: <integer>] +[IP: <integer>] + +<proc type string>* := IA32/X64 | IA64 + +<proc isa string>* := IA32 | IA64 | X64 + +<processor error type strings># := +[cache error][, TLB error][, bus error][, micro-architectural error] + +<proc operation string>* := unknown or generic | data read | data write | \ +instruction execution + +<proc flags strings># := +[restartable][, precise IP][, overflow][, corrected] + +<memory section data> := +[error_status: <integer>] +[physical_address: <integer>] +[physical_address_mask: <integer>] +[node: <integer>] +[card: <integer>] +[module: <integer>] +[bank: <integer>] +[device: <integer>] +[row: <integer>] +[column: <integer>] +[bit_position: <integer>] +[requestor_id: <integer>] +[responder_id: <integer>] +[target_id: <integer>] +[error_type: <integer>, <mem error type string>] + +<mem error type string>* := +unknown | no error | single-bit ECC | multi-bit ECC | \ +single-symbol chipkill ECC | multi-symbol chipkill ECC | master abort | \ +target abort | parity error | watchdog timeout | invalid address | \ +mirror Broken | memory sparing | scrub corrected error | \ +scrub uncorrected error + +<pcie section data> := +[port_type: <integer>, <pcie port type string>] +[version: <integer>.<integer>] +[command: <integer>, status: <integer>] +[device_id: <integer>:<integer>:<integer>.<integer> +slot: <integer> +secondary_bus: <integer> +vendor_id: <integer>, device_id: <integer> +class_code: <integer>] +[serial number: <integer>, <integer>] +[bridge: secondary_status: <integer>, control: <integer>] +[aer_status: <integer>, aer_mask: <integer> +<aer status string> +[aer_uncor_severity: <integer>] +aer_layer=<aer layer string>, aer_agent=<aer agent string> +aer_tlp_header: <integer> <integer> <integer> <integer>] + +<pcie port type string>* := PCIe end point | legacy PCI end point | \ +unknown | unknown | root port | upstream switch port | \ +downstream switch port | PCIe to PCI/PCI-X bridge | \ +PCI/PCI-X to PCIe bridge | root complex integrated endpoint device | \ +root complex event collector + +if section severity is fatal or recoverable +<aer status string># := +unknown | unknown | unknown | unknown | Data Link Protocol | \ +unknown | unknown | unknown | unknown | unknown | unknown | unknown | \ +Poisoned TLP | Flow Control Protocol | Completion Timeout | \ +Completer Abort | Unexpected Completion | Receiver Overflow | \ +Malformed TLP | ECRC | Unsupported Request +else +<aer status string># := +Receiver Error | unknown | unknown | unknown | unknown | unknown | \ +Bad TLP | Bad DLLP | RELAY_NUM Rollover | unknown | unknown | unknown | \ +Replay Timer Timeout | Advisory Non-Fatal +fi + +<aer layer string> := +Physical Layer | Data Link Layer | Transaction Layer + +<aer agent string> := +Receiver ID | Requester ID | Completer ID | Transmitter ID + +Where, [] designate corresponding content is optional + +All <field string> description with * has the following format: + +field: <integer>, <field string> + +Where value of <integer> should be the position of "string" in <field +string> description. Otherwise, <field string> will be "unknown". + +All <field strings> description with # has the following format: + +field: <integer> +<field strings> + +Where each string in <fields strings> corresponding to one set bit of +<integer>. The bit position is the position of "string" in <field +strings> description. + +For more detailed explanation of every field, please refer to UEFI +specification version 2.3 or later, section Appendix N: Common +Platform Error Record. |