F15h CPUs may report a non-DRAM address when reporting an error address
belonging to a CC6 state save area. Add a workaround to detect this
condition and compute the actual DRAM address of the error as documented
in the Revision Guide for AMD Family 15h Models 00h-0Fh Processors.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h and later use a portion of DRAM as a CC6 storage area. BIOS
programs D18F1x[17C:140,7C:40] DRAM Base/Limit accordingly by
subtracting the storage area from the DRAM limit setting. However, in
order for edac to consider that part of DRAM too, we need to include it
into the per-node range.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This warning was wrongfully added for a normal condition - intlvsel
actually selects the destination node when node interleaving is enabled
and it is not a mismatch. For a detailed example, see section 2.8.10.2
"Node Interleaving" in F10h BKDG.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This patch removes superfluous debugging output in the sysfs scrub rate
handler. It also consolidates the error handling in the scrub rate
accessors.
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
We check the pointers together but at least one of them could be invalid
due to failed allocation. Since we cannot continue if either of the two
allocations has failed, exit early by freeing them both.
Cc: <stable@kernel.org> # 38.x
Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Fix amd64_debug_display_dimm_sizes() arguments order per convention (pvt
is always first). Also, the now second arg denotes the DCT so adjust its
type.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
A node id can never be negative since we use it as an index into
the DRAM ranges array. This also makes one of the BUG_ON conditions
redundant.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add the PCI device ids required for driver registration. Remove
pvt->ctl_name and use the family descriptor directly, instead. Then,
bump driver version and fixup its format. Finally, enable DRAM ECC
decoding on F15h.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h has the same ECC symbol size options as F10h revD and later so
adjust checks to that. Simplify code a bit.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Drop static tables which map the bits in F2x80 to a chip select size in
favor of functions doing the mapping with some bit fiddling. Also, add
F15 support.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This function is relevant for F10h and higher, and it has only one
callsite so drop its function pointer from the low_ops struct.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h sys_addr to chip select mapping is almost identical to F10h's so
reuse that. Rename functions on that path accordingly.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Replace per-DCT macros with smarter ones, drop hack and look for the
spare rank on all chip selects on a channel.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
When node interleaving is enabled, a subset of the addr[14:12] bits has
to be removed in order to get the normalized DCT address of the DRAM
channel. The actual number of bits to remove is determined by F1x[1,
0][7C:40][IntlvEn]. Do this correctly.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
On revC3 and revE Fam10h machines and later, non-interleaved graphics
framebuffer memory under the 16G mark can be swapped with a region
located at the bottom of memory so that the GPU can use the interleaved
region and thus two channels. Add support for that.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The address bits from MC4_STATUS differ only between K8 and the rest so
no need for a per-family method.
No functional change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Use the struct mce directly instead of copying from it into a custom
struct err_regs.
No functionality change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The only difference is that F10h used to sport ganged DCTs and F15h
doesn't so adjust the F10h routine and reuse it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Remove reporting of errors with UC bit set - this is done by the MCE
decoding code anyway and this driver deals with DRAM ECC errors only. UC
(NB uncorrectable error) doesn't necessarily mean it is a DRAM error.
Remove unused macros while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The fact whether we are chipkill capable or not does not have any
bearing when computing the channel index on a ganged DCT configuration
so remove that. Also, simplify debug statements. Finally, remove old
error injection leftovers, while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Remove family names from macro names, drop single bit defines and
comment their meaning instead.
No functional change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* Restrict DCT ganged mode check since only Fam10h supports it
* Adjust DRAM type detection for BD since it only supports DDR3
* Remove second and thus unneeded DCLR read in k8_early_channel_count() - we do
that in read_mc_regs()
* Cleanup comments and remove family names from register macros
* Remove unused defines
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Do not read DBAM regs twice and simplify code around them.
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This function maps the system address to the normalized DCT address.
Document what the code does for more clarity and wrap insane bitmasks in
a more understandable macro which generates them. Also, reduce number of
arguments passed to the function. Finally, rename this function to what
it actually does.
No functional change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cleanup and simplify f10_determine_channel(); make it more readable.
Also drop f10_map_intlv_en_to_shift() in favor of simply counting the
bits in F1x124[DramIntlvEn] which is equivalent.
There should be no functionality change resulting from this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add a struct representing the DRAM chip select base/limit register
pairs. Concentrate all CS handling in a single function. Also, add CS
looping macros for cleaner, more readable code. While at it, adjust code
to F15h. Finally, do smaller macro names cleanups (remove family names
from register macros) and debug messages clarification.
No functional change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add a struct representing the DRAM base/limit range pairs and remove all
cached subfields. Replace them with accessor functions, which actually
saves us some space:
text data bss dec hex filename
14712 1577 336 16625 40f1 drivers/edac/amd64_edac_mod.o.after
14831 1609 336 16776 4188 drivers/edac/amd64_edac_mod.o.before
Also, it simplifies the code a lot allowing to merge the K8 and F10h
routines.
No functional change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h "multiplexes" between the configuration space of the two DRAM
controllers by toggling D18F1x10C[DctCfgSel] while F10h has a different
set of registers for DCT0, and DCT1 in extended PCI config space.
Add DCT configuration space accessors per family thus wrapping all the
different access prerequisites. Clean up code while at it, shorten
names.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Raise the debug level of these routines so that their output get issued
out only when the highest debug level is selected. Otherwise, don't
pollute driver debug output.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add tile support for the EDAC driver, which provides unified system
error (memory, PCI, etc.) reporting. For now, the TILEPro port
reports memory correctable error (CE) only.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Get rid of old users of of_platform_driver in arch/powerpc. Most
of_platform_driver users can be converted to use the platform_bus
directly.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
to edac-core
fix the totally wrong info w.r.t page,row,dimm-label previously reported to
edac-core by i82975x driver
Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
amd64_debug_display_dimm_sizes() reports the distribution of the DIMMs
on each DRAM controller and its chip select sizes. Thus, the last don't
have anything to do with whether we're running in ganged DCT mode or not
- their sizes don't change all of a sudden. Fix that by removing the
ganged-check and dump DCT0's config for DCT1 when in ganged mode since
they're identical.
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>