There are two kexec load syscalls, kexec_load another and kexec_file_load.
kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
split kexec_load syscall code to kernel/kexec.c.
And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.
The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
kexec_load syscall can bypass the checking.
Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.
Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
kexec_load syscall.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Even though it is documented how to specifiy efi parameters, it is
possible to cause a kernel panic due to a dereference of a NULL pointer when
parsing such parameters if "efi" alone is given:
PANIC: early exception 0e rip 10:ffffffff812fb361 error 0 cr2 0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-rc1+ #450
[ 0.000000] ffffffff81fe20a9 ffffffff81e03d50 ffffffff8184bb0f 00000000000003f8
[ 0.000000] 0000000000000000 ffffffff81e03e08 ffffffff81f371a1 64656c62616e6520
[ 0.000000] 0000000000000069 000000000000005f 0000000000000000 0000000000000000
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff8184bb0f>] dump_stack+0x45/0x57
[ 0.000000] [<ffffffff81f371a1>] early_idt_handler_common+0x81/0xae
[ 0.000000] [<ffffffff812fb361>] ? parse_option_str+0x11/0x90
[ 0.000000] [<ffffffff81f4dd69>] arch_parse_efi_cmdline+0x15/0x42
[ 0.000000] [<ffffffff81f376e1>] do_early_param+0x50/0x8a
[ 0.000000] [<ffffffff8106b1b3>] parse_args+0x1e3/0x400
[ 0.000000] [<ffffffff81f37a43>] parse_early_options+0x24/0x28
[ 0.000000] [<ffffffff81f37691>] ? loglevel+0x31/0x31
[ 0.000000] [<ffffffff81f37a78>] parse_early_param+0x31/0x3d
[ 0.000000] [<ffffffff81f3ae98>] setup_arch+0x2de/0xc08
[ 0.000000] [<ffffffff8109629a>] ? vprintk_default+0x1a/0x20
[ 0.000000] [<ffffffff81f37b20>] start_kernel+0x90/0x423
[ 0.000000] [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[ 0.000000] [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
[ 0.000000] RIP 0xffffffff81ba2efc
This panic is not reproducible with "efi=" as this will result in a non-NULL
zero-length string.
Thus, verify that the pointer to the parameter string is not NULL. This is
consistent with other parameter-parsing functions which check for NULL pointers.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The memory error record structure includes as its first field a
bitmask of which subsequent fields are valid. The allows new fields
to be added to the structure while keeping compatibility with older
software that parses these records. This mechanism was used between
versions 2.2 and 2.3 to add four new fields, growing the size of the
structure from 73 bytes to 80. But Linux just added all the new
fields so this test:
if (gdata->error_data_length >= sizeof(*mem_err))
cper_print_mem(newpfx, mem_err);
else
goto err_section_too_small;
now make Linux complain about old format records being too short.
Add a definition for the old format of the structure and use that
for the minimum size check. Pass the actual size to cper_print_mem()
so it can sanity check the validation_bits field to ensure that if
a BIOS using the old format sets bits as if it were new, we won't
access fields beyond the end of the structure.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.
The mount points converted and their filesystems are:
/sys/hypervisor/s390/ s390_hypfs
/sys/kernel/config/ configfs
/sys/kernel/debug/ debugfs
/sys/firmware/efi/efivars/ efivarfs
/sys/fs/fuse/connections/ fusectl
/sys/fs/pstore/ pstore
/sys/kernel/tracing/ tracefs
/sys/fs/cgroup/ cgroup
/sys/kernel/security/ securityfs
/sys/fs/selinux/ selinuxfs
/sys/fs/smackfs/ smackfs
Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
So, I'm told this problem exists in the world:
> Subject: Build error in -next due to 'efi: Add esrt support'
>
> Building ia64:defconfig ... failed
> --------------
> Error log:
>
> drivers/firmware/efi/esrt.c:28:31: fatal error: asm/early_ioremap.h: No such file or directory
>
I'm not really sure how it's okay that we have things in asm-generic on
some platforms but not others - is having it the same everywhere not the
whole point of asm-generic?
That said, ia64 doesn't have early_ioremap.h . So instead, since it's
difficult to imagine new IA64 machines with UEFI 2.5, just don't build
this code there.
To me this looks like a workaround - doing something like:
generic-y += early_ioremap.h
in arch/ia64/include/asm/Kbuild would appear to be more correct, but
ia64 has its own early_memremap() decl in arch/ia64/include/asm/io.h ,
and it's a macro. So adding the above /and/ requiring that asm/io.h be
included /after/ asm/early_ioremap.h in all cases would fix it, but
that's pretty ugly as well. Since I'm not going to spend the rest of my
life rectifying ia64 headers vs "generic" headers that aren't generic,
it's much simpler to just not build there.
Note that I've only actually tried to build this patch on x86_64, but
esrt.o still gets built there, and that would seem to demonstrate that
the conditional building is working correctly at all the places the code
built before. I no longer have any ia64 machines handy to test that the
exclusion actually works there.
Signed-off-by: Peter Jones <pjones@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
(Compile-)Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
With the libfdt include fixups to use "" instead of <> in the
latest dtc import in commit 4760597 (scripts/dtc: Update to upstream
version 9d3649bd3be245c9), it is no longer necessary to add explicit
include paths to use libfdt. Remove these across the kernel.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
The SMBIOS3 table should appear before the SMBIOS table in
/sys/firmware/efi/systab. This allows user-space utilities which
support both to pick the SMBIOS3 table with a single pass on systems
where both are implemented. The SMBIOS3 entry point is more capable
than the SMBIOS entry point so it should be preferred.
This follows the same logic as the ACPI20 table being listed before
the ACPI table.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
I spotted two (difficult to hit) bugs while reviewing this.
1) There is a double free bug because we unregister "map_kset" in
add_sysfs_runtime_map_entry() and also efi_runtime_map_init().
2) If we fail to allocate "entry" then we should return
ERR_PTR(-ENOMEM) instead of NULL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Guangyu Sun <guangyu.sun@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Apparently I missed some compiler warnings on 32-bit platforms, where
phys_addr_t isn't the same size as void * and I casted it to make printk
work. Obviously I should have thought "I'm printing some random type,
instead of typecasting I should check Documentation/printk-formats.txt
and see how to do it." o/~ The More You Know ☆彡 o/~
This patch also fixes one other warning about an uninitialized variable
some compiler versions seem to see. You can't actually hit the code
path where it would be uninitialized, because there's a prior test that
would error out, but gcc hasn't figured that out. Anyway, it now has a
test and returns the error at both places.
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Add sysfs files for the EFI System Resource Table (ESRT) under
/sys/firmware/efi/esrt and for each EFI System Resource Entry under
entries/ as a subdir.
The EFI System Resource Table (ESRT) provides a read-only catalog of
system components for which the system accepts firmware upgrades via
UEFI's "Capsule Update" feature. This module allows userland utilities
to evaluate what firmware updates can be applied to this system, and
potentially arrange for those updates to occur.
The ESRT is described as part of the UEFI specification, in version 2.5
which should be available from http://uefi.org/specifications in early
2015. If you're a member of the UEFI Forum, information about its
addition to the standard is available as UEFI Mantis 1090.
For some hardware platforms, additional restrictions may be found at
http://msdn.microsoft.com/en-us/library/windows/hardware/jj128256.aspx ,
and additional documentation may be found at
http://download.microsoft.com/download/5/F/5/5F5D16CD-2530-4289-8019-94C6A20BED3C/windows-uefi-firmware-update-platform.docx
.
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
It's not very normal to return 1 on failure and 0 on success. There
isn't a reason for it here, the callers don't care so long as it's
non-zero on failure.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
When allocating memory for the copy of the FDT that the stub
modifies and passes to the kernel, it uses the current size as
an estimate of how much memory to allocate, and increases it page
by page if it turns out to be too small. However, when loading
the FDT from a UEFI configuration table, the estimated size is
left at its default value of zero, and the allocation loop runs
starting from zero all the way up to the allocation size that
finally fits the updated FDT.
Instead, retrieve the size of the FDT from the FDT header when
loading it from the UEFI config table.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
While adding support loading kernel and initrd above 4G to grub2 in legacy
mode, I was referring to efi_high_alloc().
That will allocate buffer for kernel and then initrd, and initrd will
use kernel buffer start as limit.
During testing found two buffers will be overlapped when initrd size is
very big like 400M.
It turns out efi_high_alloc() boundary checking is not right.
end - size will be the new start, and should not compare new
start with max, we need to make sure end is smaller than max.
[ Basically, with the current efi_high_alloc() code it's possible to
allocate memory above 'max', because efi_high_alloc() doesn't check
that the tail of the allocation is below 'max'.
If you have an EFI memory map with a single entry that looks like so,
[0xc0000000-0xc0004000]
And want to allocate 0x3000 bytes below 0xc0003000 the current code
will allocate [0xc0001000-0xc0004000], not [0xc0000000-0xc0003000]
like you would expect. - Matt ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This reverts commit d1a8d66b91.
Ard reported a boot failure when running UEFI under Qemu and Xen and
experimenting with various Tianocore build options,
"As it turns out, when allocating room for the UEFI memory map using
UEFI's AllocatePool (), it may result in two new memory map entries
being created, for instance, when using Tianocore's preallocated region
feature. For example, the following region
0x00005ead5000-0x00005ebfffff [Conventional Memory| | | | | |WB|WT|WC|UC]
may be split like this
0x00005ead5000-0x00005eae2fff [Conventional Memory| | | | | |WB|WT|WC|UC]
0x00005eae3000-0x00005eae4fff [Loader Data | | | | | |WB|WT|WC|UC]
0x00005eae5000-0x00005ebfffff [Conventional Memory| | | | | |WB|WT|WC|UC]
if the preallocated Loader Data region was chosen to be right in the
middle of the original free space.
After patch d1a8d66b91 ("efi/libstub: Call get_memory_map() to
obtain map and desc sizes"), this is not being dealt with correctly
anymore, as the existing logic to allocate room for a single additional
entry has become insufficient."
Mark requested to reinstate the old loop we had before commit
d1a8d66b91, which grows the memory map buffer until it's big enough to
hold the EFI memory map.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Recently instrumentation of builtin functions calls was removed from GCC
5.0. To check the memory accessed by such functions, userspace asan
always uses interceptors for them.
So now we should do this as well. This patch declares
memset/memmove/memcpy as weak symbols. In mm/kasan/kasan.c we have our
own implementation of those functions which checks memory before accessing
it.
Default memset/memmove/memcpy now now always have aliases with '__'
prefix. For files that built without kasan instrumentation (e.g.
mm/slub.c) original mem* replaced (via #define) with prefixed variants,
cause we don't want to check memory accesses there.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kernel Address sanitizer (KASan) is a dynamic memory error detector. It
provides fast and comprehensive solution for finding use-after-free and
out-of-bounds bugs.
KASAN uses compile-time instrumentation for checking every memory access,
therefore GCC > v4.9.2 required. v4.9.2 almost works, but has issues with
putting symbol aliases into the wrong section, which breaks kasan
instrumentation of globals.
This patch only adds infrastructure for kernel address sanitizer. It's
not available for use yet. The idea and some code was borrowed from [1].
Basic idea:
The main idea of KASAN is to use shadow memory to record whether each byte
of memory is safe to access or not, and use compiler's instrumentation to
check the shadow memory on each memory access.
Address sanitizer uses 1/8 of the memory addressable in kernel for shadow
memory and uses direct mapping with a scale and offset to translate a
memory address to its corresponding shadow address.
Here is function to translate address to corresponding shadow address:
unsigned long kasan_mem_to_shadow(unsigned long addr)
{
return (addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET;
}
where KASAN_SHADOW_SCALE_SHIFT = 3.
So for every 8 bytes there is one corresponding byte of shadow memory.
The following encoding used for each shadow byte: 0 means that all 8 bytes
of the corresponding memory region are valid for access; k (1 <= k <= 7)
means that the first k bytes are valid for access, and other (8 - k) bytes
are not; Any negative value indicates that the entire 8-bytes are
inaccessible. Different negative values used to distinguish between
different kinds of inaccessible memory (redzones, freed memory) (see
mm/kasan/kasan.h).
To be able to detect accesses to bad memory we need a special compiler.
Such compiler inserts a specific function calls (__asan_load*(addr),
__asan_store*(addr)) before each memory access of size 1, 2, 4, 8 or 16.
These functions check whether memory region is valid to access or not by
checking corresponding shadow memory. If access is not valid an error
printed.
Historical background of the address sanitizer from Dmitry Vyukov:
"We've developed the set of tools, AddressSanitizer (Asan),
ThreadSanitizer and MemorySanitizer, for user space. We actively use
them for testing inside of Google (continuous testing, fuzzing,
running prod services). To date the tools have found more than 10'000
scary bugs in Chromium, Google internal codebase and various
open-source projects (Firefox, OpenSSL, gcc, clang, ffmpeg, MySQL and
lots of others): [2] [3] [4].
The tools are part of both gcc and clang compilers.
We have not yet done massive testing under the Kernel AddressSanitizer
(it's kind of chicken and egg problem, you need it to be upstream to
start applying it extensively). To date it has found about 50 bugs.
Bugs that we've found in upstream kernel are listed in [5].
We've also found ~20 bugs in out internal version of the kernel. Also
people from Samsung and Oracle have found some.
[...]
As others noted, the main feature of AddressSanitizer is its
performance due to inline compiler instrumentation and simple linear
shadow memory. User-space Asan has ~2x slowdown on computational
programs and ~2x memory consumption increase. Taking into account that
kernel usually consumes only small fraction of CPU and memory when
running real user-space programs, I would expect that kernel Asan will
have ~10-30% slowdown and similar memory consumption increase (when we
finish all tuning).
I agree that Asan can well replace kmemcheck. We have plans to start
working on Kernel MemorySanitizer that finds uses of unitialized
memory. Asan+Msan will provide feature-parity with kmemcheck. As
others noted, Asan will unlikely replace debug slab and pagealloc that
can be enabled at runtime. Asan uses compiler instrumentation, so even
if it is disabled, it still incurs visible overheads.
Asan technology is easily portable to other architectures. Compiler
instrumentation is fully portable. Runtime has some arch-dependent
parts like shadow mapping and atomic operation interception. They are
relatively easy to port."
Comparison with other debugging features:
========================================
KMEMCHECK:
- KASan can do almost everything that kmemcheck can. KASan uses
compile-time instrumentation, which makes it significantly faster than
kmemcheck. The only advantage of kmemcheck over KASan is detection of
uninitialized memory reads.
Some brief performance testing showed that kasan could be
x500-x600 times faster than kmemcheck:
$ netperf -l 30
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
no debug: 87380 16384 16384 30.00 41624.72
kasan inline: 87380 16384 16384 30.00 12870.54
kasan outline: 87380 16384 16384 30.00 10586.39
kmemcheck: 87380 16384 16384 30.03 20.23
- Also kmemcheck couldn't work on several CPUs. It always sets
number of CPUs to 1. KASan doesn't have such limitation.
DEBUG_PAGEALLOC:
- KASan is slower than DEBUG_PAGEALLOC, but KASan works on sub-page
granularity level, so it able to find more bugs.
SLUB_DEBUG (poisoning, redzones):
- SLUB_DEBUG has lower overhead than KASan.
- SLUB_DEBUG in most cases are not able to detect bad reads,
KASan able to detect both reads and writes.
- In some cases (e.g. redzone overwritten) SLUB_DEBUG detect
bugs only on allocation/freeing of object. KASan catch
bugs right before it will happen, so we always know exact
place of first bad read/write.
[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
[2] https://code.google.com/p/address-sanitizer/wiki/FoundBugs
[3] https://code.google.com/p/thread-sanitizer/wiki/FoundBugs
[4] https://code.google.com/p/memory-sanitizer/wiki/FoundBugs
[5] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel#Trophies
Based on work by Andrey Konovalov.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to some scary special case handling noticed in drivers/of, various
bits of the ARM* EFI support patches did duplicate looking for @0
variants of various nodes. Unless on an ancient PPC system, these are
not in fact required. Most instances have become refactored out along
the way, this removes the last one.
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
There is no reason to translate guid number to string here.
So remove it in order to not do unneeded work.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This fixes two minor issues in the implementation of get_memory_map():
- Currently, it assumes that sizeof(efi_memory_desc_t) == desc_size,
which is usually true, but not mandated by the spec. (This was added
intentionally to allow future additions to the definition of
efi_memory_desc_t). The way the loop is implemented currently, the
added slack space may be insufficient if desc_size is larger, which in
some corner cases could result in the loop never terminating.
- It allocates 32 efi_memory_desc_t entries first (again, using the size
of the struct instead of desc_size), and frees and reallocates if it
turns out to be insufficient. Few implementations of UEFI have such small
memory maps, which results in a unnecessary allocate/free pair on each
invocation.
Fix this by calling the get_memory_map() boot service first with a '0'
input value for map size to retrieve the map size and desc size from the
firmware and only then perform the allocation, using desc_size rather
than sizeof(efi_memory_desc_t).
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The "> 0" here should ">= 0" so we free map_entries[0].
Fixes: 926172d460 ('efi: Export EFI runtime memory mapping to sysfs')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This ensures all stub component are freed when the kernel proper is
done booting, by prefixing the names of all ELF sections that have
the SHF_ALLOC attribute with ".init". This approach ensures that even
implicitly emitted allocated data (like initializer values and string
literals) are covered.
At the same time, remove some __init annotations in the stub that have
now become redundant, and add the __init annotation to handle_kernel_image
which will now trigger a section mismatch warning without it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
In order to support kexec, the kernel needs to be able to deal with the
state of the UEFI firmware after SetVirtualAddressMap() has been called.
To avoid having separate code paths for non-kexec and kexec, let's move
the call to SetVirtualAddressMap() to the stub: this will guarantee us
that it will only be called once (since the stub is not executed during
kexec), and ensures that the UEFI state is identical between kexec and
normal boot.
This implies that the layout of the virtual mapping needs to be created
by the stub as well. All regions are rounded up to a naturally aligned
multiple of 64 KB (for compatibility with 64k pages kernels) and recorded
in the UEFI memory map. The kernel proper reads those values and installs
the mappings in a dedicated set of page tables that are swapped in during
UEFI Runtime Services calls.
Acked-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
In some cases (e.g. Intel Bay Trail machines), the kernel will happily
run in 64-bit even if the underlying UEFI firmware platform is
32-bit. That's great, but it's difficult for userland utilities like
grub-install to do the right thing in such a situation.
The kernel already knows about the size of the firmware via
efi_enabled(EFI_64BIT). Add an extra sysfs interface
/sys/firmware/efi/fw_platform_size to expose that information to
userland for low-level utilities to use.
Signed-off-by: Steve McIntyre <steve@einval.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
On systems with 64 KB pages, it is preferable for UEFI memory map
entries to be 64 KB aligned multiples of 64 KB, because it relieves
us of having to deal with the residues.
So, if EFI_ALLOC_ALIGN is #define'd by the platform, use it to round
up all memory allocations made.
Acked-by: Matt Fleming <matt.fleming@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Split of the remapping code from efi_config_init() so that the caller
can perform its own remapping. This is necessary to correctly handle
virtually remapped UEFI memory regions under kexec, as efi.systab will
have been updated to a virtual address.
Acked-by: Matt Fleming <matt.fleming@intel.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Call it what it does - "unparse" is plain-misleading.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Matt Domsch changed the dell page to point to the new upstream quite
some time ago; kernel should reflect that here as well.
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Use the helper function trace_seq_buffer_ptr() to get the current location
of the next buffer write of a trace_seq object, instead of open coding
it.
This facilitates the conversion of trace_seq to use seq_buf.
Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This reverts commit 84be880560, which itself reverted my original
attempt to move x86 from #include'ing .c files from across the tree
to using the EFI stub built as a static library.
The issue that affected the original approach was that splitting
the implementation into several .o files resulted in the variable
'efi_early' becoming a global with external linkage, which under
-fPIC implies that references to it must go through the GOT. However,
dealing with this additional GOT entry turned out to be troublesome
on some EFI implementations. (GCC's visibility=hidden attribute is
supposed to lift this requirement, but it turned out not to work on
the 32-bit build.)
Instead, use a pure getter function to get a reference to efi_early.
This approach results in no additional GOT entries being generated,
so there is no need for any changes in the early GOT handling.
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
In the absence of a DTB configuration table, the EFI stub will happily
continue attempting to boot a kernel, despite the fact that this kernel
may not function without a description of the hardware. In this case, as
with a typo'd "dtb=" option (e.g. "dbt=") or many other possible
failures, the only output seen by the user will be the rather terse
output from the EFI stub:
EFI stub: Booting Linux Kernel...
To aid those attempting to debug such failures, this patch adds a notice
when no DTB is found, making the output more helpful:
EFI stub: Booting Linux Kernel...
EFI stub: Generating empty DTB
Additionally, a positive acknowledgement is added when a user-specified
DTB is in use:
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from command line
Similarly, a positive acknowledgement is added when a DTB from a
configuration table is in use:
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
This adds support to the UEFI side for detecting the presence of
a SMBIOS 3.0 64-bit entry point. This allows the actual SMBIOS
structure table to reside at a physical offset over 4 GB, which
cannot be supported by the legacy SMBIOS 32-bit entry point.
Since the firmware can legally provide both entry points, store
the SMBIOS 3.0 entry point in a separate variable, and let the
DMI decoding layer decide which one will be used.
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
commit 5dc3826d9f08 ("efi: Implement mandatory locking for UEFI Runtime
Services") implemented some conditional locking when accessing variable
runtime services that Ingo described as "pretty disgusting".
The intention with the !efi_in_nmi() checks was to avoid live-locks when
trying to write pstore crash data into an EFI variable. Such lockless
accesses are allowed according to the UEFI specification when we're in a
"non-recoverable" state, but whether or not things are implemented
correctly in actual firmware implementations remains an unanswered
question, and so it would seem sensible to avoid doing any kind of
unsynchronized variable accesses.
Furthermore, the efi_in_nmi() tests are inadequate because they don't
account for the case where we call EFI variable services from panic or
oops callbacks and aren't executing in NMI context. In other words,
live-locking is still possible.
Let's just remove the conditional locking altogether. Now we've got the
->set_variable_nonblocking() EFI variable operation we can abort if the
runtime lock is already held. Aborting is by far the safest option.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
There are some circumstances that call for trying to write an EFI
variable in a non-blocking way. One such scenario is when writing pstore
data in efi_pstore_write() via the pstore_dump() kdump callback.
Now that we have an EFI runtime spinlock we need a way of aborting if
there is contention instead of spinning, since when writing pstore data
from the kdump callback, the runtime lock may already be held by the CPU
that's running the callback if we crashed in the middle of an EFI
variable operation.
The situation is sufficiently special that a new EFI variable operation
is warranted.
Introduce ->set_variable_nonblocking() for this use case. It is an
optional EFI backend operation, and need only be implemented by those
backends that usually acquire locks to serialize access to EFI
variables, as is the case for virt_efi_set_variable() where we now grab
the EFI runtime spinlock.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
It is a really bad idea to declare variables or parameters that
have the same name as common types. It is valid C, but it gets
surprising if a macro expansion attempts to declare an inner
local with that type. Change the local names to eliminate the
hazard.
Change s16 => str16, s8 => str8.
This resolves warnings seen when using W=2 during make, for instance:
drivers/firmware/efi/vars.c: In function ‘dup_variable_bug’:
drivers/firmware/efi/vars.c:324:44: warning: declaration of ‘s16’ shadows a global declaration [-Wshadow]
static void dup_variable_bug(efi_char16_t *s16, efi_guid_t *vendor_guid,
drivers/firmware/efi/vars.c:328:8: warning: declaration of ‘s8’ shadows a global declaration [-Wshadow]
char *s8;
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
At the moment, there are three architectures debug-printing the EFI memory
map at initialization: x86, ia64, and arm64. They all use different format
strings, plus the EFI memory type and the EFI memory attributes are
similarly hard to decode for a human reader.
Introduce a helper __init function that formats the memory type and the
memory attributes in a unified way, to a user-provided character buffer.
The array "memory_type_name" is copied from the arm64 code, temporarily
duplicating it. The (otherwise optional) braces around each string literal
in the initializer list are dropped in order to match the kernel coding
style more closely. The element size is tightened from 32 to 20 bytes
(maximum actual string length + 1) so that we can derive the field width
from the element size.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[ Dropped useless 'register' keyword, which compiler will ignore ]
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
noefi kernel param means actually disabling efi runtime, Per suggestion
from Leif Lindholm efi=noruntime should be better. But since noefi is
already used in X86 thus just adding another param efi=noruntime for
same purpose.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
noefi param can be used for arches other than X86 later, thus move it
out of x86 platform code.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
We need a way to customize the behaviour of the EFI boot stub, in
particular, we need a way to disable the "chunking" workaround, used
when reading files from the EFI System Partition.
One of my machines doesn't cope well when reading files in 1MB chunks to
a buffer above the 4GB mark - it appears that the "chunking" bug
workaround triggers another firmware bug. This was only discovered with
commit 4bf7111f50 ("x86/efi: Support initrd loaded above 4G"), and
that commit is perfectly valid. The symptom I observed was a corrupt
initrd rather than any kind of crash.
efi= is now used to specify EFI parameters in two very different
execution environments, the EFI boot stub and during kernel boot.
There is also a slight performance optimization by enabling efi=nochunk,
but that's offset by the fact that you're more likely to run into
firmware issues, at least on x86. This is the rationale behind leaving
the workaround enabled by default.
Also provide some documentation for EFI_READ_CHUNK_SIZE and why we're
using the current value of 1MB.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
According to section 7.1 of the UEFI spec, Runtime Services are not fully
reentrant, and there are particular combinations of calls that need to be
serialized. Use a spinlock to serialize all Runtime Services with respect
to all others, even if this is more than strictly needed.
We've managed to get away without requiring a runtime services lock
until now because most of the interactions with EFI involve EFI
variables, and those operations are already serialised with
__efivars->lock.
Some of the assumptions underlying the decision whether locks are
needed or not (e.g., SetVariable() against ResetSystem()) may not
apply universally to all [new] architectures that implement UEFI.
Rather than try to reason our way out of this, let's just implement at
least what the spec requires in terms of locking.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This reverts commit f23cf8bd5c ("efi/x86: efistub: Move shared
dependencies to <asm/efi.h>") as well as the x86 parts of commit
f4f75ad574 ("efi: efistub: Convert into static library").
The road leading to these two reverts is long and winding.
The above two commits were merged during the v3.17 merge window and
turned the common EFI boot stub code into a static library. This
necessitated making some symbols global in the x86 boot stub which
introduced new entries into the early boot GOT.
The problem was that we weren't fixing up the newly created GOT entries
before invoking the EFI boot stub, which sometimes resulted in hangs or
resets. This failure was reported by Maarten on his Macbook pro.
The proposed fix was commit 9cb0e39423 ("x86/efi: Fixup GOT in all
boot code paths"). However, that caused issues for Linus when booting
his Sony Vaio Pro 11. It was subsequently reverted in commit
f3670394c2.
So that leaves us back with Maarten's Macbook pro not booting.
At this stage in the release cycle the least risky option is to revert
the x86 EFI boot stub to the pre-merge window code structure where we
explicitly #include efi-stub-helper.c instead of linking with the static
library. The arm64 code remains unaffected.
We can take another swing at the x86 parts for v3.18.
Conflicts:
arch/x86/include/asm/efi.h
Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org> [arm64]
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Commit 86c8b27a01cf:
"arm64: ignore DT memreserve entries when booting in UEFI mode
prevents early_init_fdt_scan_reserved_mem() from being called for
arm64 kernels booting via UEFI. This was done because the kernel
will use the UEFI memory map to determine reserved memory regions.
That approach has problems in that early_init_fdt_scan_reserved_mem()
also reserves the FDT itself and any node-specific reserved memory.
By chance of some kernel configs, the FDT may be overwritten before
it can be unflattened and the kernel will fail to boot. More subtle
problems will result if the FDT has node specific reserved memory
which is not really reserved.
This patch has the UEFI stub remove the memory reserve map entries
from the FDT as it does with the memory nodes. This allows
early_init_fdt_scan_reserved_mem() to be called unconditionally
so that the other needed reservations are made.
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
spin_is_locked() always returns false for uniprocessor configurations
in several architectures, so do not use WARN_ON with it.
Use lockdep_assert_held() instead to also reduce overhead in
non-debug kernels.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This patch does two things. It passes EFI run time mappings to second
kernel in bootparams efi_info. Second kernel parse this info and create
new mappings in second kernel. That means mappings in first and second
kernel will be same. This paves the way to enable EFI in kexec kernel.
This patch also prepares and passes EFI setup data through bootparams.
This contains bunch of information about various tables and their
addresses.
These information gathering and passing has been written along the lines
of what current kexec-tools is doing to make kexec work with UEFI.
[akpm@linux-foundation.org: s/get_efi/efi_get/g, per Matt]
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The original patch is from Ben Hutchings's contribution to debian
kernel. Got Ben's permission to remove the code of efi-pstore.c and
send to linux-efi:
https://github.com/BlankOn/linux-debian/blob/master/debian/patches/features/all/efi-autoload-efivars.patch
efivars is generally useful to have on EFI systems, and in some cases
it may be impossible to load it after a kernel upgrade in order to
complete a boot loader update. At the same time we don't want to waste
memory on non-EFI systems by making them built-in.
Instead, give them module aliases as if they are platform drivers, and
register a corresponding platform device whenever EFI runtime services
are available. This should trigger udev to load them.
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Introduce EFI_PARAVIRT flag. If it is set then kernel runs
on EFI platform but it has not direct control on EFI stuff
like EFI runtime, tables, structures, etc. If not this means
that Linux Kernel has direct access to EFI infrastructure
and everything runs as usual.
This functionality is used in Xen dom0 because hypervisor
has full control on EFI stuff and all calls from dom0 to
EFI must be requested via special hypercall which in turn
executes relevant EFI code in behalf of dom0.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Use early_mem*() instead of early_io*() because all mapped EFI regions
are memory (usually RAM but they could also be ROM, EPROM, EEPROM, flash,
etc.) not I/O regions. Additionally, I/O family calls do not work correctly
under Xen in our case. early_ioremap() skips the PFN to MFN conversion
when building the PTE. Using it for memory will attempt to map the wrong
machine frame. However, all artificial EFI structures created under Xen
live in dom0 memory and should be mapped/unmapped using early_mem*() family
calls which map domain memory.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
It appears that the BayTrail-T class of hardware requires EFI in order
to powerdown and reboot and no other reliable method exists.
This quirk is generally applicable to all hardware that has the ACPI
Hardware Reduced bit set, since usually ACPI would be the preferred
method.
Cc: Len Brown <len.brown@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Not only can EfiResetSystem() be used to reboot, it can also be used to
power down machines.
By and large, this functionality doesn't work very well across the range
of EFI machines in the wild, so it should definitely only be used as a
last resort. In an ideal world, this wouldn't be needed at all.
Unfortunately, we're starting to see machines where EFI is the *only*
reliable way to power down, and nothing else, not PCI, not ACPI, works.
efi_poweroff_required() should be implemented on a per-architecture
basis, since exactly when we should be using EFI runtime services is a
platform-specific decision. There's no analogue for reboot because each
architecture handles reboot very differently - the x86 code in
particular is pretty complex.
Patches to enable this for specific classes of hardware will be
submitted separately.
Tested-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Implement efi_reboot(), which is really just a wrapper around the
EfiResetSystem() EFI runtime service, but it does at least allow us to
funnel all callers through a single location.
It also simplifies the callsites since users no longer need to check to
see whether EFI_RUNTIME_SERVICES are enabled.
Cc: Tony Luck <tony.luck@intel.com>
Tested-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This patch changes both x86 and arm64 efistub implementations
from #including shared .c files under drivers/firmware/efi to
building shared code as a static library.
The x86 code uses a stub built into the boot executable which
uncompresses the kernel at boot time. In this case, the library is
linked into the decompressor.
In the arm64 case, the stub is part of the kernel proper so the library
is linked into the kernel proper as well.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>