Originally 'efi_enabled' indicated whether a kernel was booted from
EFI firmware. Over time its semantics have changed, and it now
indicates whether or not we are booted on an EFI machine with
bit-native firmware, e.g. 64-bit kernel with 64-bit firmware.
The immediate motivation for this patch is the bug report at,
https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557
which details how running a platform driver on an EFI machine that is
designed to run under BIOS can cause the machine to become
bricked. Also, the following report,
https://bugzilla.kernel.org/show_bug.cgi?id=47121
details how running said driver can also cause Machine Check
Exceptions. Drivers need a new means of detecting whether they're
running on an EFI machine, as sadly the expression,
if (!efi_enabled)
hasn't been a sufficient condition for quite some time.
Users actually want to query 'efi_enabled' for different reasons -
what they really want access to is the list of available EFI
facilities.
For instance, the x86 reboot code needs to know whether it can invoke
the ResetSystem() function provided by the EFI runtime services, while
the ACPI OSL code wants to know whether the EFI config tables were
mapped successfully. There are also checks in some of the platform
driver code to simply see if they're running on an EFI machine (which
would make it a bad idea to do BIOS-y things).
This patch is a prereq for the samsung-laptop fix patch.
Cc: David Airlie <airlied@linux.ie>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Peter Jones <pjones@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Steve Langasek <steve.langasek@canonical.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Commit d6b2123802 "make sure that we always have a return path from
kernel_execve()" reshuffled kernel_init()/init_post() to ensure that
kernel_execve() has a caller to return to.
It removed __init annotation for kernel_init() and introduced/calls a
new routine kernel_init_freeable(). Latter however is inlined by any
reasonable compiler (ARC gcc 4.4 in this case), causing slight code
bloat.
This patch forces kernel_init_freeable() as noinline reducing the .text
bloat-o-meter vmlinux vmlinux_new
add/remove: 1/0 grow/shrink: 0/1 up/down: 374/-334 (40)
function old new delta
kernel_init_freeable - 374 +374 (.init.text)
kernel_init 628 294 -334 (.text)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This reverts commit bd52276fa1 ("x86-64/efi: Use EFI to deal with
platform wall clock (again)"), and the two supporting commits:
da5a108d05b4: "x86/kernel: remove tboot 1:1 page table creation code"
185034e72d59: "x86, efi: 1:1 pagetable mapping for virtual EFI calls")
as they all depend semantically on commit 53b87cf088 ("x86, mm:
Include the entire kernel memory map in trampoline_pgd") that got
reverted earlier due to the problems it caused.
This was pointed out by Yinghai Lu, and verified by me on my Macbook Air
that uses EFI.
Pointed-out-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of setting child_reaper and SIGNAL_UNKILLABLE one way
for the system init process, and another way for pid namespace
init processes test pid->nr == 1 and use the same code for both.
For the global init this results in SIGNAL_UNKILLABLE being set
much earlier in the initialization process.
This is a small cleanup and it paves the way for allowing unshare and
enter of the pid namespace as that path like our global init also will
not set CLONE_NEWPID.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
gcc-4.1.2 inlines weak functions, which causes FRV to fail when the dummy
thread_info_cache_init() gets inlined into start_kernel().
Signed-off-by: David Howells <dhowells@redhat.com>
Other than ix86, x86-64 on EFI so far didn't set the
{g,s}et_wallclock accessors to the EFI routines, thus
incorrectly using raw RTC accesses instead.
Simply removing the #ifdef around the respective code isn't
enough, however: While so far early get-time calls were done in
physical mode, this doesn't work properly for x86-64, as virtual
addresses would still need to be set up for all runtime regions
(which wasn't the case on the system I have access to), so
instead the patch moves the call to efi_enter_virtual_mode()
ahead (which in turn allows to drop all code related to calling
efi-get-time in physical mode).
Additionally the earlier calling of efi_set_executable()
requires the CPA code to cope, i.e. during early boot it must be
avoided to call cpa_flush_array(), as the first thing this
function does is a BUG_ON(irqs_disabled()).
Also make the two EFI functions in question here static -
they're not being referenced elsewhere.
History:
This commit was originally merged as bacef661ac ("x86-64/efi:
Use EFI to deal with platform wall clock") but it resulted in some
ASUS machines no longer booting due to a firmware bug, and so was
reverted in f026cfa82f. A pre-emptive fix for the buggy ASUS
firmware was merged in 03a1c254975e ("x86, efi: 1:1 pagetable
mapping for virtual EFI calls") so now this patch can be
reapplied.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Matt Fleming <matt.fleming@intel.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com> [added commit history]
* allow kernel_execve() leave the actual return to userland to
caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers
updated accordingly.
* architecture that does select GENERIC_KERNEL_EXECVE in its
Kconfig should have its ret_from_kernel_thread() do this:
call schedule_tail
call the callback left for it by copy_thread(); if it ever
returns, that's because it has just done successful kernel_execve()
jump to return from syscall
IOW, its only difference from ret_from_fork() is that it does call the
callback.
* such an architecture should also get rid of ret_from_kernel_execve()
and __ARCH_WANT_KERNEL_EXECVE
This is the last part of infrastructure patches in that area - from
that point on work on different architectures can live independently.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The only place where kernel_execve() is called without a way to
return to the caller of kernel_thread() callback is kernel_post().
Reorganize kernel_init()/kernel_post() - instead of the former
calling the latter in the end (and getting freed by it), have the
latter *begin* with calling the former (and turn the latter into
kernel_thread() callback, of course).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
After both prio_tree users have been converted to use red-black trees,
there is no need to keep around the prio tree library anymore.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ACPI BGRT driver accesses the BIOS logo image when it initializes.
However, ACPI 5.0 (which introduces the BGRT) recommends putting the
logo image in EFI boot services memory, so that the OS can reclaim that
memory. Production systems follow this recommendation, breaking the
ACPI BGRT driver.
Move the bulk of the BGRT code to run during a new EFI late
initialization phase, which occurs after switching EFI to virtual mode,
and after initializing ACPI, but before freeing boot services memory.
Copy the BIOS logo image to kernel memory at that point, and make it
accessible to the BGRT driver. Rework the existing ACPI BGRT driver to
act as a simple wrapper exposing that image (and the properties from the
BGRT) via sysfs.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/93ce9f823f1c1f3bb88bdd662cce08eee7a17f5d.1348876882.git.josh@joshtriplett.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This reverts commit bacef661ac.
This commit has been found to cause serious regressions on a number of
ASUS machines at the least. We probably need to provide a 1:1 map in
addition to the EFI virtual memory map in order for this to work.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reported-and-bisected-by: Jérôme Carretero <cJ-ko@zougloub.eu>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120805172903.5f8bb24c@zougloub.eu
When hotadd_new_pgdat() is called to create new pgdat for a new node, a
fallback zonelist should be created for the new node. There's code to try
to achieve that in hotadd_new_pgdat() as below:
/*
* The node we allocated has no zone fallback lists. For avoiding
* to access not-initialized zonelist, build here.
*/
mutex_lock(&zonelists_mutex);
build_all_zonelists(pgdat, NULL);
mutex_unlock(&zonelists_mutex);
But it doesn't work as expected. When hotadd_new_pgdat() is called, the
new node is still in offline state because node_set_online(nid) hasn't
been called yet. And build_all_zonelists() only builds zonelists for
online nodes as:
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
build_zonelists(pgdat);
build_zonelist_cache(pgdat);
}
Though we hope to create zonelist for the new pgdat, but it doesn't. So
add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
the new pgdat too.
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Keping Chen <chenkeping@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
main.c has initcall_level_names[] for parse_args to print in debug messages,
add comments to keep them in sync with initcalls defined in init.h.
Also add "loadable" into comment re not using *_initcall macros in
modules, to disambiguate from kernel/params.c and other builtins.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
... and schedule_work() for interrupt/kernel_thread callers
(and yes, now it *is* OK to call from interrupt).
We are guaranteed that __fput() will be done before we return
to userland (or exit). Note that for fput() from a kernel
thread we get an async behaviour; it's almost always OK, but
sometimes you might need to have __fput() completed before
you do anything else. There are two mechanisms for that -
a general barrier (flush_delayed_fput()) and explicit
__fput_sync(). Both should be used with care (as was the
case for fput() from kernel threads all along). See comments
in fs/file_table.c for details.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9fb48c744b ("params: add 3rd arg to option handler callback
signature") added similar lines to dmesg:
initlevel:0=early, 4 registered initcalls
initlevel:1=core, 31 registered initcalls
initlevel:2=postcore, 11 registered initcalls
initlevel:3=arch, 7 registered initcalls
initlevel:4=subsys, 40 registered initcalls
initlevel:5=fs, 30 registered initcalls
initlevel:6=device, 250 registered initcalls
initlevel:7=late, 35 registered initcalls
but they don't contain any info for the general user staring at dmesg.
I'm very doubtful the count of initcalls registered per level helps
anyone so drop that output completely.
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Commit 026cee0086 "params:
<level>_initcall-like kernel parameters" set old-style module
parameters to level 0. And we call those level 0 calls where we used
to, early in start_kernel().
We also loop through the initcall levels and call the levelled
module_params before the corresponding initcall. Unfortunately level
0 is early_init(), so we call the standard module_param calls twice.
(Turns out most things don't care, but at least ubi.mtd does).
Change the level to -1 for standard module_param calls.
Reported-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Other than ix86, x86-64 on EFI so far didn't set the
{g,s}et_wallclock accessors to the EFI routines, thus
incorrectly using raw RTC accesses instead.
Simply removing the #ifdef around the respective code isn't
enough, however: While so far early get-time calls were done in
physical mode, this doesn't work properly for x86-64, as virtual
addresses would still need to be set up for all runtime regions
(which wasn't the case on the system I have access to), so
instead the patch moves the call to efi_enter_virtual_mode()
ahead (which in turn allows to drop all code related to calling
efi-get-time in physical mode).
Additionally the earlier calling of efi_set_executable()
requires the CPA code to cope, i.e. during early boot it must be
avoided to call cpa_flush_array(), as the first thing this
function does is a BUG_ON(irqs_disabled()).
Also make the two EFI functions in question here static -
they're not being referenced elsewhere.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Matt Fleming <matt.fleming@intel.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FBFBF5F020000780008637F@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
During early boot, when the scheduler hasn't really been fully set up,
we really can't do blocking allocations because with certain (dubious)
configurations the "might_resched()" calls can actually result in
scheduling events.
We could just make such users always use GFP_ATOMIC, but quite often the
code that does the allocation isn't really aware of the fact that the
scheduler isn't up yet, and forcing that kind of random knowledge on the
initialization code is just annoying and not good for anybody.
And we actually have a the 'gfp_allowed_mask' exactly for this reason:
it's just that the kernel init sequence happens to set it to allow
blocking allocations much too early.
So move the 'gfp_allowed_mask' initialization from 'start_kernel()'
(which is some of the earliest init code, and runs with preemption
disabled for good reasons) into 'kernel_init()'. kernel_init() is run
in the newly created thread that will become the 'init' process, as
opposed to the early startup code that runs within the context of what
will be the first idle thread.
So by the time we reach 'kernel_init()', we know that the scheduler must
be at least limping along, because we've already scheduled from the idle
thread into the init thread.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Cc: David Rientjes <rientjes@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a 3rd arg, named "doing", to unknown-options callbacks invoked
from parse_args(). The arg is passed as:
"Booting kernel" from start_kernel(),
initcall_level_names[i] from do_initcall_level(),
mod->name from load_module(), via parse_args(), parse_one()
parse_args() already has the "name" parameter, which is renamed to
"doing" to better reflect current uses 1,2 above. parse_args() passes
it to an altered parse_one(), which now passes it down into the
unknown option handler callbacks.
The mod->name will be needed to handle dyndbg for loadable modules,
since params passed by modprobe are not qualified (they do not have a
"$modname." prefix), and by the time the unknown-param callback is
called, the module name is not otherwise available.
Minor tweaks:
Add param-name to parse_one's pr_debug(), current message doesnt
identify the param being handled, add it.
Add a pr_info to print current level and level_name of the initcall,
and number of registered initcalls at that level. This adds 7 lines
to dmesg output, like:
initlevel:6=device, 172 registered initcalls
Drop "parameters" from initcall_level_names[], its unhelpful in the
pr_info() added above. This array is passed into parse_args() by
do_initcall_level().
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 026cee0086 had the side-effect of dropping the '=' from
the unknown boot arguments that are passed to init as environment
variables. This is because parse_args() puts a NUL in the string
where the '=' was when it passes the "param" and "val" pointers
to the parsing subfunctions. Previously, unknown_bootoption() was
the last parse_args() subfunction to run, and it carefully put back
the '=' character. Now the ignore_unknown_bootoption() is the last
one to run, and it wasn't doing the necessary repair, so the
envp params ended up with the embedded NUL and were no longer
seen as valid environment variables by init.
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This patch adds a set of macros that can be used to declare
kernel parameters to be parsed _before_ initcalls at a chosen
level are executed. We rename the now-unused "flags" field of
struct kernel_param as the level. It's signed, for when we
use this for early params as well, in future.
Linker macro collating init calls had to be modified in order
to add additional symbols between levels that are later used
by the init code to split the calls into blocks.
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When (no)bootmem finish operation, it pass pages to buddy
allocator. Since debug_pagealloc_enabled is not set, we will do
not protect pages, what is not what we want with
CONFIG_DEBUG_PAGEALLOC=y.
To fix remove debug_pagealloc_enabled. That variable was
introduced by commit 12d6f21e "x86: do not PSE on
CONFIG_DEBUG_PAGEALLOC=y" to get more CPA (change page
attribude) code testing. But currently we have CONFIG_CPA_DEBUG,
which test CPA.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1322582711-14571-1-git-send-email-sgruszka@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes a lockdep warning on ARM platforms:
[ 0.000000] WARNING: lockdep init error! Arch code didn't call lockdep_init() early enough?
[ 0.000000] Call stack leading to lockdep invocation was:
[ 0.000000] [<c00164bc>] save_stack_trace_tsk+0x0/0x90
[ 0.000000] [<ffffffff>] 0xffffffff
The warning is caused by printk inside smp_setup_processor_id().
It is safe to do this because lockdep_init() doesn't depend on
smp_setup_processor_id(), so improve things that printk can be
called as early as possible without lockdep complaint.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1321508072-23853-1-git-send-email-tom.leiming@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The user may use "foo-bar" for a kernel parameter defined as "foo_bar".
Make sure it works the other way around too.
Apply the equality of dashes and underscores on early_params and __setup
params as well.
The example given in Documentation/kernel-parameters.txt indicates that
this is the intended behaviour.
With the patch the kernel accepts "log-buf-len=1M" as expected.
https://bugzilla.redhat.com/show_bug.cgi?id=744545
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (neatened implementations)
Initialize jump_labels much, much earlier, so they're available for use
during system setup.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Commit d5767c5353 ("bootup: move 'usermodehelper_enable()' to the end
of do_basic_setup()") moved 'usermodehelper_enable()' to end of
do_basic_setup() to after the initcalls. But then I get failed to let
uvesafb work on my computer, and lose the splash boot.
So maybe we could start usermodehelper_enable a little early to make
some task work that need eary init with the help of user mode.
[ I would *really* prefer that initcalls not call into user space - even
the real 'init' hasn't been execve'd yet, after all! But for uvesafb
it really does look like we don't have much choice.
I considered doing this when we mount the root filesystem, but
depending on config options that is in multiple places. We could do
the usermode helper enable as a rootfs_initcall()..
So I'm just using wang yanqing's trivial patch. It's not wonderful,
but it's simple and should work. We should revisit this some day,
though. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doing it just before starting to call into cpu_idle() made a sick kind
of sense only because the original bug we fixed (see commit
288d5abec831: "Boot up with usermodehelper disabled") was about problems
with some scheduler data structures not being initialized, and they had
better be initialized at that point.
But it really didn't make any other conceptual sense, and doing it after
the initial "schedule()" call for the idle thread actually opened up a
race: what if the main initialization thread did everything without
needing to sleep, and got all the way into user land too? Without
actually having scheduled back to the idle thread?
Now, in normal circumstances that doesn't ever happen, but it looks like
Richard Cochran triggered exactly that on his ARM IXP4xx machines:
"I have some ARM IXP4xx based machines that use the two on chip MAC
ports (aka NPEs). The NPE needs a firmware in order to function.
Ever since the following commit [that 288d5abec8 one], it is no
longer possible to bring up the interfaces during the init scripts."
with a call trace showing an ioctl coming from user space. Richard says:
"The init is busybox, and the startup script does mount, syslogd, and
then ifup, so that all can go by quickly."
The fix is to move the usermodehelper_enable() into the main 'init'
thread, and just put it after we've done all our initcalls. By then,
everything really should be up, but we've obviously not actually started
the user-mode portion of init yet.
Reported-and-tested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a malformed loglevel value (for example "${abc}") is passed on the
kernel cmdline, the loglevel itself is being set to 0.
That then suppresses all following messages, including all the errors
and crashes caused by other malformed cmdline options. This could make
debugging process quite tricky.
This patch leaves the previous value of loglevel if the new value is
incorrect and reports an error code in this case.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@sysgo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The core device layer sends tons of uevent notifications for each device
it finds, and if the kernel has been built with a non-empty
CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
helper binary for all these events very early in the boot.
Not only won't the root filesystem even be mounted at that point, we
literally won't have necessarily even initialized all the process
handling data structures at that point, which causes no end of silly
problems even when the usermode helper doesn't actually succeed in
executing.
So just use our existing infrastructure to disable the usermodehelpers
to make the kernel start out with them disabled. We enable them when
we've at least initialized stuff a bit.
Problems related to an uninitialized
init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex
reported by various people.
Reported-by: Manuel Lauss <manuel.lauss@googlemail.com>
Reported-by: Richard Weinberger <richard@nod.at>
Reported-by: Marc Zyngier <maz@misterjones.org>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While it's at its least, make a number of boring nitpicky cleanups to
shmem.c, mostly for consistency of variable naming. Things like "swap"
instead of "entry", "pgoff_t index" instead of "unsigned long idx".
And since everything else here is prefixed "shmem_", better change
init_tmpfs() to shmem_init().
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a problem that kdump(2nd kernel) sometimes hangs up due
to a pending IPI from 1st kernel. Kernel panic occurs because IPI
comes before call_single_queue is initialized.
To fix the crash, rename init_call_single_data() to call_function_init()
and call it in start_kernel() so that call_single_queue can be
initialized before enabling interrupts.
The details of the crash are:
(1) 2nd kernel boots up
(2) A pending IPI from 1st kernel comes when irqs are first enabled
in start_kernel().
(3) Kernel tries to handle the interrupt, but call_single_queue
is not initialized yet at this point. As a result, in the
generic_smp_call_function_single_interrupt(), NULL pointer
dereference occurs when list_replace_init() tries to access
&q->list.next.
Therefore this patch changes the name of init_call_single_data()
to call_function_init() and calls it before local_irq_enable()
in start_kernel().
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Milton Miller <miltonm@bga.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner reports that we now have a boot crash triggered by
CONFIG_CPUMASK_OFFSTACK=y:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c11ae035>] find_next_bit+0x55/0xb0
Call Trace:
[<c11addda>] cpumask_any_but+0x2a/0x70
[<c102396b>] flush_tlb_mm+0x2b/0x80
[<c1022705>] pud_populate+0x35/0x50
[<c10227ba>] pgd_alloc+0x9a/0xf0
[<c103a3fc>] mm_init+0xec/0x120
[<c103a7a3>] mm_alloc+0x53/0xd0
which was introduced by commit de03c72cfc ("mm: convert
mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
mm_init() vs mm_init_cpumask
Thomas wrote a patch to just fix the ordering of initialization, but I
hate the new double allocation in the fork path, so I ended up instead
doing some more radical surgery to clean it all up.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On larger systems, because of the numerous ACPI, Bootmem and EFI messages,
the static log buffer overflows before the larger one specified by the
log_buf_len param is allocated. Minimize the overflow by allocating the
new log buffer as soon as possible.
On kernels without memblock, a later call to setup_log_buf from
kernel/init.c is the fallback.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_PRINTK=n build]
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
It might lead to reduce cache hit ratio.
This patch has two change.
1) Move the place of cpumask into last of mm_struct. Because usually cpumask
is accessed only front bits when the system has cpu-hotplug capability
2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
footprint if cpumask_size() will use nr_cpumask_bits properly in future.
In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
It may help to detect out of tree cpu_vm_mask users.
This patch has no functional change.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kmemleak frees objects via RCU and when CONFIG_DEBUG_OBJECTS_RCU_HEAD
is enabled, the RCU callback triggers a call to free_object() in
lib/debugobjects.c. Since kmemleak is initialised before debug objects
initialisation, it may result in a kernel panic during booting. This
patch moves the kmemleak_init() call after debug_objects_mem_init().
Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Tested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@kernel.org>
This patchset is a cleanup and a preparation to unshare the pid namespace.
These prerequisites prepare for Eric's patchset to give a file descriptor
to a namespace and join an existing namespace.
This patch:
It turns out that the existing assignment in copy_process of the
child_reaper can handle the initial assignment of child_reaper we just
need to generalize the test in kernel/fork.c
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During early boot, local IRQ is disabled until IRQ subsystem is
properly initialized. During this time, no one should enable
local IRQ and some operations which usually are not allowed with
IRQ disabled, e.g. operations which might sleep or require
communications with other processors, are allowed.
lockdep tracked this with early_boot_irqs_off/on() callbacks.
As other subsystems need this information too, move it to
init/main.c and make it generally available. While at it,
toggle the boolean to early_boot_irqs_disabled instead of
enabled so that it can be initialized with %false and %true
indicates the exceptional condition.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The call to flush_scheduled_work() in do_initcalls() is there to make
sure all works queued to system_wq by initcalls finish before the init
sections are dropped.
However, the call doesn't make much sense at this point - there
already are multiple different workqueues and different subsystems are
free to create and use their own. Ordering requirements are and
should be expressed explicitly.
Drop the call to prepare for the deprecation and removal of
flush_scheduled_work().
Andrew suggested adding sanity check where the workqueue code checks
whether any pending or running work has the work function in the init
text section. However, checking this for running works requires the
worker to keep track of the current function being executed, and
checking only the pending works will miss most cases. As a violation
will almost always be caught by the usual page fault mechanism, I
don't think it would be worthwhile to make the workqueue code track
extra state just for this.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
perf_event_init() wants to start using IDR trees, its needs in turn
are satisfied by mm_init().
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222056.206992649@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently we call perf_event_init() from sched_init(). In order to
make it more obvious move it to the cannnonical location.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222056.093629821@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The perf hardware pmu got initialized at various points in the boot,
some before early_initcall() some after (notably arch_initcall).
The problem is that the NMI lockup detector is ran from early_initcall()
and expects the hardware pmu to be present.
Sanitize this by moving all architecture hardware pmu implementations to
initialize at early_initcall() and move the lockup detector to an explicit
initcall right after that.
Cc: paulus <paulus@samba.org>
Cc: davem <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290707759.2145.119.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.
Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>