The pda rework (commit 3461b0af02)
to remove static boot cpu pdas introduced a performance bug.
_boot_cpu_pda is the actual pda used by the boot cpu and is definitely
not "__read_mostly" and ended up polluting the read mostly section with
writes. This bug caused regression of about 8-10% on certain syscall
intensive workloads.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Acked-by: Mike Travis <travis@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix trampoline sizing bug, save space
While debugging a suspend-to-RAM related issue it occured to me that
if the trampoline code had grown past 4 KB, we would have been
allocating too little memory for it, since the 4 KB size of the
trampoline is hardcoded into arch/x86/kernel/e820.c . Change that
by making the kernel compute the trampoline size and allocate as much
memory as necessary.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Plus add a build time check so this doesn't go unnoticed again.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As a stopgap until Mike Travis's x86-64 gs-based percpu patches are
ready, provide workaround functions for x86_read/write_percpu for
Xen's use.
Specifically, this means that we can't really make use of vcpu
placement, because we can't use a single gs-based memory access to get
to vcpu fields. So disable all that for now.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ying Huang would like setup_data to be reserved, but not included in the
no save range.
Here we try to modify the e820 table to reserve that range early.
also add that in early_res in case bootloader messes up with the ramdisk.
other solution would be
1. add early_res_to_highmem...
2. early_res_to_e820...
but they could reserve another type memory wrongly, if early_res has some
resource reserved early, and not needed later, but it is not removed from
early_res in time. Like the RAMDISK (already handled).
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: andi@firstfloor.org
Tested-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Split x86_64_start_kernel() into two pieces:
The first essentially cleans up after head_64.S. It clears the
bss, zaps low identity mappings, sets up some early exception
handlers.
The second part preserves the boot data, reserves the kernel's
text/data/bss, pagetables and ramdisk, and then starts the kernel
proper.
This split is so that Xen can call the second part to do the set up it
needs done. It doesn't need any of the first part setups, because it
doesn't boot via head_64.S, and its redundant or actively damaging.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Remove the boot_cpu_pda array and pointer table from the data section.
Allocate the pointer table and array during init. do_boot_cpu()
will reallocate the pda in node local memory and if the cpu is being
brought up before the bootmem array is released (after_bootmem = 0),
then it will free the initial pda. This will happen for all cpus
present at system startup.
This removes 512k + 32k bytes from the data section.
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds linked list of struct setup_data supported for i386.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch extracts the common part of head32.c and head64.c into head.c.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ramdisk is reserved via reserve_early in x86_64_start_kernel,
later early_res_to_bootmem() will convert to reservation in bootmem.
so don't need to reserve that again.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
All of early setup runs with interrupts disabled, so there is no
need to set up early exception handlers for vectors >= 32
This saves some minor text size.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge pointed out that looking at the boot_params
struct to determine if the system is running in a paravirtual
environment is not reliable for the Xen case, currently. He also
points out that there already exists a function to determine if
the system is running in a paravirtual environment. So let's use
that instead. This gets rid of the preprocessor test too.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is an add-on to the 64-bit ebda patch. It makes
the functions reserve_ebda_region (renamed from reserve_ebda)
and copy_e820_map equal to the 32-bit versions of the previous
patch.
Changes:
Use u64 and u32 for local variables in copy_e820_map.
The amount of conventional memory and the start of the EBDA are
detected by reading the BIOS data area directly. Paravirtual
environments do not provide this area, so we bail out early
in that case. They will just have to set up a correct memory
map to start with.
Add a safety net for zeroed out BIOS data area.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Explicitly reserve_early the whole address range from the end of
conventional memory as reported by the bios data area up to the
1Mb mark. Regard the info retrieved from the BIOS data area with
a bit of paranoia, though, because some biosses forget to register
the EBDA correctly.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting
from __START_KERNEL_map. The kernel itself only needs _text to _end
mapped in the high alias. On relocatible kernels the ASM setup code
adjusts the compile time created high mappings to the relocation. This
creates invalid pmd entries for negative offsets:
0xffffffff80000000 -> pmd entry: ffffffffff2001e3
It points outside of the physical address space and is marked present.
This starts at the virtual address __START_KERNEL_map and goes up to
the point where the first valid physical address (0x0) is mapped.
Zap the mappings before _text and after _end right away in early
boot. This removes also the invalid entries.
Furthermore it simplifies the range check for high aliases.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86-64 there are several memory allocations before bootmem. To avoid
them stomping on each other they used to be all hard coded in bad_area().
Replace this with an array that is filled as needed.
This cleans up the code considerably and allows to expand its use.
Cc: peterz@infradead.org
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's not too pretty, but I found this made the "PANIC: early exception"
messages become much more reliably useful: 1. print the vector number,
2. print the %cs value, 3. handle error-code-pushing vs non-pushing vectors.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In init/main.c boot_cpu_init() does that later.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
start_kernel is already declared in a generic header file.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch provides a new set of functions for managing the descriptor
tables that can be used instead of putting the raw assembly in .c files.
Remodeling of store_tr() suggested by Frederik Deweerdt.
[ tglx: arch/x86 adaptation ]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since the x86 merge, lots of files that referenced their own filenames
are no longer correct. Rather than keep them up to date, just delete
them, as they add no real value.
Additionally:
- fix up comment formatting in scx200_32.c
- Remove a credit from myself in setup_64.c from a time when we had no SCM
- remove longwinded history from tsc_32.c which can be figured out from
git.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o x86_64 kernel needs to be compiled for 2MB aligned addresses. Currently
we are using BUILD_BUG_ON() to warn the user if he has not done so. But
looks like folks are not finding message very intutive and don't open
the respective c file to find problem source. (Bug 8439)
arch/x86_64/kernel/head64.c: In function 'x86_64_start_kernel':
arch/x86_64/kernel/head64.c:70: error: size of array 'type name' is negative
o Using preprocessor directive #error to print a better message if
CONFIG_PHYSICAL_START is not aligned to 2MB boundary.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o X86_64 kernel should run from 2MB aligned address for two reasons.
- Performance.
- For relocatable kernels, page tables are updated based on difference
between compile time address and load time physical address.
This difference should be multiple of 2MB as kernel text and data
is mapped using 2MB pages and PMD should be pointing to a 2MB
aligned address. Life is simpler if both compile time and load time
kernel addresses are 2MB aligned.
o Flag the error at compile time if one is trying to build a kernel which
does not meet alignment restrictions.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With the rewrite of the SMP trampoline and the early page
allocator there is nothing that needs identity mapped pages,
once we start executing C code.
So add zap_identity_mappings into head64.c and remove
zap_low_mappings() from much later in the code. The functions
are subtly different thus the name change.
This also kills boot_level4_pgt which was from an earlier
attempt to move the identity mappings as early as possible,
and is now no longer needed. Essentially I have replaced
boot_level4_pgt with trampoline_level4_pgt in trampoline.S
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Use virtual addresses instead of physical addresses
in copy bootdata. In addition fix the implementation
of the old bootloader convention. Everything is
at real_mode_data always. It is just that sometimes
real_mode_data was relocated by setup.S to not sit at
0x90000.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Remove in-source externs, linux/init.h is included in all cases.
This is a fixups for "Dynamic kernel command-line" patch.
It also includes some uml __init fixups so that we can __initdata also its
command_line.
Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
idt_table is in the .bss section, so clear_bss need to called at first
Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
I think it was only needed for the printks and we can do them later.
I put in a single early_printk so that we know the kernel is alive
(early_printk doesn't need any locks)
This makes some things easier for initialization of unwind for
lockdep, which is needed by later patches.
cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
Instead of hackish manual parsing
Requires earlier i386 patchkit, but also fixes i386 early_printk again.
I removed some obsolete really early parameters which didn't do anything useful.
Also made a few parameters that needed it early (mostly oops printing setup)
Also removed one panic check that wasn't visible without
early console anyways (the early console is now initialized after that
panic)
This cleans up a lot of code.
Signed-off-by: Andi Kleen <ak@suse.de>
This makes it possible to modify CPU flags in command line
options without hacks.
And remove another copy in head64.c
Signed-off-by: Andi Kleen <ak@suse.de>
IO-APIC or local APIC can only be disabled at runtime anyways and
Kconfig has forced these options on for a long time now.
The Kconfigs are kept only now for the benefit of the shared acpi
boot.c code.
Signed-off-by: Andi Kleen <ak@suse.de>
x86_64 uses spinlocks very early - earlier than start_kernel(). So call
lockdep_init() from the arch setup code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Early development of x86-64 Linux was in CVS, but that hasn't been
the case for a long time now. Remove the obsolete $Id$s.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The explicit and implicit calls to setup_early_printk() were passing
inconsistent arguments.
Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch uses a static PDA array early at boot and reallocates processor PDA
with node local memory when kmalloc is ready, just before pda_init.
The boot_cpu_pda is needed since the cpu_pda is used even before pda_init for
that cpu is called (to set the static per-cpu areas offset table etc)
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adding __initdata_* to asm-generic/sections.h
Replaces a lot of open coded externs in arch/x86_64/*
I had to change __bss_end to __bss_stop to match the other architectures.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should zap the low mappings, as soon as possible, so that we can catch
kernel bugs more effectively. Previously early boot had NULL mapped
and didn't trap on NULL references.
This patch introduces boot_level4_pgt, which will always have low identity
addresses mapped. Druing boot, all the processors will use this as their
level4 pgt. On BP, we will switch to init_level4_pgt as soon as we enter C
code and zap the low mappings as soon as we are done with the usage of
identity low mapped addresses. On AP's we will zap the low mappings as
soon as we jump to C code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>