Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.
That came from 9fc34113f6 x86: debug pmd_bad();
but we understand now that the typecasting was wrong for PAE in the previous
version: pagetable pages above 4GB looked bad and stopped Arjan from booting.
And revert that cded932b75 x86: fix pmd_bad
and pud_bad to support huge pages. It was the wrong way round: we shouldn't
weaken every pmd_bad and pud_bad check to let huge pages slip through - in
part they check that we _don't_ have a huge page where it's not expected.
Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
junk in the upper word is good; and x86_64 should follow x86_32's stricter
comparison, to stop thinking any subset of required bits is good); but that
should be a later patch.
Fix Hans' good observation that follow_page() will never find pmd_huge()
because that would have already failed the pmd_bad test: test pmd_huge in
between the pmd_none and pmd_bad tests. Tighten x86's pmd_huge() check?
No, once it's a hugepage entry, it can get quite far from a good pmd: for
example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.
However... though follow_page() contains this and another test for huge
pages, so it's nice to keep it working on them, where does it actually get
called on a huge page? get_user_pages() checks is_vm_hugetlb_page(vma) to
to call alternative hugetlb processing, as does unmap_vmas() and others.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Earlier-version-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All pagetables need fundamentally the same setup and destruction, so
just use the same code for everything.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
These new controls toggle experimental support for a new CPU feature,
the straightforward extension of largepages from the pmd level to the
pud level, which allows 1GB (kernel) TLBs instead of 2MB TLBs.
Turn it off by default, as this code has not been tested well enough yet.
Use the CONFIG_DIRECT_GBPAGES=y .config option or gbpages on the
boot line can be used to enable it. If enabled in the .config then
nogbpages boot option disables it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
people sometimes do crazy stuff like building really large static
arrays into their kernels or building allyesconfig kernels. Give
more space to the kernel and push modules up a bit: kernel has
512 MB and modules have 1.5 GB.
Should be enough for a few years ;-)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit cded932b75.
Arjan bisected down a boot-time hang to this, saying:
".. it prevents the kernel to finish booting on my (Penryn based)
laptop. The boot stops right after freeing the init memory."
and while it's not clear exactly what triggers it, at this stage we're
better off just reverting it while Ingo tries to figure out what went
wrong.
Requested-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Nish Aravamudan <nish.aravamudan@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
revert commit cded932b75,
"x86: fix pmd_bad and pud_bad to support huge pages", it causes
a bootup hang, as reported and bisected by Arjan van de Ven.
Bisected-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I recently stumbled upon a problem in the support for huge pages. If a
program using huge pages does not explicitly unmap them, they remain
mapped (and therefore, are lost) after the program exits.
I observed that the free huge page count in /proc/meminfo decreased when
running my program, and it did not increase after the program exited.
After running the program a few times, no more huge pages could be
allocated.
The reason for this seems to be that the x86 pmd_bad and pud_bad
consider pmd/pud entries having the PSE bit set invalid. I think there
is nothing wrong with this bit being set, it just indicates that the
lowest level of translation has been reached. This bit has to be (and
is) checked after the basic validity of the entry has been checked, like
in this fragment from follow_page() in mm/memory.c:
if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
goto no_page_table;
if (pmd_huge(*pmd)) {
BUG_ON(flags & FOLL_GET);
page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
goto out;
}
Note that this code currently doesn't work as intended if the pmd refers
to a huge page, the pmd_huge() check can not be reached if the page is
huge.
Extending pmd_bad() (and, for future 1GB page support, pud_bad()) to
allow for the PSE bit being set fixes this. For similar reasons,
allowing the NX bit being set is necessary, too. I have seen huge pages
having the NX bit set in their pmd entry, which would cause the same
problem.
Signed-Off-By: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to have it at all levels, add pgd_large() which only
returns 0.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting
from __START_KERNEL_map. The kernel itself only needs _text to _end
mapped in the high alias. On relocatible kernels the ASM setup code
adjusts the compile time created high mappings to the relocation. This
creates invalid pmd entries for negative offsets:
0xffffffff80000000 -> pmd entry: ffffffffff2001e3
It points outside of the physical address space and is marked present.
This starts at the virtual address __START_KERNEL_map and goes up to
the point where the first valid physical address (0x0) is mapped.
Zap the mappings before _text and after _end right away in early
boot. This removes also the invalid entries.
Furthermore it simplifies the range check for high aliases.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use a standard list threaded through page->lru for maintaining the pgd
list on PAE. This is the same as 64-bit, and seems saner than using a
non-standard list via page->index.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
do some leftover cleanups in the now unified arch/x86/mm/pageattr.c
file.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
64bit already had it.
Needed for later patches.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
based on this patch from Andi Kleen:
| Subject: CPA: Return the page table level in lookup_address()
| From: Andi Kleen <ak@suse.de>
|
| Needed for the next change.
|
| And change all the callers.
and ported it to x86.git.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Put all the defines for mapping pagetable operations to their native
versions (for the non-paravirt case) into one place. Make the
corresponding changes to paravirt.h.
The tricky part here is that when a pagetable entry can't be updated
atomically (ie, 32-bit PAE), we need special handlers for pte_clear,
set_pte_atomic and set_pte_present. However, the other two modes
don't need special handling for these, and can use a common
set_pte(_at) path.
[ mingo@elte.hu: fixes ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move ZERO_PAGE/empty_zero_page to common place.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make various pte accessors common.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make sure pte_t, whatever its definition, has a pte element with type
pteval_t. This allows common code to access it without needing to be
specifically parameterised on what pagetable mode we're compiling for.
For 32-bit, this means that pte_t becomes a union with "pte" and "{
pte_low, pte_high }" (PAE) or just "pte_low" (non-PAE).
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make users of supported_pte_mask common. This has the side-effect of
introducing the variable for 32-bit non-PAE, but I think its a pretty
small cost to simplify the code.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Unify functions to test and set bits in pagetable entries.
NOP: only moves existing code around, without any change to it.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
based on:
Subject: x86/pgtable: unify pagetable accessors
From: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
32 and 64-bit use the same flags for pagetable entries, so make them all common.
[ mingo@elte.hu: fixes ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
add PWT bit to NOCACHE flags. No real difference to CPUs, but needed
later for PAT.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
based on:
Subject: x86: page.h: move and unify types for pagetable entry
From: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It only has a single use, which can be trivially replaced.
Signed-off-by: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch does some whitespace cleanups in the paging code to fix some
checkpatch.pl warnings of my formerly merged cleanup patches.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries
onto a random address (in cases in which mmap() is allowed to perform a
randomization).
The code has been extraced from Ingo's exec-shield patch
http://people.redhat.com/mingo/exec-shield/
[akpm@linux-foundation.org: fix used-uninitialsied warning]
[kamezawa.hiroyu@jp.fujitsu.com: fixed ia32 ELF on x86_64 handling]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch defines the _PAGE_* paging attributes in pgtable_64.h in terms of
the former defined _PAGE_BIT_* values.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
remove asm/bitops.h includes
including asm/bitops directly may cause compile errors. don't include it
and include linux/bitops instead. next patch will deny including asm header
directly.
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86_64 uses 2M page table entries to map its 1-1 kernel space. We also
implement the virtual memmap using 2M page table entries. So there is no
additional runtime overhead over FLATMEM, initialisation is slightly more
complex. As FLATMEM still references memory to obtain the mem_map pointer and
SPARSEMEM_VMEMMAP uses a compile time constant, SPARSEMEM_VMEMMAP should be
superior.
With this SPARSEMEM becomes the most efficient way of handling virt_to_page,
pfn_to_page and friends for UP, SMP and NUMA on x86_64.
[apw@shadowen.org: code resplit, style fixups]
[apw@shadowen.org: vmemmap x86_64: ensure end of section memmap is initialised]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the headers to include/asm-x86 and fixup the
header install make rules
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 34feb2c83b.
Suresh Siddha points out that this one breaks the fundamental
requirement that you cannot free page table pages before the TLB caches
are flushed. The quicklists do not give the same kinds of guarantees
that the mmu_gather structure does, at least not in NUMA configurations.
Requested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reenable kprobes and alternative patching when the kernel text is write
protected by DEBUG_RODATA
Add a general utility function to change write protected text. The new
function remaps the code using vmap to write it and takes care of CPU
synchronization. It also does CLFLUSH to make icache recovery faster.
There are some limitations on when the function can be used, see the
comment.
This is a newer version that also changes the paravirt_ops code.
text_poke also supports multi byte patching now.
Contains bug fixes from Zach Amsden and suggestions from Mathieu
Desnoyers.
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds caching of pgds and puds, pmds, pte. That way we can avoid costly
zeroing and initialization of special mappings in the pgd.
A second quicklist is useful to separate out PGD handling. We can carry the
initialized pgds over to the next process needing them.
Also clean up the pgd_list handling to use regular list macros. There is no
need anymore to avoid the lru field.
Move the add/removal of the pgds to the pgdlist into the constructor /
destructor. That way the implementation is congruent with i386.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nobody is using ptep_test_and_clear_dirty and ptep_clear_flush_dirty. Remove
the functions from all architectures.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kill pte_rdprotect(), pte_exprotect(), pte_mkread(), pte_mkexec(), pte_read(),
pte_exec(), and pte_user() except where arch-specific code is making use of
them.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some changes done a while ago to avoid pounding on ptep_set_access_flags and
update_mmu_cache in some race situations break sun4c which requires
update_mmu_cache() to always be called on minor faults.
This patch reworks ptep_set_access_flags() semantics, implementations and
callers so that it's now responsible for returning whether an update is
necessary or not (basically whether the PTE actually changed). This allow
fixing the sparc implementation to always return 1 on sun4c.
[akpm@linux-foundation.org: fixes, cleanups]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: David Miller <davem@davemloft.net>
Cc: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eliminate 19439 (!!) sparse warnings like:
include/linux/mm.h:321:22: warning: constant 0xffff810000000000 is so big it is unsigned long
Eliminate 56 sparse warnings like:
arch/x86_64/kernel/setup.c:248:16: warning: constant 0xffffffff80000000 is so big it is unsigned long
Eliminate 5 sparse warnings like:
arch/x86_64/kernel/module.c:49:13: warning: constant 0xfffffffffff00000 is so big it is unsigned long
Eliminate 23 sparse warnings like:
arch/x86_64/mm/init.c:551:37: warning: constant 0xffffc20000000000 is so big it is unsigned long
Eliminate 6 sparse warnings like:
arch/x86_64/kernel/module.c:49:13: warning: constant 0xffffffff88000000 is so big it is unsigned long
Eliminate 23 sparse warnings like:
arch/x86_64/mm/init.c:552:6: warning: constant 0xffffe1ffffffffff is so big it is unsigned long
Eliminate 3 sparse warnings like:
arch/x86_64/kernel/e820.c:186:17: warning: constant 0x3fffffffffff is so big it is long
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make a global linux/const.h header file instead of having multiple,
per-arch files, and convert current users of asm/const.h to use
linux/const.h.
Built on x86_64 and sparc64.
[akpm@linux-foundation.org: fix include/asm-x86_64/Kbuild]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most architectures defined three macros, MK_IOSPACE_PFN(), GET_IOSPACE()
and GET_PFN() in pgtable.h. However, the only callers of any of these
macros are in Sparc specific code, either in arch/sparc, arch/sparc64 or
drivers/sbus.
This patch removes the redundant macros from all architectures except
sparc and sparc64.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)
arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was broken. It adds complexity, for no good reason. Rather than
separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(),
and preferably __pa() too - and just use "virt_to_phys()" instead, which
is more readable and has nicer semantics.
However, right now, just undo the separation, and make __pa_symbol() be
the exact same as __pa(). That fixes the bugs this patch introduced,
and we can do the fairly obvious cleanups later.
Do the new __phys_addr() function (which is now the actual workhorse for
the unified __pa()/__pa_symbol()) as a real external function, that way
all the potential issues with compile/link-time optimizations of
constant symbol addresses go away, and we can also, if we choose to, add
more sanity-checking of the argument.
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86_64 currently simulates a list using the index and private fields of the
page struct. Seems that the code was inherited from i386. But x86_64 does
not use the slab to allocate pgds and pmds etc. So the lru field is not
used by the slab and therefore available.
This patch uses standard list operations on page->lru to realize pgd
tracking.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently __pa_symbol is for use with symbols in the kernel address
map and __pa is for use with pointers into the physical memory map.
But the code is implemented so you can usually interchange the two.
__pa which is much more common can be implemented much more cheaply
if it is it doesn't have to worry about any other kernel address
spaces. This is especially true with a relocatable kernel as
__pa_symbol needs to peform an extra variable read to resolve
the address.
There is a third macro that is added for the vsyscall data
__pa_vsymbol for finding the physical addesses of vsyscall pages.
Most of this patch is simply sorting through the references to
__pa or __pa_symbol and using the proper one. A little of
it is continuing to use a physical address when we have it
instead of recalculating it several times.
swapper_pgd is now NULL. leave_mm now uses init_mm.pgd
and init_mm.pgd is initialized at boot (instead of compile time)
to the physmem virtual mapping of init_level4_pgd. The
physical address changed.
Except for the for EMPTY_ZERO page all of the remaining references
to __pa_symbol appear to be during kernel initialization. So this
should reduce the cost of __pa in the common case, even on a relocated
kernel.
As this is technically a semantic change we need to be on the lookout
for anything I missed. But it works for me (tm).
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
With the rewrite of the SMP trampoline and the early page
allocator there is nothing that needs identity mapped pages,
once we start executing C code.
So add zap_identity_mappings into head64.c and remove
zap_low_mappings() from much later in the code. The functions
are subtly different thus the name change.
This also kills boot_level4_pgt which was from an earlier
attempt to move the identity mappings as early as possible,
and is now no longer needed. Essentially I have replaced
boot_level4_pgt with trampoline_level4_pgt in trampoline.S
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>