The following commit broke CONFIG_RELOCATABLE support on FSL Book-E
parts:
commit 549e8152de
Author: Paul Mackerras <paulus@samba.org>
Date: Sat Aug 30 11:43:47 2008 +1000
powerpc: Make the 64-bit kernel as a position-independent executable
The change to __va and __pa to use PAGE_OFFSET & MEMORY_START causes
problems on the Book-E parts because we don't know MEMORY_START until
after we parse the device tree. We need __va to work properly to even
parse the device tree so we have a chicken an egg. So go back to using
he other definition of __va/__pa on CONFIG_BOOKE and use the
PAGE_OFFSET/MEMORY_START version on "Classic" PPC64.
Also updated casts to handle phys_addr_t being a different size from
unsigned long (ie 36-bit physical on PPC32).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Currently each available hugepage size uses a slightly different
pagetable layout: that is, the bottem level table of pointers to
hugepages is a different size, and may branch off from the normal page
tables at a different level. Every hugepage aware path that needs to
walk the pagetables must therefore look up the hugepage size from the
slice info first, and work out the correct way to walk the pagetables
accordingly. Future hardware is likely to add more possible hugepage
sizes, more layout options and more mess.
This patch, therefore reworks the handling of hugepage pagetables to
reduce this complexity. In the new scheme, instead of having to
consult the slice mask, pagetable walking code can check a flag in the
PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
and the entry also contains the information (eseentially hugepage
shift) necessary to then interpret that table without recourse to the
slice mask. This scheme can be extended neatly to handle multiple
levels of self-describing "special" hugepage pagetables, although for
now we assume only one level exists.
This approach means that only the pagetable allocation path needs to
know how the pagetables should be set out. All other (hugepage)
pagetable walking paths can just interpret the structure as they go.
There already was a flag bit in PGD/PUD/PMD entries for hugepage
directory pointers, but it was only used for debug. We alter that
flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
pointer (normally it would be 1 since the pointer lies in the linear
mapping). This means that asm pagetable walking can test for (and
punt on) hugepage pointers with the same test that checks for
unpopulated page directory entries (beq becomes bge), since hugepage
pointers will always be positive, and normal pointers always negative.
While we're at it, we get rid of the confusing (and grep defeating)
#defining of hugepte_shift to be the same thing as mmu_huge_psizes.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the PTE and pgtable format definitions, along with changes
to the kernel memory map and other definitions related to implementing
support for 64-bit Book3E. This also shields some asm-offset bits that
are currently only relevant on 32-bit
We also move the definition of the "linux" page size constants to
the common mmu.h file and add a few sizes that are relevant to
embedded processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Adds support for the "unused" page hint which can be used in shared
memory partitions to flag pages not in use, which will then be stolen
before active pages by the hypervisor when memory needs to be moved to
LPARs in need of additional memory. Failure to mark pages as 'unused'
makes the LPAR slower to give up unused memory to other partitions.
This adds the kernel parameter 'cmo_free_hint' to disable this
functionality.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds support for 256KB pages on ppc44x-based boards.
For simplification of implementation with 256KB pages we still assume
2-level paging. As a side effect this leads to wasting extra memory space
reserved for PTE tables: only 1/4 of pages allocated for PTEs are
actually used. But this may be an acceptable trade-off to achieve the
high performance we have with big PAGE_SIZEs in some applications (e.g.
RAID).
Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize
the risk of stack overflows in the cases of on-stack arrays, which size
depends on the page size (e.g. multipage BIOs, NTFS, etc.).
With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down
to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be
occupied by PKMAP addresses leaving no place for vmalloc. We do not
separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that
value of 10 in support for 16K/64K had been selected rather intuitively.
Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB,
one) we have 512 pages for PKMAP.
Because ELF standard supports only page sizes up to 64K, then you should
use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K
for building applications, which are to be run with the 256KB-page sized
kernel. If using the older binutils, then you should patch them like follows:
--- binutils/bfd/elf32-ppc.c.orig
+++ binutils/bfd/elf32-ppc.c
-#define ELF_MAXPAGESIZE 0x10000
+#define ELF_MAXPAGESIZE 0x40000
One more restriction we currently have with 256KB page sizes is inability
to use shmem safely, so, for now, the 256KB is available only if you turn
the CONFIG_SHMEM option off (another variant is to use BROKEN).
Though, if you need shmem with 256KB pages, you can always remove the !SHMEM
dependency in 'config PPC_256K_PAGES', and use the workaround available here:
http://lkml.org/lkml/2008/12/19/20
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds support for 16k and 64k page sizes on PowerPC 44x processors.
The PGDIR table is much smaller than a page when using 16k or 64k
pages (512 and 32 bytes respectively) so we allocate the PGDIR with
kzalloc() instead of __get_free_pages().
One PTE table covers rather a large memory area when using 16k or 64k
pages (32MB or 512MB respectively), so we can easily put FIXMAP and
PKMAP in the area covered by one PTE table.
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Vladimir Panfilov <pvr@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There are two issues when we enable CONFIG_RELOCATABLE. The first is due
to the fact that phys_addr_t is now defined in linux/types.h. The second
is due to the fact that the DMA code changes expose memstart_addr to
prom_init.c
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Today's linux-next build (powerpc allyesconfig) failed like this:
In file included from arch/powerpc/include/asm/mmu-hash64.h:17,
from arch/powerpc/include/asm/mmu.h:8,
from arch/powerpc/include/asm/pgtable.h:8,
from arch/powerpc/mm/slb.c:20:
arch/powerpc/include/asm/page.h:76: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'memstart_addr'
arch/powerpc/include/asm/page.h:77: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'kernstart_addr'
Caused by commit 600715dcdf ("generic: add
phys_addr_t for holding physical addresses") from the tip-core tree.
This only fails if CONFIG_RELOCATABLE is set.
So include that instead of asm/types.h in asm/page.h for
the CONFIG_RELOCATABLE case.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: ppc-dev <linuxppc-dev@ozlabs.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
a position-independent executable (PIE) when it is set. This involves
processing the dynamic relocations in the image in the early stages of
booting, even if the kernel is being run at the address it is linked at,
since the linker does not necessarily fill in words in the image for
which there are dynamic relocations. (In fact the linker does fill in
such words for 64-bit executables, though not for 32-bit executables,
so in principle we could avoid calling relocate() entirely when we're
running a 64-bit kernel at the linked address.)
The dynamic relocations are processed by a new function relocate(addr),
where the addr parameter is the virtual address where the image will be
run. In fact we call it twice; once before calling prom_init, and again
when starting the main kernel. This means that reloc_offset() returns
0 in prom_init (since it has been relocated to the address it is running
at), which necessitated a few adjustments.
This also changes __va and __pa to use an equivalent definition that is
simpler. With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
constants (for 64-bit) whereas PHYSICAL_START is a variable (and
KERNELBASE ideally should be too, but isn't yet).
With this, relocatable kernels still copy themselves down to physical
address 0 and run there.
Signed-off-by: Paul Mackerras <paulus@samba.org>
from include/asm-powerpc. This is the result of a
mkdir arch/powerpc/include/asm
git mv include/asm-powerpc/* arch/powerpc/include/asm
Followed by a few documentation/comment fixups and a couple of places
where <asm-powepc/...> was being used explicitly. Of the latter only
one was outside the arch code and it is a driver only built for powerpc.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:
u64 val = PAGE_ALIGN(size);
always returns a value < 4GB even if size is greater than 4GB.
The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):
#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.
Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.
See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Added support to allow an 85xx kernel to be run from a non-zero physical
address (useful for cooperative asymmetric multiprocessing situations and
kdump). The support can be configured at compile time by setting
CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as
desired.
Alternatively, the kernel build can set CONFIG_RELOCATABLE. Setting this
config option causes the kernel to determine at runtime the physical
addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START. If
CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning.
However, CONFIG_PHYSICAL_START will always be used to set the LOAD program
header physical address field in the resulting ELF image.
Currently we are limited to running at a physical address that is a
multiple of 256M. This is due to how we map TLBs to cover
lowmem. This should be fixed to allow 64M or maybe even 16M alignment
in the future. It is considered an error to try and run a kernel at a
non-aligned physical address.
All the magic for this support is accomplished by proper initialization
of the kernel memory subsystem and use of ARCH_PFN_OFFSET.
The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings.
ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET.
/dev/mem continues to allow access to any physical address in the system
regardless of how CONFIG_PHYSICAL_START is set.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We can set LOAD_OFFSET and use the AT attribute on sections and the
linker will properly set the physical address of the LOAD program
header for us.
This allows us to know how the PHYSICAL_START the user configured a
kernel with by just looking at the resulting vmlinux ELF.
This is pretty much stolen from how x86 does things in their linker
scripts.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Background: I've implemented 1K/2K page tables for s390. These sub-page
page tables are required to properly support the s390 virtualization
instruction with KVM. The SIE instruction requires that the page tables
have 256 page table entries (pte) followed by 256 page status table entries
(pgste). The pgstes are only required if the process is using the SIE
instruction. The pgstes are updated by the hardware and by the hypervisor
for a number of reasons, one of them is dirty and reference bit tracking.
To avoid wasting memory the standard pte table allocation should return
1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means
the s390 version for pte_alloc_one cannot return a pointer to a struct
page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
cannot return a pointer to a pte either, since that would require more than
32 bit for the return value of pte_alloc_one (and the pte * would not be
accessible since its not kmapped).
Solution: The only solution I found to this dilemma is a new typedef: a
pgtable_t. For s390 pgtable_t will be a (pte *) - to be introduced with a
later patch. For everybody else it will be a (struct page *). The
additional problem with the initialization of the ptl lock and the
NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
a destructor pgtable_page_dtor. The page table allocation and free
functions need to call these two whenever a page table page is allocated or
freed. pmd_populate will get a pgtable_t instead of a struct page pointer.
To get the pgtable_t back from a pmd entry that has been installed with
pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page
call in free_pte_range and apply_to_pte_range.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
asm/elf.h, asm/page.h and asm/user.h don't export to userspace now, so we can
drop #ifdef __KERNEL__ for them.
[k.shutemov@gmail.com: remove #ifdef __KERNEL_]
Signed-off-by: Kirill A. Shutemov <k.shutemov@gmail.com>
Reviewed-by: David Woodhouse <dwmw2@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Kirill A. Shutemov <k.shutemov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nathan Lynch <ntl@pobox.com> reported:
2.6.23-rc1 breaks the build for 64-bit powerpc for me (using
maple_defconfig):
LD vmlinux.o
powerpc64-unknown-linux-gnu-ld: dynreloc miscount for
kernel/built-in.o, section .opd
powerpc64-unknown-linux-gnu-ld: can not edit opd Bad value
make: *** [vmlinux.o] Error 1
However, I see a possibly related binutils patch:
http://article.gmane.org/gmane.comp.gnu.binutils/33650
It was tracked down to be caused by the weak prototype
declaration in mm.h:
__attribute__((weak)) const char *arch_vma_name(struct vm_area_struct *vma);
But there is no need to make the declaration weak - only the definition
needs to be marked weak. So drop the weak declaration. And in the process
drop the duplicate definition in page.h for powerpc.
Note: the arch_vma_name fix for x86_64 needs to be applied first to avoid
breaking x86_64
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Nathan Lynch <ntl@pobox.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For 32-bit systems, powerpc still relies on the 4level-fixup.h hack,
to pretend that the generic pagetable handling stuff is 3-levels
rather than 4. This patch removes this, instead using the newer
pgtable-nopmd.h to handle the elision of both the pud and pmd
pagetable levels (ppc32 pagetables are actually 2 levels).
This removes a little extraneous code, and makes it more easily
compared to the 64-bit pagetable code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since we don't have it active by default, the STRICT_MM_TYPECHECKS
option has bitrotted again. This patch fixes a couple of simple build
fixes if the option is selected. First, pud_t mustn't be defined in
page.h on 32-bit systems, because it conflicts with the version in the
generic pud-folding code. Second, pci_32.c is missing a __pgprot()
wrapper call. Third, a couple of PS3 files use constants of type
pgprot_t when they need the raw values, we add pgprot_val() calls to
fix this.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This looks like cruft to me, these functions don't exist AFAICT,
and I can't see that it's possible to even enable DISCONTIGMEM on
powerpc anymore. CC'ing some folks who might know better, based on
the who-touched-it-last principle.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch cleans up some locking & error handling in the ppc vdso and
moves the vdso base pointer from the thread struct to the mm context
where it more logically belongs. It brings the powerpc implementation
closer to Ingo's new x86 one and also adds an arch_vma_name() function
allowing to print [vsdo] in /proc/<pid>/maps if Ingo's x86 vdso patch is
also applied.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to know the base address of the kdump kernel even when we're not a
kdump kernel, so add a #define for it. Move the logic that sets the kdump
kernelbase into kdump.h instead of page.h.
Rename kdump_setup() to setup_kdump_trampoline() to make it clearer what it's
doing, and add an empty definition for the !CRASH_DUMP case to avoid a
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
PowerPC can use generic ones.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
PPC32 is still using asm-generic/4level-fixup.h, but asm-powerpc/page.h
was defining pud_t and pgd_t. Depending on the order in which files
got included, this could result in a compilation error. Tweak the ifdef
so that page.h doesn't try to define pud_t on ppc32 (which uses 2-level
page tables).
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a Kconfig variable, CONFIG_CRASH_DUMP, which configures the
built kernel for use as a Kdump kernel.
Currently "all" this involves is changing the value of KERNELBASE to 32 MB.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch separates usage of KERNELBASE and PAGE_OFFSET. I haven't
looked at any of the PPC32 code, if we ever want to support Kdump on
PPC we'll have to do another audit, ditto for iSeries.
This patch makes PAGE_OFFSET the constant, it'll always be 0xC * 1
gazillion for 64-bit.
To get a physical address from a virtual one you subtract PAGE_OFFSET,
_not_ KERNELBASE.
KERNELBASE is the virtual address of the start of the kernel, it's
often the same as PAGE_OFFSET, but _might not be_.
If you want to know something's offset from the start of the kernel
you should subtract KERNELBASE.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There's a bunch of code that compares an address with KERNELBASE to see if
it's a "kernel address", ie. >= KERNELBASE. The proper test is actually to
compare with PAGE_OFFSET, since we're going to change KERNELBASE soon.
So replace all of them with an is_kernel_addr() macro that does that.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Merge asm-ppc/page.h and asm-ppc64/page.h into asm-powerpc/page.h,
asm-powerpc/page_32.h and asm-powerpc/page_64.h
Built for PPC (common_defconfig), with ARCH=powerpc, mostly built with
ARCH=ppc (other things break the build). Built and booted on P5 LPAR
for PPC64 with ARCH=ppc/powerpc (pseries_defconfig). Mostly built for
iSeries powerpc.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>