Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Improves efficiency of
resched_task and some cpu_idle routines.
* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
and as we hold it during resched_task, then there is no need for an
atomic test and set there. The only other time this should be set is
when the task's quantum expires, in the timer interrupt - this is
protected against because the rq lock is irq-safe.
- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
won't get unset until the task get's schedule()d off.
- If we are running on the same CPU as the task we resched, then set
TIF_NEED_RESCHED and no further action is required.
- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
after TIF_NEED_RESCHED has been set, then we need to send an IPI.
Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.
* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
becomes true, explicitly call schedule(). This makes things a bit clearer
(IMO), but haven't updated all architectures yet.
- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
to the resched_task rules, this isn't needed (and actually breaks the
assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
held). So remove that. Generally one less locked memory op when switching
to the idle thread.
- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
most polling idle loops. The above resched_task semantics allow it to be
set until before the last time need_resched() is checked before going into
a halt requiring interrupt wakeup.
Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
can be always left set, completely eliminating resched IPIs when rescheduling
the idle task.
POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Run idle threads with preempt disabled.
Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
How did it ever work before?
Might fix the CPU hotplugging hang which Nigel Cunningham noted.
We think the bug hits if the idle thread is preempted after checking
need_resched() and before going to sleep, then the CPU offlined.
After calling stop_machine_run, the CPU eventually returns from preemption and
into the idle thread and goes to sleep. The CPU will continue executing
previous idle and have no chance to call play_dead.
By disabling preemption until we are ready to explicitly schedule, this bug is
fixed and the idle threads generally become more robust.
From: alexs <ashepard@u.washington.edu>
PPC build fix
From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
MIPS build fix
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ancient ppcdebug/PPCDBG mechanism is now only used in two places.
First, in the hash setup code, one of the bits allows the size of the
hash table to be reduced by a factor of 8 - which would be better
accomplished with a command line option for that purpose. The other
was a bunch of bus walking related messages in the iSeries code, which
would seem to be insufficient reason to keep the mechanism.
This patch removes the last traces of this mechanism.
Built and booted on iSeries and pSeries POWER5 LPAR (ARCH=powerpc).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel
base page size to 64K. The resulting kernel still boots on any
hardware. On current machines with 4K pages support only, the kernel
will maintain 16 "subpages" for each 64K page transparently.
Note that while real 64K capable HW has been tested, the current patch
will not enable it yet as such hardware is not released yet, and I'm
still verifying with the firmware architects the proper to get the
information from the newer hypervisors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These days, the NACA only exists on iSeries. Therefore, this patch
moves naca.h from include/asm-ppc64 to arch/powerpc/platforms/iseries.
There was one file including naca.h outside of platforms/iseries -
arch/ppc64/kernel/udbg_scc.c. However, that's obviously a hangover
from older days. The include is not necessary, so this patch simply
removes it.
Built and booted on iSeries, built for G5 (which uses udbg_scc.o).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
By parsing the command line earlier, we can add the mem= value to the
flattened device tree and let the generic code sort out the memory limit
for us.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
We had a static memory_limit in prom.c, and then another one defined
in setup_64.c and used in numa.c, which resulted in the kernel crashing
when mem=xxx was given on the command line. This puts the declaration
in system.h and the definition in mem.c. This also moves the
definition of tce_alloc_start/end out of setup_64.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
A few things change for consistency between ppc32 and ppc64:
idle functions return void; *_get_boot_time functions return
unsigned long (i.e. time_t) rather than filling in a struct rtc_time
(since that's useful to the callers and easier for pmac to
generate); *_get_rtc_time and *_set_rtc_time functions take
a struct rtc_time; irq_canonicalize is gone; nvram_sync returns
void.
Signed-off-by: Paul Mackerras <paulus@samba.org>
On ARCH=ppc64 we were getting htab_hash_mask recalculated
to the correct value for our particular machine by accident.
In the merge tree, that code was commented out, so htab_hash_mask
was being corrupted.
We now set ppc64_pft_size instead which gets htab_has_mask
calculated correctly for us later. We should put an
ibm,pft-size property in the device tree at some point.
Also set -mno-minimal-toc in some makefiles.
Allow iSeries to configure PROC_DEVICETREE.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
iSeries_setup.c becomes setup.c
iSeries_setup.h becomes setup.h
mf.c retains its name
Also moved iSeries_[gs]et_rtc_time and iSeries_get_boot_time into
mf.c since they are just small wrappers around mf_ functions.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Add the /cpus node and nodes for each cpu, as well as cache size properties,
reg propery, "linux,boot-cpu", and timebase/clock frequency.
With those properties in place we can remove:
- setup_iSeries_cache_sizes()
- code in iSeries_setup_arch() to calculate timebase etc.
- iSeries_calibrate_decr()
- smp_iSeries_numProcs() and simplify smp_iSeries_probe()
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Add /chosen/linux,platform to the device tree so we can remove iSeries
specific code in setup_system() to set systemcfg->platform.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
This patch adds the required nodes to the iSeries device tree to allow
early_init_devtree() to do the lmb setup for us.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Misc steps to incorporate the flat device tree on iSeries.
- define iseries_probe()
- call build_iSeries_Memory_Map() earlier
- return __pa() of the flat device tree from iSeries_early_setup()
- actually call early_setup() for iSeries
- add iseries_md to machdep_calls
- build prom.o for iSeries
- enable /proc/device-tree for iSeries
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
This patch adds infrastructure for creating a fake flattened device tree
on iSeries.
We also need to build prom.o for iSeries which means we'll always need it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
early_setup() calls htab_initialize() which is similar, but not identical
to iSeries_bolt_kernel().
On iSeries the Hypervisor has already inserted some ptes for us, and we
simply have to detect that and bolt them. iSeries_hpte_bolt_or_insert()
implements that logic.
For the case of a non-existing pte we just call iSeries_hpte_insert(). This
appears to work, although it's not entirely equivalent to the old code in
iSeries_make_pte() which panicked if we got a secondary slot. Not sure if
that's important.
Finally we call iSeries_hpte_bolt_or_insert() from create_pte_mapping(),
which is called from htab_initialize() for each lmb region.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
In order to call finish_device_tree() on iSeries we need to define
virt_irq_create_mapping(). We also need to set ppc64_interrupt_controller to
something other than zero. If we want to do interrupt setup via the device
tree on iSeries this code will need some serious work, but it's harmless to
have it there as long as the nodes in the iSeries device tree don't cause
it to be invoked.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Move the iSeries machine specific calls into a machdep_calls struct like
other platforms, rather than setting members of ppc_md explicitly.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
This patch moves power4_enable_pmcs() to arch/ppc64/kernel/pmc.c.
I've tested it on P5 LPAR and P4. It does what it used to.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rename the msChunks struct to get rid of the StUdlY caps and make it a bit
clearer what it's for.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Chunks are 256KB, so use constants for the size/shift/mask, rather than
getting them from the msChunks struct. The iSeries debugger (??) might still
need access to the values in the msChunks struct, so we keep them around
for now, but set them from the constant values.
Replace msChunks_entry typedef with regular u32.
Simplify msChunks_alloc() to manipulate klimit directly, rather than via
a parameter.
Move msChunks_alloc() and msChunks into iSeries_setup.c, as that's where
they're used.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes the use of bitfield types from the ppc64 hash table
manipulation code.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Not sure if we really need this, but it was handy to know which iSeries loop I
was testing.
Be consistent about printing which idle loop we're using, with this patch we
cover all cases.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a compile warning introduced by the previous patches.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- remove min/max yield time, we dont use the values anywhere
- separate shared and dedicated idle loops
- check need_resched again with irqs off to avoid sleeping with pending work
- continually set runlatch off in idle loop, this means we dont need to
turn the runlatch off on exception exit and suffer that associated
cost for all exceptions. (A future patch will turn the runlatch on at
exception entry)
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes up iSeries, pSeries, pmac and maple to set the correct idle
function for each platform.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move iSeries_idle() into iSeries_setup.c, no one else needs to know about it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The xItLpQueue is initalised manually in iSeries_setup_arch(). Move
this code into ItLpQueue.c for a cleaner separation.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The only code outside ItLpQueue.c that refers to spread_lpevents is in
set_apread_lpevents(), so move it inside ItLpQueue.c and make spread_lpevents
static.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The iSeries code keeps a pointer to the ItLpQueue in its paca struct. But
all these pointers end up pointing to the one place, ie. xItLpQueue.
So remove the pointer from the paca struct and just refer to xItLpQueue
directly where needed.
The only complication is that the spread_lpevents logic was implemented by
having a NULL lpqueue pointer in the paca on CPUs that weren't supposed to
process events. Instead we just compare the spread_lpevents value to the
processor id to get the same behaviour.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
pSeries and maple have almost the same code for calibrate_decr,
and BPA would need yet another copy. Instead, I'm moving the
code to arch/ppc64/kernel/time.c.
Some of the related declarations were missing from header
files, so I'm moving those as well.
It makes sense to merge this with the pmac function of the
same name, so we end up having just one implemetation for
iSeries and one for Open Firmware based machines.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch allows iSeries to build with CONFIG_PCI=n. This is useful for
partitions that have only virtual I/O.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/asm-ppc64/iSeries/LparData.h just included a whole lot of other files
to declare variables that would be better declared in those other files. So,
remove it. This will reduce that number of things needed to be included in
most cases to access the relevant variables.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/asm-ppc64/iSeries/iSeries_proc.h just contains a declaration of a
function that no longer exists. Remove it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that we have HZ=1000 there is much less of a need for decr_overclock.
Remove it.
Leave spread_lpevents but move it into iSeries_setup.c. We should look at
making event spreading the default some day.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!