While sending interrupts to a cpu to repeatedly wake a thread, on occasion
that thread will take a full timer tick cycle (4002 usec in my case)
to wakeup.
The problem concerns a race condition in the code around the safe_halt()
call in the default_idle() routine. Setting 'nohalt' on the kernel
command line causes the long wakeups to disappear.
void
default_idle (void)
{
local_irq_enable();
while (!need_resched()) {
--> if (can_do_pal_halt)
--> safe_halt();
else
A timer tick could arrive between the check for !need_resched and the
actual call to safe_halt() (which does a pal call to PAL_HALT_LIGHT).
By the time the timer tick completes, a thread that might now need to run
could get held up for as long as a timer tick waiting for the halted cpu.
I'm proposing that we disable irq's and check need_resched again before
calling safe_halt(). Does anyone see any problem with this approach?
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make sure to reduce the rating of the ITC clock if ITCs are drifty. If they
are drifting then we have not synchronized the ITC values, nor are we doing
the jitter compensation (useless since drift may increase the differentials
arbitrarily).
Without this patch it is possible that the ITC clock becomes selected as
the system clock on systems with drifty ITCs which will result in
nanosleep hanging.
One can still select the itc clock manually on such systems via
clocksource=itc
(Produces nice hangs on SGI Altix.)
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If the sn2_rtc clock is present then it is a must have since sn2_rtc
provides a synchronized time source on Altix systems. So elevate
the priority to 450. Otherwise the ITC would take precendence. Altix
systems currently do not boot because the ITC clocksource is broken. It
seems to assume that ITCs are synchronized and as a result nanosleep
hangs (may be fixed in a different patch).
While we are at it: Remove the sn2_mc definition. The sn2_rtc has a fixed
address. No point in reading the address from memory. Removing it avoids
touching one cacheline.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In error path we must unlock irq_desc[irq].lock before we change
'irq'.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove unused TIF_NOTIFY_RESUME flag for all processor architectures. The
flag was not used excecpt on IA-64 where the patch replaces it with
TIF_PERFMON_WORK.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, destroy_and_reserve_irq() sets irq_status[irq] UNUSED using
clear_irq_vector() and sets irq_status[irq] RSVD using reserve_irq().
But there is a race window because vector_lock is once released between
them. This patch fixes this race window.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that interrupts are not initialized correctly at PCI
hotplug or driver reloading time.
By vector domain change, the iosapic_rte_info structure was changed to
be on the iosapic_intr_info[irq].rtes list even after the interrupts
are unregistered. So iosapic_intr_info[irq].rtes list must not be
checked to see if there are registered interrupts (RTEs) on the
irq. We must check iosapic_intr_info[irq].count counter instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes a few duplicate includes from arch/ia64/
Acked-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This removes the requirement for callers to get_cpu() to check in simple
cases. i386 and x86_64 already received a similar treatment.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the following section mismatch warnings:
WARNING: vmlinux.o(.text+0x41902): Section mismatch: reference to .init.text:__alloc_bootmem (between 'ia64_mca_cpu_init' and 'ia64_do_tlb_purge')
WARNING: vmlinux.o(.text+0x49222): Section mismatch: reference to .init.text:__alloc_bootmem (between 'register_intr' and 'iosapic_register_intr')
WARNING: vmlinux.o(.text+0x62beb2): Section mismatch: reference to .init.text:__alloc_bootmem_node (between 'hubdev_init_node' and 'cnodeid_get_geoid')
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When comparing a pointer, it's clearer to compare it to NULL than to 0.
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Signed-off-by: Tony Luck <tony.luck@intel.com>
b716395e2b added code to handle
a compatability issue with 32bit quota tools, but the new compat
routines are only needed when CONFIG_COMPAT=y (and with this set
to 'n' there are compilation problems since some new typedefs are
not visible).
Reported by Doug Chapman. Fix tuned by a cast of thousands (Andi,
Andreas, Arthur, HPA, Willy)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Forgot to adjust this one with the acpi autoloading patches
in commit 8c8eb78f67
Acked-by: Myron Stowe <myron.stowe@hp.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix wrong return value in parse_vector_domain().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ia64's acpi_gsi_to_irq() function assumes irq == vector. But in
fact irq can be different from vector. This patch fix this wrong
assumption.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add some sanity checks into __bind_irq_vector().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
u32* volatile cyclone_timer means volatile auto pointer to u32,
which is clearly not what had been intended (we never even take
the address of that variable, let alone pass it to something that
could change it behind our back). u32 volatile * is what the
authors apparently wanted to say, but in reality we don't need that
qualifier there at all - it's (properly) only passed to iomem helpers
which takes care of that stuff just fine.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
pcibios_setup (between 'pci_setup' and 'quirk_mellanox_tavor')
setup_profiling_timer (between 'write_profile' and 'delayed_put_task_struct')
Signed-off-by: Tony Luck <tony.luck@intel.com>
In 741f98fe29 Sam added full
checking across the entire vmlinux image. This flushed out
a dozen new section mismatch warnings. Start the whack-a-mole
game again to stomp them out.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Jens has added a partial_page thing in splice whcih conflicts with the ia64
one. Rename ia64 out of the way. (ia64 chose poorly).
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Parse the machvec command line option outside of the early_param()
so that ia64_mv is set before any console intialisation that
may result from early_param parsing.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This change fixes a panic when assign_irq_vector(irq) is called with
irq = AUTO_ASSIGN.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As it was a synonym for (CONFIG_ACPI && CONFIG_X86),
the ifdefs for it were more clutter than they were worth.
For ia64, just add a few stubs in anticipation of future
S3 or S4 support.
Signed-off-by: Len Brown <len.brown@intel.com>
This is a merge of Peter Keilty's initial patch (which was
revived by Bob Picco) for this with Hidetoshi Seto's fixes
and scaling improvements.
Acked-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Update arch/ia64/defconfig: select 64K pagesize
Same for arch/ia64/configs/tiger_defconfig + CONFIG_COMPAT=n
Signed-off-by: Tony Luck <tony.luck@intel.com>
> arch/ia64/kernel/iosapic.c:597: warning: 'iosapic_free_rte' defined but not used
>
> This isn't spurious, the only call to iosapic_free_rte() has been removed, but there
> is still a call to iosapic_alloc_rte() ... which means we must have a memory leak.
I did it on purpose (and gave the warning a miss...) and I consider
iosapic_free_rte() is no longer needed.
I decided to remain iosapic_rte_info to keep gsi-to-irq binding
after device disable. Indeed it needs some extra memory, but it
is only "sizeof(iosapic_rte_info) * <the number of removed devices>"
bytes and has no memory leak becasue re-enabled devices use the
iosapic_rte_info which they used before disabling.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
sys_fallocate for ia64. This uses an empty slot #1303 erroneously
marked as reserved for move_pages (which had already been allocated
as syscall #1276)
Signed-Off-By: Dave Chinner <dgc@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since Ingo's recent scheduler rewrite which was merged as commit
0437e109e1 sched_cacheflush is unused.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
the old mm into the new mm.
We create the new mm before the binfmt code runs, and place the new stack at
the very top of the address space. Once the binfmt code runs and figures out
where the stack should be, we move it downwards.
It is a bit peculiar in that we have one task with two mm's, one of which is
inactive.
[a.p.zijlstra@chello.nl: limit stack size]
Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Hugh Dickins <hugh@veritas.com>
[bunk@stusta.de: unexport bprm_mm_init]
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently most of the per cpu data, which is accessed by different cpus,
has a ____cacheline_aligned_in_smp attribute. Move all this data to the
new per cpu shared data section: .data.percpu.shared_aligned.
This will seperate the percpu data which is referenced frequently by other
cpus from the local only percpu data.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.
One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.
This patch:
Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I realise jprobes are a razor-blades-included type of interface, but that
doesn't mean we can't try and make them safer to use. This guy I know once
wrote code like this:
struct jprobe jp = { .kp.symbol_name = "foo", .entry = "jprobe_foo" };
And then his kernel exploded. Oops.
This patch adds an arch hook, arch_deref_entry_point() (I don't like it
either) which takes the void * in a struct jprobe, and gives back the text
address that it represents.
We can then use that in register_jprobe() to check that the entry point we're
passed is actually in the kernel text, rather than just some random value.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer. This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).
[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based on usage and testing over the past couple of years, kprobes on
i386, ia64, powerpc and x86_64 is no longer EXPERIMENTAL.
This is a follow-up to Robert P.J. Day's patch making "Instrumentation
support" non-EXPERIMENTAL:
http://marc.info/?l=linux-kernel&m=118396955423812&w=2
Arch maintainers for sparc64, avr32 and s390 need to take a similar call.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the kernelcore= parameter for x86.
Once all patches are applied, a new command-line parameter exist and a new
sysctl. This patch adds the necessary documentation.
From: Yasunori Goto <y-goto@jp.fujitsu.com>
When "kernelcore" boot option is specified, kernel can't boot up on ia64
because of an infinite loop. In addition, the parsing code can be handled
in an architecture-independent manner.
This patch uses common code to handle the kernelcore= parameter. It is
only available to architectures that support arch-independent zone-sizing
(i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will
ignore the boot parameter.
[bunk@stusta.de: make cmdline_parse_kernelcore() static]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add per-CPU vector domain support for IA64_DIG. It is enabled by
adding the "vector=percpu" boot option.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add per-CPU vector domain support for IA64_GENERIC. It is enabled by
adding the "vector=percpu" boot option.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add support for IRQ migration across vector domain.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add fundamental support for multiple vector domain. There still exists
only one vector domain even with this patch. IRQ migration across
domain is not supported yet by this patch.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add mapping tables between irqs and vectors, and its management code.
This is necessary for supporting multiple vector domain because 1:1
mapping between irq and vector will be changed to n:1.
The irq == vector relationship between irqs and vectors is explicitly
remained for percpu interrupts, platform interrupts, isa IRQs and
vectors assigned using assign_irq_vector() because some programs might
depend on it.
And I should consider the following problem.
When pci drivers enabled/disabled devices dynamically, its irq number
is changed to the different one. Therefore, suspend/resume code may
happen problem.
To fix this problem, I bound gsi to irq.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Need to check if irq is sharable amoung handlers when searching
sharable IOSAPIC irq.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>