The two macros need special care to use:
Assume rc, ctxt, ops and done exist outside of them.
Can goto outside.
Considering the fact that these are used only in decode functions,
moving these right after do_insn_fetch() seems to be a right thing
to improve the readability.
We also rename do_fetch_insn_byte() to do_insn_fetch_byte() to be
consistent.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c
files, and I need it in a third one to fix a powerpc bug, so let's
first move it into a header
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Removing __init from check_platform_magic since it is called by
xen_unplug_emulated_devices in non-init contexts (It probably gets inlined
because of -finline-functions-called-once, removing __init is more to avoid
mismatch being reported).
Signed-off-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In the past we would only use the function's value if the
returned value was not equal to 'acpi_sci_override_gsi'. Meaning
that the INT_SRV_OVR values for global and source irq were different.
But it is OK to use the function's value even when the global
and source irq are the same.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In the past (2.6.38) the 'xen_allocate_pirq_gsi' would allocate
an entry in a Linux IRQ -> {XEN_IRQ, type, event, ..} array. All
of that has been removed in 2.6.39 and the Xen IRQ subsystem uses
an linked list that is populated when the call to
'xen_allocate_irq_gsi' (universally done from any of the xen_bind_*
calls) is done. The 'xen_allocate_pirq_gsi' is a NOP and there is
no need for it anymore so lets remove it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
As the code paths that are guarded by CONFIG_XEN_DOM0 already depend
on CONFIG_ACPI so the extra #ifdef is not required. The earlier
patch that added them in had done its job.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
.. which means we can preset of NR_IRQS_LEGACY interrupts using
the 'acpi_get_override_irq' API before this loop.
This means that we can get the IRQ's polarity (and trigger) from either
the ACPI (or MP); or use the default values. This fixes a bug if we did
not have an IOAPIC we would not been able to preset the IRQ's polarity
if the MP table existed.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Since they are only called once and the rest of the pci_xen_*
functions follow the same pattern of setup.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In the past we would guard those code segments to be dependent
on CONFIG_XEN_DOM0 (which depends on CONFIG_ACPI) so this patch is
not stricly necessary. But the next patch will merge common
HVM and initial domain code and we want to make sure the CONFIG_ACPI
dependency is preserved - as HVM code does not depend on CONFIG_XEN_DOM0.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Update the out-dated comment at the beginning of the file.
Also provide the copyrights of folks who have been contributing
to this code lately.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The file is hard to read. Move the code around so that
the contents of it follows a uniform format:
- setup GSIs - PV, HVM, and initial domain case
- then MSI/MSI-x setup - PV, HVM and then initial domain case.
- then MSI/MSI-x teardown - same order.
- lastly, the __init functions in PV, HVM, and initial domain order.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The code in setup_ioapic_irq() determines the Destination Field,
so why not also include it in the debug printk output that gets
displayed when the boot parameter "apic=debug" is used.
Before the change, "dmesg" will show:
IOAPIC[0]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0)
IOAPIC[0]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0)
IOAPIC[0]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0) ...
After the change, you will see:
IOAPIC[0]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
IOAPIC[0]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
IOAPIC[0]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0) ...
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708184603.2734.91071.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When IOAPIC data is displayed in "dmesg" with the help of the
boot parameter "apic=debug" certain values are not formatted
correctly wrt their size.
In the "dmesg" snippet below, note that the output for "max
redirection entries", and "IO APIC version" which are each
defined to be just 8-bits long are displayed as 2 bytes in
length. Similarly, "Dst" under the "IRQ redirection table"
should only be 8-bits long.
IO APIC #0......
...
...
.... register #01: 00170020
....... : max redirection entries: 0017
....... : PRQ implemented: 0
....... : IO APIC version: 0020
...
...
.... IRQ redirection table:
NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
00 000 1 0 0 0 0 0 0 00
01 000 0 0 0 0 0 0 0 31
02 000 0 0 0 0 0 0 0 30
03 000 1 0 0 0 0 0 0 33
...
...
Do some formatting clean up, so you will see output like below:
IO APIC #0......
...
...
.... register #01: 00170020
....... : max redirection entries: 17
....... : PRQ implemented: 0
....... : IO APIC version: 20
...
...
.... IRQ redirection table:
NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
00 00 1 0 0 0 0 0 0 00
01 00 0 0 0 0 0 0 0 31
02 00 0 0 0 0 0 0 0 30
03 00 1 0 0 0 0 0 0 33
...
...
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708184557.2734.61830.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 2706a0bf7b ("x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit
too") enabled AMD NUMA for 32bit too. Unfortunately, SPARSEMEM
on 32bit had rather coarse (512MiB) addr->node mapping
granularity due to lack of space in page->flags. This led to
boot failure on certain AMD NUMA machines which had 128MiB
alignment on nodes.
Patches to properly detect this condition and reject NUMA
configuration are posted[1] but deemed too pervasive for merge
at this point (-rc6). Disable AMD NUMA for 32bit for now and
re-enable once the detection logic is merged.
[1] http://thread.gmane.org/gmane.linux.kernel/1161279/focus=1162583
Reported-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Conny Seidel <conny.seidel@amd.com>
Link: http://lkml.kernel.org/r/20110711083432.GC943@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Detect Xen before HyperV because in Viridian compatibility mode Xen
presents itself as HyperV. Move Xen to the top since it seems more
likely that Xen would emulate VMware than vice versa.
Signed-off-by: Anupam Chanda <achanda@nicira.com>
Link: http://lkml.kernel.org/r/1310150570-26810-1-git-send-email-achanda@nicira.com
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
nr_cpus allows one to specify number of possible cpus in the system.
Current assumption seems to be that first cpu to show up is boot cpu
and this assumption will be broken in kdump scenario where we can be
booting on a non boot cpu with nr_cpus=1.
It might happen that first cpu we parse is not the cpu we boot on and
later we ignore boot cpu. Though code later seems to recognize this
anomaly and forcibly sets boot cpu in physical cpu map with following
warning.
if (!physid_isset(hard_smp_processor_id(), phys_cpu_present_map)) {
printk(KERN_WARNING
"weird, boot CPU (#%d) not listed by the BIOS.\n",
hard_smp_processor_id());
physid_set(hard_smp_processor_id(), phys_cpu_present_map);
}
This patch waits for boot cpu to show up and starts ignoring the cpus
once we have hit (nr_cpus - 1) number of cpus. So effectively we are
reserving one slot out of nr_cpus for boot cpu explicitly.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20110708171926.GF2930@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
I triggered a triple fault with gcc 4.5.1 because it did not
honor the inline annotation to arch_local_save_flags() function
and that function was added to the pool of functions traced by
the function tracer.
When preempt_schedule() called arch_local_save_flags() (called
by irqs_disabled()), it was traced, but the first thing the
function tracer does is disable preemption. When it enables
preemption, the NEED_RESCHED flag will not have been cleared and
the preemption check will trigger the call to preempt_schedule()
again.
Although the dynamic function tracer crashed immediately, the
static version of the function tracer (CONFIG_DYNAMIC_FTRACE is
not set) actually was able to show where the problem was.
swapper-1 3.N.. 103885us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103886us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103886us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103887us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103887us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103888us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103888us : arch_local_save_flags <-preempt_schedule
It went on for a while before it triple faulted with a corrupted
stack.
The arch_local_save_flags and arch_local_irq_* functions should
not be traced. Even though they are marked as inline, gcc may
still make them a function and enable tracing of them.
The simple solution is to just mark them as notrace. I had to
add the <linux/types.h> for this file to include the notrace
tag.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110702033852.733414762@goodmis.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Previously we would check for acpi_sci_override_gsi == gsi every time
a PCI device was enabled. That works during early bootup, but later
on it could lead to triggering unnecessarily the acpi_gsi_to_irq(..) lookup.
The reason is that acpi_sci_override_gsi was declared in __initdata and
after early bootup could contain bogus values.
This patch moves the check for acpi_sci_override_gsi to the
site where the ACPI SCI is preset.
CC: stable@kernel.org
Reported-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
Tested-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
[http://lists.xensource.com/archives/html/xen-devel/2011-07/msg00154.html]
Suggested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Testing suggests that at least some Lenovos and some Intels will
fail to reboot via EFI, attempting to jump to an unmapped
physical address. In the long run we could handle this by
providing a page table with a 1:1 mapping of physical addresses,
but for now it's probably just easier to assume that ACPI or
legacy methods will be present and reboot via those.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/1309985557-15350-1-git-send-email-mjg@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some BIOSes will reset the Intel MISC_ENABLE MSR (specifically the
XD_DISABLE bit) when resuming from S3, which can interact poorly with
ebba638ae7. In 32bit PAE mode, this can
lead to a fault when EFER is restored by the kernel wakeup routines,
due to it setting the NX bit for a CPU that (thanks to the BIOS reset)
now incorrectly thinks it lacks the NX feature. (64bit is not affected
because it uses a common CPU bring-up that specifically handles the
XD_DISABLE bit.)
The need for MISC_ENABLE being restored so early is specific to the S3
resume path. Normally, MISC_ENABLE is saved in save_processor_state(),
but this happens after the resume header is created, so just reproduce
the logic here. (acpi_suspend_lowlevel() creates the header, calls
do_suspend_lowlevel, which calls save_processor_state(), so the saved
processor context isn't available during resume header creation.)
[ hpa: Consider for stable if OK in mainline ]
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Link: http://lkml.kernel.org/r/20110707011034.GA8523@outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@kernel.org> 2.6.38+
Add a driver for the ACPI-based EC event interface found on the
OLPC XO-1.5 laptop. This enables notification of battery/AC power events,
and enables various devices to be used as wakeup sources through regular
ACPI mechanisms.
This driver can't be built as a module, because some drivers need to know
at boot-time if SCI-based functionality is available via
olpc_ec_wakeup_available().
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-12-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add a driver to configure the XO-1 RTC via CS5536 MSRs, to be used as a
system wakeup source via olpc-xo1-pm.
Device detection is based on finding the relevant device tree node.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-11-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
EC events indicate change in AC power connectivity, battery state of
charge, battery error, battery presence, etc. Send notifications to
the power supply subsystem when changes are detected.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-10-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Configure the XO-1's lid switch GPIO to trigger an SCI interrupt,
and correctly expose this input device which can be used as a wakeup
source.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-9-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The EC in the OLPC XO-1 delivers GPE events to provide various
notifications. Add the basic code for GPE/EC event processing and
enable the ebook switch, which can be used as a wakeup source.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-8-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Update the EC SCI masks with recent additions.
Add functions to query SCI events and set the wakeup mask, to be used by
followup patches.
Add functions to tweak an event mask used to select certain EC events as
a system wakeup source. Also add a function to determine if EC wakeup
functionality is available, as this depends on child drivers (different
for each laptop model) to configure the SCI interrupt.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-7-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The System Control Interrupt is used in the OLPC XO-1 to control various
features of the laptop. Add the driver base and the power button
functionality.
This driver can't be built as a module, because functionality added in
future patches means that some drivers need to know at boot-time whether
SCI-based functionality is available.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-6-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add code needed for basic suspend/resume of the XO-1 laptop.
Based on earlier work by Jordan Crouse, Andres Salomon, and others.
This patch incorporates all earlier feedback from Thomas Gleixner. To
clarify a certain point (now more obvious in the code itself):
On resume, OpenFirmware returns execution to Linux in protected mode
with a kernel-compatible GDT already set up. The changes and
simplifications suggested have all been included.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-5-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Based on earlier review comments, we'll no longer try to stick all of
our XO-1 goodies in a single driver. We'll split it into a power management
driver, and an EC/SCI driver.
As a first step, rename olpc-xo1 to olpc-xo1-pm, and make it builtin
instead of modular.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-4-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Move these definitions into the relevant header file.
This was requested in the review of the upcoming XO-1 suspend/resume code.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-3-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
In response to new device tree code in the kernel, OLPC will start
using it for probing of certain devices. However, some firmware fixes
are needed to put the devicetree into a usable state.
Retain compatibility with old firmware by fixing up the device tree
at boot-time if it does not contain the new nodes/properties that
we need for probing. This is the same approach taken on PPC platforms.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-2-git-send-email-dsd@laptop.org
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Andres Salomon <dilinger@queued.net>
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Since git commit
660e34cebf x86: reorder reboot method
preferences,
my Acer Aspire One hangs on reboot. It appears that its ACPI method
for rebooting is broken. The attached patch adds a quirk so that the
machine will reboot via the BIOS.
[ hpa: verified that the ACPI control on this machine is just plain broken. ]
Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au>
Link: http://lkml.kernel.org/r/w439iki5vl.wl%25peter@chubb.wattle.id.au
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Consumers of the table pointers in struct efi check for
EFI_INVALID_TABLE_ADDR to determine validity, hence these
pointers should all be pre-initialized to this value (rather
than zero).
Noticed by the discrepancy between efivars' systab sysfs entry
showing all tables (and their pointers) despite the code
intending to only display the valid ones. No other bad effects
known, but having the various table parsing routines bogusly
access physical address zero is certainly not very desirable
(even though they're unlikely to find anything useful there).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Link: http://lkml.kernel.org/r/4E13100A020000780004C256@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
in order to restore it later in ret_from_intr.
It is convenient because we save its value in the irq regs
and it's easily restored using the leave instruction.
However this is a kind of abuse of the frame pointer which
role is to help unwinding the kernel by chaining frames
together, each node following the return address to the
previous frame.
But although we are breaking the frame by changing the stack
pointer, there is no preceding return address before the new
frame. Hence using the frame pointer to link the two stacks
breaks the stack unwinders that find a random value instead of
a return address here.
There is no workaround that can work in every case. We are using
the fixup_bp_irq_link() function to dereference that abused frame
pointer in the case of non nesting interrupt (which means stack
changed).
But that doesn't fix the case of interrupts that don't change the
stack (but we still have the unconditional frame link), which is
the case of hardirq interrupting softirq. We have no way to detect
this transition so the frame irq link is considered as a real frame
pointer and the return address is dereferenced but it is still a
spurious one.
There are two possible results of this: either the spurious return
address, a random stack value, luckily belongs to the kernel text
and then the unwinding can continue and we just have a weird entry
in the stack trace. Or it doesn't belong to the kernel text and
unwinding stops there.
This is the reason why stacktraces (including perf callchains) on
irqs that interrupted softirqs don't work very well.
To solve this, we don't save the old stack pointer on rbp anymore
but we save it to a scratch register that we push on the new
stack and that we pop back later on irq return.
This preserves the whole frame chain without spurious return addresses
in the middle and drops the need for the horrid fixup_bp_irq_link()
workaround.
And finally irqs that interrupt softirq are sanely unwinded.
Before:
99.81% perf [kernel.kallsyms] [k] perf_pending_event
|
--- perf_pending_event
irq_work_run
smp_irq_work_interrupt
irq_work_interrupt
|
|--41.60%-- __read
| |
| |--99.90%-- create_worker
| | bench_sched_messaging
| | cmd_bench
| | run_builtin
| | main
| | __libc_start_main
| --0.10%-- [...]
After:
1.64% swapper [kernel.kallsyms] [k] perf_pending_event
|
--- perf_pending_event
irq_work_run
smp_irq_work_interrupt
irq_work_interrupt
|
|--95.00%-- arch_irq_work_raise
| irq_work_queue
| __perf_event_overflow
| perf_swevent_overflow
| perf_swevent_event
| perf_tp_event
| perf_trace_softirq
| __do_softirq
| call_softirq
| do_softirq
| irq_exit
| |
| |--73.68%-- smp_apic_timer_interrupt
| | apic_timer_interrupt
| | |
| | |--96.43%-- amd_e400_idle
| | | cpu_idle
| | | start_secondary
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
The unwinder backlink in interrupt entry is very useless.
It's actually not part of the stack frame chain and thus is
never used.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Just for clarity in the code. Have a first block that handles
the frame pointer and a separate one that handles pt_regs
pointer and its use.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
The save_regs function that saves the regs on low level
irq entry is complicated because of the fact it changes
its stack in the middle and also because it manipulates
data allocated in the caller frame and accesses there
are directly calculated from callee rsp value with the
return address in the middle of the way.
This complicates the static stack offsets calculation and
require more dynamic ones. It also needs a save/restore
of the function's return address.
To simplify and optimize this, turn save_regs() into a
macro.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
When regs are passed to dump_stack(), we fetch the frame
pointer from the regs but the stack pointer is taken from
the current frame.
Thus the frame and stack pointers may not come from the same
context. For example this can result in the unwinder to
think the context is in irq, due to the current value of
the stack, but the frame pointer coming from the regs points
to a frame from another place. It then tries to fix up
the irq link but ends up dereferencing a random frame
pointer that doesn't belong to the irq stack:
[ 9131.706906] ------------[ cut here ]------------
[ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
[ 9131.707003] Hardware name: AMD690VM-FMH
[ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain
[ 9131.707003] Modules linked in:
[ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
[ 9131.707003] Call Trace:
[ 9131.707003] <IRQ> [<ffffffff8104bd4a>] warn_slowpath_common+0x7a/0xb0
[ 9131.707003] [<ffffffff8104be21>] warn_slowpath_fmt+0x41/0x50
[ 9131.707003] [<ffffffff8178b873>] ? bad_to_user+0x6d/0x10be
[ 9131.707003] [<ffffffff8100c2da>] dump_trace+0x2aa/0x330
[ 9131.707003] [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
[ 9131.707003] [<ffffffff8101b164>] perf_callchain_kernel+0x54/0x70
[ 9131.707003] [<ffffffff810d391f>] perf_prepare_sample+0x19f/0x2a0
[ 9131.707003] [<ffffffff810d546c>] __perf_event_overflow+0x16c/0x290
[ 9131.707003] [<ffffffff810d5430>] ? __perf_event_overflow+0x130/0x290
[ 9131.707003] [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
[ 9131.707003] [<ffffffff8100fbb9>] ? sched_clock+0x9/0x10
[ 9131.707003] [<ffffffff810752e5>] ? T.375+0x15/0x90
[ 9131.707003] [<ffffffff81084da4>] ? trace_hardirqs_on_caller+0x64/0x180
[ 9131.707003] [<ffffffff810817bd>] ? trace_hardirqs_off+0xd/0x10
[ 9131.707003] [<ffffffff810d5764>] perf_event_overflow+0x14/0x20
[ 9131.707003] [<ffffffff810d588c>] perf_swevent_hrtimer+0x11c/0x130
[ 9131.707003] [<ffffffff817821a1>] ? error_exit+0x51/0xb0
[ 9131.707003] [<ffffffff81072e93>] __run_hrtimer+0x83/0x1e0
[ 9131.707003] [<ffffffff810d5770>] ? perf_event_overflow+0x20/0x20
[ 9131.707003] [<ffffffff81073256>] hrtimer_interrupt+0x106/0x250
[ 9131.707003] [<ffffffff812a3bfd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 9131.707003] [<ffffffff81024833>] smp_apic_timer_interrupt+0x53/0x90
[ 9131.707003] [<ffffffff81789053>] apic_timer_interrupt+0x13/0x20
[ 9131.707003] <EOI> [<ffffffff817821a1>] ? error_exit+0x51/0xb0
[ 9131.707003] [<ffffffff8178219c>] ? error_exit+0x4c/0xb0
[ 9131.707003] ---[ end trace b2560d4876709347 ]---
Fix this by simply taking the stack pointer from regs->sp
when regs are provided.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
In order to prepare for fetching the stack pointer from the
regs when possible in dump_trace() instead of taking the
local one, save the current stack pointer in perf live regs saving.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
This code uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so
it wasn't converted by commit 44c10138fd ("PCI: Change all
drivers to use pci_device->revision") before being moved to
arch/x86/...
Do it now at last -- and save one level of indentation...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/201107012242.08347.sshtylyov@ru.mvista.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
During 32/64 NUMA init unification, commit 797390d855 ("x86-32,
NUMA: use sparse_memory_present_with_active_regions()") made
32bit mm init call memory_present() automatically from
active_regions instead of leaving it to each NUMA init path.
This commit description is inaccurate - memory_present() calls
aren't the same for flat and numaq. After the commit,
memory_present() is only called for the intersection of e820 and
NUMA layout. Before, on flatmem, memory_present() would be
called from 0 to max_pfn. After, it would be called only on the
areas that e820 indicates to be populated.
This is how x86_64 works and should be okay as memmap is allowed
to contain holes; however, x86_32 DISCONTIGMEM is missing
early_pfn_valid(), which makes memmap_init_zone() assume that
memmap doesn't contain any hole. This leads to the following
oops if e820 map contains holes as it often does on machine with
near or more 4GiB of memory by calling pfn_to_page() on a pfn
which isn't mapped to a NUMA node, a reported by Conny Seidel:
BUG: unable to handle kernel paging request at 000012b0
IP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2
*pdpt =3D 0000000000000000 *pde =3D f000eef3f000ee00
Oops: 0000 [#1] SMP
last sysfs file:
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00164-g797390d #1 To Be Filled By O.E.M. To Be Filled By O.E.M./E350M1
EIP: 0060:[<c1aa13ce>] EFLAGS: 00010012 CPU: 0
EIP is at memmap_init_zone+0x6c/0xf2
EAX: 00000000 EBX: 000a8000 ECX: 000a7fff EDX: f2c00b80
ESI: 000a8000 EDI: f2c00800 EBP: c19ffe54 ESP: c19ffe34
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=3Dc19fe000 task=3Dc1a07f60 task.ti=3Dc19fe000)
Stack:
00000002 00000000 0023f000 00000000 10000000 00000a00 f2c00000 f2c00b58
c19ffeb0 c1a80f24 000375fe 00000000 f2c00800 00000800 00000100 00000030
c1abb768 0000003c 00000000 00000000 00000004 00207a02 f2c00800 000375fe
Call Trace:
[<c1a80f24>] free_area_init_node+0x358/0x385
[<c1a81384>] free_area_init_nodes+0x420/0x487
[<c1a79326>] paging_init+0x114/0x11b
[<c1a6cb13>] setup_arch+0xb37/0xc0a
[<c1a69554>] start_kernel+0x76/0x316
[<c1a690a8>] i386_start_kernel+0xa8/0xb0
This patch fixes the bug by defining early_pfn_valid() to be the
same as pfn_valid() when DISCONTIGMEM.
Reported-bisected-and-tested-by: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: hans.rosenfeld@amd.com
Cc: Christoph Lameter <cl@linux.com>
Cc: Conny Seidel <conny.seidel@amd.com>
Link: http://lkml.kernel.org/r/20110628094107.GB3386@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The v1 PMU does not have any fixed counters. Using the v2 constraints,
which do have fixed counters, causes an additional choice to be present
in the weight calculation, but not when actually scheduling the event,
leading to an event being not scheduled at all.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-3-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure. This is ugly and doesn't scale if a
single callback services many perf_events.
Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a NODE level to the generic cache events which is used to measure
local vs remote memory accesses. Like all other cache events, an
ACCESS is HIT+MISS, if there is no way to distinguish between reads
and writes do reads only etc..
The below needs filling out for !x86 (which I filled out with
unsupported events).
I'm fairly sure ARM can leave it like that since it doesn't strike me as
an architecture that even has NUMA support. SH might have something since
it does appear to have some NUMA bits.
Sparc64, PowerPC and MIPS certainly want a good look there since they
clearly are NUMA capable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the OFFCORE registers are fully symmetric, try the other one
when the specified one is already in use.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1306141897.18455.8.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds Intel Sandy Bridge offcore_response support by
providing the low-level constraint table for those events.
On Sandy Bridge, there are two offcore_response events. Each uses
its own dedictated extra register. But those registers are NOT shared
between sibling CPUs when HT is on unlike Nehalem/Westmere. They are
always private to each CPU. But they still need to be controlled within
an event group. All events within an event group must use the same
value for the extra MSR. That's not controlled by the second patch in
this series.
Furthermore on Sandy Bridge, the offcore_response events have NO
counter constraints contrary to what the official documentation
indicates, so drop the events from the contraint table.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110606145712.GA7304@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>