Yet another variant of the Dell Latitude series which requires
reboot=pci.
From the E5420 bug report by Daniel J Blueman:
> The E6420 is affected also (same platform, different casing and
> features), which provides an external confirmation of the issue; I can
> submit a patch for that later or include it if you prefer:
> http://linux.koolsolutions.com/2009/08/04/howto-fix-linux-hangfreeze-during-reboots-and-restarts/
Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
Rebooting on the Dell E5420 often hangs with the keyboard or ACPI
methods, but is reliable via the PCI method.
[ hpa: this was deferred because we believed for a long time that the
recent reshuffling of the boot priorities in commit
660e34cebf fixed this platform.
Unfortunately that turned out to be incorrect. ]
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Link: http://lkml.kernel.org/r/1305248699-2347-1-git-send-email-daniel.blueman@gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c
files, and I need it in a third one to fix a powerpc bug, so let's
first move it into a header
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Commit 2706a0bf7b ("x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit
too") enabled AMD NUMA for 32bit too. Unfortunately, SPARSEMEM
on 32bit had rather coarse (512MiB) addr->node mapping
granularity due to lack of space in page->flags. This led to
boot failure on certain AMD NUMA machines which had 128MiB
alignment on nodes.
Patches to properly detect this condition and reject NUMA
configuration are posted[1] but deemed too pervasive for merge
at this point (-rc6). Disable AMD NUMA for 32bit for now and
re-enable once the detection logic is merged.
[1] http://thread.gmane.org/gmane.linux.kernel/1161279/focus=1162583
Reported-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Conny Seidel <conny.seidel@amd.com>
Link: http://lkml.kernel.org/r/20110711083432.GC943@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Previously we would check for acpi_sci_override_gsi == gsi every time
a PCI device was enabled. That works during early bootup, but later
on it could lead to triggering unnecessarily the acpi_gsi_to_irq(..) lookup.
The reason is that acpi_sci_override_gsi was declared in __initdata and
after early bootup could contain bogus values.
This patch moves the check for acpi_sci_override_gsi to the
site where the ACPI SCI is preset.
CC: stable@kernel.org
Reported-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
Tested-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
[http://lists.xensource.com/archives/html/xen-devel/2011-07/msg00154.html]
Suggested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Testing suggests that at least some Lenovos and some Intels will
fail to reboot via EFI, attempting to jump to an unmapped
physical address. In the long run we could handle this by
providing a page table with a 1:1 mapping of physical addresses,
but for now it's probably just easier to assume that ACPI or
legacy methods will be present and reboot via those.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/1309985557-15350-1-git-send-email-mjg@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some BIOSes will reset the Intel MISC_ENABLE MSR (specifically the
XD_DISABLE bit) when resuming from S3, which can interact poorly with
ebba638ae7. In 32bit PAE mode, this can
lead to a fault when EFER is restored by the kernel wakeup routines,
due to it setting the NX bit for a CPU that (thanks to the BIOS reset)
now incorrectly thinks it lacks the NX feature. (64bit is not affected
because it uses a common CPU bring-up that specifically handles the
XD_DISABLE bit.)
The need for MISC_ENABLE being restored so early is specific to the S3
resume path. Normally, MISC_ENABLE is saved in save_processor_state(),
but this happens after the resume header is created, so just reproduce
the logic here. (acpi_suspend_lowlevel() creates the header, calls
do_suspend_lowlevel, which calls save_processor_state(), so the saved
processor context isn't available during resume header creation.)
[ hpa: Consider for stable if OK in mainline ]
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Link: http://lkml.kernel.org/r/20110707011034.GA8523@outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@kernel.org> 2.6.38+
Since git commit
660e34cebf x86: reorder reboot method
preferences,
my Acer Aspire One hangs on reboot. It appears that its ACPI method
for rebooting is broken. The attached patch adds a quirk so that the
machine will reboot via the BIOS.
[ hpa: verified that the ACPI control on this machine is just plain broken. ]
Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au>
Link: http://lkml.kernel.org/r/w439iki5vl.wl%25peter@chubb.wattle.id.au
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
During 32/64 NUMA init unification, commit 797390d855 ("x86-32,
NUMA: use sparse_memory_present_with_active_regions()") made
32bit mm init call memory_present() automatically from
active_regions instead of leaving it to each NUMA init path.
This commit description is inaccurate - memory_present() calls
aren't the same for flat and numaq. After the commit,
memory_present() is only called for the intersection of e820 and
NUMA layout. Before, on flatmem, memory_present() would be
called from 0 to max_pfn. After, it would be called only on the
areas that e820 indicates to be populated.
This is how x86_64 works and should be okay as memmap is allowed
to contain holes; however, x86_32 DISCONTIGMEM is missing
early_pfn_valid(), which makes memmap_init_zone() assume that
memmap doesn't contain any hole. This leads to the following
oops if e820 map contains holes as it often does on machine with
near or more 4GiB of memory by calling pfn_to_page() on a pfn
which isn't mapped to a NUMA node, a reported by Conny Seidel:
BUG: unable to handle kernel paging request at 000012b0
IP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2
*pdpt =3D 0000000000000000 *pde =3D f000eef3f000ee00
Oops: 0000 [#1] SMP
last sysfs file:
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00164-g797390d #1 To Be Filled By O.E.M. To Be Filled By O.E.M./E350M1
EIP: 0060:[<c1aa13ce>] EFLAGS: 00010012 CPU: 0
EIP is at memmap_init_zone+0x6c/0xf2
EAX: 00000000 EBX: 000a8000 ECX: 000a7fff EDX: f2c00b80
ESI: 000a8000 EDI: f2c00800 EBP: c19ffe54 ESP: c19ffe34
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=3Dc19fe000 task=3Dc1a07f60 task.ti=3Dc19fe000)
Stack:
00000002 00000000 0023f000 00000000 10000000 00000a00 f2c00000 f2c00b58
c19ffeb0 c1a80f24 000375fe 00000000 f2c00800 00000800 00000100 00000030
c1abb768 0000003c 00000000 00000000 00000004 00207a02 f2c00800 000375fe
Call Trace:
[<c1a80f24>] free_area_init_node+0x358/0x385
[<c1a81384>] free_area_init_nodes+0x420/0x487
[<c1a79326>] paging_init+0x114/0x11b
[<c1a6cb13>] setup_arch+0xb37/0xc0a
[<c1a69554>] start_kernel+0x76/0x316
[<c1a690a8>] i386_start_kernel+0xa8/0xb0
This patch fixes the bug by defining early_pfn_valid() to be the
same as pfn_valid() when DISCONTIGMEM.
Reported-bisected-and-tested-by: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: hans.rosenfeld@amd.com
Cc: Christoph Lameter <cl@linux.com>
Cc: Conny Seidel <conny.seidel@amd.com>
Link: http://lkml.kernel.org/r/20110628094107.GB3386@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the past we would use the GSI value to preset the ACPI SCI
IRQ which worked great as GSI == IRQ:
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
While that is most often seen, there are some oddities:
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low level)
which means that GSI 20 (or pin 20) is to be overriden for IRQ 9.
Our code that presets the interrupt for ACPI SCI however would
use the GSI 20 instead of IRQ 9 ending up with:
xen: sci override: global_irq=20 trigger=0 polarity=1
xen: registering gsi 20 triggering 0 polarity 1
xen: --> pirq=20 -> irq=20
xen: acpi sci 20
.. snip..
calling acpi_init+0x0/0xbc @ 1
ACPI: SCI (IRQ9) allocation failed
ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20110413/evevent-119)
ACPI: Unable to start the ACPI Interpreter
as the ACPI interpreter made a call to 'acpi_gsi_to_irq' which got nine.
It used that value to request an IRQ (request_irq) and since that was not
present it failed.
The fix is to recognize that for interrupts that are overriden (in our
case we only care about the ACPI SCI) we should use the IRQ number
to present the IRQ instead of the using GSI. End result is that we get:
xen: sci override: global_irq=20 trigger=0 polarity=1
xen: registering gsi 20 triggering 0 polarity 1
xen: --> pirq=20 -> irq=9 (gsi=9)
xen: acpi sci 9
which fixes the ACPI interpreter failing on startup.
CC: stable@kernel.org
Reported-by: Liwei <xieliwei@gmail.com>
Tested-by: Liwei <xieliwei@gmail.com>
[http://lists.xensource.com/archives/html/xen-devel/2011-06/msg01727.html]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Simple enough - we use an extern defined symbol which is not
defined when CONFIG_SMP is not defined. This fixes the linker
dying.
CC: stable@kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
%rip-relative addressing is relative to the first byte of the next instruction,
so we need to add %rip only after we've fetched any immediate bytes.
Based on original patch by Li Xin <xin.li@intel.com>.
Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Li Xin <xin.li@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Trying to build the Intel SCU Watchdog fails for me with gcc 4.6.0 -
$ gcc --version | head -n 1
gcc (GCC) 4.6.0 20110513 (prerelease)
like this :
CC drivers/watchdog/intel_scu_watchdog.o
In file included from drivers/watchdog/intel_scu_watchdog.c:49:0:
/home/jj/src/linux-2.6/arch/x86/include/asm/apb_timer.h: In function ‘apbt_time_init’:
/home/jj/src/linux-2.6/arch/x86/include/asm/apb_timer.h:65:42: warning: ‘return’ with a value, in function returning void [enabled by default]
drivers/watchdog/intel_scu_watchdog.c: In function ‘intel_scu_watchdog_init’:
drivers/watchdog/intel_scu_watchdog.c:468:2: error: implicit declaration of function ‘sfi_get_mtmr’ [-Werror=implicit-function-declaration]
drivers/watchdog/intel_scu_watchdog.c:468:32: warning: assignment makes pointer from integer without a cast [enabled by default]
cc1: some warnings being treated as errors
make[1]: *** [drivers/watchdog/intel_scu_watchdog.o] Error 1
make: *** [drivers/watchdog/intel_scu_watchdog.o] Error 2
Additionally, linux/types.h is needlessly being included twice in
drivers/watchdog/intel_scu_watchdog.c
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
commit 21a3c96 uses node_start/end_pfn(nid) for detection start/end
of nodes. But, it's not defined in linux/mmzone.h but defined in
/arch/???/include/mmzone.h which is included only under
CONFIG_NEED_MULTIPLE_NODES=y.
Then, we see
mm/page_cgroup.c: In function 'page_cgroup_init':
mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'
So, fixiing page_cgroup.c is an idea...
But node_start_pfn()/node_end_pfn() is a very generic macro and
should be implemented in the same manner for all archs.
(m32r has different implementation...)
This patch removes definitions of node_start/end_pfn() in each archs
and defines a unified one in linux/mmzone.h. It's not under
CONFIG_NEED_MULTIPLE_NODES, now.
A result of macro expansion is here (mm/page_cgroup.c)
for !NUMA
start_pfn = ((&contig_page_data)->node_start_pfn);
end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
for NUMA (x86-64)
start_pfn = ((node_data[nid])->node_start_pfn);
end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
Changelog:
- fixed to avoid using "nid" twice in node_end_pfn() macro.
Reported-and-acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 128-bit multiply in pvclock.h was missing an output constraint for
EDX which caused a register corruption to appear. Thanks to Ulrich for
diagnosing the EDX corruption and Avi for providing this fix.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The condition is opposite, it always maps huge page for the dirty tracked page
Reported-by: Steve <stefan.bosak@gmail.com>
Signed-off-by: Steve <stefan.bosak@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Only decache guest CR3 value if vcpu->arch.cr3 is stale.
Fixes loadvm with live guest.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-by: Markus Schade <markus.schade@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
On 3.0-rc1 I get
In file included from arch/x86/kvm/mmu.c:2856:
arch/x86/kvm/paging_tmpl.h: In function ‘paging32_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
In file included from arch/x86/kvm/mmu.c:2852:
arch/x86/kvm/paging_tmpl.h: In function ‘paging64_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
caused by 6e2ca7d180. According to Takuya
Yoshikawa, ptep_user won't be used uninitialized so shut up gcc.
Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Link: http://lkml.kernel.org/r/20110530094604.GC21833@liondog.tnic
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Commit 916f676f8d started reserving boot service code since some systems
require you to keep that code around until SetVirtualAddressMap is called.
However, in some cases those areas will overlap with reserved regions.
The proper medium-term fix is to fix the bootloader to prevent the
conflicts from occurring by moving the kernel to a better position,
but the kernel should check for this possibility, and only reserve regions
which can be reserved.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Link: http://lkml.kernel.org/r/4DF7A005.1050407@gmail.com
Acked-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The earlier attempts (24bdb0b62c)
at fixing this problem caused other problems to surface (PV guests
with no PCI passthrough would have SWIOTLB turned on - which meant
64MB of precious contingous DMA32 memory being eaten up per guest).
The problem was: "on xen we add an extra memory region at the end of
the e820, and on this particular machine this extra memory region
would start below 4g and cross over the 4g boundary:
[0xfee01000-0x192655000)
Unfortunately e820_end_of_low_ram_pfn does not expect an
e820 layout like that so it returns 4g, therefore initial_memory_mapping
will map [0 - 0x100000000), that is a memory range that includes some
reserved memory regions."
The memory range was the IOAPIC regions, and with the 1-1 mapping
turned on, it would map them as RAM, not as MMIO regions. This caused
the hypervisor to complain. Fortunately this is experienced only under
the initial domain so we guard for it.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
.. As it won't actually power off the machine.
Reported-by: Sven Köhler <sven.koehler@gmail.com>
Tested-by: Sven Köhler <sven.koehler@gmail.com>
Signed-off-by: Tom Goetz <tom.goetz@virtualcomputer.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The MAXSMP config option requires CPUMASK_OFFSTACK, which in turn
requires we init the memory for the maps while we bring up the cpus.
MAXSMP also increases NR_CPUS to 4096. This increase in size exposed an
issue in the argument construction for multicalls from
xen_flush_tlb_others. The args should only need space for the actual
number of cpus.
Also in 2.6.39 it exposes a bootup problem.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8157a1d3>] set_cpu_sibling_map+0x123/0x30d
...
Call Trace:
[<ffffffff81039a3f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[<ffffffff819dc4db>] xen_smp_prepare_cpus+0x36/0x135
..
CC: stable@kernel.org
Signed-off-by: Andrew Jones <drjones@redhat.com>
[v2: Updated to compile on 3.0]
[v3: Updated to compile when CONFIG_SMP is not defined]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In some rare cases, nmis are generated immediately after the nmi
handler of the cpu was started. This causes the counter not to be
enabled. Before enabling the nmi handlers we need to set variable
ctr_running first and make sure its value is written to memory.
Also, the patch makes all existing barriers a memory barrier instead
of a compiler barrier only.
Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: <stable@kernel.org> # .35+
Signed-off-by: Robert Richter <robert.richter@amd.com>
[ Also from Ben Hutchings <ben@decadent.org.uk> and Vitaliy Ivanov
<vitalivanov@gmail.com> ]
Commit 06ae40ce07 ("x86 idle: EXPORT_SYMBOL(default_idle, pm_idle)
only when APM demands it") removed the export for pm_idle/default_idle
unless the apm module was modularised and CONFIG_APM_CPU_IDLE was set.
But the apm module uses pm_idle/default_idle unconditionally,
CONFIG_APM_CPU_IDLE only affects the bios idle threshold. Adjust the
export accordingly.
[ Used #ifdef instead of #if defined() as it's shorter, and what both
Ben and Vitaliy used.. Andy, you're out-voted ;) - Linus ]
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When I added 3448a19da4
I forgot about the special uv handling code for this, so this
patch fixes it up.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Ingo Molnar
Signed-off-by: Dave Airlie <airlied@redhat.com>
Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function. This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.
With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.
Since the address limit is already set to USER_DS when start_thread()
will be invoked, this redundancy can be safely removed.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the following build failure:
drivers/built-in.o: In function `early_init_dt_check_for_initrd':
/home/florian/dev/kernel/x86/linux-2.6-x86/drivers/of/fdt.c:571:
undefined reference to `early_init_dt_setup_initrd_arch'
make: *** [.tmp_vmlinux1] Error 1
which happens as soon as we enable initrd support on a x86 devicetree
platform such as Intel CE4100.
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Maxime Bizon <mbizon@freebox.fr>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: stable@kernel.org # 2.6.39
Link: http://lkml.kernel.org/r/201106061015.50039.ffainelli@freebox.fr
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We only need to set max_pfn_mapped to the last pfn mapped on x86_64 to
make sure that cleanup_highmap doesn't remove important mappings at
_end.
We don't need to do this on x86_32 because cleanup_highmap is not called
on x86_32. Besides lowering max_pfn_mapped on x86_32 has the unwanted
side effect of limiting the amount of memory available for the 1:1
kernel pagetable allocation.
This patch reverts the x86_32 part of the original patch.
CC: stable@kernel.org
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
After a newly plugged CPU sets the cpu_online bit it enables
interrupts and goes idle. The cpu which brought up the new cpu waits
for the cpu_online bit and when it observes it, it sets the cpu_active
bit for this cpu. The cpu_active bit is the relevant one for the
scheduler to consider the cpu as a viable target.
With forced threaded interrupt handlers which imply forced threaded
softirqs we observed the following race:
cpu 0 cpu 1
bringup(cpu1);
set_cpu_online(smp_processor_id(), true);
local_irq_enable();
while (!cpu_online(cpu1));
timer_interrupt()
-> wake_up(softirq_thread_cpu1);
-> enqueue_on(softirq_thread_cpu1, cpu0);
^^^^
cpu_notify(CPU_ONLINE, cpu1);
-> sched_cpu_active(cpu1)
-> set_cpu_active((cpu1, true);
When an interrupt happens before the cpu_active bit is set by the cpu
which brought up the newly onlined cpu, then the scheduler refuses to
enqueue the woken thread which is bound to that newly onlined cpu on
that newly onlined cpu due to the not yet set cpu_active bit and
selects a fallback runqueue. Not really an expected and desirable
behaviour.
So far this has only been observed with forced hard/softirq threading,
but in theory this could happen without forced threaded hard/softirqs
as well. It's probably unobservable as it would take a massive
interrupt storm on the newly onlined cpu which causes the softirq loop
to wake up the softirq thread and an even longer delay of the cpu
which waits for the cpu_online bit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org # 2.6.39
Some PCIe cards ship with a PCI-PCIe bridge which is not
visible as a PCI device in Linux. But the device-id of the
bridge is present in the IOMMU tables which causes a boot
crash in the IOMMU driver.
This patch fixes by removing these cards from the IOMMU
handling. This is a pure -stable fix, a real fix to handle
this situation appriatly will follow for the next merge
window.
Cc: stable@kernel.org # > 2.6.32
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Unfortunatly there are systems where the AMD IOMMU does not
cover all devices. This breaks with the current driver as it
initializes the global dma_ops variable. This patch limits
the AMD IOMMU to the devices listed in the IVRS table fixing
DMA for devices not covered by the IOMMU.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The driver contains several loops counting on an u16 value
where the exit-condition is checked against variables that
can have values up to 0xffff. In this case the loops will
never exit. This patch fixed 3 such loops.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Commit f6511935f4 moved the permission check for io instructions
to the ->check_perm callback. It failed to copy the port value from RDX
register for string and "in,out ax,dx" instructions.
Fix it by reading RDX register at decode stage when appropriate.
Fixes FC8.32 installation.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
b->args[] has MC_ARGS elements, so the comparison here should be
">=" instead of ">". Otherwise we read past the end of the array
one space.
CC: stable@kernel.org
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
The flags field of struct resource from linux/ioport.h is "unsigned
long". Change the "type" parameter of coalesce_windows() function to
match that field. This fixes the following warning messages when
compiling with "make C=1 W=1 bzImage modules":
arch/x86/pci/acpi.c: In function ‘coalesce_windows’:
arch/x86/pci/acpi.c:198: warning: conversion to ‘long unsigned int’ from ‘int’ may change the sign of the result
arch/x86/pci/acpi.c:203: warning: conversion to ‘long unsigned int’ from ‘int’ may change the sign of the result
Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
A logic error in mwait_play_dead() causes the kernel to use
mwait even on cpus which don't support it, such as KVM virtual
cpus.
Introduced by:
349c004e3d31: x86: A fast way to check capabilities of the current cpu
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=36222
Reported-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1306758237-9327-1-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Without an IRQ chip set, we now get a WARN_ON and no timer interrupt. This
prevents booting.
Fortunately, the fix is a one-liner: set up the timer IRQ like everything
else.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .39.x
mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT on SMP hardware that supports it.
But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle().
Deprecate mwait_idle() and the "idle=mwait" cmdline param
to simplify the x86 idle code.
After this change, kernels configured with
(!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware
that support MWAIT will simply use HLT. If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be used.
cc: x86@kernel.org
cc: stable@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
We'd rather that modern machines not check if HLT works on
every entry into idle, for the benefit of machines that had
marginal electricals 15-years ago. If those machines are still running
the upstream kernel, they can use "idle=poll". The only difference
will be that they'll now invoke HLT in machine_hlt().
cc: x86@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
We don't want to export the pm_idle function pointer to modules.
Currently CONFIG_APM_CPU_IDLE w/ CONFIG_APM_MODULE forces us to.
CONFIG_APM_CPU_IDLE is of dubious value, it runs only on 32-bit
uniprocessor laptops that are over 10 years old. It calls into
the BIOS during idle, and is known to cause a number of machines
to fail.
Removing CONFIG_APM_CPU_IDLE and will allow us to stop exporting
pm_idle. Any systems that were calling into the APM BIOS
at run-time will simply use HLT instead.
cc: x86@kernel.org
cc: Jiri Kosina <jkosina@suse.cz>
cc: stable@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
In the long run, we don't want default_idle() or (pm_idle)() to
be exported outside of process.c. Start by not exporting them
to modules, unless the APM build demands it.
cc: x86@kernel.org
cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
The workaround for AMD erratum 400 uses the term "c1e" falsely suggesting:
1. Intel C1E is somehow involved
2. All AMD processors with C1E are involved
Use the string "amd_c1e" instead of simply "c1e" to clarify that
this workaround is specific to AMD's version of C1E.
Use the string "e400" to clarify that the workaround is specific
to AMD processors with Erratum 400.
This patch is text-substitution only, with no functional change.
cc: x86@kernel.org
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
32bit and 64bit on x86 are tested and working. The rest I have looked
at closely and I can't find any problems.
setns is an easy system call to wire up. It just takes two ints so I
don't expect any weird architecture porting problems.
While doing this I have noticed that we have some architectures that are
very slow to get new system calls. cris seems to be the slowest where
the last system calls wired up were preadv and pwritev. avr32 is weird
in that recvmmsg was wired up but never declared in unistd.h. frv is
behind with perf_event_open being the last syscall wired up. On h8300
the last system call wired up was epoll_wait. On m32r the last system
call wired up was fallocate. mn10300 has recvmmsg as the last system
call wired up. The rest seem to at least have syncfs wired up which was
new in the 2.6.39.
v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
v4: Moved wiring up of the system call to another patch
v5: ported to v2.6.39-rc6
v6: rebased onto parisc-next and net-next to avoid syscall conflicts.
v7: ported to Linus's latest post 2.6.39 tree.
> arch/blackfin/include/asm/unistd.h | 3 ++-
> arch/blackfin/mach-common/entry.S | 1 +
Acked-by: Mike Frysinger <vapier@gentoo.org>
Oh - ia64 wiring looks good.
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The commit 44259b1abf
Author: Andy Lutomirski <luto@MIT.EDU>
x86-64: Move vread_tsc into a new file with sensible options
Removed the -pg from tsc.o which caused the function graph tracer
to go into an infinite function call recursion as it uses the tsc
internally outside its recursion protection, thus tracing the tsc
breaks the function graph tracer.
This commit also added the file vread_tsc_64.c that gets used
by vdso but failed to prevent GCOV from monkeying with it,
causing userspace to try to access kernel data when GCOV was
enabled.
Thanks to Thomas Gleixner for pointing out GCOV as the likely
culprit that added strange kernel accesses into the vread_tsc()
call.
Cc: Author: Andy Lutomirski <luto@MIT.EDU>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
By the previous style change, CONFIG_GENERIC_FIND_NEXT_BIT,
CONFIG_GENERIC_FIND_BIT_LE, and CONFIG_GENERIC_FIND_LAST_BIT are not used
to test for existence of find bitops anymore.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Blackfin arch, like the x86 arch, needs to adjust the PC manually
after a breakpoint is hit as normally this is handled by the remote gdb.
However, rather than starting another arch ifdef mess, create a common
GDB_ADJUSTS_BREAK_OFFSET define for any arch to opt-in via their kgdb.h.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Dongdong Deng <dongdong.deng@windriver.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>