If oprofilefs_ulong_from_user() is called with count equals zero, *val
remains unchanged. Depending on the implementation it might be
uninitialized. Fixing users of oprofilefs_ulong_ from_user().
We missed these s390 changes with:
913050b oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Robert Richter <robert.richter@amd.com>
The following patch makes the microcode update code path
actually invoke the perf_check_microcode() function and
thus potentially renabling SNB PEBS.
By default, CONFIG_MICROCODE_OLD_INTERFACE is
forced to Y in arch/x86/Kconfig. There is no
way to disable this. That means that the code
path used in arch/x86/kernel/microcode_core.c
did not include the call to perf_check_microcode().
Thus, even though the microcode was updated to a
version that fixes the SNB PEBS problem, perf_event
would still return EOPNOTSUPP when enabling precise
sampling.
This patch simply adds a call to perf_check_microcode()
in the call path used when OLD_INTERFACE=y.
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: peterz@infradead.org
Cc: andi@firstfloor.org
Link: http://lkml.kernel.org/r/20120824133434.GA8014@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This reverts commit b1acf1bb54.
Something went horribly wrong when I did savedefconfig, not sure what,
but what's in there is busted so let's revert it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For certain speculative events on Power7, 'perf stat' reports far higher
event count than 'perf record' for the same event.
As described in following commit, a performance monitor exception is raised
even when the the performance events are rolled back.
commit 0837e3242c
Author: Anton Blanchard <anton@samba.org>
Date: Wed Mar 9 14:38:42 2011 +1100
perf_event_interrupt() records an event only when an overflow occurs. But
this check for overflow is a simple 'if (val < 0)'.
Because the events are rolled back, this check for overflow fails and the
event is not recorded. perf_event_interrupt() later uses pmc_overflow() to
detect the overflow and resets the counters and the events are lost completely.
To properly detect the overflow of rolled back events, use pmc_overflow()
even when recording events.
To reproduce:
$ cat strcpy.c
#include <stdio.h>
#include <string.h>
main()
{
char buf[256];
alarm(5);
while(1)
strcpy(buf, "string1");
}
$ perf record -e r20014 ./strcpy
$ perf report -n > report.1
$ perf stat -e r20014 > report.2
# Compare report.1 and report.2
Reported-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The enhanced prefetch hint patches corrupt the condition register
that was used to check if we are in interrupt. Fix this by using cr1.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
"powerpc: Use enhanced touch instructions in POWER7
copy_to_user/copy_from_user" was applied twice. Remove one.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Directly comparing current->personality against PER_LINUX32 doesn't work
in cases when any of the personality flags stored in the top three bytes
are used.
Directly forcefully setting personality to PER_LINUX32 or PER_LINUX
discards any flags stored in the top three bytes
Use personality() macro to compare only PER_MASK bytes and make sure that
we are setting only the bits that should be set, instead of overwriting
the whole value.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Checking for device mask to cover the whole IOMMU table is too strict.
IOMMU allocators should handle mask constraint properly for each
allocation.
The patch enables to use old AirPort Extreme cards on PowerMacs with
more than 1GB of memory; without the patch the driver init fails with:
b43-pci-bridge 0001:01:01.0: Warning: IOMMU window too big for device mask
b43-pci-bridge 0001:01:01.0: mask: 0x3fffffff, table end: 0x80000000
b43-phy0 ERROR: The machine/kernel does not support the required 30-bit DMA mask
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For powerpc BooKE and e200, singlestep is handled on the critical/dbg
exception stack. This causes current_thread_info() to fail for kgdb
internal, so previously We work around this issue by copying
the thread_info from the kernel stack before calling kgdb_handle_exception,
and copying it back afterwards.
But actually we don't do this properly. We should backup current_thread_info
then restore that when exit.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to skip a breakpoint exception when it occurs after
a breakpoint has already been removed.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The kgdb_single_step flag has the possibility to indefinitely
hang the system on an SMP system.
The x86 arch have the same problem, and that problem was fixed by
commit 8097551d9ab9b9e3630(kgdb,x86: do not set kgdb_single_step
on x86). This patch does the same behaviors as x86's patch.
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add several #includes that mpic_msgr relies on being pulled implicitly,
which only happens on certain configs.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Meador Inge <meador_inge@mentor.com>
Cc: Jia Hongtao <B38951@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently if you are doing a global perf recording with hardware
breakpoints (ie perf record -e mem:0xdeadbeef -a), you can oops with:
Faulting instruction address: 0xc000000000738890
cpu 0xc: Vector: 300 (Data Access) at [c0000003f76af8d0]
pc: c000000000738890: .hw_breakpoint_handler+0xa0/0x1e0
lr: c000000000738830: .hw_breakpoint_handler+0x40/0x1e0
sp: c0000003f76afb50
msr: 8000000000001032
dar: 6f0
dsisr: 42000000
current = 0xc0000003f765ac00
paca = 0xc00000000f262a00 softe: 0 irq_happened: 0x01
pid = 6810, comm = loop-read
enter ? for help
[c0000003f76afbe0] c00000000073cd04 .notifier_call_chain.isra.0+0x84/0xe0
[c0000003f76afc80] c00000000073cdbc .notify_die+0x3c/0x60
[c0000003f76afd20] c0000000000139f0 .do_dabr+0x40/0xf0
[c0000003f76afe30] c000000000005a9c handle_dabr_fault+0x14/0x48
--- Exception: 300 (Data Access) at 0000000010000480
SP (ff8679e0) is in userspace
This is because we don't check to see if the break point is associated
with task before we deference the task_struct pointer.
This changes the update to use current.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are a few whitespace goolies in xmon.c, some of them appear to
be my fault. Fix them all in one go.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since the printk internals were reworked the xmon 'dl' command which
dumps the content of __log_buf has stopped working.
It is now a structured buffer, so just dumping it doesn't really work.
Use the helpers added for kgdb to print out the content.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Yocto (Built by Poky 7.0) 1.2 root filesystems fail to boot,
at least over nfs, with:
Failed to mount /dev: No such device
Configuring DEVTMPFS fixes it.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
run make savedefconfig on fsl defconfigs.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Using 'select' in Kconfig is hard, a platform cannot just
enable a driver without also making sure that its subsystem
is there. Also, there is no actual code dependency between
the platform and the gpio leds driver.
Without this patch, building without LEDS_CLASS esults in:
drivers/built-in.o: In function `create_gpio_led.part.2':
governor_userspace.c:(.devinit.text+0x5a58): undefined reference to `led_classdev_register'
drivers/built-in.o: In function `gpio_led_remove':
governor_userspace.c:(.devexit.text+0x6b8): undefined reference to `led_classdev_unregister'
This reverts 8733f53c6 "ARM: ux500: Kconfig: Compile in leds-gpio
support for Snowball" that introduced the regression and did not
provide a helpful explanation.
In order to leave the GPIO LED code still present in normal
builds, this also enables the symbol in u8500_defconfig, in addition
to the other LED drivers that are already selected there.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Lee Jones <lee.jones@linaro.org>
The head-v7.S contains a call to the generic cpu_suspend function,
which is only available when selected by the i.MX6 code. As
pointed out by Shawn Guo, i.MX5 does not actually use any
functions defined in head-v7.S. It is also needed only for
the i.MX6 power management code and for the SMP code, so
we can restrict building this file to situations in which
at least one of those two is present.
Finally, other platforms with a similar file call it headsmp.S,
so we can rename it to the same for consistency.
Without this patch, building imx5 standalone results in:
arch/arm/mach-imx/built-in.o: In function `v7_cpu_resume':
arch/arm/mach-imx/head-v7.S:104: undefined reference to `cpu_resume'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Eric Miao <eric.miao@linaro.org>
Cc: stable@vger.kernel.org
If the kernel is compiled with gcc 4.6.0 which supports -mfentry,
then use that instead of mcount.
With mcount, frame pointers are forced with the -pg option and we
get something like:
<can_vma_merge_before>:
55 push %rbp
48 89 e5 mov %rsp,%rbp
53 push %rbx
41 51 push %r9
e8 fe 6a 39 00 callq ffffffff81483d00 <mcount>
31 c0 xor %eax,%eax
48 89 fb mov %rdi,%rbx
48 89 d7 mov %rdx,%rdi
48 33 73 30 xor 0x30(%rbx),%rsi
48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi
With -mfentry, frame pointers are no longer forced and the call looks
like this:
<can_vma_merge_before>:
e8 33 af 37 00 callq ffffffff81461b40 <__fentry__>
53 push %rbx
48 89 fb mov %rdi,%rbx
31 c0 xor %eax,%eax
48 89 d7 mov %rdx,%rdi
41 51 push %r9
48 33 73 30 xor 0x30(%rbx),%rsi
48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi
This adds the ftrace hook at the beginning of the function before a
frame is set up, and allows the function callbacks to be able to access
parameters. As kprobes now can use function tracing (at least on x86)
this speeds up the kprobe hooks that are at the beginning of the
function.
Link: http://lkml.kernel.org/r/20120807194100.130477900@goodmis.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The i.MX cpufreq implementation uses the CPU_FREQ_TABLE helpers,
so it needs to select that code to be built. This problem has
apparently existed since the i.MX cpufreq code was first merged
in v2.6.37.
Building IMX without CPU_FREQ_TABLE results in:
arch/arm/plat-mxc/built-in.o: In function `mxc_cpufreq_exit':
arch/arm/plat-mxc/cpufreq.c:173: undefined reference to `cpufreq_frequency_table_put_attr'
arch/arm/plat-mxc/built-in.o: In function `mxc_set_target':
arch/arm/plat-mxc/cpufreq.c:84: undefined reference to `cpufreq_frequency_table_target'
arch/arm/plat-mxc/built-in.o: In function `mxc_verify_speed':
arch/arm/plat-mxc/cpufreq.c:65: undefined reference to `cpufreq_frequency_table_verify'
arch/arm/plat-mxc/built-in.o: In function `mxc_cpufreq_init':
arch/arm/plat-mxc/cpufreq.c:154: undefined reference to `cpufreq_frequency_table_cpuinfo'
arch/arm/plat-mxc/cpufreq.c:162: undefined reference to `cpufreq_frequency_table_get_attr'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Yong Shen <yong.shen@linaro.org>
Cc: stable@vger.kernel.org
The ksz9021rn_phy_fixup and mx6q_sabrelite functions try to
set up an ethernet phy if they can. They do check whether
phylib is enabled, but unfortunately the functions can only
be called from platform code if phylib is builtin, not
if it is a module
Without this patch, building with a modular phylib results in:
arch/arm/mach-imx/mach-imx6q.c: In function 'imx6q_sabrelite_init':
arch/arm/mach-imx/mach-imx6q.c:120:5: error: 'ksz9021rn_phy_fixup' undeclared (first use in this function)
arch/arm/mach-imx/mach-imx6q.c:120:5: note: each undeclared identifier is reported only once for each function it appears in
The bug was originally reported by Artem Bityutskiy but only
partially fixed in ef441806 "ARM: imx6q: register phy fixup only when
CONFIG_PHYLIB is enabled".
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
This moves the imx5 pm code out of the list of unconditionally
compiled files for imx5, mirroring what we already do for imx6
and how it was done before the code was move from mach-mx5 to
mach-imx in v3.3.
Without this patch, building with CONFIG_PM disabled results in:
arch/arm/mach-imx/pm-imx5.c:202:116: error: redefinition of 'imx51_pm_init'
arch/arm/mach-imx/include/mach-imx/common.h:154:91: note: previous definition of 'imx51_pm_init' was here
arch/arm/mach-imx/pm-imx5.c:209:116: error: redefinition of 'imx53_pm_init'
arch/arm/mach-imx/include/mach-imx/common.h:155:91: note: previous definition of 'imx53_pm_init' was here
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: stable@vger.kernel.org
The new omap4 cpuidle implementation currently requires
ARCH_NEEDS_CPU_IDLE_COUPLED, which only works on SMP.
This patch makes it possible to build a non-SMP kernel
for that platform. This is not normally desired for
end-users but can be useful for testing.
Without this patch, building rand-0y2jSKT results in:
drivers/cpuidle/coupled.c: In function 'cpuidle_coupled_poke':
drivers/cpuidle/coupled.c:317:3: error: implicit declaration of function '__smp_call_function_single' [-Werror=implicit-function-declaration]
It's not clear if this patch is the best solution for
the problem at hand. I have made sure that we can now
build the kernel in all configurations, but that does
not mean it will actually work on an OMAP44xx.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Tony Lindgren <tony@atomide.com>
When we are finished with return PFNs to the hypervisor, then
populate it back, and also mark the E820 MMIO and E820 gaps
as IDENTITY_FRAMEs, we then call P2M to set areas that can
be used for ballooning. We were off by one, and ended up
over-writting a P2M entry that most likely was an IDENTITY_FRAME.
For example:
1-1 mapping on 40000->40200
1-1 mapping on bc558->bc5ac
1-1 mapping on bc5b4->bc8c5
1-1 mapping on bc8c6->bcb7c
1-1 mapping on bcd00->100000
Released 614 pages of unused memory
Set 277889 page(s) to 1-1 mapping
Populating 40200-40466 pfn range: 614 pages added
=> here we set from 40466 up to bc559 P2M tree to be
INVALID_P2M_ENTRY. We should have done it up to bc558.
The end result is that if anybody is trying to construct
a PTE for PFN bc558 they end up with ~PAGE_PRESENT.
CC: stable@vger.kernel.org
Reported-by-and-Tested-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If the controller has no PCIe module attached, accessing of the device
configuration space causes a data bus error. Avoid this by checking the
status of the PCIe link in advance, and indicate an error if the link
is down.
Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Cc: stable@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4293/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The binding doc and dts use properties "fsl,{cd,wp}-internal" while
esdhc driver uses "fsl,{cd,wp}-controller". Fix binding doc and dts
to get them match driver code.
Reported-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Acked-by: Chris Ball <cjb@laptop.org>
Though commit 602bf40 (ARM: imx6: exit coherency when shutting down
a cpu) improves the stability of imx6q cpu hotplug a lot, there are
still hangs seen with a more stressful hotplug testing.
It's expected that once imx_enable_cpu(cpu, false) is called, the cpu
will be taken down by hardware immediately, and the code after that
will not get any chance to execute. However, this is not always the
case from the testing. The cpu could possibly be alive for a few
cycles before hardware actually takes it down. So rather than letting
cpu execute some code that could cause a hang in these cycles, let's
make the cpu spin there and wait for hardware to take it down.
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
This issue was recently observed on an AMD C-50 CPU where a patch of
maximum size was applied.
Commit be62adb492 ("x86, microcode, AMD: Simplify ucode verification")
added current_size in get_matching_microcode(). This is calculated as
size of the ucode patch + 8 (ie. size of the header). Later this is
compared against the maximum possible ucode patch size for a CPU family.
And of course this fails if the patch has already maximum size.
Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1344361461-10076-1-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The sub-register used to access the stack (sp, esp, or rsp) is not
determined by the address size attribute like other memory references,
but by the stack segment's B bit (if not in x86_64 mode).
Fix by using the existing stack_mask() to figure out the correct mask.
This long-existing bug was exposed by a combination of a27685c33a
(emulate invalid guest state by default), which causes many more
instructions to be emulated, and a seabios change (possibly a bug) which
causes the high 16 bits of esp to become polluted across calls to real
mode software interrupts.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Although the possible race described in
commit 85b7059169
KVM: MMU: fix shrinking page from the empty mmu
was correct, the real cause of that issue was a more trivial bug of
mmu_shrink() introduced by
commit 1952639665
KVM: MMU: do not iterate over all VMs in mmu_shrink()
Here is the bug:
if (kvm->arch.n_used_mmu_pages > 0) {
if (!nr_to_scan--)
break;
continue;
}
We skip VMs whose n_used_mmu_pages is not zero and try to shrink others:
in other words we try to shrink empty ones by mistake.
This patch reverses the logic so that mmu_shrink() can free pages from
the first VM whose n_used_mmu_pages is not zero. Note that we also add
comments explaining the role of nr_to_scan which is not practically
important now, hoping this will be improved in the future.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Probably a leftover from the early days of self-patching, p6nops
are marked __initconst_or_module, which causes them to be
discarded in a non-modular kernel. If something later triggers
patching, it will overwrite kernel code with garbage.
Reported-by: Tomas Racek <tracek@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: qemu-devel@nongnu.org
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When one CPU is going down and this CPU is the last one in irq
affinity, current code is setting cpu_all_mask as the new
affinity for that irq.
But for some systems (such as in Medfield Android mobile) the
firmware sends the interrupt to each CPU in the irq affinity
mask, averaged, and cpu_all_mask includes all potential CPUs,
i.e. offline ones as well.
So replace cpu_all_mask with cpu_online_mask.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A137286@SHSMSX101.ccr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This comment is no longer true. We support up to 2^16 CPUs
because __ticket_t is an u16 if NR_CPUS is larger than 256.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Each page mapped in a process's address space must be correctly
accounted for in _mapcount. Normally the rules for this are
straightforward but hugetlbfs page table sharing is different. The page
table pages at the PMD level are reference counted while the mapcount
remains the same.
If this accounting is wrong, it causes bugs like this one reported by
Larry Woodman:
kernel BUG at mm/filemap.c:135!
invalid opcode: 0000 [#1] SMP
CPU 22
Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
Pid: 18001, comm: mpitest Tainted: G W 3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
RIP: 0010:[<ffffffff8112cfed>] [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
Call Trace:
delete_from_page_cache+0x40/0x80
truncate_hugepages+0x115/0x1f0
hugetlbfs_evict_inode+0x18/0x30
evict+0x9f/0x1b0
iput_final+0xe3/0x1e0
iput+0x3e/0x50
d_kill+0xf8/0x110
dput+0xe2/0x1b0
__fput+0x162/0x240
During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
shared page tables with the check dst_pte == src_pte. The logic is if
the PMD page is the same, they must be shared. This assumes that the
sharing is between the parent and child. However, if the sharing is
with a different process entirely then this check fails as in this
diagram:
parent
|
------------>pmd
src_pte----------> data page
^
other--------->pmd--------------------|
^
child-----------|
dst_pte
For this situation to occur, it must be possible for Parent and Other to
have faulted and failed to share page tables with each other. This is
possible due to the following style of race.
PROC A PROC B
copy_hugetlb_page_range copy_hugetlb_page_range
src_pte == huge_pte_offset src_pte == huge_pte_offset
!src_pte so no sharing !src_pte so no sharing
(time passes)
hugetlb_fault hugetlb_fault
huge_pte_alloc huge_pte_alloc
huge_pmd_share huge_pmd_share
LOCK(i_mmap_mutex)
find nothing, no sharing
UNLOCK(i_mmap_mutex)
LOCK(i_mmap_mutex)
find nothing, no sharing
UNLOCK(i_mmap_mutex)
pmd_alloc pmd_alloc
LOCK(instantiation_mutex)
fault
UNLOCK(instantiation_mutex)
LOCK(instantiation_mutex)
fault
UNLOCK(instantiation_mutex)
These two processes are not poing to the same data page but are not
sharing page tables because the opportunity was missed. When either
process later forks, the src_pte == dst pte is potentially insufficient.
As the check falls through, the wrong PTE information is copied in
(harmless but wrong) and the mapcount is bumped for a page mapped by a
shared page table leading to the BUG_ON.
This patch addresses the issue by moving pmd_alloc into huge_pmd_share
which guarantees that the shared pud is populated in the same critical
section as pmd. This also means that huge_pte_offset test in
huge_pmd_share is serialized correctly now which in turn means that the
success of the sharing will be higher as the racing tasks see the pud
and pmd populated together.
Race identified and changelog written mostly by Mel Gorman.
{akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
Reported-by: Larry Woodman <lwoodman@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 7c5763b845 (drivers:misc: Remove MISC_DEVICES config option) removed
CONFIG_MISC_DEVICES option, so remove the occurrences from the config files
as well.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Commit ec2212088c ("Disintegrate asm/system.h for Alpha") removed
asm/system.h however arch/alpha/oprofile/common.c requires definitions
that were shifted from asm/system.h to asm/special_insns.h. Include
that.
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following build error occurred during an alpha build:
net/core/sock.c:274:36: error: initializer element is not constant
Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });
The above line contains two compound literals. It also uses a designated
initializer to initialize the field enabled. A compound literal is not a
constant expression.
The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.
Cc: <stable@vger.kernel.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit ec2212088c ("Disintegrate asm/system.h for Alpha"), the
fpu.h header which we install for userland started depending on
special_insns.h which is not installed.
However, fpu.h only uses that for __KERNEL__ code, so protect the
inclusion the same way to avoid build breakage in glibc:
/usr/include/asm/fpu.h:4:31: fatal error: asm/special_insns.h: No such file or directory
Cc: stable@vger.kernel.org
Reported-by: Matt Turner <mattst88@gentoo.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit d065bd810b ("mm: retry page fault when blocking on disk
transfer") and 37b23e0525 ("x86,mm: make pagefault killable")
introduced changes into the x86 pagefault handler for making the page
fault handler retryable as well as killable.
These changes reduce the mmap_sem hold time, which is crucial during OOM
killer invocation.
Port these changes to ALPHA.
Signed-off-by: Mohd. Faris <mohdfarisq2010@gmail.com>
Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
New helper: current_thread_info(). Allows to do a bunch of odd syscalls
in C. While we are at it, there had never been a reason to do
osf_getpriority() in assembler. We also get "namespace"-aware (read:
consistent with getuid(2), etc.) behaviour from getx?id() syscalls now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to x86/sparc/powerpc implementations except:
1) we implement an extremely efficient has_zero()/find_zero()
sequence with both prep_zero_mask() and create_zero_mask()
no-operations.
2) Our output from prep_zero_mask() differs in that only the
lowest eight bits are used to represent the zero bytes
nevertheless it can be safely ORed with other similar masks
from prep_zero_mask() and forms input to create_zero_mask(),
the two fundamental properties prep_zero_mask() must satisfy.
Tests on EV67 and EV68 CPUs revealed that the generic code is
essentially as fast (to within 0.5% of CPU cycles) of the old
Alpha specific code for large quadword-aligned strings, despite
the 30% extra CPU instructions executed. In contrast, the
generic code for unaligned strings is substantially slower (by
more than a factor of 3) than the old Alpha specific code.
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add sys_process_vm_readv and sys_process_vm_writev to Alpha.
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we export SOCK_NONBLOCK to user space but that conflicts with
the definition from glibc leading to compilation errors in user programs
(e.g. see Debian bug #658460).
The generic socket.h restricts the definition of SOCK_NONBLOCK to the
kernel, as does the MIPS specific socket.h, so let's do the same on
Alpha.
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some of the arguments to {g,s}etsockopt are passed in userland pointers.
If we try to use the 64bit entry point, we end up sometimes failing.
For example, dhcpcd doesn't run in x32:
# dhcpcd eth0
dhcpcd[1979]: version 5.5.6 starting
dhcpcd[1979]: eth0: broadcasting for a lease
dhcpcd[1979]: eth0: open_socket: Invalid argument
dhcpcd[1979]: eth0: send_raw_packet: Bad file descriptor
The code in particular is getting back EINVAL when doing:
struct sock_fprog pf;
setsockopt(s, SOL_SOCKET, SO_ATTACH_FILTER, &pf, sizeof(pf));
Diving into the kernel code, we can see:
include/linux/filter.h:
struct sock_fprog {
unsigned short len;
struct sock_filter __user *filter;
};
net/core/sock.c:
case SO_ATTACH_FILTER:
ret = -EINVAL;
if (optlen == sizeof(struct sock_fprog)) {
struct sock_fprog fprog;
ret = -EFAULT;
if (copy_from_user(&fprog, optval, sizeof(fprog)))
break;
ret = sk_attach_filter(&fprog, sk);
}
break;
arch/x86/syscalls/syscall_64.tbl:
54 common setsockopt sys_setsockopt
55 common getsockopt sys_getsockopt
So for x64, sizeof(sock_fprog) is 16 bytes. For x86/x32, it's 8 bytes.
This comes down to the pointer being 32bit for x32, which means we need
to do structure size translation. But since x32 comes in directly to
sys_setsockopt, it doesn't get translated like x86.
After changing the syscall table and rebuilding glibc with the new kernel
headers, dhcp runs fine in an x32 userland.
Oddly, it seems like Linus noted the same thing during the initial port,
but I guess that was missed/lost along the way:
https://lkml.org/lkml/2011/8/26/452
[ hpa: tagging for -stable since this is an ABI fix. ]
Bugzilla: https://bugs.gentoo.org/423649
Reported-by: Mads <mads@ab3.no>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Link: http://lkml.kernel.org/r/1345320697-15713-1-git-send-email-vapier@gentoo.org
Cc: H. J. Lu <hjl.tools@gmail.com>
Cc: <stable@vger.kernel.org> v3.4..v3.5
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
with either a p2m_missing or p2m_identity respectively. The old
page (which was created via extend_brk or was grafted on from the
mfn_list) can be re-used for setting new PFNs.
This also means we can remove git commit:
5bc6f9888d
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
which tried to fix this.
and make the amount that is required to be reserved much smaller.
CC: stable@vger.kernel.org # for 3.5 only.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Previous attempts to add platform probing of the Audio related devices
only call from non-DT initialisation functions. This patch extends that
functionality to the Device Tree related ones too.
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The platform attempts to register platform device 'snd_soc_u8500'
which doesn't actually exist. Here we change the reference to the
correct one 'snd_soc_mop500'.
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>