Given enough debug options some modules can grow large enough
that the GOT table gets bigger than 4K. On s390 the modules
are compiled with -fpic which limits the GOT to 4K. The end
result is a module that is loaded but won't work.
Add a sanity check to apply_rela and return with an error if
a relocation error is detected for a module.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
__ARCH_WANT_SYS_RT_SIGACTION,
__ARCH_WANT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore
CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} -
can be assumed always set.
There are two ways to express an interruption subclass:
- As a bitmask, as used in cr6.
- As a number, as used in the I/O interruption word.
Unfortunately, we have treated the I/O interruption word as if it
contained the bitmask as well, which went unnoticed so far as
- (not-yet-released) qemu made the same mistake, and
- Linux guest kernels don't check the isc value in the I/O interruption
word for subchannel interrupts.
Make sure that we treat the I/O interruption word correctly.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Only alpha and sparc are unusual - they have ka_restorer in it.
And nobody needs that exposed to userland.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Since ed4f209 "s390/time: fix sched_clock() overflow" a new helper function
is used to avoid overflows when converting TOD format values to nanosecond
values.
The kvm interrupt code formerly however only worked by accident because of
an overflow. It tried to program a timer that would expire in more than ~29
years. Because of the old TOD-to-nanoseconds overflow bug the real expiry
value however was much smaller, but now it isn't anymore.
This however triggers yet another bug in the function that programs the clock
comparator s390_next_ktime(): if the absolute "expires" value is after 2042
this will result in an overflow and the programmed value is lower than the
current TOD value which immediatly triggers a clock comparator (= timer)
interrupt.
Since the timer isn't expired it will be programmed immediately again and so
on... the result is a dead system.
To fix this simply program the maximum possible value if an overflow is
detected.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # v3.3+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Instructions with long displacement have a signed displacement.
Currently the sign bit is interpreted as 2^20: Lets fix it by doing the
sign extension from 20bit to 32bit and then use it as a signed variable
in the addition (see kvm_s390_get_base_disp_rsy).
Furthermore, there are lots of "int" in that code. This is problematic,
because shifting on a signed integer is undefined/implementation defined
if the bit value happens to be negative.
Fortunately the promotion rules will make the right hand side unsigned
anyway, so there is no real problem right now.
Let's convert them anyway to unsigned where appropriate to avoid
problems if the code is changed or copy/pasted later on.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
On store status we need to copy the current state of registers
into a save area. Currently we might save stale versions:
The sie state descriptor doesnt have fields for guest ACRS,FPRS,
those registers are simply stored in the host registers. The host
program must copy these away if needed. We do that in vcpu_put/load.
If we now do a store status in KVM code between vcpu_put/load, the
saved values are not up-to-date. Lets collect the ACRS/FPRS before
saving them.
This also fixes some strange problems with hotplug and virtio-ccw,
since the low level machine check handler (on hotplug a machine check
will happen) will revalidate all registers with the content of the
save area.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: stable@vger.kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
While remotely reading the cputime of a task running in a
full dynticks CPU, the values stored in utime/stime fields
of struct task_struct may be stale. Its values may be those
of the last kernel <-> user transition time snapshot and
we need to add the tickless time spent since this snapshot.
To fix this, flush the cputime of the dynticks CPUs on
kernel <-> user transition and record the time / context
where we did this. Then on top of this snapshot and the current
time, perform the fixup on the reader side from task_times()
accessors.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[fixed kvm module related build errors]
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Definitions and macros for implementing soreusport.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On s390, an architecture-specific implementation of the function
pmdp_set_wrprotect() is missing and the generic version is currently
being used. The generic version does not flush the tlb as it would be
needed on s390 when modifying an active pmd, which can lead to subtle
tlb errors on s390 when using transparent hugepages.
This patch adds an s390-specific implementation of pmdp_set_wrprotect()
including the missing tlb flush.
Cc: stable@vger.kernel.org
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
While a privileged program can open a raw socket, attach some
restrictive filter and drop its privileges (or send the socket to an
unprivileged program through some Unix socket), the filter can still
be removed or modified by the unprivileged program. This commit adds a
socket option to lock the filter (SO_LOCK_FILTER) preventing any
modification of a socket filter program.
This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even
root is not allowed change/drop the filter.
The state of the lock can be read with getsockopt(). No error is
triggered if the state is not changed. -EPERM is returned when a user
tries to remove the lock or to change/remove the filter while the lock
is active. The check is done directly in sk_attach_filter() and
sk_detach_filter() and does not affect only setsockopt() syscall.
Signed-off-by: Vincent Bernat <bernat@luffy.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
the variable inti should be freed in the branch CPUSTAT_STOPPED.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Converting a 64 Bit TOD format value to nanoseconds means that the value
must be divided by 4.096. In order to achieve that we multiply with 125
and divide by 512.
When used within sched_clock() this triggers an overflow after appr.
417 days. Resulting in a sched_clock() return value that is much smaller
than previously and therefore may cause all sort of weird things in
subsystems that rely on a monotonic sched_clock() behaviour.
To fix this implement a tod_to_ns() helper function which converts TOD
values without overflow and call this function from both places that
open coded the conversion: sched_clock() and kvm_s390_handle_wait().
Cc: stable@kernel.org
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The xfs module uses a lot of tracepoint, with TRACEPOINTS=y and a
few debugging options the GOT table of the xfs module will get
bigger than 4K. To get a working xfs module it needs to be compiled
with -fPIC instead of -fpic. To play safe use -fPIC for all modules.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The pfn calculation in pmd_pfn() is broken for thp, because it uses
HPAGE_SHIFT instead of the normal PAGE_SHIFT. This is fixed by removing
the distinction between thp and normal pmds in that function, and always
using PAGE_SHIFT.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Cornelia Huck <cornelia.huck@de.ibm.com>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
commit b080935c86
kvm: Directly account vtime to system on guest switch
also removed the irq_disable/enable around kvm guest switch, which
is correct in itself. Unfortunately, there is a BUG ON that (correctly)
checks for preemptible to cover the call to rcu later on.
(Introduced with commit 8fa2206821
KVM: make guest mode entry to be rcu quiescent state)
This check might trigger depending on the kernel config.
Lets make sure that no preemption happens during kvm_guest_enter.
We can enable preemption again after the call to
rcu_virt_note_context_switch returns.
Please note that we continue to run s390 guests with interrupts
enabled.
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Just map the read*_relaxed() functions to their corresponding read*() functions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Export cpu_topology symbol, so it's available for modules.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Export pm_power_off symbol. Needed by at least one of the new device
drivers that come with CONFIG_PCI.
And all other architectures export that symbol as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Define isa_dma_bridge_buggy. Needed to make pci quirks compile:
drivers/pci/quirks.c: In function ‘quirk_isa_dma_hangs’:
drivers/pci/quirks.c:88:7: error: ‘isa_dma_bridge_buggy’ undeclared (first use in this function)
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Count CPU Restart events and make them visible via /proc/interrupts.
Every CPU hotplug (online) event will increase the per cpu counter.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now that irq sum accounting for /proc/stat's "intr" line works again we
have the oddity that the sum field (first field) contains only the sum
of the second (external irqs) and third field (I/O interrupts).
The reason for that is that these two fields are already sums of all other
fields. So if we would sum up everything we would count every interrupt
twice.
This is broken since the split interrupt accounting was merged two years
ago: 052ff461c8 "[S390] irq: have detailed
statistics for interrupt types".
To fix this remove the split interrupt fields from /proc/stat's "intr"
line again and only have them in /proc/interrupts.
This restores the old behaviour, seems to be the only sane fix and mimics
a behaviour from other architectures where /proc/interrupts also contains
more than /proc/stat's "intr" line does.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For more than two years, since f2c66cd8ee
"/proc/stat: scalability of irq num per cpu" the output of /proc/stat is
broken.
The first field in the "intr" line should contain the sum of all interrupts,
however since the above mentioned change it is always zero.
The reason for that is that a per cpu irq sum variable had been introduced
which got incremented when calling kstat_incr_irqs_this_cpu(). However
on s390 we directly incremented only the per cpu per irq counter by accessing
the array element via kstat_cpu(smp_processor_id()).irqs[...].
So fix this and use the kstat_incr_irqs_this_cpu() wrapper which increments
both: the per cpu per irq counter and the per cpu irq sum counter.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Get rid of these:
arch/s390/pci/pci_dma.c:16:29: warning: ‘zpci_ioat_dt’ defined but not used [-Wunused-variable]
arch/s390/pci/pci.c:164:12: warning: ‘zpci_store_fib’ defined but not used [-Wunused-function]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fixes this section mismatch:
WARNING: vmlinux.o(.text+0x145e4): Section mismatch in reference from the function
smp_add_present_cpu() to the function .cpuinit.text:register_cpu()
The function smp_add_present_cpu() references
the function __cpuinit register_cpu().
This is often because smp_add_present_cpu lacks a __cpuinit
annotation or the annotation of register_cpu is wrong.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The debug_register/unregister_view() functions call debugfs_remove()
while holding the debug_info spinlock. Because debugfs_remove() takes
a mutex and therefore can sleep this is not allowed. To fix the problem
we give up the debug_info lock before calling debugfs_remove().
The following shows the lockdep message:
[ INFO: possible circular locking dependency detected ]
-------------------------------------------------------
rmmod/4379 is trying to acquire lock:
(&sb->s_type->i_mutex_key#2){+.+.+.}, at: [<00000000003acae2>] debugfs_remove+0x5e/0xa
but task is already holding lock:
(&(&rc->lock)->rlock){-.-...}, at: [<000000000010a5ae>] debug_unregister_view+0x3a/0xd
which lock already depends on the new lock.
-> #0 (&sb->s_type->i_mutex_key#2){+.+.+.}:
[<00000000001b1644>] validate_chain+0x880/0x1154
[<00000000001b4d6c>] __lock_acquire+0x414/0xc44
[<00000000001b5c16>] lock_acquire+0xbe/0x178
[<0000000000614016>] mutex_lock_nested+0x66/0x36c
[<00000000003acae2>] debugfs_remove+0x5e/0xac
[<000000000010a620>] debug_unregister_view+0xac/0xd0
[<000003ff8002f140>] qeth_core_exit+0x48/0xf08 [qeth]
[<00000000001c35a4>] SyS_delete_module+0x1a4/0x260
[<0000000000618134>] sysc_noemu+0x22/0x28
[<000003fffd4704da>] 0x3fffd4704da
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a new capability, KVM_CAP_S390_CSS_SUPPORT, which will pass
intercepts for channel I/O instructions to userspace. Only I/O
instructions interacting with I/O interrupts need to be handled
in-kernel:
- TEST PENDING INTERRUPTION (tpi) dequeues and stores pending
interrupts entirely in-kernel.
- TEST SUBCHANNEL (tsch) dequeues pending interrupts in-kernel
and exits via KVM_EXIT_S390_TSCH to userspace for subchannel-
related processing.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Explicitely catch all channel I/O related instructions intercepts
in the kernel and set condition code 3 for them.
This paves the way for properly handling these instructions later
on.
Note: This is not architecture compliant (the previous code wasn't
either) since setting cc 3 is not the correct thing to do for some
of these instructions. For Linux guests, however, it still has the
intended effect of stopping css probing.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Add support for injecting machine checks (only repressible
conditions for now).
This is a bit more involved than I/O interrupts, for these reasons:
- Machine checks come in both floating and cpu varieties.
- We don't have a bit for machine checks enabling, but have to use
a roundabout approach with trapping PSW changing instructions and
watching for opened machine checks.
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Add support for handling I/O interrupts (standard, subchannel-related
ones and rudimentary adapter interrupts).
The subchannel-identifying parameters are encoded into the interrupt
type.
I/O interrupts are floating, so they can't be injected on a specific
vcpu.
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
These tables are never modified.
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This fixes up all of the smaller arches that had __dev* markings for
their platform-specific drivers.
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, __devinitdata,
__devinitconst, and __devexit from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Myron Stowe <myron.stowe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Thierry Reding <thierry.reding@avionic-design.de>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Yong Zhang <yong.zhang0@gmail.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Jan Glauber <jang@linux.vnet.ibm.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>