schedule() has the special "TASK_INTERRUPTIBLE && signal_pending()" case,
this allows us to do
current->state = TASK_INTERRUPTIBLE;
schedule();
without fear to sleep with pending signal.
However, the code like
current->state = TASK_KILLABLE;
schedule();
is not right, schedule() doesn't take TASK_WAKEKILL into account. This means
that mutex_lock_killable(), wait_for_completion_killable(), down_killable(),
schedule_timeout_killable() can miss SIGKILL (and btw the second SIGKILL has
no effect).
Introduce the new helper, signal_pending_state(), and change schedule() to
use it. Hopefully it will have more users, that is why the task's state is
passed separately.
Note this "__TASK_STOPPED | __TASK_TRACED" check in signal_pending_state().
This is needed to preserve the current behaviour (ptrace_notify). I hope
this check will be removed soon, but this (afaics good) change needs the
separate discussion.
The fast path is "(state & (INTERRUPTIBLE | WAKEKILL)) + signal_pending(p)",
basically the same that schedule() does now. However, this patch of course
bloats schedule().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now we are using register_e820_active_regions() instead of
add_active_range() directly. So end_pfn could be different between the
value in early_node_map to node_end_pfn.
So we need to make shrink_active_range() smarter.
shrink_active_range() is a generic MM function in mm/page_alloc.c but
it is only used on 32-bit x86. Should we move it back to some file in
arch/x86?
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes a typo in the name of a config variable.
Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Reviewed-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Minor source code cleanup of page flags in mm/page_alloc.c.
Move the definition of the groups of bits to page-flags.h.
The purpose of this clean up is that the next patch will
conditionally add a page flag to the groups. Doing that
in a header file is cleaner than adding #ifdefs to the
C code.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ehea driver was recently changed[1] to use walk_memory_resource() to
detect the system's memory layout. However, walk_memory_resource() is
available only when memory hotplug is enabled. So CONFIG_EHEA was
made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a
network driver to have such a dependency.
Make the declaration of walk_memory_resource() and its powerpc
implementation (ehea is powerpc-specific) unconditionally available.
[1] 48cfb14f8b
"ehea: Add DLPAR memory remove support"
[2] fb7b6ca2b6
"ehea: Add dependency to Kconfig"
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 73f20e58b1 ("FAT_VALID_MEDIA():
remove pointless test") wrongly added the new fat_valid_media() function
to the userspace-visible part of include/linux/msdos_fs.h
Move it to the part of include/linux/msdos_fs.h that is not exported to
userspace.
Reported-by: Onur Küçük <onur@pardus.org.tr>
Reported-by: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To get zeroed out memory from a particular NUMA node. To be used by
sunrpc.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When posting:
[PATCH 1/8] Scaling msgmni to the amount of lowmem
(see http://article.gmane.org/gmane.linux.kernel/637849/) I changed the
MSGPOOL value to make it fit what is said in the man pages (i.e. a size
in bytes).
But Michael Kerrisk rightly complained that this change could affect the
ABI. So I'm posting this patch to make MSGPOOL expressed back in Kbytes.
Michael, on his side, has fixed the man page.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces memory_read_from_buffer().
The only difference between memory_read_from_buffer() and
simple_read_from_buffer() is which address space the function copies to.
simple_read_from_buffer copies to user space memory.
memory_read_from_buffer copies to normal memory.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Doug Warzecha <Douglas_Warzecha@dell.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: Abhay Salunke <Abhay_Salunke@dell.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Markus Rechberger <markus.rechberger@amd.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Len Brown <lenb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Brian King <brking@us.ibm.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Andrew Vasquez <linux-driver@qlogic.com>
Cc: Seokmann Ju <seokmann.ju@qlogic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bluetooth will be able to use this.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Although if people have questions about ARCnet, perhaps it's _better_
for them to be mailing dwmw2@cam.ac.uk about it...
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Migrate the PIT timer to the physical CPU which vcpu0 is scheduled on,
similarly to what is done for the LAPIC timers, otherwise PIT interrupts
will be delayed until an unrelated event causes an exit.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The RT folks over at RedHat found an issue w.r.t. hotplug support which
was traced to problems with the cpupri infrastructure in the scheduler:
https://bugzilla.redhat.com/show_bug.cgi?id=449676
This bug affects 23-rt12+, 24-rtX, 25-rtX, and sched-devel. This patch
applies to 25.4-rt4, though it should trivially apply to most cpupri enabled
kernels mentioned above.
It turned out that the issue was that offline cpus could get inadvertently
registered with cpupri so that they were erroneously selected during
migration decisions. The end result would be an OOPS as the offline cpu
had tasks routed to it.
This patch generalizes the old join/leave domain interface into an
online/offline interface, and adjusts the root-domain/hotplug code to
utilize it.
I was able to easily reproduce the issue prior to this patch, and am no
longer able to reproduce it after this patch. I can offline cpus
indefinately and everything seems to be in working order.
Thanks to Arnaldo (acme), Thomas, and Peter for doing the legwork to point
me in the right direction. Also thank you to Peter for reviewing the
early iterations of this patch.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch removes 24 bytes of padding and allows 1 extra object per
slab on my fedora based config.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Current IRQ affinity interface does not provide a way to set affinity
for the IRQs that will be allocated/activated in the future.
This patch creates /proc/irq/default_smp_affinity that lets users set
default affinity mask for the newly allocated IRQs. Changing the default
does not affect affinity masks for the currently active IRQs, they
have to be changed explicitly.
Updated based on Paul J's comments and added some more documentation.
Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Cc: pj@sgi.com
Cc: a.p.zijlstra@chello.nl
Cc: tglx@linutronix.de
Cc: rdunlap@xenotime.net
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
These were removed in commit 26d507fcfef7f7d0cd2eec874a87169cc121c835:
> -#define V4L2_CID_HCENTER (V4L2_CID_BASE+22)
> -#define V4L2_CID_VCENTER (V4L2_CID_BASE+23)
> -#define V4L2_CID_LASTP1 (V4L2_CID_BASE+24) /*
> last CID + 1 */
> +
> +/* Deprecated, use V4L2_CID_PAN_RESET and V4L2_CID_TILT_RESET */
> +#define V4L2_CID_HCENTER_DEPRECATED (V4L2_CID_BASE+22)
> +#define V4L2_CID_VCENTER_DEPRECATED (V4L2_CID_BASE+23)
But there was no warning in Documentation/feature-removal-schedule.txt
and I'm receiving reports that it's breaking userspace apps (the
gstreamer-v4l2 plugin breaks in Fedora rawhide). You can't just pull
things from the published userspace API like that.
Please can we revert the addition of _DEPRECATED to these ioctl
definitions. Perhaps we can add a runtime warning if they actually get
used? Or a compile-time warning if we can manage that?
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
- Make ata_sff_altstatus private so nobody uses it by mistake
- Drop the 400nS delay from it
Add
ata_sff_irq_status - encapsulates the IRQ check logic
This function keeps the existing behaviour for altstatus using devices. I
actually suspect the logic was wrong before the changes but -rc isn't the
time to play with that
ata_sff_sync - ensure writes hit the device
Really we want an io* operation for 'is posted' eg ioisposted(ioaddr) so
that we can fix the nasty delay this causes on most systems.
- ata_sff_pause - 400nS delay
Ensure the command hit the device and delay 400nS
- ata_sff_dma_pause
Ensure the I/O hit the device and enforce an HDMA1:0 transition delay.
Requires altstatus register exists, BUG if not so we don't risk
corruption in MWDMA modes. (UDMA the checksum will save your backside in
theory)
The only other complication then is devices with their own handlers.
rb532 can use dma_pause but scc needs to access its own altstatus
register for internal errata workarounds so directly call the drivers own
altstatus function.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The field was supposed to allow the creation of an anycast route by
assigning an anycast address to an address prefix. It was never
implemented so this field is unused and serves no purpose. Remove it.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also removes an unused policy entry for an attribute which is
only used in kernel->user direction.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also removes an obsolete check for the unused flag RTCF_MASQ.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tty layer provides a callback that is used when the line discipline
is changed. Some hardware uses this to configure hardware specific
features such as IrDA mode on serial ports. Unfortunately the serial
layer does not provide this feature or pass it down to drivers.
Blackfin used to hack around this by rewriting the tty ops, but those are
now properly shared and const so the hack fails. Instead provide the
proper operations.
This change plus a follow up from the Blackfin guys is needed to avoid
blackfin losing features in this release.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since mmc_spi.h uses irqreturn_t type, it should include appropriate
header, otherwise build will break if users didn't include it (some of
them do not use interrupts).
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In dynamic ftrace, the mcount function starts off pointing to a stub
function that just returns.
On start up, the call to the stub is modified to point to a "record_ip"
function. The job of the record_ip function is to add the function to
a pre-allocated hash list. If the function is already there, it simply is
ignored, otherwise it is added to the list.
Later, a ftraced daemon wakes up and calls kstop_machine if any functions
have been recorded, and changes the calls to the recorded functions to
a simple nop. If no functions were recorded, the daemon goes back to sleep.
The daemon wakes up once a second to see if it needs to update any newly
recorded functions into nops. Usually it does not, but if a lot of code
has been executed for the first time in the kernel, the ftraced daemon
will call kstop_machine to update those into nops.
The problem currently is that there's no way to stop the daemon from doing
this, and it can cause unneeded latencies (800us which for some is bothersome).
This patch adds a new file /debugfs/tracing/ftraced_enabled. If the daemon
is active, reading this will return "enabled\n" and "disabled\n" when the
daemon is not running. To disable the daemon, the user can echo "0" or
"disable" into this file, and "1" or "enable" to re-enable the daemon.
Since the daemon is used to convert the functions into nops to increase
the performance of the system, I also added that anytime something is
written into the ftraced_enabled file, kstop_machine will run if there
are new functions that have been detected that need to be converted.
This way the user can disable the daemon but still be able to control the
conversion of the mcount calls to nops by simply,
"echo 0 > /debugfs/tracing/ftraced_enabled"
when they need to do more conversions.
To see the number of converted functions:
"cat /debugfs/tracing/dyn_ftrace_total_info"
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Source code out there hard-codes a notion of what the
_LINUX_CAPABILITY_VERSION #define means in terms of the semantics of the
raw capability system calls capget() and capset(). Its unfortunate, but
true.
Since the confusing header file has been in a released kernel, there is
software that is erroneously using 64-bit capabilities with the semantics
of 32-bit compatibilities. These recently compiled programs may suffer
corruption of their memory when sys_getcap() overwrites more memory than
they are coded to expect, and the raising of added capabilities when using
sys_capset().
As such, this patch does a number of things to clean up the situation
for all. It
1. forces the _LINUX_CAPABILITY_VERSION define to always retain its
legacy value.
2. adopts a new #define strategy for the kernel's internal
implementation of the preferred magic.
3. deprecates v2 capability magic in favor of a new (v3) magic
number. The functionality of v3 is entirely equivalent to v2,
the only difference being that the v2 magic causes the kernel
to log a "deprecated" warning so the admin can find applications
that may be using v2 inappropriately.
[User space code continues to be encouraged to use the libcap API which
protects the application from details like this. libcap-2.10 is the first
to support v3 capabilities.]
Fixes issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=447518.
Thanks to Bojan Smojver for the report.
[akpm@linux-foundation.org: s/depreciate/deprecate/g]
[akpm@linux-foundation.org: be robust about put_user size]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Bojan Smojver <bojan@rexursive.com>
Cc: stable@kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
The SW_RADIO code for EV_SW events has a name that is not descriptive
enough of its intended function, and could induce someone to think
KEY_RADIO is its EV_KEY counterpart, which is false.
Rename it to SW_RFKILL_ALL, and document what this event is for. Keep
the old name around, to avoid userspace ABI breaks.
The SW_RFKILL_ALL event is meant to be used by rfkill master switches. It
is not bound to a particular radio switch type, and usually applies to all
types. It is semantically tied to master rfkill switches that enable or
disable every radio in a system.
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
virtio allows drivers to suppress callbacks (ie. interrupts) for
efficiency (no locking, it's just an optimization).
There's a similar mechanism for the host to suppress notifications
coming from the guest: in that case, we ignore the suppression if the
ring is completely full.
It turns out that life is simpler if the host similarly ignores
callback suppression when the ring is completely empty: the network
driver wants to free up old packets in a timely manner, and otherwise
has to use a timer to poll.
We have to remove the code which ignores interrupts when the driver
has disabled them (again, it had no locking and hence was unreliable
anyway).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since commit 72e61eb40b (virtio: change config
to guest endian) config space is no longer fixed endian.
Lets change the virtio_blk_config variables.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty,
This patch is a prereq for the virtio_blk blocksize patch, please apply it
first.
Adding an u32 value to the virtio_blk_config unconvered a small bug the config
space defintions:
v is a pointer, to we have to use sizeof(*v) instead of sizeof(v).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note that by itself, having a "hardware" random generator does very
little: you should probably run "rngd" in your guest to feed this into
the kernel entropy pool.
Included:
virtio_rng: dont use vmalloced addresses for virtio
If virtio_rng is build as a module, random_data is an address
in vmalloc space. As virtio expects guest real addresses, this
can cause any kind of funny behaviour, so lets allocate
random_data dynamically with kmalloc.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Hello Rusty,
sometimes it is useful to share a disk (e.g. usr). To avoid file system
corruption, the disk should be mounted read-only in that case. This patch
adds a new feature flag, that allows the host to specify, if the disk should
be considered read-only.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Create the dev_set_name function now so that various subsystems can
start changing over to it before other changes in 2.6.27 will make it
compulsory.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
improve the sysbench ramp-up phase and its peak throughput on
a 16way NUMA box, by turning on WAKE_AFFINE:
tip/sched tip/sched+wake-affine
-------------------------------------------------
1: 700 830 +15.65%
2: 1465 1391 -5.28%
4: 3017 3105 +2.81%
8: 5100 6021 +15.30%
16: 10725 10745 +0.19%
32: 10135 10150 +0.16%
64: 9338 9240 -1.06%
128: 8599 8252 -4.21%
256: 8475 8144 -4.07%
-------------------------------------------------
SUM: 57558 57882 +0.56%
this change also improves lat_ctx from 6.69 usecs to 1.11 usec:
$ ./lat_ctx -s 0 2
"size=0k ovr=1.19
2 1.11
$ ./lat_ctx -s 0 2
"size=0k ovr=1.22
2 6.69
in sysbench it's an overall win with some weakness at the lots-of-clients
side. That happens because we now under-balance this workload
a bit. To counter that effect, turn on NEWIDLE:
wake-idle wake-idle+newidle
-------------------------------------------------
1: 830 834 +0.43%
2: 1391 1401 +0.65%
4: 3105 3091 -0.43%
8: 6021 6046 +0.42%
16: 10745 10736 -0.08%
32: 10150 10206 +0.55%
64: 9240 9533 +3.08%
128: 8252 8355 +1.24%
256: 8144 8384 +2.87%
-------------------------------------------------
SUM: 57882 58591 +1.21%
as a bonus this not only improves the many-clients case but
also improves the (more important) rampup phase.
sysbench is a workload that quickly breaks down if the
scheduler over-balances, so since it showed an improvement
under NEWIDLE this change is definitely good.
Yanmin Zhang reported:
Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1.
It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito.
With bisect, I located the following patch:
| 18d95a2832 is first bad commit
| commit 18d95a2832
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date: Sat Apr 19 19:45:00 2008 +0200
|
| sched: fair-group: SMP-nice for group scheduling
Revert it so that we get v2.6.25 behavior.
Bisected-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently it uses a single static char array, but that risks
being corrupted when multiple users issue message notes at the
same time. Make the buffers dynamically allocated when the trace
is setup and make them per-cpu instead.
The default max message size of 1k is also very large, the
interface is mainly for small text notes. So shrink it to 128 bytes.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Allows messages to be inserted into blktrace streams.
Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch implements Xen save/restore and migration.
Saving is triggered via xenbus, which is polled in
drivers/xen/manage.c. When a suspend request comes in, the kernel
prepares itself for saving by:
1 - Freeze all processes. This is primarily to prevent any
partially-completed pagetable updates from confusing the suspend
process. If CONFIG_PREEMPT isn't defined, then this isn't necessary.
2 - Suspend xenbus and other devices
3 - Stop_machine, to make sure all the other vcpus are quiescent. The
Xen tools require the domain to run its save off vcpu0.
4 - Within the stop_machine state, it pins any unpinned pgds (under
construction or destruction), performs canonicalizes various other
pieces of state (mostly converting mfns to pfns), and finally
5 - Suspend the domain
Restore reverses the steps used to save the domain, ending when all
the frozen processes are thawed.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Without console= arguments on the kernel command line, the first
console to register becomes enabled and the preferred console (the one
behind /dev/console). This is normally tty (assuming
CONFIG_VT_CONSOLE is enabled, which it commonly is).
This is okay as long tty is a useful console. But unless we have the
PV framebuffer, and it is enabled for this domain, tty0 in domU is
merely a dummy. In that case, we want the preferred console to be the
Xen console hvc0, and we want it without having to fiddle with the
kernel command line. Commit b8c2d3dfbc
did that for us.
Since we now have the PV framebuffer, we want to enable and prefer tty
again, but only when PVFB is enabled. But even then we still want to
enable the Xen console as well.
Problem: when tty registers, we can't yet know whether the PVFB is
enabled. By the time we can know (xenstore is up), the console setup
game is over.
Solution: enable console tty by default, but keep hvc as the preferred
console. Change the preferred console to tty when PVFB probes
successfully, unless we've been given console kernel parameters.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
While debugging latencies in the RT kernel, I found that it would be nice
to be able to filter away functions from the trace than just to filter
on functions.
I added a new interface to the debugfs tracing directory called
set_ftrace_notrace
When dynamic frace is enabled, this lets you filter away functions that will
not be recorded in the trace. It is similar to adding 'notrace' to those
functions but by doing it without recompiling the kernel.
Here's how set_ftrace_filter and set_ftrace_notrace interact. Remember, if
set_ftrace_filter is set, it removes all functions from the trace execpt for
those listed in the set_ftrace_filter. set_ftrace_notrace will prevent those
functions from being traced.
If you were to set one function in both set_ftrace_filter and
set_ftrace_notrace and that function was the same, then you would end up
with an empty trace.
the set of functions to trace is:
set_ftrace_filter == empty then
all functions not in set_ftrace_notrace
else
set of the set_ftrace_filter and not in set of set_ftrace_notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Based on Roland's patch. This approach was suggested by Austin Clements
from the very beginning, and then by Linus.
As Austin pointed out, the execing task can be killed by SI_TIMER signal
because exec flushes the signal handlers, but doesn't discard the pending
signals generated by posix timers. Perhaps not a bug, but people find this
surprising. See http://bugzilla.kernel.org/show_bug.cgi?id=10460
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Austin Clements <amdragon+kernelbugzilla@mit.edu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a journal checksum error is detected, the ext4 filesystem will call
ext4_error(), and the mount will either continue, become a read-only
mount, or cause a kernel panic based on the superblock flags
indicating the user's preference of what to do in case of filesystem
corruption being detected.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Align i2c_device_id.driver_data to 8 bytes to not fail on crossbuilds.
(Added in d2653e92732bd3911feff6bee5e23dbf959381db.)
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
The use of #defines with '##' pre-processor concatenation is a useful
way to form several symbol names with a common pattern. But when there
is just a single name obtained from that #define, it's just obfuscation.
Better to just write the plain symbol name, as is.
The following patch is a result of my wasting ten minutes looking through
the kernel to figure out what 'PB_migrate_end' meant, and forgetting what
I came to do, by the time I figured out that the #define PB_range macro
defined it.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove three extern declarations for routines
that don't exist. Fix a typo in a comment.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As git-grep shows, open_softirq() is always called with the last argument
being NULL
block/blk-core.c: open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);
kernel/hrtimer.c: open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL);
kernel/rcuclassic.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/rcupreempt.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL);
kernel/softirq.c: open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
kernel/softirq.c: open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL);
net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL);
This observation has already been made by Matthew Wilcox in June 2002
(http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html)
"I notice that none of the current softirq routines use the data element
passed to them."
and the situation hasn't changed since them. So it appears we can safely
remove that extra argument to save 128 (54) bytes of kernel data (text).
Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
.. allowing it to be write-protected just as other read-only data
under CONFIG_DEBUG_RODATA.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>