Early 4.3 versions of gcc apparently aggressively optimize the raw
time accumulation loop, replacing it with a divide.
On 32bit systems, this causes the following link errors:
undefined reference to `__umoddi3'
undefined reference to `__udivdi3'
The gcc issue has been fixed in 4.4 and greater.
This patch replaces the accumulation loop with a do_div, as suggested
by Linus.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
CC: Jason Wessel <jason.wessel@windriver.com>
CC: Larry Finger <Larry.Finger@lwfinger.net>
CC: Ingo Molnar <mingo@elte.hu>
CC: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 3bcf3860a4 (and the
accompanying commit c1e5c95402 "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).
The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.
Fix up various conflicts due to later fsnotify work.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The tv_nsec is a long and when added to the shifted interval it can wrap
and become negative which later causes looping problems in the
getrawmonotonic(). The edge case occurs when the system has slept for
a short period of time of ~2 seconds.
A trace printk of the values in this patch illustrate the problem:
ftrace time stamp: log
43.716079: logarithmic_accumulation: raw: 3d0913 tv_nsec d687faa
43.718513: logarithmic_accumulation: raw: 3d0913 tv_nsec da588bd
43.722161: logarithmic_accumulation: raw: 3d0913 tv_nsec de291d0
46.349925: logarithmic_accumulation: raw: 7a122600 tv_nsec e1f9ae3
46.349930: logarithmic_accumulation: raw: 1e848980 tv_nsec 8831c0e3
The kernel starts looping at 46.349925 in the getrawmonotonic() due to
the negative value from adding the raw value to tv_nsec.
A simple solution is to accumulate into a u64, and then normalize it
to a timespec_t.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
[ Reworked variable names and simplified some of the code. - John ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a dummy printk function for the maintenance of unused printks through gcc
format checking, and also so that side-effect checking is maintained too.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Secure discard is the same as discard except that all copies of the
discarded sectors (perhaps created by garbage collection) must also be
erased.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Gardiner <bengardiner@nanometrics.ca>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current kfifo scatterlist implementation will not work with chained
scatterlists. It assumes that struct scatterlist arrays are allocated
contiguously, which is not the case when chained scatterlists (struct
sg_table) are in use.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simply replace the whole kfifo.c and kfifo.h files with the new generic
version and fix the kerneldoc API template file.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the new version of the kfifo API files kfifo.c and kfifo.h.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
copy_to/from_user() returns the number of bytes remaining to be copied.
It never returns a negative value. The correct return code is -EFAULT and
not -EIO.
All the callers check for non-zero returns so that's Ok, but the return
code is passed to the user so we should fix this.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Simon Kagstrom <simon.kagstrom@netinsight.net>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are missing the oops end marker for the exception based WARN implementation
in lib/bug.c. This is useful for logfile analysis tools.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To keep panic_timeout accuracy when running under a hypervisor, the
current implementation only spins on long time (1 second) calls to mdelay.
That brings a good effect, but the problem is the keyboard LEDs don't
blink at all on that situation.
This patch changes to call to panic_blink_enter() between every mdelay and
keeps blinking in spite of long spin timer mode.
The time to call to mdelay is now 100ms. Even this change will keep
panic_timeout accuracy enough when running under a hypervisor.
Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alloc_pidmap() calculates max_scan so that if the initial offset != 0 we
inspect the first map->page twice. This is correct, we want to find the
unused bits < offset in this bitmap block. Add the comment.
But it doesn't make any sense to stop the find_next_offset() loop when we
are looking into this map->page for the second time. We have already
already checked the bits >= offset during the first attempt, it is fine to
do this again, no matter if we succeed this time or not.
Remove this hard-to-understand code. It optimizes the very unlikely case
when we are going to fail, but slows down the more likely case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Salman Qazi <sqazi@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A program that repeatedly forks and waits is susceptible to having the
same pid repeated, especially when it competes with another instance of
the same program. This is really bad for bash implementation.
Furthermore, many shell scripts assume that pid numbers will not be used
for some length of time.
Race Description:
A B
// pid == offset == n // pid == offset == n + 1
test_and_set_bit(offset, map->page)
test_and_set_bit(offset, map->page);
pid_ns->last_pid = pid;
pid_ns->last_pid = pid;
// pid == n + 1 is freed (wait())
// Next fork()...
last = pid_ns->last_pid; // == n
pid = last + 1;
Code to reproduce it (Running multiple instances is more effective):
#include <errno.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
// The distance mod 32768 between two pids, where the first pid is expected
// to be smaller than the second.
int PidDistance(pid_t first, pid_t second) {
return (second + 32768 - first) % 32768;
}
int main(int argc, char* argv[]) {
int failed = 0;
pid_t last_pid = 0;
int i;
printf("%d\n", sizeof(pid_t));
for (i = 0; i < 10000000; ++i) {
if (i % 32786 == 0)
printf("Iter: %d\n", i/32768);
int child_exit_code = i % 256;
pid_t pid = fork();
if (pid == -1) {
fprintf(stderr, "fork failed, iteration %d, errno=%d", i, errno);
exit(1);
}
if (pid == 0) {
// Child
exit(child_exit_code);
} else {
// Parent
if (i > 0) {
int distance = PidDistance(last_pid, pid);
if (distance == 0 || distance > 30000) {
fprintf(stderr,
"Unexpected pid sequence: previous fork: pid=%d, "
"current fork: pid=%d for iteration=%d.\n",
last_pid, pid, i);
failed = 1;
}
}
last_pid = pid;
int status;
int reaped = wait(&status);
if (reaped != pid) {
fprintf(stderr,
"Wait return value: expected pid=%d, "
"got %d, iteration %d\n",
pid, reaped, i);
failed = 1;
} else if (WEXITSTATUS(status) != child_exit_code) {
fprintf(stderr,
"Unexpected exit status %x, iteration %d\n",
WEXITSTATUS(status), i);
failed = 1;
}
}
}
exit(failed);
}
Thanks to Ted Tso for the key ideas of this implementation.
Signed-off-by: Salman Qazi <sqazi@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
exit_ptrace() takes tasklist_lock unconditionally. We need this lock to
avoid the race with ptrace_traceme(), it acts as a barrier.
Change its caller, forget_original_parent(), to call exit_ptrace() under
tasklist_lock. Change exit_ptrace() to drop and reacquire this lock if
needed.
This allows us to add the fastpath list_empty(ptraced) check. In the
likely no-tracees case exit_ptrace() just returns and we avoid the lock()
+ unlock() sequence.
"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> suggested to add this
check, and he reports that this change adds about 11% improvement in some
tests.
Suggested-and-tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The original code didn't leave enough space for a NULL terminator. These
strings are copied with strcpy() into fixed length buffers in
cgroup_root_from_opts().
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Reviewd-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Ben Blum <bblum@andrew.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There may be cases (most obviously, sysfs-writable charp parameters) where
a module needs to prevent sysfs access to parameters.
Rather than express this in terms of a big lock, the functions are
expressed in terms of what they protect against. This is clearer, esp.
if the implementation changes to a module-level or even param-level lock.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Since this section can be read-only (they're in .rodata), they should
always have been const. Minor flow-through various functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Instead of using a "I kmalloced this" flag, we keep track of the kmalloced
strings and use that list to check if we need to kfree (in practice, the
list is very short).
This means that kparams can be const again, and plugs a leak. This
is important for drivers/usb/gadget/nokia.c which gets modprobe/rmmod'ed
frequently on the N9000.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
This allows us to generalize the KPARAM_KMALLOCED flag, by calling a function
on every parameter when a module is unloaded.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
This is more kernel-ish, saves some space, and also allows us to
expand the ops without breaking all the callers who are happy for the
new members to be NULL.
The few places which defined their own param types are changed to the
new scheme (more which crept in recently fixed in following patches).
Since we're touching them anyway, we change get() and set() to take a
const struct kernel_param (which they really are). This causes some
harmless warnings until we fix them (in following patches).
To reduce churn, module_param_call creates the ops struct so the callers
don't have to change (and casts the functions to reduce warnings).
The modern version which takes an ops struct is called module_param_cb.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Alessandro Rubini <rubini@ipvvis.unipv.it>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
This is modern style, and good to do before we start changing things.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
An audit by Dongdong Deng revealed that most driver-author-written param
calls don't handle val == NULL (which happens when parameters are specified
with no =, eg "foo" instead of "foo=1").
The only real case to use this is boolean, so handle it specially for that
case and remove a source of bugs for everyone else.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dongdong Deng <dongdong.deng@windriver.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Add three helpers that retrieve a refcounted copy of the root and cwd
from the supplied fs_struct.
get_fs_root()
get_fs_pwd()
get_fs_root_and_pwd()
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix kernel-doc warning, add @timer description:
Warning(kernel/timer.c:335): No description found for parameter 'timer'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kmsg_dump takes care to sample the global variables
inside a spinlock, but then goes on to use the same
variables outside the spinlock region too.
Use the correct variable. This will make the race
window smaller.
Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reorder elements in structure cpu_stopper to remove alignment padding on
64 bit builds, this shrinks its size from 40 to 32 bytes saving 8 bytes
per cpu.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove duplicate definition of ARRAY_SIZE(), which was never used anyway.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cleanup, no functional changes.
- __set_personality() always changes ->exec_domain/personality, the
special case when ->exec_domain remains the same buys nothing but
complicates the code. Unify both cases to simplify the code.
- The -EINVAL check in sys_personality() was never right. If we assume
that set_personality() can fail we should check the value it returns
instead of verifying that task->personality was actually changed.
Remove it. Before the previous patch it was possible to hit this case
due to overflow problems, but this -EINVAL just indicated the kernel
bug.
OTOH, probably it makes sense to change lookup_exec_domain() to return
ERR_PTR() instead of default_exec_domain if the search in exec_domains
list fails, and report this error to the user-space. But this means
another user-space change, and we have in-kernel users which need fixes.
For example, PER_OSF4 falls into PER_MASK for unkown reason and nobody
cares to register this domain.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Wenming Zhang <wezhang@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When taking a memory snapshot in hibernate_snapshot(), all (directly
called) memory allocations use GFP_ATOMIC. Hence swap misusage during
hibernation never occurs.
But from a pessimistic point of view, there is no guarantee that no page
allcation has __GFP_WAIT. It is better to have a global indication "we
enter hibernation, don't use swap!".
This patch tries to freeze new-swap-allocation during hibernation. (All
user processes are frozenm so swapin is not a concern).
This way, no updates will happen to swap_map[] between
hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free()
is called. We can be assured that swap corruption will not occur.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ondrej Zary <linux@rainbow-software.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This a complete rewrite of the oom killer's badness() heuristic which is
used to determine which task to kill in oom conditions. The goal is to
make it as simple and predictable as possible so the results are better
understood and we end up killing the task which will lead to the most
memory freeing while still respecting the fine-tuning from userspace.
Instead of basing the heuristic on mm->total_vm for each task, the task's
rss and swap space is used instead. This is a better indication of the
amount of memory that will be freeable if the oom killed task is chosen
and subsequently exits. This helps specifically in cases where KDE or
GNOME is chosen for oom kill on desktop systems instead of a memory
hogging task.
The baseline for the heuristic is a proportion of memory that each task is
currently using in memory plus swap compared to the amount of "allowable"
memory. "Allowable," in this sense, means the system-wide resources for
unconstrained oom conditions, the set of mempolicy nodes, the mems
attached to current's cpuset, or a memory controller's limit. The
proportion is given on a scale of 0 (never kill) to 1000 (always kill),
roughly meaning that if a task has a badness() score of 500 that the task
consumes approximately 50% of allowable memory resident in RAM or in swap
space.
The proportion is always relative to the amount of "allowable" memory and
not the total amount of RAM systemwide so that mempolicies and cpusets may
operate in isolation; they shall not need to know the true size of the
machine on which they are running if they are bound to a specific set of
nodes or mems, respectively.
Root tasks are given 3% extra memory just like __vm_enough_memory()
provides in LSMs. In the event of two tasks consuming similar amounts of
memory, it is generally better to save root's task.
Because of the change in the badness() heuristic's baseline, it is also
necessary to introduce a new user interface to tune it. It's not possible
to redefine the meaning of /proc/pid/oom_adj with a new scale since the
ABI cannot be changed for backward compatability. Instead, a new tunable,
/proc/pid/oom_score_adj, is added that ranges from -1000 to +1000. It may
be used to polarize the heuristic such that certain tasks are never
considered for oom kill while others may always be considered. The value
is added directly into the badness() score so a value of -500, for
example, means to discount 50% of its memory consumption in comparison to
other tasks either on the system, bound to the mempolicy, in the cpuset,
or sharing the same memory controller.
/proc/pid/oom_adj is changed so that its meaning is rescaled into the
units used by /proc/pid/oom_score_adj, and vice versa. Changing one of
these per-task tunables will rescale the value of the other to an
equivalent meaning. Although /proc/pid/oom_adj was originally defined as
a bitshift on the badness score, it now shares the same linear growth as
/proc/pid/oom_score_adj but with different granularity. This is required
so the ABI is not broken with userspace applications and allows oom_adj to
be deprecated for future removal.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The three oom killer sysctl variables (sysctl_oom_dump_tasks,
sysctl_oom_kill_allocating_task, and sysctl_panic_on_oom) are better
declared in include/linux/oom.h rather than kernel/sysctl.c.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We'll need the path to implement the flags field for statvfs support.
We do have it available in all callers except:
- ecryptfs_statfs. This one doesn't actually need vfs_statfs but just
needs to do a caller to the lower filesystem statfs method.
- sys_ustat. Add a non-exported statfs_by_dentry helper for it which
doesn't won't be able to fill out the flags field later on.
In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
of the misleading vfs prefix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 6ee0578b (workqueue: mark init_workqueues as early_initcall)
made workqueue SMP initialization depend on workqueue_cpu_callback(),
which however was registered as hotcpu_notifier() and didn't get
called if CONFIG_HOTPLUG_CPU is not set. This made gcwqs on non-boot
CPUs not create their initial workers leading to boot failures. Fix
it by making it a cpu_notifier.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-bisected-by: walt <w41ter@gmail.com>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
works in schecule_on_each_cpu() is a percpu pointer but was missing
__percpu markup. Add it.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The blktrace driver currently needs the BKL, but
we should not need to take that in the block layer,
so just push it down into the driver itself.
It is quite likely that the BKL is not actually
required in blktrace code and could be removed
in a follow-on patch.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.
Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
We really shouldn't be asking userspace to create new root filesystems.
So follow along with all of the other in-kernel filesystems, and provide
a mount point in sysfs.
For cgroupfs, this should be in /sys/fs/cgroup/ This change provides
that mount point when the cgroup filesystem is registered in the kernel.
Acked-by: Paul Menage <menage@google.com>
Acked-by: Dhaval Giani <dhaval.giani@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The kernel/hotplug sysctl variable (/proc/sys/kernel/hotplug file) was
made conditional on CONFIG_NET by commit
f743ca5e10 (applied in 2.6.18) to fix
problems with undefined references in 2.6.16 when CONFIG_HOTPLUG=y &&
!CONFIG_NET, but this restriction is no longer needed.
This patch makes the kernel/hotplug sysctl variable depend only on
CONFIG_HOTPLUG.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Randy Dunlap <randy.dunlap@oracle.COM>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The kernel console interface stores the number of lines it is
configured to use. The kdb debugger can greatly benefit by knowing how
many lines there are on the console for the pager functionality
without having the end user compile in the setting or have to
repeatedly change it at run time.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: David Airlie <airlied@linux.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
When an arch such as mips and microblaze does not implement either HW
or software single stepping the debug core should re-enter kdb. The
kdb code will properly ignore the single step operation. Attempting
to single step the kernel without software or hardware support causes
unpredictable kernel crashes.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
In systems with more than one processor it is desirable to look at the
per cpu trace buffers.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Add in a helper function to allow the kdb shell to dump the ftrace
buffer.
Modify trace.c to expose the capability to iterate over the ftrace
buffer in a read only capacity.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Presently the usable registers definitions on x86 are not contiguous
for kgdb. The x86 kgdb uses a case statement for the sparse register
accesses. The array which defines the registers (dbg_reg_def) should
not be used directly in order to safely work with sparse register
definitions.
Specifically there was a problem when gdb accesses ORIG_AX, which is
accessed only through the case statement.
This patch encodes register memory using the size information provided
from the debugger which avoids the need to look up the size of the
register. The dbg_set_reg() function always further validates the
inputs from the debugger.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
The gdbserial 'p' and 'P' packets allow gdb to individually get and
set registers instead of querying for all the available registers.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The kdb shell specification includes the ability to get and set
architecture specific registers by name.
For the time being individual register get and set will be implemented
on a per architecture basis. If an architecture defines
DBG_MAX_REG_NUM > 0 then kdb and the gdbstub will use the capability
for individually getting and setting architecture specific registers.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The gdb debugger understands how to parse short versions of the thread
reference string as long as the bytes are paired in sets of two
characters. The kgdb implementation was always sending 8 leading
zeros which could be omitted, and further optimized in the case of
non-negative thread numbers. The negative numbers are used to
reference a specific cpu in the case of kgdb.
An example of the previous i386 stop packet looks like:
T05thread:00000000000003bb;
New stop packet response:
T05thread:03bb;
The previous ThreadInfo response looks like:
m00000000fffffffe,0000000000000001,0000000000000002,0000000000000003,0000000000000004,0000000000000005,0000000000000006,0000000000000007,000000000000000c,0000000000000088,000000000000008a,000000000000008b,000000000000008c,000000000000008d,000000000000008e,00000000000000d4,00000000000000d5,00000000000000dd
New ThreadInfo response:
mfffffffe,01,02,03,04,05,06,07,0c,88,8a,8b,8c,8d,8e,d4,d5,dd
A few bytes saved means better response time when using kgdb over a
serial line.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
When a secondary CPU is being brought up, it is not uncommon for
printk() to be invoked when cpu_online(smp_processor_id()) == 0. The
case that I witnessed personally was on MIPS:
http://lkml.org/lkml/2010/5/30/4
If (can_use_console() == 0), printk() will spool its output to log_buf
and it will be visible in "dmesg", but that output will NOT be echoed to
the console until somebody calls release_console_sem() from a CPU that
is online. Therefore, the boot time messages from the new CPU can get
stuck in "limbo" for a long time, and might suddenly appear on the
screen when a completely unrelated event (e.g. "eth0: link is down")
occurs.
This patch modifies the console code so that any pending messages are
automatically flushed out to the console whenever a CPU hotplug
operation completes successfully or aborts.
The issue was seen on 2.6.34.
Original patch by Kevin Cernekee with cleanups by akpm and additional fixes
by Santosh Shilimkar. This patch superseeds
https://patchwork.linux-mips.org/patch/1357/.
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
To: <mingo@elte.hu>
To: <akpm@linux-foundation.org>
To: <simon.kagstrom@netinsight.net>
To: <David.Woodhouse@intel.com>
To: <lethal@linux-sh.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: <linux-mips@linux-mips.org>
Reviewed-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/1534/
LKML-Reference: <ede63b5a20af951c755736f035d1e787772d7c28@localhost>
LKML-Reference: <EAF47CD23C76F840A9E7FCE10091EFAB02C5DB6D1F@dbde02.ent.ti.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
On my (32-bit x86) machine, sys_init_module() uses 124 bytes of stack
once load_module() is inlined.
This effectively reverts ffb4ba76 which inlined it due to stack
pressure.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>