|
|
|
#ifndef _LINUX__INIT_TASK_H
|
|
|
|
#define _LINUX__INIT_TASK_H
|
|
|
|
|
|
|
|
#include <linux/file.h>
|
|
|
|
#include <linux/rcupdate.h>
|
|
|
|
#include <linux/irqflags.h>
|
|
|
|
#include <linux/utsname.h>
|
[PATCH] lockdep: core
Do 'make oldconfig' and accept all the defaults for new config options -
reboot into the kernel and if everything goes well it should boot up fine and
you should have /proc/lockdep and /proc/lockdep_stats files.
Typically if the lock validator finds some problem it will print out
voluminous debug output that begins with "BUG: ..." and which syslog output
can be used by kernel developers to figure out the precise locking scenario.
What does the lock validator do? It "observes" and maps all locking rules as
they occur dynamically (as triggered by the kernel's natural use of spinlocks,
rwlocks, mutexes and rwsems). Whenever the lock validator subsystem detects a
new locking scenario, it validates this new rule against the existing set of
rules. If this new rule is consistent with the existing set of rules then the
new rule is added transparently and the kernel continues as normal. If the
new rule could create a deadlock scenario then this condition is printed out.
When determining validity of locking, all possible "deadlock scenarios" are
considered: assuming arbitrary number of CPUs, arbitrary irq context and task
context constellations, running arbitrary combinations of all the existing
locking scenarios. In a typical system this means millions of separate
scenarios. This is why we call it a "locking correctness" validator - for all
rules that are observed the lock validator proves it with mathematical
certainty that a deadlock could not occur (assuming that the lock validator
implementation itself is correct and its internal data structures are not
corrupted by some other kernel subsystem). [see more details and conditionals
of this statement in include/linux/lockdep.h and
Documentation/lockdep-design.txt]
Furthermore, this "all possible scenarios" property of the validator also
enables the finding of complex, highly unlikely multi-CPU multi-context races
via single single-context rules, increasing the likelyhood of finding bugs
drastically. In practical terms: the lock validator already found a bug in
the upstream kernel that could only occur on systems with 3 or more CPUs, and
which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
That bug was found and reported on a single-CPU system (!). So in essence a
race will be found "piecemail-wise", triggering all the necessary components
for the race, without having to reproduce the race scenario itself! In its
short existence the lock validator found and reported many bugs before they
actually caused a real deadlock.
To further increase the efficiency of the validator, the mapping is not per
"lock instance", but per "lock-class". For example, all struct inode objects
in the kernel have inode->inotify_mutex. If there are 10,000 inodes cached,
then there are 10,000 lock objects. But ->inotify_mutex is a single "lock
type", and all locking activities that occur against ->inotify_mutex are
"unified" into this single lock-class. The advantage of the lock-class
approach is that all historical ->inotify_mutex uses are mapped into a single
(and as narrow as possible) set of locking rules - regardless of how many
different tasks or inode structures it took to build this set of rules. The
set of rules persist during the lifetime of the kernel.
To see the rough magnitude of checking that the lock validator does, here's a
portion of /proc/lockdep_stats, fresh after bootup:
lock-classes: 694 [max: 2048]
direct dependencies: 1598 [max: 8192]
indirect dependencies: 17896
all direct dependencies: 16206
dependency chains: 1910 [max: 8192]
in-hardirq chains: 17
in-softirq chains: 105
in-process chains: 1065
stack-trace entries: 38761 [max: 131072]
combined max dependencies: 2033928
hardirq-safe locks: 24
hardirq-unsafe locks: 176
softirq-safe locks: 53
softirq-unsafe locks: 137
irq-safe locks: 59
irq-unsafe locks: 176
The lock validator has observed 1598 actual single-thread locking patterns,
and has validated all possible 2033928 distinct locking scenarios.
More details about the design of the lock validator can be found in
Documentation/lockdep-design.txt, which can also found at:
http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
[bunk@stusta.de: cleanups]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago
|
|
|
#include <linux/lockdep.h>
|
|
|
|
#include <linux/ipc.h>
|
|
|
|
#include <linux/pid_namespace.h>
|
|
|
|
#include <linux/user_namespace.h>
|
|
|
|
|
|
|
|
#define INIT_FDTABLE \
|
|
|
|
{ \
|
|
|
|
.max_fds = NR_OPEN_DEFAULT, \
|
|
|
|
.fd = &init_files.fd_array[0], \
|
[PATCH] Shrinks sizeof(files_struct) and better layout
1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
platforms, lowering kmalloc() allocated space by 50%.
2) Reduce the size of (files_struct), using a special 32 bits (or
64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
close_on_exec_init and open_fds_init fields. This save some ram (248
bytes per task) as most tasks dont open more than 32 files. D-Cache
footprint for such tasks is also reduced to the minimum.
3) Reduce size of allocated fdset. Currently two full pages are
allocated, that is 32768 bits on x86 for example, and way too much. The
minimum is now L1_CACHE_BYTES.
UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
same cache line)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago
|
|
|
.close_on_exec = (fd_set *)&init_files.close_on_exec_init, \
|
|
|
|
.open_fds = (fd_set *)&init_files.open_fds_init, \
|
|
|
|
.rcu = RCU_HEAD_INIT, \
|
|
|
|
.next = NULL, \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define INIT_FILES \
|
|
|
|
{ \
|
|
|
|
.count = ATOMIC_INIT(1), \
|
|
|
|
.fdt = &init_files.fdtab, \
|
|
|
|
.fdtab = INIT_FDTABLE, \
|
|
|
|
.file_lock = __SPIN_LOCK_UNLOCKED(init_task.file_lock), \
|
[PATCH] Shrinks sizeof(files_struct) and better layout
1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
platforms, lowering kmalloc() allocated space by 50%.
2) Reduce the size of (files_struct), using a special 32 bits (or
64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
close_on_exec_init and open_fds_init fields. This save some ram (248
bytes per task) as most tasks dont open more than 32 files. D-Cache
footprint for such tasks is also reduced to the minimum.
3) Reduce size of allocated fdset. Currently two full pages are
allocated, that is 32768 bits on x86 for example, and way too much. The
minimum is now L1_CACHE_BYTES.
UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
same cache line)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago
|
|
|
.next_fd = 0, \
|
|
|
|
.close_on_exec_init = { { 0, } }, \
|
|
|
|
.open_fds_init = { { 0, } }, \
|
|
|
|
.fd_array = { NULL, } \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define INIT_KIOCTX(name, which_mm) \
|
|
|
|
{ \
|
|
|
|
.users = ATOMIC_INIT(1), \
|
|
|
|
.dead = 0, \
|
|
|
|
.mm = &which_mm, \
|
|
|
|
.user_id = 0, \
|
|
|
|
.next = NULL, \
|
|
|
|
.wait = __WAIT_QUEUE_HEAD_INITIALIZER(name.wait), \
|
|
|
|
.ctx_lock = __SPIN_LOCK_UNLOCKED(name.ctx_lock), \
|
|
|
|
.reqs_active = 0U, \
|
|
|
|
.max_reqs = ~0U, \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define INIT_MM(name) \
|
|
|
|
{ \
|
|
|
|
.mm_rb = RB_ROOT, \
|
|
|
|
.pgd = swapper_pg_dir, \
|
|
|
|
.mm_users = ATOMIC_INIT(2), \
|
|
|
|
.mm_count = ATOMIC_INIT(1), \
|
|
|
|
.mmap_sem = __RWSEM_INITIALIZER(name.mmap_sem), \
|
|
|
|
.page_table_lock = __SPIN_LOCK_UNLOCKED(name.page_table_lock), \
|
|
|
|
.mmlist = LIST_HEAD_INIT(name.mmlist), \
|
|
|
|
.cpu_vm_mask = CPU_MASK_ALL, \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define INIT_SIGNALS(sig) { \
|
|
|
|
.count = ATOMIC_INIT(1), \
|
|
|
|
.wait_chldexit = __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
|
|
|
|
.shared_pending = { \
|
|
|
|
.list = LIST_HEAD_INIT(sig.shared_pending.list), \
|
|
|
|
.signal = {{0}}}, \
|
|
|
|
.posix_timers = LIST_HEAD_INIT(sig.posix_timers), \
|
|
|
|
.cpu_timers = INIT_CPU_TIMERS(sig.cpu_timers), \
|
|
|
|
.rlim = INIT_RLIMITS, \
|
|
|
|
.pgrp = 0, \
|
|
|
|
.tty_old_pgrp = NULL, \
|
|
|
|
{ .__session = 0}, \
|
|
|
|
}
|
|
|
|
|
|
|
|
extern struct nsproxy init_nsproxy;
|
|
|
|
#define INIT_NSPROXY(nsproxy) { \
|
|
|
|
.pid_ns = &init_pid_ns, \
|
|
|
|
.count = ATOMIC_INIT(1), \
|
|
|
|
.nslock = __SPIN_LOCK_UNLOCKED(nsproxy.nslock), \
|
|
|
|
.uts_ns = &init_uts_ns, \
|
|
|
|
.mnt_ns = NULL, \
|
|
|
|
INIT_IPC_NS(ipc_ns) \
|
|
|
|
.user_ns = &init_user_ns, \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define INIT_SIGHAND(sighand) { \
|
|
|
|
.count = ATOMIC_INIT(1), \
|
|
|
|
.action = { { { .sa_handler = NULL, } }, }, \
|
|
|
|
.siglock = __SPIN_LOCK_UNLOCKED(sighand.siglock), \
|
signal/timer/event: signalfd core
This patch series implements the new signalfd() system call.
I took part of the original Linus code (and you know how badly it can be
broken :), and I added even more breakage ;) Signals are fetched from the same
signal queue used by the process, so signalfd will compete with standard
kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
seems to be working fine on my Dual Opteron machine. I made a quick test
program for it:
http://www.xmailserver.org/signafd-test.c
The signalfd() system call implements signal delivery into a file descriptor
receiver. The signalfd file descriptor if created with the following API:
int signalfd(int ufd, const sigset_t *mask, size_t masksize);
The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
signalfd file.
The "mask" allows to specify the signal mask of signals that we are interested
in. The "masksize" parameter is the size of "mask".
The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
will return POLLIN when signals are available to be dequeued. As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.
The read(2) system call will return a "struct signalfd_siginfo" structure in
the userspace supplied buffer. The return value is the number of bytes copied
in the supplied buffer, or -1 in case of error. The read(2) call can also
return 0, in case the sighand structure to which the signalfd was attached,
has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
return -EAGAIN in case no signal is available.
If the size of the buffer passed to read(2) is lower than sizeof(struct
signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
return -ERESTARTSYS in case a signal hits the process. The format of the
struct signalfd_siginfo is, and the valid fields depends of the (->code &
__SI_MASK) value, in the same way a struct siginfo would:
struct signalfd_siginfo {
__u32 signo; /* si_signo */
__s32 err; /* si_errno */
__s32 code; /* si_code */
__u32 pid; /* si_pid */
__u32 uid; /* si_uid */
__s32 fd; /* si_fd */
__u32 tid; /* si_fd */
__u32 band; /* si_band */
__u32 overrun; /* si_overrun */
__u32 trapno; /* si_trapno */
__s32 status; /* si_status */
__s32 svint; /* si_int */
__u64 svptr; /* si_ptr */
__u64 utime; /* si_utime */
__u64 stime; /* si_stime */
__u64 addr; /* si_addr */
};
[akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years ago
|
|
|
.signalfd_list = LIST_HEAD_INIT(sighand.signalfd_list), \
|
|
|
|
}
|
|
|
|
|
|
|
|
extern struct group_info init_groups;
|
|
|
|
|
|
|
|
#define INIT_STRUCT_PID { \
|
|
|
|
.count = ATOMIC_INIT(1), \
|
|
|
|
.nr = 0, \
|
|
|
|
/* Don't put this struct pid in pid_hash */ \
|
|
|
|
.pid_chain = { .next = NULL, .pprev = NULL }, \
|
|
|
|
.tasks = { \
|
|
|
|
{ .first = &init_task.pids[PIDTYPE_PID].node }, \
|
|
|
|
{ .first = &init_task.pids[PIDTYPE_PGID].node }, \
|
|
|
|
{ .first = &init_task.pids[PIDTYPE_SID].node }, \
|
|
|
|
}, \
|
|
|
|
.rcu = RCU_HEAD_INIT, \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define INIT_PID_LINK(type) \
|
|
|
|
{ \
|
|
|
|
.node = { \
|
|
|
|
.next = NULL, \
|
|
|
|
.pprev = &init_struct_pid.tasks[type].first, \
|
|
|
|
}, \
|
|
|
|
.pid = &init_struct_pid, \
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* INIT_TASK is used to set up the first task table, touch at
|
|
|
|
* your own risk!. Base=0, limit=0x1fffff (=2MB)
|
|
|
|
*/
|
|
|
|
#define INIT_TASK(tsk) \
|
|
|
|
{ \
|
|
|
|
.state = 0, \
|
|
|
|
.stack = &init_thread_info, \
|
|
|
|
.usage = ATOMIC_INIT(2), \
|
|
|
|
.flags = 0, \
|
|
|
|
.lock_depth = -1, \
|
|
|
|
.prio = MAX_PRIO-20, \
|
|
|
|
.static_prio = MAX_PRIO-20, \
|
|
|
|
.normal_prio = MAX_PRIO-20, \
|
|
|
|
.policy = SCHED_NORMAL, \
|
|
|
|
.cpus_allowed = CPU_MASK_ALL, \
|
|
|
|
.mm = NULL, \
|
|
|
|
.active_mm = &init_mm, \
|
|
|
|
.run_list = LIST_HEAD_INIT(tsk.run_list), \
|
|
|
|
.ioprio = 0, \
|
|
|
|
.time_slice = HZ, \
|
|
|
|
.tasks = LIST_HEAD_INIT(tsk.tasks), \
|
|
|
|
.ptrace_children= LIST_HEAD_INIT(tsk.ptrace_children), \
|
|
|
|
.ptrace_list = LIST_HEAD_INIT(tsk.ptrace_list), \
|
|
|
|
.real_parent = &tsk, \
|
|
|
|
.parent = &tsk, \
|
|
|
|
.children = LIST_HEAD_INIT(tsk.children), \
|
|
|
|
.sibling = LIST_HEAD_INIT(tsk.sibling), \
|
|
|
|
.group_leader = &tsk, \
|
|
|
|
.group_info = &init_groups, \
|
|
|
|
.cap_effective = CAP_INIT_EFF_SET, \
|
|
|
|
.cap_inheritable = CAP_INIT_INH_SET, \
|
|
|
|
.cap_permitted = CAP_FULL_SET, \
|
|
|
|
.keep_capabilities = 0, \
|
|
|
|
.user = INIT_USER, \
|
|
|
|
.comm = "swapper", \
|
|
|
|
.thread = INIT_THREAD, \
|
|
|
|
.fs = &init_fs, \
|
|
|
|
.files = &init_files, \
|
|
|
|
.signal = &init_signals, \
|
|
|
|
.sighand = &init_sighand, \
|
|
|
|
.nsproxy = &init_nsproxy, \
|
|
|
|
.pending = { \
|
|
|
|
.list = LIST_HEAD_INIT(tsk.pending.list), \
|
|
|
|
.signal = {{0}}}, \
|
|
|
|
.blocked = {{0}}, \
|
|
|
|
.alloc_lock = __SPIN_LOCK_UNLOCKED(tsk.alloc_lock), \
|
|
|
|
.journal_info = NULL, \
|
|
|
|
.cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \
|
|
|
|
.fs_excl = ATOMIC_INIT(0), \
|
|
|
|
.pi_lock = __SPIN_LOCK_UNLOCKED(tsk.pi_lock), \
|
|
|
|
.pids = { \
|
|
|
|
[PIDTYPE_PID] = INIT_PID_LINK(PIDTYPE_PID), \
|
|
|
|
[PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID), \
|
|
|
|
[PIDTYPE_SID] = INIT_PID_LINK(PIDTYPE_SID), \
|
|
|
|
}, \
|
|
|
|
INIT_TRACE_IRQFLAGS \
|
[PATCH] lockdep: core
Do 'make oldconfig' and accept all the defaults for new config options -
reboot into the kernel and if everything goes well it should boot up fine and
you should have /proc/lockdep and /proc/lockdep_stats files.
Typically if the lock validator finds some problem it will print out
voluminous debug output that begins with "BUG: ..." and which syslog output
can be used by kernel developers to figure out the precise locking scenario.
What does the lock validator do? It "observes" and maps all locking rules as
they occur dynamically (as triggered by the kernel's natural use of spinlocks,
rwlocks, mutexes and rwsems). Whenever the lock validator subsystem detects a
new locking scenario, it validates this new rule against the existing set of
rules. If this new rule is consistent with the existing set of rules then the
new rule is added transparently and the kernel continues as normal. If the
new rule could create a deadlock scenario then this condition is printed out.
When determining validity of locking, all possible "deadlock scenarios" are
considered: assuming arbitrary number of CPUs, arbitrary irq context and task
context constellations, running arbitrary combinations of all the existing
locking scenarios. In a typical system this means millions of separate
scenarios. This is why we call it a "locking correctness" validator - for all
rules that are observed the lock validator proves it with mathematical
certainty that a deadlock could not occur (assuming that the lock validator
implementation itself is correct and its internal data structures are not
corrupted by some other kernel subsystem). [see more details and conditionals
of this statement in include/linux/lockdep.h and
Documentation/lockdep-design.txt]
Furthermore, this "all possible scenarios" property of the validator also
enables the finding of complex, highly unlikely multi-CPU multi-context races
via single single-context rules, increasing the likelyhood of finding bugs
drastically. In practical terms: the lock validator already found a bug in
the upstream kernel that could only occur on systems with 3 or more CPUs, and
which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
That bug was found and reported on a single-CPU system (!). So in essence a
race will be found "piecemail-wise", triggering all the necessary components
for the race, without having to reproduce the race scenario itself! In its
short existence the lock validator found and reported many bugs before they
actually caused a real deadlock.
To further increase the efficiency of the validator, the mapping is not per
"lock instance", but per "lock-class". For example, all struct inode objects
in the kernel have inode->inotify_mutex. If there are 10,000 inodes cached,
then there are 10,000 lock objects. But ->inotify_mutex is a single "lock
type", and all locking activities that occur against ->inotify_mutex are
"unified" into this single lock-class. The advantage of the lock-class
approach is that all historical ->inotify_mutex uses are mapped into a single
(and as narrow as possible) set of locking rules - regardless of how many
different tasks or inode structures it took to build this set of rules. The
set of rules persist during the lifetime of the kernel.
To see the rough magnitude of checking that the lock validator does, here's a
portion of /proc/lockdep_stats, fresh after bootup:
lock-classes: 694 [max: 2048]
direct dependencies: 1598 [max: 8192]
indirect dependencies: 17896
all direct dependencies: 16206
dependency chains: 1910 [max: 8192]
in-hardirq chains: 17
in-softirq chains: 105
in-process chains: 1065
stack-trace entries: 38761 [max: 131072]
combined max dependencies: 2033928
hardirq-safe locks: 24
hardirq-unsafe locks: 176
softirq-safe locks: 53
softirq-unsafe locks: 137
irq-safe locks: 59
irq-unsafe locks: 176
The lock validator has observed 1598 actual single-thread locking patterns,
and has validated all possible 2033928 distinct locking scenarios.
More details about the design of the lock validator can be found in
Documentation/lockdep-design.txt, which can also found at:
http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
[bunk@stusta.de: cleanups]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago
|
|
|
INIT_LOCKDEP \
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#define INIT_CPU_TIMERS(cpu_timers) \
|
|
|
|
{ \
|
|
|
|
LIST_HEAD_INIT(cpu_timers[0]), \
|
|
|
|
LIST_HEAD_INIT(cpu_timers[1]), \
|
|
|
|
LIST_HEAD_INIT(cpu_timers[2]), \
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#endif
|