Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.
In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:
spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.
This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect. In addition add some documentation for both methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Make sure compiler won't do weird things with limits. E.g. fetching
them twice may return 2 different values after writable limits are
implemented.
I.e. either use rlimit helpers added in
3e10e716ab
or ACCESS_ONCE if not applicable.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is no longer any use of the include2/ directory.
The generated files has moved to include/generated.
Drop all references to said directory.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
size_t len cannot be less than 0.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code
But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the last users of markers have migrated to the event
tracer we can kill off the (now orphan) support code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090917173527.GA1699@lst.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I wrote sputrace before generic tracing infrastrucure was available.
Now that we have the generic event tracer we can convert it over and
remove a lot of code:
8 files changed, 45 insertions(+), 285 deletions(-)
To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
the spufs trace channel by
echo 1 > /sys/kernel/debug/tracing/events/spufs/spufs_context/enable
and then read the trace records using e.g.
cat /sys/kernel/debug/tracing/trace
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 45db924089 ("powerpc/spufs: Remove
double check for non-negative dentry") removed the only user of the
out_dput label, so remove it and the code following it.
Gets rid of this warning:
arch/powerpc/platforms/cell/spufs/inode.c: In function 'spufs_create':
arch/powerpc/platforms/cell/spufs/inode.c:647: warning: label 'out_dput' defined but not used
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch removes an unnecessary double check if the dentry returned by
lookup_create() is actually non-negative. Since lookup_create() itself returns
an error in this case just remove the check.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to offset by *pos bytes, not *pos words.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Based on an original patch from Roel Kluin <roel.kluin@gmail.com>.
The write size calculated during regs and fpcr writes may currently
go negative. Because size is unsigned, this will wrap, and our
check for EFBIG will fail.
Instead, do the check for EFBIG before subtracting from size.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
spuctx_switch_state() warns if ktime goes backwards, but it
sometimes compares an uninitialized value, which showed that
the data was unreliable when we actually saw the warning.
Initialize it to the current time in order to get correct data.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The spufs context directory contents definitions are not changed after
initialisation, so we can declare them as const. We can do the same
with the spu coredump reader callbacks too.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, we may setup the MFC for isolated mode initilaisation with
the purge still active. This means that DMAs required to perform the
init do not happen.
This change clears the purge status after doing the purge, so that
the isolated init can proceed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: new timer API
Based on an idea from Martin Josefsson with the help of
Patrick McHardy and Stephen Hemminger:
introduce the mod_timer_pending() API which is a mod_timer()
offspring that is an invariant on already removed timers.
(regular mod_timer() re-activates non-pending timers.)
This is useful for the networking code in that it can
allow unserialized mod_timer_pending() timer-forwarding
calls, but a single del_timer*() will stop the timer
from being reactivated again.
Also while at it:
- optimize the regular mod_timer() path some more, the
timer-stat and a debug check was needlessly duplicated
in __mod_timer().
- make the exports come straight after the function, as
most other exports in timer.c already did.
- eliminate __mod_timer() as an external API, change the
users to mod_timer().
The regular mod_timer() code path is not impacted
significantly, due to inlining optimizations and due to
the simplifications.
Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When I review ocfs2 code, find there are 2 typos to "successfull". After
doing grep "successfull " in kernel tree, 22 typos found totally -- great
minds always think alike :)
This patch fixes all the similar typos. Thanks for Randy's ack and comments.
Signed-off-by: Coly Li <coyli@suse.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.
i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Impact: New APIs
The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask. Part of removing cpumasks from
the stack.
(Also replaces powerpc internal uses of node_to_cpumask).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit. We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.
This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it. The hash code clears it if the feature bit is not set.
It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.
This should help having the PTE actually containing the bit
combinations that we really want.
I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss. I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, we can end up in an infinite loop if we get a signal
while the kernel has faulted in spufs_ps_fault. Eg:
alarm(1);
write(fd, some_spu_psmap_register_address, 4);
- the write's copy_from_user will fault on the ps mapping, and
signal_pending will be non-zero. Because returning from the fault
handler will never clear TIF_SIGPENDING, so we'll just keep faulting,
resulting in an unkillable process using 100% of CPU.
This change returns VM_FAULT_SIGBUS if there's a fatal signal pending,
letting us escape the loop.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Pass credentials through dentry_open() so that the COW creds patch can have
SELinux's flush_unauthorized_files() pass the appropriate creds back to itself
when it opens its null chardev.
The security_dentry_open() call also now takes a creds pointer, as does the
dentry_open hook in struct security_operations.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: James Morris <jmorris@namei.org>
This patch adds a comment to clarify why atomic_dec_if_positive is being used
to decrement gang's aff_sched_count on SPU context unbind.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
This patch improves redability of the code responsible for trying to find
a node with enough SPUs not committed to other affinity gangs.
An additional check is also added, to avoid taking into account gangs that
have no SPU affinity.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
With most file readers (eg cat, dd), reading a context's regs file will
result in two reads: the first to read the data, and the second to
return EOF. Because each read performs a spu_acquire_saved, we end up
descheduling and re-scheduling the context twice.
This change does a simple check to see if we'd return EOF before
calling spu_acquire_saved(), saving the extra schedule operation.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, read() on the sputrace log will block until the read buffer
is full. This makes it difficult to retrieve the end of the buffer, as
the user will need to read with the right-sized buffer.
In a similar method as 91553a1b5e0df006a3573a88d98ee7cd48a3818a, this
change makes the switch_log return if there has already been data
read.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we use ctx->mapping_lock and ctx->switch_log->lock for the
context switch log. The mapping lock only prevents concurrent open()s,
so we require the switch_lock->lock for reads.
Since writes to the switch log buffer occur on context switches, we're
better off synchronising with the state_mutex, which is held during a
switch. Since we're serialised througout the buffer reads and writes,
we can use the state mutex to protect open and release too, and
can now kfree() the log buffer on release. This allows us to perform
the switch log notify without taking any extra locks.
Because the buffer is only present while the file is open, we can use
it to prevent multiple simultaneous openers.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, read() on the sputrace buffer will only return data when
the user buffer is exhausted. This may mean that we never see the
end of the event log, unless we read() with exactly the right-sized
buffer.
This change makes sputrace_read not block if we have data ready to
return.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, sputrace will start logging to the event buffer before the
log buffer has been open()ed. This results in a heap of "lost samples"
warnings if the sputrace file hasn't yet been opened.
Since the buffer is reset on open() anyway, there's no need to enable
logging when no-one has opened the log.
Because open clears the log, make it return EBUSY for mutliple open
calls.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We need a marker_synchronize_unregister() before the end of exit() to make sure
every probe callers have exited the non preemptible section and thus are not
executing the probe code anymore.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A mutex_unlock(&gang->aff_mutex) in spufs_create_context() is missing
in case spufs_context_open() fails. As a result, spu_create syscall
and spu_get_idle() may block.
This patch adds the mutex_unlock.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Andre Detsch <adetsch@br.ibm.com>
Currently, an empty spufs root inode has nlink count of 1. However,
the directory has two links; / -> spu and /spu/ -> .
This change increments the link count of the root inode in spufs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We currently have a race when scheduling a context to a SPE -
after we have found a runnable context in spusched_tick, the same
context may have been scheduled by spu_activate().
This may result in a panic if we try to unschedule a context that has
been freed in the meantime.
This change exits spu_schedule() if the context has already been
scheduled, so we don't end up scheduling it twice.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We currently have a race for a free SPE. With one thread doing a
spu_yield(), and another doing a spu_activate():
thread 1 thread 2
spu_yield(oldctx) spu_activate(ctx)
__spu_deactivate(oldctx)
spu_unschedule(oldctx, spu)
spu->alloc_state = SPU_FREE
spu = spu_get_idle(ctx)
- searches for a SPE in
state SPU_FREE, gets
the context just
freed by thread 1
spu_schedule(ctx, spu)
spu->alloc_state = SPU_USED
spu_schedule(newctx, spu)
- assumes spu is still free
- tries to schedule context on
already-used spu
This change introduces a 'free_spu' flag to spu_unschedule, to indicate
whether or not the function should free the spu after descheduling the
context. We only set this flag if we're not going to re-schedule
another context on this SPU.
Add a comment to document this behaviour.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Commit 8d5636fbca introduced a reference
count on SPU contexts during find_victim, but this may cause a leak in
the reference count if we later find a better contender for a context to
unschedule.
Change the reference to after we've found our victim context, so we
don't do the extra get_spu_context().
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Based on an original patch from Christoph Hellwig <hch@lst.de>.
Currently, there is a possible reference-after-free in the spusched
code - contexts may be freed after we have released their state_mutex
in spusched_tick and find_victim.
This change takes a reference to the context before releasing the
mutex, so that the context doesn't get destroyed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>