fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called ->fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.
Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
and ppc). Patches for s390(x) and ia64 are already available from
previous posts, but it was decided that they should be added later
once fallocate is in the mainline. Hence not including those patches
in this take.
2. Changes to glibc,
a) to support fallocate() system call
b) to make posix_fallocate() and posix_fallocate64() call fallocate()
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Not all the world is an i386. Many architectures need 64-bit arguments to be
aligned in suitable pairs of registers, and the original
sys_sync_file_range(int, loff_t, loff_t, int) was therefore wasting an
argument register for padding after the first integer. Since we don't
normally have more than 6 arguments for system calls, that left no room for
the final argument on some architectures.
Fix this by introducing sys_sync_file_range2(int, int, loff_t, loff_t) which
all fits nicely. In fact, ARM already had that, but called it
sys_arm_sync_file_range. Move it to fs/sync.c and rename it, then implement
the needed compatibility routine. And stop the missing syscall check from
bitching about the absence of sys_sync_file_range() if we've implemented
sys_sync_file_range2() instead.
Tested on PPC32 and with 32-bit and 64-bit userspace on PPC64.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed that almost all architectures implemented exactly the same
sys32_sysinfo... except parisc, where a bug was to be found in handling of
the uptime. So let's remove a whole whack of code for fun and profit.
Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it
would be the best tested.
This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but
instead extracting out the common code from sys_sysinfo.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These patches make the kernel pass 64-bit inode numbers internally when
communicating to userspace, even on a 32-bit system. They are required
because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
for example. The 64-bit inode numbers are then propagated to userspace
automatically where the arch supports it.
Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
number returned by stat64() or getdents64() to differentiate files, and
failing because the 64-bit inode number space was compressed to 32-bits, and
so overlaps occur.
This patch:
Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
inode number so that 64-bit inode numbers can be passed back to userspace.
The stat functions then returns the full 64-bit inode number where
available and where possible. If it is not possible to represent the inode
number supplied by the filesystem in the field provided by userspace, then
error EOVERFLOW will be issued.
Similarly, the getdents/readdir functions now pass the full 64-bit inode
number to userspace where possible, returning EOVERFLOW instead when a
directory entry is encountered that can't be properly represented.
Note that this means that some inodes will not be stat'able on a 32-bit
system with old libraries where they were before - but it does mean that
there will be no ambiguity over what a 32-bit inode number refers to.
Note similarly that directory scans may be cut short with an error on a
32-bit system with old libraries where the scan would work before for the
same reasons.
It is judged unlikely that this situation will occur because modern glibc
uses 64-bit capable versions of stat and getdents class functions
exclusively, and that older systems are unlikely to encounter
unrepresentable inode numbers anyway.
[akpm: alpha build fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since sys_sysctl is deprecated start allow it to be compiled out. This
should catch any remaining user space code that cares, and paves the way
for further sysctl cleanups.
[akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create compat_sys_adjtimex and use it an all appropriate places.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a copy of the compatibility version of struct timex in each 64 bit
architecture. This patch just creates a global one and replaces all the
usages of the old ones.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes a regression which was introduced by moving ppc32 to use
the same sort of lockless gettimeofday as ppc64 has been using for
some time. This involves getting the timebase and performing some
simple arithmetic to convert it to seconds and microseconds. However,
the factor and offset used there weren't being updated when NTP
varied the tick length using adjtimex. 64-bit didn't notice the
problem because it had a hook in the 32-bit adjtimex compat routine
that attempted to work out what the generic timekeeping code would
do and alter the factor and offset to match. However, that code
was very complex and it wasn't clear that it still matched what the
generic code would do.
Now we use the generic current_tick_length() routine that was recently
added to check that the current tick will be as long as we expect; if
not we recompute the factor and offset. This keeps gettimeofday and
xtime in sync. In addition we check that gettimeofday hasn't got ahead
of xtime on each timer interrupt; if it has, we resync.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The comment in compat.c is wrong, every architecture provides a
get_compat_sigevent() for the IPC compat code already.
This basically moves the x86_64 version to common code and removes all the
others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matthew Wilcox wondered why we need these functions. We don't.
Remove them and just use the "normal" versions.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch merges platform codes. systemcfg->platform is no longer used,
systemcfg use in general is deprecated as much as possible (and renamed
_systemcfg before it gets completely moved elsewhere in a future patch),
_machine is now used on ppc64 along as ppc32. Platform codes aren't gone
yet but we are getting a step closer. A bunch of asm code in head[_64].S
is also turned into C code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This allows us to get rid of one type of entry in systbl.S.
In passing we remove the duplicate compat_sys_getdents and
compat_sys_utimes for which there are generic versions.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
This makes ppc use the syscalls.c from arch/powerpc/kernel, exports
copy_and_flush from head_32.S for use by prom_init.c (ARCH=powerpc),
and consolidates the sys_fadvise64_64 implementations for 32-bit.
Signed-off-by: Paul Mackerras <paulus@samba.org>
and rename it to pci.c. This also required moving
arch/ppc64/kernel/pci.h into include/asm-powerpc (called
ppc-pci.h.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
The pciconfig_iobase, pciconfig_read and pciconfig_write system calls
were only implemented for 32-bit processes; for 64-bit processes they
returned an ENOSYS error. This allows them to be used by 64-bit
processes as well. The X server uses pciconfig_iobase at least, and
this change is necessary to allow a 64-bit X server to work on my G5.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch pulls the PCI-related junk out of struct device_node and
puts it in a separate structure, struct pci_dn. The device_node now
just has a void * pointer in it, which points to a struct pci_dn for
nodes that represent PCI devices. It could potentially be used in
future for device-specific data for other sorts of devices, such as
virtual I/O devices.
Signed-off-by: Paul Mackerras <paulus@samba.org>
64 bit architectures all implement their own compatibility sys_open(),
when in fact the difference is simply not forcing the O_LARGEFILE
flag. So use the a common function instead.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove some unnecessary includes, an out of date comment and a prototype for
sys_timer_create (which is now in syscalls.h)
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch allows iSeries to build with CONFIG_PCI=n. This is useful for
partitions that have only virtual I/O.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes some bugs in the ppc64 PER_LINUX32 implementation,
noted by Juergen Kreileder:
* uname(2) doesn't respect PER_LINUX32, it returns 'ppc64' instead of 'ppc'
* Child processes of a PER_LINUX32 process don't inherit PER_LINUX32
Along the way I took the opportunity to move things around so that
sys_ppc32.c only has 32-bit syscall emulation functions and to remove
the obsolete "fakeppc" command line option.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!