The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.
The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.
We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.
- check the global sum once every interval (this will delay zero detection
for some interval, so it's probably a showstopper for vfsmounts).
- keep a local count and only taking the global sum when local reaches 0 (this
is difficult for vfsmounts, because we can't hold preempt off for the life of
a reference, so a counter would need to be per-thread or tied strongly to a
particular CPU which requires more locking).
- keep a local difference of increments and decrements, which allows us to sum
the total difference and hence find the refcount when summing all CPUs. Then,
keep a single integer "long" refcount for slow and long lasting references,
and only take the global sum of local counters when the long refcount is 0.
This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.
This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.
This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Add support for mtd repartition based on the block
device BLKPG interface:
BLKPG_ADD_PARTITION - for partition creation;
BLKPG_DEL_PARTITION - for partition delete
The usage is based on BLKPG ioctl called with
struct blkpg_ioctl_arg argument which includes the
reference to struct blkpg_partition discribing the
partition offset and length.
Disadvantage: there is no implementation for mtd
flags control. The flags are always borrowed from
the master device.
Signed-off-by: Roman Tereshonkov <roman.tereshonkov@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
If "ur_idx" is wrong we could go past the end of the array. The
"ur_idx" comes from root so it's not a huge deal, but adding a sanity
check makes the code more robust.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There were some improvements and additions necessary in the
comments explaining of the expansion of nand_ecclayout, the
introduction of nand_ecclayout_user, and the deprecation of the
ioctl ECCGETLAYOUT.
Also, I found a better placement for the macro MTD_MAX_ECCPOS_ENTRIES;
next to the definition of MTD_MAX_OOBFREE_ENTRIES in mtd-abi.h. The macro
is really only important for the ioctl code (found in drivers/mtd/mtdchar.c)
but since there are small edits being made to the user-space header, I
figured this is a better location.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
struct nand_ecclayout is too small for many new chips; OOB regions can be as
large as 448 bytes and may increase more in the future. Thus, copying that
struct to user-space with the ECCGETLAYOUT ioctl is not a good idea; the ioctl
would have to be updated every time there's a change to the current largest
size.
Instead, the old nand_ecclayout is renamed to nand_ecclayout_user and a
new struct nand_ecclayout is created that can accomodate larger sizes and
expand without affecting the user-space. struct nand_ecclayout can still
be used in board drivers without modification -- at least for now.
A new function is provided to convert from the new to the old in order to
allow the deprecated ioctl to continue to work with truncated data. Perhaps
the ioctl, the conversion process, and the struct nand_ecclayout_user can be
removed altogether in the future.
Note: There are comments in nand/davinci_nand.c::nand_davinci_probe()
regarding this issue; this driver (and maybe others) can be updated to
account for extra space. All kernel drivers can use the expanded
nand_ecclayout as a drop-in replacement and ignore its benefits.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
All these files use the big kernel lock in a trivial
way to serialize their private file operations,
typically resulting from an earlier semi-automatic
pushdown from VFS.
None of these drivers appears to want to lock against
other code, and they all use the BKL as the top-level
lock in their file operations, meaning that there
is no lock-order inversion problem.
Consequently, we can remove the BKL completely,
replacing it with a per-file mutex in every case.
Using a scripted approach means we can avoid
typos.
file=$1
name=$2
if grep -q lock_kernel ${file} ; then
if grep -q 'include.*linux.mutex.h' ${file} ; then
sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
else
sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
fi
sed -i ${file} \
-e "/^#include.*linux.mutex.h/,$ {
1,/^\(static\|int\|long\)/ {
/^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);
} }" \
-e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
-e '/[ ]*cycle_kernel_lock();/d'
else
sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \
-e '/cycle_kernel_lock()/d'
fi
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: linux-mtd@lists.infradead.org
For no-mmu systems mmap() on RAM/ROM devices already works
but for systems with mmu it probably was not tested and
doesn't work.
This patch allows using mmap() on MTD RAM/ROM devices on systems
with MMU. It has been tested on mpc5121e based platform with
MR0A16A MRAM device attached over LocalBus.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patchs adds a way for user space programs to find out whether a
flash sector is locked. An optional driver method in the mtd_info struct
provides the information.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use memdup_user when user data is immediately copied into the
allocated region.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@
- to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+ to = memdup_user(from,size);
if (
- to==NULL
+ IS_ERR(to)
|| ...) {
<+... when != goto l1;
- -ENOMEM
+ PTR_ERR(to)
...+>
}
- if (copy_from_user(to, from, size) != 0) {
- <+... when != goto l2;
- -EFAULT
- ...+>
- }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We cannot modify file->f_mapping->backing_dev_info, because it will corrupt
backing device of device node inode, since file->f_mapping is equal to
inode->i_mapping (see __dentry_open() in fs/open.c).
Let's introduce separate inode for MTD device with appropriate backing
device.
[dwmw2: Refactor to keep it all entirely within mtdchar.c; use iget_locked()]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
These are the last remaining device drivers using
the ->ioctl file operation in the drivers directory
(except from v4l drivers).
[fweisbec: drop i8k pushdown as it has been done from
procfs pushdown branch already]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
register_chrdev() registers minor numbers up to 255, but we can now
potentially have much larger numbers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
MAX_MTD_DEVICES is about to be removed.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In mtd_ioctl MEMGETREGIONINFO the region_user_info pointer ur
is cast in __kernel space. This produces a number of sparse warnings
like:
warning: cast removes address space of expression
warning: incorrect type in initializer (different address spaces)
expected unsigned int const [noderef] <asn:1>*register __p
got unsigned int *<noident>
Since argp is already a void __user * just use it dirrectly without
the cast and make ur a __user *.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Remove all references to MTD ioctls from fs/compat_ioctl.c and let
them all be handled by mtd_compat_ioctl().
Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
1) Move the MEMREADOOB/MEMWRITEOOB compat_ioctl wrappers from
fs/compat_ioctl.c into mtdchar.c . Original request was here:
http://lkml.org/lkml/2009/4/1/295
2) Add missing COMPATIBLE_IOCTL lines, so that mtd-utils does not error
out when running in 64/32 compatibility mode.
LKML-Reference: <200904011650.22928.arnd@arndb.de>
Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
New MEMERASE/MEMREADOOB/MEMWRITEOOB ioctls are needed in order to support
64-bit offsets into large NAND flash devices.
Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Update driver model support in the MTD framework, so it fits
better into the current udev-based hotplug framework:
- Each mtd_info now has a device node. MTD drivers should set
the dev.parent field to point to the physical device, before
setting up partitions or otherwise declaring MTDs.
- Those device nodes always map to /sys/class/mtdX device nodes,
which no longer depend on MTD_CHARDEV.
- Those mtdX sysfs nodes have a "starter set" of attributes;
it's not yet sufficient to replace /proc/mtd.
- Enabling MTD_CHARDEV provides /sys/class/mtdXro/ nodes and the
/sys/class/mtd*/dev attributes (for udev, mdev, etc).
- Include a MODULE_ALIAS_CHARDEV_MAJOR macro. It'll work with
udev creating the /dev/mtd* nodes, not just a static rootfs.
So the sysfs structure is pretty much what you'd expect, except
that readonly chardev nodes are a bit quirky.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Present backing device capabilities for MTD character device files to allow
NOMMU mmap to do direct mapping where possible.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Bernd Schmidt <bernd.schmidt@analog.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The mtdchar module is missing the char-major-90-* alias that would cause
it to be auto-loaded when a device of that type is opened. This patch
adds the alia..
Signed-off-by: Scott James Remnant <scott@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
MTD internal API presently uses 32-bit values to represent
device size. This patch updates them to 64-bits but leaves
the external API unchanged. Extending the external API
is a separate issue for several reasons. First, no one
needs it at the moment. Secondly, whether the implementation
is done with IOCTLs, sysfs or both is still debated. Thirdly
external API changes require the internal API to be accepted
first.
Note that although the MTD API will be able to support 64-bit
device sizes, existing drivers do not and are not required
to do so, although NAND base has been updated.
In general, changing from 32-bit to 64-bit values cause little
or no changes to the majority of the code with the following
exceptions:
- printk message formats
- division and modulus of 64-bit values
- NAND base support
- 32-bit local variables used by mtdpart and mtdconcat
- naughtily assuming one structure maps to another
in MEMERASE ioctl
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The MEMGETREGIONINFO ioctl() in mtdchar.c was clobbering user memory by
overwriting more than intended, due the size of struct mtd_erase_region_info
changing in commit 0ecbc81adf ('Support
for auto locking flash on power up').
Fix avoids this by copying struct members one by one with put_user(), as there
is no longer a convenient struct to use the size of as the length argument to
copy_to_user().
Signed-off-by: Zev Weiss <zevweiss@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Now that we can tell when we have one of the newer DataFlash chips,
optionally expose the 128 bytes of OTP memory they provide. Tested
on at45db642 revision B and D chips.
Switch mtdchar over to a generic HAVE_MTD_OTP flag instead of adding
another #ifdef for each type of chip whose driver has OTP support.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
device_create() is race-prone, so use the race-free
device_create_drvdata() instead as device_create() is going away.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use einfo, oinfo for the inner erase_info and otp_info structs used in
individual case statements.
drivers/mtd/mtdchar.c:582:26: warning: symbol 'info' shadows an earlier one
drivers/mtd/mtdchar.c:380:23: originally declared here
drivers/mtd/mtdchar.c:596:26: warning: symbol 'info' shadows an earlier one
drivers/mtd/mtdchar.c:380:23: originally declared here
drivers/mtd/mtdchar.c:704:19: warning: symbol 'info' shadows an earlier one
drivers/mtd/mtdchar.c:380:23: originally declared here
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The copy_to_user was casting away the address space to get the offset of
the length member. Use offsetof() instead and add it to the void __user
*argp.
drivers/mtd/mtdchar.c:527:23: warning: cast removes address space of expression
drivers/mtd/mtdchar.c:527:23: warning: incorrect type in argument 1 (different address spaces)
drivers/mtd/mtdchar.c:527:23: expected void [noderef] <asn:1>*to
drivers/mtd/mtdchar.c:527:23: got unsigned int *<noident>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Once upon a time, the MTD repository was using CVS.
This patch therefore removes all usages of the no longer updated CVS
keywords from the MTD code.
This also includes code that printed them to the user.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
struct class_device is going away, this converts the code to use struct
device instead.
Signed-off-by: Tony Jones <tonyj@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
"include/linux/mtd/mtd.h" declares "mtd_oob_ops.retlen" as size_t, which
is 64 bits on targets with a 64 bit addressing. The MEMWRITEOOB ioctl
calls copy_to_user() to write it back to "mtd_oob_buf.length", which is
declared in "include/linux/mtd-abi.h" as uint32_t. Since powerpc is a
big endian architecture, this only copies the upper 32 bits of the
address, which is always 0.
Signed-off-by: David Scidmore <dscidmore@xes-inc.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
sh:
drivers/mtd/mtdchar.c: In function `mtd_mmap':
drivers/mtd/mtdchar.c:817: error: dereferencing pointer to incomplete type
drivers/mtd/mtdchar.c:817: error: `VM_SHARED' undeclared (first use in this function)
drivers/mtd/mtdchar.c:817: error: (Each undeclared identifier is reported only once
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ops.len member is not initialized, because it is unused for this
operation. The length check needs to use ops.ooblen instead
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove unused and broken mtd->ecctype and mtd->eccsize fields
from struct mtd_info. Do not remove them from userspace API
data structures (don't want to breake userspace) but mark them
as obsolete by a comment. Any userspace program which uses them
should be half-broken anyway, so this is more about saving
data structure size.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
get_mtd_device() returns NULL in case of any failure. Teach it to return an
error code instead. Fix all users as well.
Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
As was discussed between Ricard Wanderlöf, David Woodhouse, Artem
Bityutskiy and me, the current API for reading/writing OOB is confusing.
The thing that introduces confusion is the need to specify ops.len
together with ops.ooblen for reads/writes that concern only OOB not data
area. So, ops.len is overloaded: when ops.datbuf != NULL it serves to
specify the length of the data read, and when ops.datbuf == NULL, it
serves to specify the full OOB read length.
The patch inlined below is the slightly updated version of the previous
patch serving the same purpose, but with the new Artem's comments taken
into account.
Artem, BTW, thanks a lot for your valuable input!
Signed-off-by: Vitaly Wool <vwool@ru.mvista.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
1. The ECCGETLAYOUT ioctl copy_to_user() call has a superfluous '&'
causing the resulting information to be garbage rather than the intended
mtd->ecclayout.
2. The MEMGETOOBSEL misses copying mtd->ecclayout->eccbytes so the
resulting field of the returned structure contains garbage.
Signed-off-by: Ricard Wanderlöf <ricardw@axis.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Allow lseek(mtdchar_fd, 0, SEEK_END) to succeed, which currently fails
with EINVAL.
lseek(fd, 0, SEEK_END) should result into the same fileposition as
lseek(fd, 0, SEEK_SET) + read(fd, buf, length(fd))
Furthermore, lseek(fd, 0, SEEK_CUR) should return the current file position,
which in case of an encountered EOF should not result in EINVAL
Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>