This patch is part of a patch-set which changes the MTD interface
from 'mtd->func()' form to 'mtd_func()' form. We need this because
we want to add common code to to all drivers in the mtd core level,
which is impossible with the current interface when MTD clients
call driver functions like 'read()' or 'write()' directly.
At this point we just introduce a new inline wrapper function, but
later some of them are expected to gain more code. E.g., the input
parameters check should be moved to the wrappers rather than be
duplicated at many drivers.
This particular patch introduced the 'mtd_erase()' interface. The
following patches add all the other interfaces one by one.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We are going to re-work the MTD interface and change 'mtd->write()' to
'mtd_write()', 'mtd->read()' to 'mtd_read()' and so forth for all functions
in the 'struct mtd_info' structure.
However, mtdchar.c has its own 'mtd_read()', 'mtd_write()', etc functions
which collide with our changes. This patch renames these functions
to 'mtdchar_read()', 'mtdchar_write()', etc.
Additionally, to make the 'mtdchar.c' file look consistent, rename
similarly all the other functions starting with 'mtd_'.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Replace direct i_nlink updates with the respective updater function
(inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
The ecctype and eccsize fields have been obsolete for a while. Since they
don't have any users, we can kill them and leave padding in their place
for now.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Implement a new ioctl for writing both page data and OOB to flash at the
same time. This ioctl is intended to be a generic interface that can
replace other ioctls (MEMWRITEOOB and MEMWRITEOOB64) and cover the
functionality of several other old ones, e.g., MEMWRITE can:
* write autoplaced OOB instead of using ECCGETLAYOUT (deprecated) and
working around the reserved areas
* write raw (no ECC) OOB instead of using MTDFILEMODE to set the
per-file-descriptor MTD_FILE_MODE_RAW
* write raw (no ECC) data instead of using MTDFILEMODE
(MTD_FILE_MODE_RAW) and using standard character device "write"
This ioctl is especially useful for MLC NAND, which cannot be written
twice (i.e., we cannot successfully write the page data and OOB in two
separate operations). Instead, MEMWRITE can write both in a single
operation.
Note that this ioctl is not affected by the MTD file mode (i.e.,
MTD_FILE_MODE_RAW vs. MTD_FILE_MODE_NORMAL), since it receives its write
mode as an input parameter.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
These modes hold their state only for the life of their file descriptor,
and they overlap functionality with the MTD_OPS_* modes. Particularly,
MTD_MODE_RAW and MTD_OPS_RAW cover the same function: to provide raw
(i.e., without ECC) access to the flash. In fact, although it may not be
clear, MTD_MODE_RAW implied that operations should enable the
MTD_OPS_RAW mode.
Thus, we should be specific on what each mode means. This is a start,
where MTD_FILE_MODE_* actually represents a "file mode," not necessarily
a true global MTD mode.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
These modes are not necessarily for OOB only. Particularly, MTD_OOB_RAW
affected operations on in-band page data as well. To clarify these
options and to emphasize that their effect is applied per-operation, we
change the primary prefix to MTD_OPS_.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
This fixes issues with `nanddump -n' and the MEMREADOOB[64] ioctls on
hardware that performs error correction when reading only OOB data. A
driver for such hardware needs to know when we're doing a RAW vs. a
normal write, but mtd_do_read_oob does not pass such information to the
lower layers (e.g., NAND). We should pass MTD_OOB_RAW or MTD_OOB_PLACE
based on the MTD file mode.
For now, most drivers can get away with just setting:
chip->ecc.read_oob_raw = chip->ecc.read_oob
This is done by default; but for systems that behave as described above,
you must supply your own replacement function.
This was tested with nandsim as well as on actual SLC NAND.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Jim Quinlan <jim2101024@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
This fixes issues with `nandwrite -n -o' and the MEMWRITEOOB[64] ioctls
on hardware that writes ECC when writing OOB. The problem arises as
follows: `nandwrite -n' can write page data to flash without applying
ECC, but when used with the `-o' option, ECC is applied (incorrectly),
contrary to the `--noecc' option.
I found that this is the case because my hardware computes and writes
ECC data to flash upon either OOB write or page write. Thus, to support
a proper "no ECC" write, my driver must know when we're performing a raw
OOB write vs. a normal ECC OOB write. However, MTD does not pass any raw
mode information to the write_oob functions. This patch addresses the
problems by:
1) Passing MTD_OOB_RAW down to lower layers, instead of just defaulting
to MTD_OOB_PLACE
2) Handling MTD_OOB_RAW within the NAND layer's `nand_do_write_oob'
3) Adding a new (replaceable) function pointer in struct ecc_ctrl; this
function should support writing OOB without ECC data. Current
hardware often can use the same OOB write function when writing
either with or without ECC
This was tested with nandsim as well as on actual SLC NAND.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Jim Quinlan <jim2101024@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Previous generations of MTDs all used OOB sizes that were powers of 2,
(e.g., 64, 128). However, newer generations of flash, especially NAND,
use irregular OOB sizes that are not powers of 2 (e.g., 218, 224, 448).
This means we cannot use masks like "mtd->oobsize - 1" to assume that we
will get a proper bitmask for OOB operations.
These masks are really only intended to hide the "page" portion of the
offset, leaving any OOB offset intact, so a masking with the writesize
(which *is* always a power of 2) is valid and makes more sense.
This has been tested for read/write of NAND devices (nanddump/nandwrite)
using nandsim and actual NAND flash.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Start moving away from the MTD_DEBUG_LEVEL messages. The dynamic
debugging feature is a generic kernel feature that provides more
flexibility.
(See Documentation/dynamic-debug-howto.txt)
Also fix some punctuation, indentation, and capitalization that went
along with the affected lines.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
This comment was unclear regarding which NAND functions do and do not
support ECC on the spare area. This update should reflect the current
status of the NAND system but can be updated if changes are made in
the standard functions.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <dedekind1@gmail.com>
While the standard NAND OOB functions do not do ECC on the spare area,
it is possible for a driver to supply its own OOB ECC functions (e.g., HW
ECC). nand_do_read_oob should act like nand_do_read_ops in checking the
ECC stats and returning -EBADMSG or -EUCLEAN on uncorrectable errors or
correctable bitflips, respectively. These error codes could be used in
flash-based BBT code or in YAFFS, for example.
Doing this, however, messes with the behavior of mtd_do_readoob. Now,
mtd_do_readoob should check whether we had -EUCLEAN or -EBADMSG errors
and discard those as "non-fatal" so that the ioctls can still succeed
with (possibly uncorrected) data.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
On writes in MODE_RAW the mtd_oob_ops struct is not sufficiently
initialized which may cause nandwrite to fail. With this patch
it is possible to write raw nand/oob data without additional ECC
(either for testing or when some sectors need different oob layout
e.g. bootloader) like
nandwrite -n -r -o /dev/mtd0 <myfile>
Signed-off-by: Peter Wippich <pewi@gw-instruments.de>
Cc: stable@kernel.org
Tested-by: Ricard Wanderlof <ricardw@axis.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
For a number of file systems that don't have a mount point (e.g. sockfs
and pipefs), they are not marked as long term. Therefore in
mntput_no_expire, all locks in vfs_mount lock are taken instead of just
local cpu's lock to aggregate reference counts when we release
reference to file objects. In fact, only local lock need to have been
taken to update ref counts as these file systems are in no danger of
going away until we are ready to unregister them.
The attached patch marks file systems using kern_mount without
mount point as long term. The contentions of vfs_mount lock
is now eliminated. Before un-registering such file system,
kern_unmount should be called to remove the long term flag and
make the mount point ready to be freed.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that none of the drivers use CONFIG_MTD_PARTITIONS we can remove
it from Kconfig and the last remaining uses.
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Replace direct call to kmalloc for a potentially large, contiguous
buffer allocation with one to mtd_kmalloc_up_to which helps ensure the
operation can succeed under low-memory, highly- fragmented situations
albeit somewhat more slowly.
Signed-off-by: Grant Erickson <marathon96@gmail.com>
Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>
Tested-by: Stefano Babic <sbabic@denx.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Instead of splitting refcount between (per-cpu) mnt_count
and (SMP-only) mnt_longrefs, make all references contribute
to mnt_count again and keep track of how many are longterm
ones.
Accounting rules for longterm count:
* 1 for each fs_struct.root.mnt
* 1 for each fs_struct.pwd.mnt
* 1 for having non-NULL ->mnt_ns
* decrement to 0 happens only under vfsmount lock exclusive
That allows nice common case for mntput() - since we can't drop the
final reference until after mnt_longterm has reached 0 due to the rules
above, mntput() can grab vfsmount lock shared and check mnt_longterm.
If it turns out to be non-zero (which is the common case), we know
that this is not the final mntput() and can just blindly decrement
percpu mnt_count. Otherwise we grab vfsmount lock exclusive and
do usual decrement-and-check of percpu mnt_count.
For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
namespace.c uses the latter in places where we don't already hold
vfsmount lock exclusive and opencodes a few remaining spots where
we need to manipulate mnt_longterm.
Note that we mostly revert the code outside of fs/namespace.c back
to what we used to have; in particular, normal code doesn't need
to care about two kinds of references, etc. And we get to keep
the optimization Nick's variant had bought us...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.
The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.
We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.
- check the global sum once every interval (this will delay zero detection
for some interval, so it's probably a showstopper for vfsmounts).
- keep a local count and only taking the global sum when local reaches 0 (this
is difficult for vfsmounts, because we can't hold preempt off for the life of
a reference, so a counter would need to be per-thread or tied strongly to a
particular CPU which requires more locking).
- keep a local difference of increments and decrements, which allows us to sum
the total difference and hence find the refcount when summing all CPUs. Then,
keep a single integer "long" refcount for slow and long lasting references,
and only take the global sum of local counters when the long refcount is 0.
This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.
This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.
This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Function mtd_has_master renamed as mtd_is_partition to follow the function logic.
The patch fixes the problem of checking the right mtd device for partition creation.
To delete partition checking is not needed here so as it is done in mtd_del_partition.
By master we consider the mtd device which does not belong to any partition.
Signed-off-by: Roman Tereshonkov <roman.tereshonkov@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Structure mtd_info_user is copied to userland with padding byted
between "type" and "flags" fields uninitialized. It leads to leaking
of contents of kernel stack memory.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Add support for mtd repartition based on the block
device BLKPG interface:
BLKPG_ADD_PARTITION - for partition creation;
BLKPG_DEL_PARTITION - for partition delete
The usage is based on BLKPG ioctl called with
struct blkpg_ioctl_arg argument which includes the
reference to struct blkpg_partition discribing the
partition offset and length.
Disadvantage: there is no implementation for mtd
flags control. The flags are always borrowed from
the master device.
Signed-off-by: Roman Tereshonkov <roman.tereshonkov@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
If "ur_idx" is wrong we could go past the end of the array. The
"ur_idx" comes from root so it's not a huge deal, but adding a sanity
check makes the code more robust.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There were some improvements and additions necessary in the
comments explaining of the expansion of nand_ecclayout, the
introduction of nand_ecclayout_user, and the deprecation of the
ioctl ECCGETLAYOUT.
Also, I found a better placement for the macro MTD_MAX_ECCPOS_ENTRIES;
next to the definition of MTD_MAX_OOBFREE_ENTRIES in mtd-abi.h. The macro
is really only important for the ioctl code (found in drivers/mtd/mtdchar.c)
but since there are small edits being made to the user-space header, I
figured this is a better location.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
struct nand_ecclayout is too small for many new chips; OOB regions can be as
large as 448 bytes and may increase more in the future. Thus, copying that
struct to user-space with the ECCGETLAYOUT ioctl is not a good idea; the ioctl
would have to be updated every time there's a change to the current largest
size.
Instead, the old nand_ecclayout is renamed to nand_ecclayout_user and a
new struct nand_ecclayout is created that can accomodate larger sizes and
expand without affecting the user-space. struct nand_ecclayout can still
be used in board drivers without modification -- at least for now.
A new function is provided to convert from the new to the old in order to
allow the deprecated ioctl to continue to work with truncated data. Perhaps
the ioctl, the conversion process, and the struct nand_ecclayout_user can be
removed altogether in the future.
Note: There are comments in nand/davinci_nand.c::nand_davinci_probe()
regarding this issue; this driver (and maybe others) can be updated to
account for extra space. All kernel drivers can use the expanded
nand_ecclayout as a drop-in replacement and ignore its benefits.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
All these files use the big kernel lock in a trivial
way to serialize their private file operations,
typically resulting from an earlier semi-automatic
pushdown from VFS.
None of these drivers appears to want to lock against
other code, and they all use the BKL as the top-level
lock in their file operations, meaning that there
is no lock-order inversion problem.
Consequently, we can remove the BKL completely,
replacing it with a per-file mutex in every case.
Using a scripted approach means we can avoid
typos.
file=$1
name=$2
if grep -q lock_kernel ${file} ; then
if grep -q 'include.*linux.mutex.h' ${file} ; then
sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
else
sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
fi
sed -i ${file} \
-e "/^#include.*linux.mutex.h/,$ {
1,/^\(static\|int\|long\)/ {
/^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);
} }" \
-e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
-e '/[ ]*cycle_kernel_lock();/d'
else
sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \
-e '/cycle_kernel_lock()/d'
fi
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: linux-mtd@lists.infradead.org
For no-mmu systems mmap() on RAM/ROM devices already works
but for systems with mmu it probably was not tested and
doesn't work.
This patch allows using mmap() on MTD RAM/ROM devices on systems
with MMU. It has been tested on mpc5121e based platform with
MR0A16A MRAM device attached over LocalBus.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patchs adds a way for user space programs to find out whether a
flash sector is locked. An optional driver method in the mtd_info struct
provides the information.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use memdup_user when user data is immediately copied into the
allocated region.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@
- to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+ to = memdup_user(from,size);
if (
- to==NULL
+ IS_ERR(to)
|| ...) {
<+... when != goto l1;
- -ENOMEM
+ PTR_ERR(to)
...+>
}
- if (copy_from_user(to, from, size) != 0) {
- <+... when != goto l2;
- -EFAULT
- ...+>
- }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We cannot modify file->f_mapping->backing_dev_info, because it will corrupt
backing device of device node inode, since file->f_mapping is equal to
inode->i_mapping (see __dentry_open() in fs/open.c).
Let's introduce separate inode for MTD device with appropriate backing
device.
[dwmw2: Refactor to keep it all entirely within mtdchar.c; use iget_locked()]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
These are the last remaining device drivers using
the ->ioctl file operation in the drivers directory
(except from v4l drivers).
[fweisbec: drop i8k pushdown as it has been done from
procfs pushdown branch already]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
register_chrdev() registers minor numbers up to 255, but we can now
potentially have much larger numbers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
MAX_MTD_DEVICES is about to be removed.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In mtd_ioctl MEMGETREGIONINFO the region_user_info pointer ur
is cast in __kernel space. This produces a number of sparse warnings
like:
warning: cast removes address space of expression
warning: incorrect type in initializer (different address spaces)
expected unsigned int const [noderef] <asn:1>*register __p
got unsigned int *<noident>
Since argp is already a void __user * just use it dirrectly without
the cast and make ur a __user *.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Remove all references to MTD ioctls from fs/compat_ioctl.c and let
them all be handled by mtd_compat_ioctl().
Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>