This is a patch to allocate pgdat and per node data area for ia64. The size
for them can be calculated by compute_pernodesize().
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is to refresh node_data[] array for ia64. As I mentioned previous
patches, ia64 has copies of information of pgdat address array on each node as
per node data.
At v2 of node_add, this function used stop_machine_run() to update them. (I
wished that they were copied safety as much as possible.) But, in this patch,
this arrays are just copied simply, and set node_online_map bit after
completion of pgdat initialization.
So, kernel must touch NODE_DATA() macro after checking node_online_map().
(Current code has already done it.) This is more simple way for just
hot-add.....
Note : It will be problem when hot-remove will occur,
because, even if online_map bit is set, kernel may
touch NODE_DATA() due to race condition. :-(
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When new node becomes enable by hot-add, new sysfs file must be created for
new node. So, if new node is enabled by add_memory(), register_one_node() is
called to create it. In addition, I386's arch_register_node() and a part of
register_nodes() of powerpc are consolidated to register_one_node() as a
generic_code().
This is tested by Tiger4(IPF) with node hot-plug emulation.
Signed-off-by: Keiichiro Tokunaga <tokuanga.keiich@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch allows hot-add memory which is not aligned to section.
Now, hot-added memory has to be aligned to section size. Considering big
section sized archs, this is not useful.
When hot-added memory is registerd as iomem resoruce by iomem resource
patch, we can make use of that information to detect valid memory range.
Note: With this, not-aligned memory can be registerd. To allow hot-add
memory with holes, we have to do more work around add_memory().
(It doesn't allows add memory to already existing mem section.)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When node is hot-added, kswapd for the node should start. This export kswapd
start function as kswapd_run() to use at add_memory().
[akpm@osdl.org: daemonize() isn't needed when using the kthread API]
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Refresh NODE_DATA() for generic archs. In this case, NODE_DATA(nid) ==
node_data[nid]. node_data[] is array of address of pgdat. So, refresh is
quite simple.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For node hotplug, basically we have to allocate new pgdat. But, there are
several types of implementations of pgdat.
1. Allocate only pgdat.
This style allocate only pgdat area.
And its address is recorded in node_data[].
It is most popular style.
2. Static array of pgdat
In this case, all of pgdats are static array.
Some archs use this style.
3. Allocate not only pgdat, but also per node data.
To increase performance, each node has copy of some data as
a per node data. So, this area must be allocated too.
Ia64 is this style. Ia64 has the copies of node_data[] array
on each per node data to increase performance.
In this series of patches, treat (1) as generic arch.
generic archs can use generic function. (2) and (3) should have
its own if necessary.
This patch defines pgdat allocator.
Updating NODE_DATA() macro function is in other patch.
Signed-off-by: Yasonori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is to find node id from acpi's handle of memory_device in DSDT. _PXM for
the new node can be found by acpi_get_pxm() by using new memory's handle. So,
node id can be found by pxm_to_nid_map[].
This patch becomes simpler than v2 of node hot-add patch.
Because old add_memory() function doesn't have node id parameter.
So, kernel must find its handle by physical address via DSDT again.
But, v3 just give node id to add_memory() now.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the name of old add_memory() to arch_add_memory. And use node id to
get pgdat for the node at NODE_DATA().
Note: Powerpc's old add_memory() is defined as __devinit. However,
add_memory() is usually called only after bootup.
I suppose it may be redundant. But, I'm not well known about powerpc.
So, I keep it. (But, __meminit is better at least.)
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce the Kconfig entry and actually switch to a 64bit value, if
wanted, for resource_size_t.
Based on a patch series originally from Vivek Goyal <vgoyal@in.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Based on a patch series originally from Vivek Goyal <vgoyal@in.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Based on a patch series originally from Vivek Goyal <vgoyal@in.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Based on a patch series originally from Vivek Goyal <vgoyal@in.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
But do not change it from what it currently is (unsigned long)
Based on a patch series originally from Vivek Goyal <vgoyal@in.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- When xdatum is removed, a new xdatum with 'delete marker' is
written. (version==0xffffffff means 'delete marker')
- When xref is removed, a new xref with 'delete marker' is written.
(odd-numbered xseqno means 'delete marker')
- delete_xattr_(datum/xref)_delay() are new deletion functions
are added. We can only use them if we can detect the target
obsolete xdatum/xref as a orphan or errir one.
(e.g when inode deletion, or detecting crc error)
[1/3] jffs2-xattr-v6-01-delete_marker.patch
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
so that userspace can be notified of dock and undock events.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Implement ata_port_max_devices(). This function returns the number of
possible devices on a port. This will be used by new PM
implementation.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
From: Andrew Morton <akpm@osdl.org>
s390:
In file included from drivers/scsi/libata-bmdma.c:39: include/linux/libata.h:391: error: field 'sgent' has incomplete type
include/linux/libata.h:392: error: field 'pad_sgent' has incomplete type
include/linux/libata.h: In function 'ata_sg_is_last': include/linux/libata.h:849: error: arithmetic on pointer to an incomplete type
include/linux/libata.h:849: error: arithmetic on pointer to an incomplete type
include/linux/libata.h: In function 'ata_qc_next_sg':
include/linux/libata.h:869: error: increment of pointer to unknown structure
include/linux/libata.h:869: error: arithmetic on pointer to an incomplete type
include/linux/libata.h:869: error: arithmetic on pointer to an incomplete type
include/linux/libata.h:869: error: arithmetic on pointer to an incomplete type
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Partially revert 74d0a988d3aa359b6b8a8536c8cb92cce02ca5d5:
[PATCH] PCI: Move various PCI IDs to header file
libata policy is to avoid use of named PCI device ID constants.
These are often single-use constants, which have little value over
direct numeric constants save for constant include/linux/pci_ids.h
patching/merging headaches.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
I've always found this flag confusing. Now that devfs is no longer around, it
has been renamed, and the documentation for when this flag should be used has
been updated.
Also fixes all drivers that use this flag.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is the first patch in a series of patches that removes devfs
support from the kernel. This patch removes the core devfs code, and
its private header file.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit c7b2eff059.
Hugh Dickins explains:
"It seems too little tested: "losetup -d /dev/loop0" fails with
EINVAL because nothing sets lo_thread; but even when you patch
loop_thread() to set lo->lo_thread = current, it can't survive
more than a few dozen iterations of the loop below (with a tmpfs
mounted on /tst):
j=0
cp /dev/zero /tst
while :
do
let j=j+1
echo "Doing pass $j"
losetup /dev/loop0 /tst/zero
mkfs -t ext2 -b 1024 /dev/loop0 >/dev/null 2>&1
mount -t ext2 /dev/loop0 /mnt
umount /mnt
losetup -d /dev/loop0
done
it collapses with failed ioctl then BUG_ON(!bio).
I think the original lo_done completion was more subtle and safe
than the kthread conversion has allowed for."
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In timekeeping code, one often does need to use conversion constants. Naming
these leads to code that's easier to understand, showing the reader between
which units the conversion is made.
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add proper conditionals to be able to build with CONFIG_MODULES=n.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If no unwinding is possible at all for a certain exception instance,
fall back to the old style call trace instead of not showing any trace
at all.
Also, allow setting the stack trace mode at the command line.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are the generic bits needed to enable reliable stack traces based
on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will
enable x86-64 and i386 to make use of this.
Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities
for improvement.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sometimes e.g. with crashme the compat layer warnings can be noisy.
Add a way to turn them off by gating all output through compat_printk
that checks a global sysctl. The default is not changed.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- record the 'event' count on each individual device (they
might sometimes be slightly different now)
- add a new value for 'sb_dirty': '3' means that the super
block only needs to be updated to record a clean<->dirty
transition.
- Prefer odd event numbers for dirty states and even numbers
for clean states
- Using all the above, don't update the superblock on
a spare device if the update is just doing a clean-dirty
transition. To accomodate this, a transition from
dirty back to clean might now decrement the events counter
if nothing else has changed.
The net effect of this is that spare drives will not see any IO requests
during normal running of the array, so they can go to sleep if that is what
they want to do.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If md is asked to store a bitmap in a file, it tries to hold onto the page
cache pages for that file, manipulate them directly, and call a cocktail of
operations to write the file out. I don't believe this is a supportable
approach.
This patch changes the approach to use the same approach as swap files. i.e.
bmap is used to enumerate all the block address of parts of the file and we
write directly to those blocks of the device.
swapfile only uses parts of the file that provide a full pages at contiguous
addresses. We don't have that luxury so we have to cope with pages that are
non-contiguous in storage. To handle this we attach buffers to each page, and
store the addresses in those buffers.
With this approach the pagecache may contain data which is inconsistent with
what is on disk. To alleviate the problems this can cause, md invalidates the
pagecache when releasing the file. If the file is to be examined while the
array is active (a non-critical but occasionally useful function), O_DIRECT io
must be used. And new version of mdadm will have support for this.
This approach simplifies a lot of code:
- we no longer need to keep a list of pages which we need to wait for,
as the b_endio function can keep track of how many outstanding
writes there are. This saves a mempool.
- -EAGAIN returns from write_page are no longer possible (not sure if
they ever were actually).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
md/bitmap currently has a separate thread to wait for writes to the bitmap
file to complete (as we cannot get a callback on that action).
However this isn't needed as bitmap_unplug is called from process context and
waits for the writeback thread to do it's work. The same result can be
achieved by doing the waiting directly in bitmap_unplug.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes the needlessly global md_print_devices() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The "industry standard" DDF format allows for a stripe/offset layout where
data is duplicated on different stripes. e.g.
A B C D
D A B C
E F G H
H E F G
(columns are drives, rows are stripes, LETTERS are chunks of data).
This is similar to raid10's 'far' mode, but not quite the same. So enhance
'far' mode with a 'far/offset' option which follows the layout of DDFs
stripe/offset.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For a while we have had checkpointing of resync. The version-1 superblock
allows recovery to be checkpointed as well, and this patch implements that.
Due to early carelessness we need to add a feature flag to signal that the
recovery_offset field is in use, otherwise older kernels would assume that a
partially recovered array is in fact fully recovered.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a lot of commonality between raid5.c and raid6main.c. This patches
merges both into one module called raid456. This saves a lot of code, and
paves the way for online raid5->raid6 migrations.
There is still duplication, e.g. between handle_stripe5 and handle_stripe6.
This will probably be cleaned up later.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>