The break_lock data structure and code for spinlocks is quite nasty.
Not only does it double the size of a spinlock but it changes locking to
a potentially less optimal trylock.
Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a
__raw_spin_is_contended that uses the lock data itself to determine whether
there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is
not set.
Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to
decouple it from the spinlock implementation, and make it typesafe (rwlocks
do not have any need_lockbreak sites -- why do they even get bloated up
with that break_lock then?).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The patch below updates the jbd stats patch to 2.6.20/jbd2.
The initial patch was posted by Alex Tomas in December 2005
(http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
It provides statistics via procfs such as transaction lifetime and size.
Sometimes, investigating performance problems, i find useful to have
stats from jbd about transaction's lifetime, size, etc. here is a
patch for review and inclusion probably.
for example, stats after creation of 3M files in htree directory:
[root@bob ~]# cat /proc/fs/jbd/sda/history
R/C tid wait run lock flush log hndls block inlog ctime write drop close
R 261 8260 2720 0 0 750 9892 8170 8187
C 259 750 0 4885 1
R 262 20 2200 10 0 770 9836 8170 8187
R 263 30 2200 10 0 3070 9812 8170 8187
R 264 0 5000 10 0 1340 0 0 0
C 261 8240 3212 4957 0
R 265 8260 1470 0 0 4640 9854 8170 8187
R 266 0 5000 10 0 1460 0 0 0
C 262 8210 2989 4868 0
R 267 8230 1490 10 0 4440 9875 8171 8188
R 268 0 5000 10 0 1260 0 0 0
C 263 7710 2937 4908 0
R 269 7730 1470 10 0 3330 9841 8170 8187
R 270 0 5000 10 0 830 0 0 0
C 265 8140 3234 4898 0
C 267 720 0 4849 1
R 271 8630 2740 20 0 740 9819 8170 8187
C 269 800 0 4214 1
R 272 40 2170 10 0 830 9716 8170 8187
R 273 40 2280 0 0 3530 9799 8170 8187
R 274 0 5000 10 0 990 0 0 0
where,
R - line for transaction's life from T_RUNNING to T_FINISHED
C - line for transaction's checkpointing
tid - transaction's id
wait - for how long we were waiting for new transaction to start
(the longest period journal_start() took in this transaction)
run - real transaction's lifetime (from T_RUNNING to T_LOCKED
lock - how long we were waiting for all handles to close
(time the transaction was in T_LOCKED)
flush - how long it took to flush all data (data=ordered)
log - how long it took to write the transaction to the log
hndls - how many handles got to the transaction
block - how many blocks got to the transaction
inlog - how many blocks are written to the log (block + descriptors)
ctime - how long it took to checkpoint the transaction
write - how many blocks have been written during checkpointing
drop - how many blocks have been dropped during checkpointing
close - how many running transactions have been closed to checkpoint this one
all times are in msec.
[root@bob ~]# cat /proc/fs/jbd/sda/info
280 transaction, each upto 8192 blocks
average:
1633ms waiting for transaction
3616ms running transaction
5ms transaction was being locked
1ms flushing data (in ordered mode)
1799ms logging transaction
11781 handles per transaction
5629 blocks per transaction
5641 logged blocks per transaction
Signed-off-by: Johann Lombardi <johann.lombardi@bull.net>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.
If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).
We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state. The locking there is subtle though (as
everywhere in JBD ;(). We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.
Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either. Better be safe if something
changes in future...
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Mingming Cao originally did this work, and Shaggy reproduced it using some
scripts from her.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a simple copy of the files in fs/jbd to fs/jbd2 and
/usr/incude/linux/[ext4_]jbd.h to /usr/include/[ext4_]jbd2.h
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
More white space cleanups in preparation of cloning ext4 from ext3.
Removing spaces that precede a tab.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove whitespace from ext3 and jbd, before we clone ext4.
Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
jbd_sync_bh releases journal->j_list_lock. Add a lock annotation to this
function so that sparse can check callers for lock pairing, and so that
sparse will not complain about this function since it intentionally uses
the lock in this manner.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split the checkpoint list of the transaction into two lists. In the first
list we keep the buffers that need to be submitted for IO. In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete. This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch reverts commit f93ea411b73594f7d144855fd34278bcf34a9afc:
[PATCH] jbd: split checkpoint lists
This broke journal_flush() for OCFS2, which is its method of being sure
that metadata is sent to disk for another node.
And two related commits 8d3c7fce2d and
43c3e6f5ab with the subjects:
[PATCH] jbd: log_do_checkpoint fix
[PATCH] jbd: remove_transaction fix
These seem to be incremental bugfixes on the original patch and as such are
no longer needed.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While checkpointing we have to check that our transaction still is in the
checkpoint list *and* (not or) that it's not just a different transaction
with the same address.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split the checkpoint list of the transaction into two lists. In the first
list we keep the buffers that need to be submitted for IO. In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete. This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We must be sure that the current data in buffer are sent to disk. Hence we
have to call ll_rw_block() with SWRITE.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug in list scanning that can cause us to skip the last buffer on the
checkpoint list (and hence fail to do any progress under some rather
unfavorable conditions).
The problem is we first do jh=next_jh and then test
} while (jh!=last_jh);
Hence we skip the last buffer on the list (if it was not the only buffer on
the list). As we already do jh=next_jh; in the beginning of the loop we
are safe to just remove the assignment in the end. It can happen that 'jh'
will be freed at the point we test jh != last_jh but that does not matter
as we never *dereference* the pointer.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix possible false assertion failure in log_do_checkpoint(). We might fail
to detect that we actually made a progress when cleaning up the checkpoint
lists if we don't retry after writing something to disk. The patch was
confirmed to fix observed assertion failures for several users.
When we flushed some buffers we need to retry scanning the list.
Otherwise we can fail to detect our progress.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!