On Sat, 2008-01-05 at 13:35 -0800, Davide Libenzi wrote:
> I remember I talked with Arjan about this time ago. Basically, since 1)
> you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
> are used, you can see a wake_up() from inside another wake_up(), but they
> will never refer to the same lock instance.
> Think about:
>
> dfd = socket(...);
> efd1 = epoll_create();
> efd2 = epoll_create();
> epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
> epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
>
> When a packet arrives to the device underneath "dfd", the net code will
> issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
> callback wakeup entry on that queue, and the wake_up() performed by the
> "dfd" net code will end up in ep_poll_callback(). At this point epoll
> (efd1) notices that it may have some event ready, so it needs to wake up
> the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
> that ends up in another wake_up(), after having checked about the
> recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
> avoid stack blasting. Never hit the same queue, to avoid loops like:
>
> epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
> epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
> epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
> epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);
>
> The code "if (tncur->wq == wq || ..." prevents re-entering the same
> queue/lock.
Since the epoll code is very careful to not nest same instance locks
allow the recursion.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a way to use tc filters on vlan tag even if tag is buried in
skb due to hardware acceleration.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if_addrlabel.h is needed for iproute2 usage.
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the following no longer used functions:
- rtattr_parse()
- rtattr_strlcpy()
- __rtattr_parse_nested_compat()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bumps the AGP interface to 0.103.
Certain Intel chipsets contains a global write buffer, and this can require
flushing from the drm or X.org to make sure all data has hit RAM before
initiating a GPU transfer, due to a lack of coherency with the integrated
graphics device and this buffer.
This just adds generic support to the AGP interfaces, a follow-on patch
will add support to the Intel driver to use this interface.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add some comments explaining the fields of the kmem_cache_cpu structure.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The NUMA defrag works by allocating objects from partial slabs on remote
nodes. Rename it to
remote_node_defrag_ratio
to be clear about this.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently early kernel messages, i.e., those from uncompression, go to the
debugging UART. And if it is enabled in the platform configuration, but
not initialized by the bootloader, the machine hangs, waiting for UART
status change. Besides, having those messages on another UART - typically
the console UART - may be preferrable. This patch allows selecting the
UART in kernel configuration.
Signed-off-by: Guennadi Liakhovetski <lg@denx.de>
Acked-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
After discussions with Anthony Liguori, it seems that the virtio
balloon can be made even simpler. Here's my attempt.
The device configuration tells the driver how much memory it should
take from the guest (ie. balloon size). The guest feeds the page
numbers it has taken via one virtqueue.
A second virtqueue feeds the page numbers the driver wants back: if
the device has the VIRTIO_BALLOON_F_MUST_TELL_HOST bit, then this
queue is compulsory, otherwise it's advisory (and the guest can simply
fault the pages back in).
This driver can be enhanced later to deflate the balloon via a
shrinker, oom callback or we could even go for a complete set of
in-guest regulators.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As Avi pointed out, as we continue to massage the virtio PCI ABI, we can make
things a little more friendly to users by utilizing the PCI revision field to
indicate which version of the ABI we're using. This is a hard ABI version
and incrementing it will cause the guest driver to break.
This is the necessary changes to virtio_pci to support this.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a PCI device that implements a transport for virtio. It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A reset function solves three problems:
1) It allows us to renegotiate features, eg. if we want to upgrade a
guest driver without rebooting the guest.
2) It gives us a clean way of shutting down virtqueues: after a reset,
we know that the buffers won't be used by the host, and
3) It helps the guest recover from messed-up drivers.
So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.
We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1) Turn GSO on virtio net into an all-or-nothing (keep checksumming
separate). Having multiple bits is a pain: if you can't support something
you should handle it in software, which is still a performance win.
2) Make VIRTIO_NET_HDR_GSO_ECN a flag in the header, so it can apply to
IPv6 or v4.
3) Rename VIRTIO_NET_F_NO_CSUM to VIRTIO_NET_F_CSUM (ie. means we do
checksumming).
4) Add csum and gso params to virtio_net to allow more testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's far easier to deal with packets if we don't have to parse the
packet to figure out the header length to know how much to pull into
the skb data. Add the field to the virtio_net_hdr struct (and fix the
spaces that somehow crept in there).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This field has been unused since an older version of virtio. Remove
it now before we freeze the ABI.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au.
The other side (host) can set the NO_NOTIFY flag as an optimization,
to say "no need to kick me when you add things". Make it clear that
this is advisory only; especially that we should always notify when
the ring is full.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Using unsigned int resulted in silent truncation of the upper 32-bit
on x86_64 resulting in an OOPS since the ring was being initialized
wrong.
Please reconsider my previous patch to just use PAGE_ALIGN(). Open
coding this sort of stuff, no matter how simple it seems, is just
asking for this sort of trouble.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Various drivers want to know when their configuration information
changes: the balloon driver is the immediate user, but the network
driver may one day have a "carrier" status as well.
This introduces that callback (lguest doesn't use it yet).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.
Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previously we used a type/len pair within the config space, but this
seems overkill. We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.
The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use it in virtio_net (replacing buggy version there), it's also going
to be used by TAP for partial csum support.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
fcntl(F_GETLK,..) can return pid of process for not current pid namespace
(if process is belonged to the several namespaces). It is true also for
pids in /proc/locks. So correct behavior is saving pointer to the struct
pid of the process lock owner.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Function do_timer_interrupt_hook() don't take argument regs,
and structure hrtimer_sleeper don't have member cb_pending.
So delete comments refering to these symbols.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
There is an unmatched parenthesis in the locking commentary of radix_tree.h
which is trivially fixed by the patch below.
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Having the macro to prevent multiple inclusion of
include/linux/dma-mapping.h contain the prefix "_ASM" is just begging
for possible confusion some day.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
PHY read/write functions can potentially sleep (e.g., a PHY accessed
via I2C). The following changes were made to account for this:
* Change spin locks to mutex locks
* Add a BUG_ON() to phy_read() phy_write() to warn against
calling them from an interrupt context.
* Use work queue for PHY state machine handling since
it can potentially sleep
* Change phydev lock from spinlock to mutex
Signed-off-by: Nate Case <ncase@xes-inc.com>
Acked-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 312b1485fb made __INIT_REFOK expand
into .section .section ".ref.text", "ax". Since the assembler doesn't
tolerate stuttering in the source that broke all MIPS builds.
Since with this change Sam downgraded __INIT_REFOK to just a backward
compat thing and there being only a single use in the MIPS arch code the
best solution is to delete both of __INIT_REFOK and __INITDATA_REFOK (which
was equally broken) being unused anyway these can be deleted.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Move the declaration of device_pm_schedule_removal() to device.h
and make it exported, as it will be used directly by some drivers
for unregistering device objects during suspend/resume cycles in a
safe way.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit 6c723d5bd8.
It caused build errors on non-x86 platforms, config file confusion, and
even some boot errors on some x86-64 boxes. All around, not quite ready
for prime-time :(
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Move check_dma_crc() to ide-dma.c and add inline version for
CONFIG_BLK_DEV_IDEDMA=n case.
* Rename check_dma_crc() to ide_check_dma_crc().
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* siimage.c: use hwif->sata_scr[SATA_{ERROR,STATUS}_OFFSET] instead of
SATA_{ERROR,STATUS}_REG macros.
* Remove no longer needed SATA_*_REG macros.
While at it:
* Remove needless SATA Status register read from sil_sata_reset_poll().
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* ->nice0 and ->nice2 ide_drive_t fields are always zero so remove them.
* IDE_NICE_0 and IDE_NICE_2 defines from <linux/hdreg.h> are no longer
used by any kernel code so cover them with #ifndef/#endif __KERNEL__.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Un-static create_proc_ide_drives() and call it from ide_device_add_all().
While at it:
* Rename create_proc_ide_drives() to ide_proc_port_register_devices().
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Factor out devices setup from ide_acpi_init() to
ide_acpi_port_init_devices().
* Call ide_acpi_port_init_devices() in ide_device_add_all().
While at it:
* Remove no longer needed 'drive' field from struct ide_acpi_drive_link.
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ->port_init_devs method to ide_hwif_t for a host specific
initialization of devices on a port. Call the new method from
ide_port_init_devices().
* Convert ht6560b, qd65xx and opti621 host drivers to use the new
->port_init_devs method.
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Use the same bit for IDE_HFLAG_CS5520 and IDE_HFLAG_VDMA host flags
(both are used only by cs5520 host driver currently).
* Add IDE_HFLAG_NO_IO32_BIT host flag and use it instead of ->no_io_32bit
ide_hwif_t field.
* Add IDE_HFLAG_NO_UNMASK_IRQS host flag, then convert dtc2278 and rz1000
host drivers to use it.
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Factor out code for finding ide_hwifs[] slot from ide_register_hw()
to ide_deprecated_find_port().
* Convert bast-ide, ide-cs and delkin_cb host drivers to use ide_device_add()
instead of ide_register_hw() (while at it drop doing "ide_unregister()" loop
which tries to unregister _all_ IDE interfaces if useable ide_hwifs[] slot
cannot be find).
This patch leaves us with only two ide_register_hw() users:
- drivers/macintosh/mediabay.c
- drivers/ide/ide.c
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add 'init_default' (flag for calling init_hwif_default()) and 'restore'
(flag for calling ide_hwif_restore()) arguments to ide_unregister().
* Update ide_unregister() users to set 'init_default' and 'restore' flags.
* No need to set 'init_default' flag in ide_register_hw() if the setup done
by init_hwif_default() is going to be overridden by ide_init_port_hw().
* No need to set 'init_default' and 'restore' flags in cleanup_module().
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Also, move xfer_func_t typedef to the ide.h since it is used by two drivers
now (more coming).
Bart:
- use __func__ while at it
Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ->cable_detect method to ide_hwif_t.
* Call the new method in ide_init_port() if:
- the host supports UDMA modes > UDMA2 ('hwif->ultra_mask & 78')
- DMA initialization was successful (if hwif->dma_base is not set
ide_init_port() sets hwif->ultra_mask to zero)
- "idex=ata66" is not used ('hwif->cbl != ATA_CBL_PATA40_SHORT')
* Convert PCI host drivers to use ->cable_detect method.
While at it:
* Factor out cable detection to separate functions (if not already done).
* hpt366.c/it8213.c/slc90e66.c:
- don't check cable type if "idex=ata66" is used
* pdc202xx_new.c:
- add __devinit tag to pdcnew_cable_detect()
* pdc202xx_old.c:
- rename pdc202xx_old_cable_detect() to pdc2026x_old_cable_detect()
- add __devinit tag to pdc2026x_old_cable_detect()
Reviewed-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Remove 'struct pci_dev *dev' argument from ide_hwif_setup_dma().
* Un-static ide_hwif_setup_dma() and add CONFIG_BLK_DEV_IDEDMA_PCI=n version.
* Add 'const struct ide_port_info *d' argument to ide_device_add[_all]().
* Factor out generic ports init from ide_pci_setup_ports() to ide_init_port(),
move it to ide-probe.c and call it in in ide_device_add_all() instead of
ide_pci_setup_ports().
* Move ->mate setup to ide_device_add_all() from ide_port_init().
* Add IDE_HFLAG_NO_AUTOTUNE host flag for host drivers that don't enable
->autotune currently.
* Setup hwif->chipset in ide_init_port() but iff pi->chipset is set
(to not override setup done by ide_hwif_configure()).
* Add ETRAX host handling to ide_device_add_all().
* cmd640.c: set IDE_HFLAG_ABUSE_* also for CONFIG_BLK_DEV_CMD640_ENHANCED=n.
* pmac.c: make pmac_ide_setup_dma() return an error value and move DMA masks
setup to pmac_ide_setup_device().
* Add 'struct ide_port_info' instances to legacy host drivers, pass them to
ide_device_add() calls and then remove open-coded ports initialization.
Reviewed-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>