Finally this patch lets virtio_net receive GSO packets in addition
to sending them. This can definitely be optimised for the non-GSO
case. For comparison the Xen approach stores one page in each skb
and uses subsequent skb's pages to construct an SG skb instead of
preallocating the maximum amount of pages per skb.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added feature bits)
This patch adds some basic ethtool operations to virtio_net so
I could test SG without GSO (which was really useful because TSO
turned out to be buggy :)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (remove MTU setting)
On Mon, 2008-05-26 at 17:42 +1000, Rusty Russell wrote:
> If we fail to transmit a packet, we assume the queue is full and put
> the skb into last_xmit_skb. However, if more space frees up before we
> xmit it, we loop, and the result can be transmitting the same skb twice.
>
> Fix is simple: set skb to NULL if we've used it in some way, and check
> before sending.
...
> diff -r 564237b31993 drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c Mon May 19 12:22:00 2008 +1000
> +++ b/drivers/net/virtio_net.c Mon May 19 12:24:58 2008 +1000
> @@ -287,21 +287,25 @@ again:
> free_old_xmit_skbs(vi);
>
> /* If we has a buffer left over from last time, send it now. */
> - if (vi->last_xmit_skb) {
> + if (unlikely(vi->last_xmit_skb)) {
> if (xmit_skb(vi, vi->last_xmit_skb) != 0) {
> /* Drop this skb: we only queue one. */
> vi->dev->stats.tx_dropped++;
> kfree_skb(skb);
> + skb = NULL;
> goto stop_queue;
> }
> vi->last_xmit_skb = NULL;
With this, may drop an skb and then later in the function discover that
we could have sent it after all. Poor wee skb :)
How about the incremental patch below?
Cheers,
Mark.
Subject: [PATCH] virtio_net: Delay dropping tx skbs
Currently we drop the skb in start_xmit() if we have a
queued buffer and fail to transmit it.
However, if we delay dropping it until we've stopped the
queue and enabled the tx notification callback, then there
is a chance space might become available for it.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:
u64 val = PAGE_ALIGN(size);
always returns a value < 4GB even if size is greater than 4GB.
The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):
#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.
Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.
See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Although mv643xx_eth has no hardware support for inserting a vlan
tag by twiddling some bits in the TX descriptor, it does support
hardware TX checksumming on packets where the IP header starts {a
limited set of values other than 14} bytes into the packet.
This patch sets mv643xx_eth's ->vlan_features to NETIF_F_SG |
NETIF_F_IP_CSUM, which prevents the stack from checksumming vlan'ed
packets in software, and if vlan tags are present on a transmitted
packet, notifies the hardware of this fact by toggling the right
bits in the TX descriptor.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
When there is a link status change (link or phy status interrupt),
print a message notifying the user of the new link status.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The mv643xx_eth hardware has a provision for polling the PHY's
MII management registers to obtain the (R)(G)MII interface speed
(10/100/1000) and duplex (half/full) and pause (off/symmetric)
settings to use to talk to the PHY.
The driver currently does not make use of this feature. Instead,
whenever there is a link status change event, it reads the current
link parameters from the PHY, and programs those parameters into
the mv643xx_eth MAC by hand.
This patch switches the mv643xx_eth driver to letting the MAC
auto-determine the (R)(G)MII link parameters by PHY polling, if there
is a PHY present. For PHYless ports (when e.g. the (R)(G)MII
interface is connected to a hardware switch), we keep hardcoding the
MII interface parameters.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Instead of hardcoding MII register addresses and values, use the
symbolic constants defined in linux/mii.h.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The mv643xx_eth driver is limiting DMA bursts to 32 bytes, while
using the largest burst size (128 bytes) gives a couple percentage
points performance improvement in throughput tests, and the docs
say that the 128 byte default should not need to be changed, so
use 128 byte bursts instead.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The recommended sequence for waiting for the transmit path to clear
after disabling all of the transmit queues is to wait for the
TX_FIFO_EMPTY bit in the Port Status register to become set as well
as the TX_IN_PROGRESS bit to clear.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The maximum receive packet size field in the Port Serial Control
register controls at what size received packets are flagged
overlength in the receive descriptor, but it doesn't prevent
overlength packets from being DMAd to memory and signaled to the
host like other received packets.
mv643xx_eth does not support receiving jumbo frames in 10/100 mode,
but setting the packet threshold to larger than 1522 bytes in 10/100
mode won't cause breakage by itself.
If we really want to enforce maximum packet size on the receiving
end instead of on the sending end where it should be done, we can
always just add a length check to the software receive handler
instead of relying on the hardware to do the comparison for us.
What's more, changing the maximum packet size field requires
temporarily disabling the RX/TX paths. So once the link comes
up in 10/100 Mb/s mode or 1000 Mb/s mode, we'd have to disable it
again just to set the right maximum packet size field (1522 in
10/100 Mb/s mode or 9700 in 1000 Mb/s mode), just so that we can
offload one comparison operation to hardware that we might as well
do in software, assuming that we'd want to do it at all.
Contrary to what the documentation suggests, there is no harm in
just setting a 9700 byte maximum packet size in 10/100 mode, so use
the maximum maximum packet size for all modes.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The mv643xx_eth driver allows doing transmit reclaim from within the
napi poll routine, but after doing reclaim, it would forget to check
the free transmit descriptor count and wake up the transmit queue if
the reclaim caused enough descriptors for a new packet to become
available. This would cause the netdev watchdog to occasionally kick
in during certain workloads with combined receive and transmit traffic.
Fix this by adding a wakeup check identical to the one in the
interrupt handler to the napi poll routine.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
When the ethernet link goes down while mv643xx_eth is transmitting
data, transmit DMA can stop before all queued transmit descriptors
have been processed. But even the descriptors that _have_ been
processed might not be properly marked as done before the transmit
DMA unit shuts down.
Then when the link comes up again, the hardware transmit pointer
might have advanced while not all previous packet descriptors have
been marked as transmitted, causing software transmit reclaim to
hang waiting for the hardware to finish transmitting a descriptor
that it has already skipped.
This patch forcibly reclaims all packets on the transmit ring on a
link down interrupt, and then resyncs the hardware transmit pointer to
what the software's idea of the first free descriptor is. Also, we
need to prevent re-waking the transmit queue if we get a 'transmit
done' interrupt at the same time as a 'link down' interrupt, which
this patch does as well.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The previously merged TX hang erratum workaround ("mv643xx_eth:
work around TX hang hardware issue") assumes that TX_END interrupts
are delivered simultaneously with or after their corresponding TX
interrupts, but this is not always true in practise.
In particular, it appears that TX_END interrupts are issued as soon
as descriptor fetch returns an invalid descriptor, which may happen
before earlier descriptors have been fully transmitted and written
back to memory as being done.
This hardware behavior can lead to a situation where the current
driver code mistakenly assumes that the MAC has given up transmitting
before noticing the packets that it is in fact still currently working
on, causing the driver to re-kick the transmit queue, which will only
cause the MAC to re-fetch the invalid head descriptor, and generate
another TX_END interrupt, et cetera, until the packets in the pipe
finally finish transmitting and have their descriptors written back
to memory, which will then finally break the loop.
Fix this by having the erratum workaround not check the 'number of
unfinished descriptor', but instead, to compare the software's idea
of what the head descriptor pointer should be to the hardware's head
descriptor pointer (which is updated on the same conditions as the
TX_END interupt is generated on, i.e. possibly before all previous
descriptors have been transmitted and written back).
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Evgeniy Polyakov noticed that drivers/net/e1000e/netdev.c:e1000_netpoll()
was calling e1000_clean_tx_irq() without taking the TX lock.
David Miller suggested to remove the call altogether: since in this
callpah there's periodic calls to ->poll() anyway which will do
e1000_clean_tx_irq() and will garbage-collect any finished TX ring
descriptors.
This fix solved the e1000e+netconsole crashes i've been seeing:
=============================================================================
BUG skbuff_head_cache: Poison overwritten
-----------------------------------------------------------------------------
INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b
INFO: Allocated in __alloc_skb+0x2c/0x110 age=0 cpu=0 pid=5098
INFO: Freed in __kfree_skb+0x31/0x80 age=0 cpu=1 pid=4440
INFO: Slab 0xc16cc140 objects=16 used=1 fp=0xf658ae00 flags=0x400000c3
INFO: Object 0xf658ae00 @offset=3584 fp=0xf658af00
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an mlx4 device with default FW (which gives a UAR BAR size of 8 MB)
is used in a system with 64 KB pages, then there are only 8192/64==128
UAR pages available. However, the first 128 UAR pages are reserved
for use with event queue doorbells, so no UAR pages are available to
do anything else with, which means that the driver cannot work.
The current driver fails with a fairly cryptic "Failed to allocate
driver access region, aborting" message in this situation. Fix the
driver to detect the problem earlier and print out a clearer
description of the problem and a suggestion of how to fix it (use a
new firmware image).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for the following operations to mlx4 when device firmware
supports them:
- Send with invalidate and local invalidate send queue work requests;
- Allocate/free fast register MRs;
- Allocate/free fast register MR page lists;
- Fast register MR send queue work requests;
- Local DMA L_Key.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
They have never been used in this port of the driver. It is has only
ever been used on the ColdFire SoC ethernet core.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
This ifdefs are leftovers from the time as the driver was running
on a ppc.
Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
I found config FADS only in ppc/Kconfig. Bye bye relic.
Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
It is unnecessary, to stop queue and turn off carrier in shutdown
routine. With new netdev_queue this causes warnings.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_carrier_{on,off}() handles starting and stopping packet
flow into the driver. So there is no reason to invoke netif_stop_queue()
and netif_wake_queue() in response to link status events.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reworks the error handling in r6040_init_one
in order not to leak resources and correcly unmap and release
PCI regions of the MAC. Also prefix printk's with the driver name
for clarity.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch bumps the release of the r6040 driver. There has been
quite some versions of it out there, but this one is the one
people should report bugs against.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch allows the MAC to handle the RX FIFO full
and no descriptor available interrupts. While we are at it
replace the TX interrupt with its corresponding definition.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch changes the default waiting time of a packet, which
along with our previous r6040_rx path, was causing huge delays
with another host (160 to 230 ms).
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Define all the descriptor status the MAC can set.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch completely reworks the RX path in order to be
more accurate about what is going on with the MAC.
We no longer read the error from the MLSR register instead read
the descriptor status register which reflects, the error per descriptor.
We now allocate skbs on the fly in r6040_rx, and we handle allocation
failure instead of simply dropping the packet. Remove the
rx_free_desc counter of r6040_private structure since we
allocate skbs in the RX path.
r6040_rx_buf_alloc is now removed and becomes unuseless.
Signed-Off-By: Joerg Albert <jal2@gmx.de>
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
We did not call napi_disabled when putting down the interface
which should be done. Finally initialize lp->dev when everything
is set.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Joseph Fannin <jfannin@gmail.com> and Takashi Iwai <tiwai@suse.de>
noticed that commit 073a345c04
("mv643xx_eth: clarify irq masking and unmasking") broke the
mv643xx_eth build when NETPOLL is enabled, due to it not renaming
one instance of INT_CAUSE_EXT in mv643xx_eth_netpoll(). This patch
takes care of that instance as well.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: Dale Farnsworth <dale@farnsworth.org>
Cc: Joseph Fannin <jfannin@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Rework the RX buffers allocation function so that we do not
leak memory in the case we could not allocate skbs for the
RX path. Propagate the errors to the r6040_up function
where we call the RX buffers allocation function.
Also rename the r6040_alloc_txbufs function to
r6040_init_txbufs, to reflect what it really does.
Signed-Off-By: Joerg Albert <jal2@gmx.de>
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add a helper function which only modifies R6040 MAC registers
use it when we timeout, and on adapter initialization. Fix
the scheduling while atomic but in the timeout routine due
to the reallocation of rx/tx buffers.
Signed-Off-By: Joerg Albert <jal2@gmx.de>
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch fixes a null pointer access in r6040_rx due
to lp->dev not being initialized.
Fix the TX timeouts, TX irq was not re-enabled on RX irq
Signed-Off-By: Joerg Albert <jal2@gmx.de>
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Prefix all functions inside the r6040 driver with r6040 to
avoid namespace clashing.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch allows Windows Mobile 6 devices to be used for
tethering -- that is, used as modems. It was requested by
AdamW in kernel bugzilla:
http://bugzilla.kernel.org/show_bug.cgi?id=11119
and Mandriva kernel-discuss list. It is tested and confirmed
to work by Peterl:
http://forum.eeeuser.com/viewtopic.php?pid=323543#p323543
This patch is based on the patch in the above kernel bugzilla,
which is from the usb-rndis-lite tree.
[ dbrownell@users.sourceforge.net: misc fixes ]
Signed-off-by: Thomas Backlund <tmb@mandriva.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>