ConnectX can work more efficiently if the CPU cache line size is passed
to it with the INIT_HCA firmware command.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
catas_reset() uses pointer to mlx4_priv, but mlx4_priv is not valid
after call mlx4_restart_one().
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the mlx4 driver uses the same name for interrupts for every
device in the system. This can make it very confusing trying to work
out exactly which device MSI-X interrupts are for. Change the driver
to add the PCI name of the device to the interrupt name.
Signed-off-by: Arputham Benjamin <abenjamin@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On the error path of mlx4_init_hca(), mlx4_close_hca() is called,
followed by mlx4_free_icms() and mlx4_UNMAP_FA(). But both those
functions are also called from mlx4_close_hca(), which leads to a
double free.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current implementation allocates a single host page for EQ context
memory, which was OK when we only allocated a few EQs. However, since
we now allocate an EQ for each CPU core, this patch removes the
hard-coded limit (which we exceed with 4 KB pages and 128 byte EQ
context entries with 32 CPUs) and uses the same ICM table code as all
other context tables, which ends up simplifying the code quite a bit
while fixing the problem.
This problem was actually hit in practice on a dual-socket Nehalem box
with 16 real hardware threads and sufficiently odd ACPI tables that it
shows on boot
SMP: Allowing 32 CPUs, 16 hotplug CPUs
so num_possible_cpus() ends up 32, and mlx4 ends up creating 33 MSI-X
interrupts and 33 EQs. This mlx4 bug means that mlx4 can't even
initialize at all on this quite mainstream system.
Cc: <stable@kernel.org>
Reported-by: Eli Cohen <eli@mellanox.co.il>
Tested-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The old code used two calls to pci_request_region() to get the two BARs
for the mlx4 device, for no particularly good reason. Clean up the code
a little by converting this to a single call to pci_request_regions().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In a couple of cases collapse some extra code like:
int retval = NETDEV_TX_OK;
...
return retval;
into
return NETDEV_TX_OK;
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NETPOLL API requires that interrupts remain disabled in
netpoll_send_skb(). The use of "A functions set" in the NETPOLL API
callbacks causes the interrupts to get enabled and can lead to kernel
instability.
The solution is to use "B functions set" to prevent the irqs from
getting enabled while in netpoll_send_skb().
A functions set:
local_irq_disable()/local_irq_enable()
spin_lock_irq()/spin_unlock_irq()
spin_trylock_irq()/spin_unlock_irq()
B functions set:
local_irq_save()/local_irq_restore()
spin_lock_irqsave()/spin_unlock_irqrestore()
spin_trylock_irqsave()/spin_unlock_irqrestore()
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the length is less or equal to frag_prefix_size in the first iteration
we write skb_frags_rx[-1] and read from priv->frag_info[-1]
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We use 1:1 mapping between QPs and SRQs on receive side,
so additional indirection level not required. Allocated the receive
buffers for the RSS QPs.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point in using more QPs then actual number of receive rings.
If the RSS function for two streams gives the same result modulo number
of rings, they will arrive to the same RX ring anyway.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the net device is identified as "sender" (number of sent packets
is higher then the number of received packets and the incoming packets are
small), set the moderation time to its low limit.
We do it because the incoming packets are acks, and we don't want to delay them
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
In cases of fragmented skb, with the data pointers being wrapped around
the TX buffer, the completion handling code would not forward the data
pointer and the firs fragment was unmapped several times, while others
were not unmapped at all.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The values in the advertising field are typically ADVERTISED_xxx, not
SUPPORTED_xxx. Both SUPPORTED_10000baseT_Full and
ADVERTISED_1000baseT_Full have the same value.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
struct ethtool_cmd E;
@@
*E.advertising = SUPPORTED_10000baseT_Full
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
MT26468 (PCI ID 0x6764) devices can expose multiple physical
functions. The current driver only handles the primary physical
function. For other functions, the QUERY_FW firmware command will
fail with the CMD_STAT_MULTI_FUNC_REQ error code. Don't try to drive
such devices, but print a message saying the driver is skipping those
devices rather than just "QUERY_FW command failed."
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
[ Rather than keeping unsupported devices bound to the driver, simply
print a more informative error message and exit - Roland ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch converts the remaining occurences of raw return values to their
symbolic counterparts in ndo_start_xmit() functions that were missed by the
previous automatic conversion.
Additionally code that assumed the symbolic value of NETDEV_TX_OK to be zero
is changed to explicitly use NETDEV_TX_OK.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5d23a1d2 ("net: replace dma_sync_single with
dma_sync_single_for_cpu") replaced uses of the deprectated function
dma_sync_single() with calls to dma_sync_single_for_cpu(). However,
to be correct, the code should do a sync for_cpu() before touching the
memory and for_device() after it's done.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Our RX rings are always full, there is no need to check whether
we need to fill them or not. If we fail to allocate a new socket
buffer, the incoming packet is dropped an the ring remains full.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
This check that verifies that the LSO header along with control
segment and first data segment do not cross 128 bytes is no longer
required.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When closing the port, we stop all transmit queues under the transmit
lock. It ensures that we will not attempt to transmit new packets after
the physical port was closed.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
After we moved to be a multi queue device, need to stop/start
all of our transmit queues.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need this check in the transmit function
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reporting the counter's value through 'ethtool -S'
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When both MSI-X and legacy INTx fail to generate an interrupt, the
driver frees the MSI-X interrupts twice. Fix this by clearing the
have_irq flag for the MSI-X interrupts when they are freed the first
time. This is the same bug that was reported in ib_mthca by Yinghai
Lu <yhlu.kernel@gmail.com>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If mlx4_create_eq() would fail for one of EQ's assigned for
completion handling, the code would try to free the same EQ
we failed to create.
The crash was found by Christoph Lameter
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default the driver opens 8 TX queues (defined by MLX4_EN_NUM_TX_RINGS).
If the driver is configured to support Per Priority Flow Control, we open
8 additional TX rings.
dev->real_num_tx_queues is always set to be MLX4_EN_NUM_TX_RINGS.
The mlx4_en_select_queue() function uses standard hashing (skb_tx_hash)
in case that PPFC is not supported or the skb contain a vlan tag,
otherwise the queue is selected according to vlan priority.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interrupt moderation should not depend on number of incoming
bytes, but on number of incoming packets.
The previous scheme caused very high interrupts rate for small
messages when big MTU was configured.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the initialization of one of the ports failed,
there is no need to fail the other one as well.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
en_params.c file now only handles Ethtool functionality
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
For each debug message, the message will show interface name in case
that the net device was registered, and PCI bus ID with port number
if we were not registered yet. Messages that are not port/netdev specific
stayed in the old format
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the transmit queue gets full we enable interrupts for TX completions
There was a race that we handled the TX queue both from the interrupt context
and from the transmit function. Using "spin_trylock_irq()" ensures this
doesn't happen.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces dma_sync_single() with dma_sync_single_for_cpu() because
dma_sync_single() is an obsolete API; include/linux/dma-mapping.h says:
/* Backwards compat, remove in 2.7.x */
#define dma_sync_single dma_sync_single_for_cpu
#define dma_sync_sg dma_sync_sg_for_cpu
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commits 9d21493b4b
and 08baf56108
(net: tx scalability works : trans_start)
(net: txq_trans_update() helper)
Now that core network takes care of trans_start updates, dont do it
in drivers themselves, if possible. Multi queue drivers can
avoid one cache miss (on dev->trans_start) in their start_xmit()
handler.
Exceptions are NETIF_F_LLTX drivers (vxge & tehuti)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current MTT allocator uses kmalloc() to allocate a buffer for its
buddy allocator, and thus is limited in the amount of MTT segments
that it can control. As a result, the size of memory that can be
registered is limited too. This patch uses a module parameter to
control the number of MTT entries that each segment represents,
allowing more memory to be registered with the same number of
segments.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In case of allocation failure, the actual ring size is rounded down to
nearest power of 2. The remaining descriptors are freed.
The CQ and SRQ are allocated with the actual size and the mask is updated.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Napi structures are being created each time we open a port, but when
the port is closed the napi structure is only disabled but not removed.
This bug caused hang while removing the driver.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the return after the goto. We want the goto because it frees
memory as well as returning err.
Found by smatch (http://repo.or.cz/w/smatch.git).
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we failed to allocate new fragments for receive buffer,
the packet should be dropped and packets should be reused.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of mlx4_en_activate_cq() failure, the cleanup
code would go to rx_err and try to disable unactivated rings.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the msi_x option is enabled but pci_enable_msix() fails (not
enough vectors are available etc), the entries array was not freed on
the error path.
Signed-off-by: Nicolas Morey-Chaisemartin <nicolas.morey-chaisemartin@ext.bull.net>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If creating a workqueue fails, don't jump to the error path where that
same workqueue is destroyed, since destroy_workqueue() can't handle a
NULL pointer.
This was spotted by the Coverity checker (CID 2617).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The per ring counters are implemented in SW. Now moving to have the total
counters as the sum of all rings. This way the numbers will always be consistent
and we no longer depend on HW buffer size limitations for those counters
that can be insufficient in some cases.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>