Return address in use, if some other kernel code has the SAP.
Propogate out error codes from netfilter registration and unwind.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Small optimizations of bridge forwarding path.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow mulitcast reception of datagrams (similar to UDP).
All sockets bound to the same SAP receive a clone.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is legal for an application to bind to a SAP that is also being
used by the kernel. This happens if the bridge module binds to the
STP SAP, and the user wants to have a daemon for STP as well.
It is possible to have kernel doing STP on one bridge, but
let application do RSTP on another bridge.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The receive hander pointer might be modified during network changes
of protocol. So use rcu_dereference (only matters on alpha).
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
LLC receive is broken for SOCK_DGRAM.
If an application does recv() on a datagram socket and there
is no data present, don't return "not connected". Instead, just
do normal datagram semantics.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
for_each_cpu() is going away (and is gone in -mm).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Locks down user pages and sets up for DMA in tcp_recvmsg, then calls
dma_async_try_early_copy in tcp_v4_do_rcv
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Any socket recv of less than this ammount will not be offloaded
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an extra argument to sk_eat_skb, and make it move early copied
packets to the async_wait_queue instead of freeing them.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds an async_wait_queue and some additional fields to tcp_sock, and a
dma_cookie_t to sk_buff.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provides for pinning user space pages in memory, copying to iovecs,
and copying from sk_buffs including fragmented and chained sk_buffs.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attempts to allocate per-CPU DMA channels
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provides an API for offloading memory copies to DMA devices
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Documentation/infiniband/core_locking.txt says:
All of the methods in struct ib_device exported by a low-level
driver must be fully reentrant. The low-level driver is required to
perform all synchronization necessary to maintain consistency, even
if multiple function calls using the same object are run
simultaneously.
However, mthca's modify_qp, modify_srq and resize_cq methods are
currently not reentrant. Add a mutex to the QP, SRQ and CQ structures
so that these calls can be properly serialized.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some error paths after the mthca_alloc_mailbox() call in mthca_modify_qp()
just do a "return -EINVAL" without freeing the mailbox. Convert these
returns to "goto out" to avoid leaking the mailbox storage.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Factor out common code for adding a userspace object to an idr into a
function idr_add_uobj(). This shrinks both the source and object code:
add/remove: 1/0 grow/shrink: 0/6 up/down: 57/-220 (-163)
function old new delta
idr_add_uobj - 57 +57
ib_uverbs_create_ah 543 512 -31
ib_uverbs_create_srq 662 630 -32
ib_uverbs_reg_mr 737 699 -38
ib_uverbs_create_cq 639 600 -39
ib_uverbs_alloc_pd 485 446 -39
ib_uverbs_create_qp 1020 979 -41
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In error paths when destroying an object, uverbs should not decrement
associated objects' usecnt, since ib_dereg_mr(), ib_destroy_qp(),
etc. already do that.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If ibdev->alloc_ucontext() fails then ib_uverbs_get_context() does not
unlock file->mutex before returning error.
Signed-off by: Ganapathi CH <cganapathi@novell.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use new ib_init_ah_from_wc() and ib_init_ah_from_path() helper
functions to clean up the IB CM.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a call to initialize address handle attributes given a path record.
This is used by the CM, and would be useful for users of UD QPs.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a function to initialize address handle attributes from a work
completion. This functionality is duplicated by both verbs and the CM.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The P_Key is provided into a SIDR REQ in two places, once as a
parameter, and again in the path record. Remove the P_Key as a
parameter and always use the one given in the path record.
This change has no practical effect on ABI functionality.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Misc cleanups in ib_srp:
1) I think that it is more efficient to move the req entries from req_list
to free_list in srp_reconnect_target (rather than rebuild the free_list).
(In any case this code is shorter).
2) This allows us to reuse code in srp_reset_device and srp_reconnect_target
and call a new function srp_reset_req.
Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There has been a change in the format of port identifiers between
revision 10 of the SRP specification and the current revision 16A.
Revision 10 specifies port identifier format as
lower 8 bytes : GUID upper 8 bytes : Extension
Whereas revision 16A specifies it as
lower 8 bytes : Extension upper 8 bytes : GUID
There are older targets (e.g. SilverStorm Virtual Fibre Channel
Bridge) which conform to revision 10 of the SRP specification.
The I/O class of revision 10 is 0xFF00 and the I/O class of revision
16A is 0x0100.
For supporting older targets, this patch:
1) Adds a new optional target creation parameter "io_class". Default
value of io_class is 0x0100 (i.e. revision 16A)
2) Uses the correct port identifier format for targets with IO class
of 0xFF00 (i.e. conforming to revision 10)
Signed-off-by: Ramachandra K <rkuchimanchi@silverstorm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add enum values for I/O Class values from rev. 10 and rev. 16a SRP
drafts. The values are used to detect targets that implement obsolete
revisions of SRP, so that the initiator can use the old format for
port identifier when connecting to them.
Signed-off-by: Ramachandra K <rkuchimanchi@silverstorm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When creating a FMR pool, query the IB device and use the returned
max_map_map_per_fmr attribute as for the max number of FMR remaps. If
the device does not suport querying this attribute, use the original
IB_FMR_MAX_REMAPS (32) default.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Report the true max_map_per_fmr value from mthca_query_device(),
taking into account the change in FMR remapping introduced by the
Sinai performance optimization.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change the mthca snoop of MADs that set PortInfo to check if the SM
has set the client reregister bit, and if it has, generate a client
reregister event. If the bit is not set, just generate a LID change
event as usual.
Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move ipath's struct port_info into <rdma/ib_smi.h>, so that it can be
used by mthca to implement client reregister support.
Remove the __attribute__((packed)) because all the members of the struct
are naturally aligned anyway.
Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Handle client reregister events by treating them just like LID or
SM changes -- flush all cached paths and rejoin multicast groups.
Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add IB_EVENT_CLIENT_REREGISTER to enum so low-level drivers can
generate "client reregister" events.
Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix misaligned access faults on ia64: never cast a misaligned
neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer
instead. The memcpy was being optimized to use full word accesses
because the compiler thought that union ib_gid is always aligned.
The cast in IPOIB_GID_ARG is safe, since it is fixed to access each
byte separately.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The comparisons of priv->tx_tail to ah->last_send in ipoib_free_ah()
and ipoib_post_receive() are slightly unsafe, because priv->tx_lock is
not held and hence a stale value of ah->last_send might be used, which
would lead to freeing an AH before the driver was really done with it.
The simple way to fix this is to the optimization of early free from
ipoib_free_ah() and unconditionally queue AHs for reaping, and then
take priv->tx_lock in __ipoib_reap_ah().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Check GID/LID for requester side when searching for request which
matches received response. This is in order to guarantee uniqueness
if the same TID is used when requesting via multiple source LIDs (when
LMC is not zero). Use ports' cached LMC to perform the check.
Further, do not perform LID check for direct-routed packets, since
the permissive LID makes a proper check impossible.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add an LMC cache to struct ib_device, and add a function
ib_get_cached_lmc() to query the cache.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
destroy_workqueue() already does flush_workqueue().
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
It's perfectly valid for a connection to an SRP target to have a
request limit of 0, so get rid of the message about it, which can spam
kernel logs even with printk_ratelimit(). Keep a count of such events
in a "zero_req_lim" SCSI host attribute instead, so someone who cares
can look at the statistics.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Handle IB_CM_DREQ_ERROR and IB_CM_DREQ_RECEIVED events from the CM,
instead of just printing "Unhandled CM event". In the case of
DREQ_ERROR, just ignore the event -- a TIMEWAIT_EXIT will be generated
also. For DREQ_RECEIVED, send a DREP in response to shut the
connection down cleanly.
Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make the sg_tablesize used by SRP adjustable at module load time via a
module parameter. Calculate the corresponding IU length required to
support this.
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Allow userspace to throttle traffic on a given connection to a target
port by adding "max_cmd_per_lun=xyz" to lower the cmd_per_lun value
set for that scsi_host.
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>