Commit 38617c64 ("RDMA/addr: Add support for translating IPv6
addresses") broke the build when CONFIG_IPV6=n, because the ib_addr
module unconditionally attempted to call ipv6_chk_addr() and other
IPv6 functions that are not defined when IPv6 is disabled. Fix this
by only building IPv6 support if CONFIG_IPV6 is turned on, and
add a Kconfig dependency to prevent the ib_addr code from being built
in when IPv6 is built modular.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Handle AF_INET6 cases where required, and use struct sockaddr_storage
wherever an IPv6 address might be stored.
Signed-off-by: Aleksey Senin <aleksey@alst60.(none)>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for translating AF_INET6 addresses to the IB address
translation service. This requires using struct sockaddr_storage
instead of struct sockaddr wherever an IPv6 address might be stored,
and adding cases to handle IPv6 in addition to IPv4 to the various
translation functions.
Signed-off-by: Aleksey Senin <aleksey@alst60.(none)>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.
So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun's commit 7b595756ec made sysfs
attribute->owner unnecessary. But the field was left in the structure to
ease the merge. It's been over a year since that change and it is now
time to start killing attribute->owner along with its users - one arch at
a time!
This patch is attempt #1 to get rid of attribute->owner only for
CONFIG_X86_64 or CONFIG_X86_32 . We will deal with other arches later on
as and when possible - avr32 will be the next since that is something I
can test. Compile (make allyesconfig / make allmodconfig / custom config)
and boot tested.
akpm: the idea is that we put the declaration of sttribute.owner inside
`#ifndef CONFIG_X86'. But that proved to be too ambitious for now because
new usages kept on turning up in subsystem trees.
[akpm: remove the ifdef for now]
Signed-off-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that device_create() has been audited, rename things back to the
original call to be sane.
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use krealloc() instead of kmalloc() followed by memcpy() when resizing
the MAD module's snoop table.
Also put parentheses around the new table size to avoid calculating
the wrong size to allocate, which fixes a bug pointed out by Haven
Hash <haven.hash@isilon.com>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In case of error, the function ucma_alloc_multicast() returns a NULL
pointer, but never returns an ERR pointer. So after a call to this
function, an IS_ERR test should be replaced by a NULL test.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@match bad_is_err_test@
expression x, E;
@@
x = ucma_alloc_multicast(...)
... when != x = E
IS_ERR(x)
// </smpl>
Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
commit 110cf374 ("infiniband: make cm_device use a struct device and
not a kobject.") introduced a memory leak, since it deleted
cm_release_dev_obj(), which was where cm_dev was freed. Fix this by
freeing the leaked structure after calling device_unregister().
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This fixes the problem of incoming BMA responses being dropped due to
a bad "is response" check. Fix the test to use the ib_response_mad()
predicate, which correctly handles BMA MADs.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=988>.
Signed-off-by: Michael Brooks <michael.brooks@qlogic.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In case of error, the function ib_create_send_mad() returns an ERR
pointer, but never returns a NULL pointer. So testing the return
value for error should be done with IS_ERR, not by comparing with
NULL.
A simplified version of the semantic patch that makes this change is
as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@correct_null_test@
expression x,E;
statement S1, S2;
@@
x = ib_create_send_mad(...)
<... when != x = E
if (
(
- x@p2 != NULL
+ ! IS_ERR ( x )
|
- x@p2 == NULL
+ IS_ERR( x )
)
)
S1
else S2
...>
? x = E;
// </smpl>
Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There are a few places where the RDMA CM code handles IPv6 by doing
struct sockaddr addr;
u8 pad[sizeof(struct sockaddr_in6) -
sizeof(struct sockaddr)];
This is fragile and ugly; handle this in a better way with just
struct sockaddr_storage addr;
[ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
switch to struct sockaddr_storage and get rid of padding arrays in
struct rdma_addr. ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove IB_ACCESS_LOCAL_WRITE from qp.qp_access_flags because this
attribute is only used to set remote permissions.
Signed-off-by: Dotan Barak <dotanba@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If update_sm_ah() fails, it leaves the port's sm_ah as NULL. Then if
the device or module is removed, ib_sa_remove_one() will dereference a
NULL pointer when it calls kref_put(). Fix this by testing if sm_ah
is NULL before dropping the reference.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Consumers that want to re-use their QPs in new connections need to
know when the QP has exited the timewait state. Report the timewait
event through the rdma_cm.
Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm
consumers that wish to have their RDMA sessions always use the same
links (eg <hca/port>) as the IP stack does. In the current code, this
does not happen when bonding is used and fail-over happened but the IB
link used by an already existing session is operating fine.
Use the netevent notification for sensing that a change has happened
in the IP stack, then scan the rdma-cm ID list to see if there is an
ID that is "misaligned" with respect to the IP stack, and deliver
RDMA_CM_EVENT_ADDR_CHANGE for this ID. The consumer can act on the
event or just ignore it.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This object really should be a struct device, or at least contain a
pointer to a struct device, as it is trying to create a separate device
tree outside of the main device tree. This patch fixes this problem.
It is needed for the class core rework that is being done in the driver
core.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This pointer really is a struct ib_device, not a struct device, so name
it properly to help prevent confusion.
This makes the followon patch in this series much smaller and easier to
understand as well.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The RDMA CM has some logic in place to make sure that callbacks on a
given CM ID are delivered to the consumer in a serialized manner.
Specifically it has code to protect against a device removal racing
with a running callback function.
This patch simplifies this logic by using a mutex per ID instead of a
wait queue and atomic variable. This means that cma_disable_remove()
now is more properly named to cma_disable_callback(), and
cma_enable_remove() can now be removed because it just would become a
trivial wrapper around mutex_unlock().
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr(). Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.
In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds a sysfs attribute group called "proto_stats" under
/sys/class/infiniband/$device/ and populates this group with protocol
statistics if they exist for a given device. Currently, only iWARP
stats are defined, but the code is designed to allow InfiniBand
protocol stats if they become available. These stats are per-device
and more importantly -not- per port.
Details:
- Add union rdma_protocol_stats in ib_verbs.h. This union allows
defining transport-specific stats. Currently only iwarp stats are
defined.
- Add struct iw_protocol_stats to define the current set of iwarp
protocol stats.
- Add new ib_device method called get_proto_stats() to return protocol
statistics.
- Add logic in core/sysfs.c to create iwarp protocol stats attributes
if the device is an RNIC and has a get_proto_stats() method.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
I was reviewing the QP state transition diagram in the IB 1.2.1 spec
and the code for qp_state_table[], and noticed that the code allows a
QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
for figure 124 (pg 457) specifically says that this transition isn't
allowed. This is a clarification from earlier versions of the IB
spec, which were ambiguous in this area and suggested that the RESET
to ERR transition was allowed.
Fix up the qp_state_table[] to make RESET->ERR not allowed.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement). The new operations are:
- Allocate an ib_mr for use in fast register work requests.
- Allocate/free a physical buffer lists for use in fast register work
requests. This allows device drivers to allocate this memory as
needed for use in posting send requests (eg via dma_alloc_coherent).
- New send queue work requests:
* send with remote invalidate
* fast register memory region
* local invalidate memory region
* RDMA read with invalidate local memory region (iWARP only)
Consumer interface details:
- A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
to indicate device support for these features.
- New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
IB_WR_RDMA_READ_WITH_INV are added.
- A new consumer API function, ib_alloc_mr() is added to allocate
fast register memory regions.
- New consumer API functions, ib_alloc_fast_reg_page_list() and
ib_free_fast_reg_page_list() are added to allocate and free
device-specific memory for fast registration page lists.
- A new consumer API function, ib_update_fast_reg_key(), is added to
allow the key portion of the R_Key and L_Key of a fast registration
MR to be updated. Consumers call this if desired before posting
a IB_WR_FAST_REG_MR work request.
Consumers can use this as follows:
- MR is allocated with ib_alloc_mr().
- Page list memory is allocated with ib_alloc_fast_reg_page_list().
- MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().
- MR made VALID and bound to a specific page list via
ib_post_send(IB_WR_FAST_REG_MR)
- MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
invalidate operation.
- MR is deallocated with ib_dereg_mr()
- page lists dealloced via ib_free_fast_reg_page_list().
Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ). For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.
The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key. The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch solves a race that occurs after an event occurs that causes
the SA query module to flush its SM address handle (AH). When SM AH
becomes invalid and needs an update it is handled by the global
workqueue. On the other hand this event is also handled in the IPoIB
driver by queuing work in the ipoib_workqueue that does multicast
joins. Although queuing is in the right order, it is done to 2
different workqueues and so there is no guarantee that the first to be
queued is the first to be executed.
This causes a problem because IPoIB may end up sending an request to
the old SM, which will take a long time to time out (since the old SM
is gone); this leads to a much longer than necessary interruption in
multicast traffer.
The patch sets the SA query module's SM AH to NULL when the event
occurs, and until update_sm_ah() is done, any request that needs sm_ah
fails with -EAGAIN return status.
For consumers, the patch doesn't make things worse. Before the patch,
MADs are sent to the wrong SM so the request gets lost. Consumers can
be improved if they examine the return code and respond to EAGAIN
properly but even without an improvement the situation is not getting
worse.
Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The license text for several files references a third software license
that was inadvertently copied in. Update the license to what was
intended. This update was based on a request from HP.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove explicit lock_kernel() calls and document why the code is safe.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Remove explicit lock_kernel() calls and document why the code is safe.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
All of the open() functions which don't need the BKL on their face may
still depend on its acquisition to serialize opens against driver
initialization. So make those functions acquire then release the BKL to be
on the safe side.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This documents the fact that somebody looked at the relevant open()
functions and concluded that, due to their trivial nature, no locking was
needed.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event
files") changed the way that closed files are handled in the uverbs
code. However, after the conversion, is_closed flag is checked
incorrectly in ib_uverbs_async_handler(). As a result, no async
events are ever passed to applications.
Found by: Ronni Zimmerman <ronniz@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On a 64-bit architecture, if ib_umem_get() is called with a size value
that is so big that npages is negative when cast to int, then the
length of the page list passed to get_user_pages(), namely
min_t(int, npages, PAGE_SIZE / sizeof (struct page *))
will be negative, and get_user_pages() will immediately return 0 (at
least since 900cf086, "Be more robust about bad arguments in
get_user_pages()"). This leads to an infinite loop in ib_umem_get(),
since the code boils down to:
while (npages) {
ret = get_user_pages(...);
npages -= ret;
}
Fix this by taking the minimum as unsigned longs, so that the value of
npages is never truncated.
The impact of this bug isn't too severe, since the value of npages is
checked against RLIMIT_MEMLOCK, so a process would need to have an
astronomical limit or have CAP_IPC_LOCK to be able to trigger this,
and such a process could already cause lots of mischief. But it does
let buggy userspace code cause a kernel lock-up; for example I hit
this with code that passes a negative value into a memory registartion
function where it is promoted to a huge u64 value.
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a low-level driver returns IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED,
handle_outgoing_dr_smp() doesn't clean up properly. The fix is to
kfree the local data and break, rather than falling through. This was
observed with the ipath driver, but could happen with any driver.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1027>.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There is a race from when a device is created with device_create() and
then the drvdata is set with a call to dev_set_drvdata() in which a
sysfs file could be open, yet the drvdata will be NULL, causing all
sorts of bad things to happen.
This patch fixes the problem by using the new function,
device_create_drvdata().
Cc: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.
Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add support for modifying CQ parameters for controlling event
generation moderation.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions. Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated. Add this new union to struct
ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().
Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.
Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit. The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing). Remove the flag from
all existing drivers that set it until we know which ones are OK.
The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.
This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make sure that a device implements the modify_srq and reg_phys_mr
optional methods before calling them.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a create_flags member to struct ib_qp_init_attr that will allow a
kernel verbs consumer to create a pass special flags when creating a QP.
Add a flag value for telling low-level drivers that a QP will be used
for IPoIB UD LSO. The create_flags member will also be useful for XRC
and ehca low-latency QP support.
Since no create_flags handling is implemented yet, add code to all
low-level drivers to return -EINVAL if create_flags is non-zero.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>