Clean up the errors as shown when 'let c_space_errors=1' is set in vim.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eliminate some padding in the structure by rearranging the members.
sizeof(struct ib_uverbs_event_file) is now 72 bytes (from 80) and
more members now fit in the first cacheline.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some large systems may support more than IB_UVERBS_MAX_DEVICES
(currently 32).
This change allows us to support more devices in a backwards-compatible
manner. The first IB_UVERBS_MAX_DEVICES keep the same major/minor
device numbers that they've always had.
If there are more than IB_UVERBS_MAX_DEVICES, we then dynamically
request a new major device number (new minors start at 0).
This change increases the maximum number of HCAs to 64 (from 32).
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This change is not useful by itself, but sets us up for a future change
that allows us to support more than IB_UVERBS_MAX_DEVICES in a system.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This change is not useful by itself, but it sets us up for a future
change that allows us to dynamically allocate device numbers in case
we have more than IB_UVERBS_MAX_DEVICES in the system.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
dev_table's raison d'etre was to associate an inode back to a struct
ib_uverbs_device.
However, now that we've converted ib_uverbs_device to contain an
embedded cdev (instead of a *cdev), we can use the container_of()
macro and cast back to the containing device.
There's no longer any need for dev_table, so get rid of it.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Instead of storing a pointer to a cdev, embed the entire struct cdev.
This change allows us to use the container_of() macro in
ib_uverbs_open() in a future patch.
This change increases the size of struct ib_uverbs_device to 168 bytes
across 3 cachelines from 80 bytes in 2 cachelines. However, we
rearrange the members so that everything fits into the first cacheline
except for the struct cdev. Finally, we don't touch the cdev in any
fastpaths, so this change shouldn't negatively affect performance.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently the iSER receive completion flow takes the session lock
twice. Optimize it to avoid the first one by letting
iser_task_rdma_finalize() be called only from the cleanup_task
callback invoked by iscsi_free_task, thus reducing the contention on
the session lock between the scsi command submission to the scsi
command completion flows.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
libiscsi passthrough mode invokes the transport xmit calls directly
without first going through an internal queue, unlike the other mode,
which uses a queue and a xmitworker thread. Now that the "cant_sleep"
prerequisite of iscsi_host_alloc is met, move to use it. Handling
xmit errors is now done by the passthrough flow of libiscsi. Since
the queue/worker aren't used in this mode, the code that schedules the
xmitworker is removed.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove unnecessary checks for the IB connection state and for QP
overflow, as conn state changes are reported by iSER to libiscsi and
handled there. QP overflow is theoretically possible only when
unsolicited data-outs are used; anyway it's being checked and handled
by HW drivers.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Two minor flows in iSER's data path still use allocations; move them
to be atomic as a preperation step towards moving to use libiscsi
passthrough mode.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Simplify and shrink the logic/code used for the send descriptors.
Changes include removing struct iser_dto (an unnecessary abstraction),
using struct iser_regd_buf only for handling SCSI commands, using
dma_sync instead of dma_map/unmap, etc.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use a different CQ for send completions, where send completions are
polled by the interrupt-driven receive completion handler. Therefore,
interrupts aren't used for the send CQ.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Now that both the posting and reaping of receive buffers is done in
the completion path, the counter of outstanding buffers not be atomic.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently, the recv buffer posting logic is based on the transactional
nature of iSER which allows for posting a buffer before sending a PDU.
Change this to post only when the number of outstanding recv buffers
is below a water mark and in a batched manner, thus simplifying and
optimizing the data path. Use a pre-allocated ring of recv buffers
instead of allocating from kmem cache. A special treatment is given
to the login response buffer whose size must be 8K unlike the size of
buffers used for any other purpose which is 128 bytes.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We will make a major change in the recv buffer posting logic, after
which the problem commit bba7ebb "avoid recv buffer exhaustion caused
by unexpected PDUs" comes to solve doesn't exist any more, so revert it.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change the nes driver to return -ENOMEM on SQ/RQ overflow to match the
return code of other RDMA HW drivers (e.g cxgb3, ehca, mlx4, mthca).
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Acked-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There is a double disconnect during AE processing, causing crashes.
While fixing the crash, also simplify the AE handling code.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a listener is destroyed and there is an MPA response pending for
loopback connection, the active side cm_node gets destroyed twice:
once in cm_event_connect_error() and again in nes_accept()/nes_reject().
Increment the cm_node's refcount so it's not destroyed by
cm_event_connect_error().
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
After running long iterative MPI tests, sometimes ethtool reports a
"CM Destroy Listener" count more than the "CM Create Listener" count.
This inconsistency is fixed by making counter variables atomic.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the caller does not pass a valid in_wc to process_mad(), return MAD
failure status, as it is not possible to generate a valid MAD redirect
response (and redirects are the only MAD responses ehca generates).
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Dunno, what was the idea, it wasn't used for a long time.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The max_dest_rd_atomic and max_qp_rd_atomic values are properly
returned by query_qp(), so there should not be an error returned when
they are queried.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The irq_spinlock is only taken in tasklet context, so it is safe not to
disable hardware interrupts.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
struct ib_qp already holds a pointer to the ib device. No need to dive to the
hw device object to retrieve it.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch replaces dev->mc_count in all drivers (hopefully I didn't miss
anything). Used spatch and did small tweaks and conding style changes when
it was suitable.
Jirka
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure compiler won't do weird things with limits by using the
rlimit helpers added in 3e10e716 ("resource: add helpers for fetching
rlimits"). E.g. fetching them twice may return 2 different values
after writable limits are implemented.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As of commit f56bcd8 ("IPoIB: Use separate CQ for UD send
completions"), there are no TX interrupts. Change the ethtool code
not to report TX moderation settings, so users will not be misled to
think they can control TX interrupt moderation. Pointed out by Alex
Vainman <alexv@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Failure to rearm a CQ means the cxgb3 device is wedged, but we shouldn't
kill the whole system with a BUG_ON() if this happens.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Revert the following change from commit 6f8372b6 ("RDMA/cm: fix
loopback address support")
The defined behavior of rdma_bind_addr is to associate an RDMA
device with an rdma_cm_id, as long as the user specified a non-
zero address. (ie they weren't just trying to reserve a port)
Currently, if the loopback address is passed to rdma_bind_addr,
no device is associated with the rdma_cm_id. Fix this.
It turns out that important apps such as Open MPI depend on
rdma_bind_addr() NOT associating any RDMA device when binding to a
loopback address. Open MPI is being updated to deal with this, but at
least until a new Open MPI release is available, maintain the previous
behavior: allow rdma_bind_addr() to succeed, but do not bind to a
device.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Use the %pM kernel extension to display the MAC address.
The only difference in the output is that the MAC address is
shown in the usual colon-separated hex notation.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct misspelled "CONFIG_IPv6" that was introduced in commit
d14714df ("IB/addr: Fix IPv6 routing lookup"). The config variable
should be all uppercase.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
[ This was my fault when I munged the original patch. - Roland ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In mlx4_ib_post_recv(), we should check the queue for overflow using
recv_cq instead of send_cq (current code looks like a copy-and-paste
mistake).
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As for memfree mthca hardware, ConnectX also requires SRQ WQE scatter
entries to be initialized with the invalid L_Key at SRQ creation time.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix the "ignoring return value of '...', declared with attribute
warn_unused_result" compiler warning in several users of the new kfifo
API.
It removes the __must_check attribute from kfifo_in() and
kfifo_in_locked() which must not necessary performed.
Fix the allocation bug in the nozomi driver file, by moving out the
kfifo_alloc from the interrupt handler into the probe function.
Fix the kfifo_out() and kfifo_out_locked() users to handle a unexpected
end of fifo.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rename kfifo_put... into kfifo_in... to prevent miss use of old non in
kernel-tree drivers
ditto for kfifo_get... -> kfifo_out...
Improve the prototypes of kfifo_in and kfifo_out to make the kerneldoc
annotations more readable.
Add mini "howto porting to the new API" in kfifo.h
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the pointer to the spinlock out of struct kfifo. Most users in
tree do not actually use a spinlock, so the few exceptions now have to
call kfifo_{get,put}_locked, which takes an extra argument to a
spinlock.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a new generic kernel FIFO implementation.
The current kernel fifo API is not very widely used, because it has to
many constrains. Only 17 files in the current 2.6.31-rc5 used it.
FIFO's are like list's a very basic thing and a kfifo API which handles
the most use case would save a lot of development time and memory
resources.
I think this are the reasons why kfifo is not in use:
- The API is to simple, important functions are missing
- A fifo can be only allocated dynamically
- There is a requirement of a spinlock whether you need it or not
- There is no support for data records inside a fifo
So I decided to extend the kfifo in a more generic way without blowing up
the API to much. The new API has the following benefits:
- Generic usage: For kernel internal use and/or device driver.
- Provide an API for the most use case.
- Slim API: The whole API provides 25 functions.
- Linux style habit.
- DECLARE_KFIFO, DEFINE_KFIFO and INIT_KFIFO Macros
- Direct copy_to_user from the fifo and copy_from_user into the fifo.
- The kfifo itself is an in place member of the using data structure, this save an
indirection access and does not waste the kernel allocator.
- Lockless access: if only one reader and one writer is active on the fifo,
which is the common use case, no additional locking is necessary.
- Remove spinlock - give the user the freedom of choice what kind of locking to use if
one is required.
- Ability to handle records. Three type of records are supported:
- Variable length records between 0-255 bytes, with a record size
field of 1 bytes.
- Variable length records between 0-65535 bytes, with a record size
field of 2 bytes.
- Fixed size records, which no record size field.
- Preserve memory resource.
- Performance!
- Easy to use!
This patch:
Since most users want to have the kfifo as part of another object,
reorganize the code to allow including struct kfifo in another data
structure. This requires changing the kfifo_alloc and kfifo_init
prototypes so that we pass an existing kfifo pointer into them. This
patch changes the implementation and all existing users.
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Always set bad_wr when an immediate error is detected. Return ENOMEM
for queue full instead of EINVAL to match other drivers.
Signed-off-by: Frank Zago <fzago@systemfabricworks.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the remote node's ethernet address changes, the connection keeps
trying to connect using the old address. The connection wil continue
failing until the driver is unloaded and loaded again (eiter reboot or
rmmod). Fix this by checking that the NIC has the correct address
before starting a connection.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A FIN that is received during an MPA start up sequence causes a
timeout in iwcm.c. The connection has not been completely closed so
the iwcm code is waiting for resources to be cleaned up. This closes
the connection so everything cleans up correctly.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We fail when creating many qps as kmap() fails for sq_vbase.
Fix this by doing kunmap() as soon as we are done with sq_vbase.
We do kunmap() in one of the locations below:
(1) nes_destroy_qp()
(2) nes_accept()
(3) nes_connect_event
We keep a flag to avoid multiple calls to kunmap().
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
STags are generated randomly but the driver does not correctly prevent
a zero STag. Using STag zero is privileged and causes a user space
application to fail. This change prevents the driver from trying to
allocate a zero STag.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
While running a Xansation test, an active side node crashed. The
problem started on the passive side, which generated an STtag that was
0. The passive side sent a TERMINATE instead of an MPA REJECT msg.
The active side, receives TERMINATE and sends connect_err() and set
the cm_node state to CLOSED. The passive side sends FIN + ACK after
TERMINATE. Active side ends up in handle_ack_pkt() and send_reset().
send_reset() consumes 1 cm_node's ref_count. Because the cm_node is
in CLOSED state, which means that cm_node will be destroyed after
completion of the connect_err() indication, CM will crash after
send_reset().
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the listener is destroyed for a loopback connection, the listener
node gets a reset event. This causes a crash as the listener is not
expecting a reset event. Code review of cm_event_reset() during
debugging showed the cm_id ref count is incremented after calling its
event handler and not before.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
While running IMP_EXT's window test, we saw a crash in nes_accept().
Here is the sequence of what happened:
(1) In MVAPICH2, connect request is received for port #0.
FIX: Add a nes_connect() check to make sure local or remote tcp port
is not 0.
(2) Remote node's (passive) TCP stack sends a reset when it gets a
connect request because of port = 0. Active side set the connect
error to IW_CM_EVENT_STATUS_REJECTED when it received the RST from
remote node.
FIX: The corect error code is -ECONNRESET.
(3) Wrong error code of IW_CM_EVENT_STATUS_REJECTED causes the core to
destroy its listener ports. Here there are connections that may
have sent an MPA request up and waiting for accept or reject. But
the listener and its cm_nodes have been freed already causing the
crash noticed.
FIX: The cm_node is freed only if its state is not
NES_CM_STATE_MPAREQ_RCVD. If cm_node's state is
NES_CM_STATE_MPAREQ_RCVD then its new state is set to
NES_CM_STATE_LISTENER_DESTROYED and it is not freed. When
nes_accept() or nes_reject() is received, its state is checked
for NES_CM_STATE_LISTENER_DESTROYED and in this case the cm_node
is freed and error is returned.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>