Introduce a new API to register RPC services on IPv6 interfaces to allow
the NFS server and lockd to advertise on IPv6 networks.
Unlike rpcb_register(), the new rpcb_v4_register() function uses rpcbind
protocol version 4 to contact the local rpcbind daemon. The version 4
SET/UNSET procedures allow services to register address families besides
AF_INET, register at specific network interfaces, and register transport
protocols besides UDP and TCP. All of this functionality is exposed via
the new rpcb_v4_register() kernel API.
A user-space rpcbind daemon implementation that supports version 4 of the
rpcbind protocol is required in order to make use of this new API.
Note that rpcbind version 3 is sufficient to support the new rpcbind
facilities listed above, but most extant implementations use version 4.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Try to make the comment here a little more clear and concise.
Also, this macro definition seems unnecessary.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The cl_chatty flag alows us to control whether a given rpc client leaves
"server X not responding, timed out"
messages in the syslog. Such messages make sense for ordinary nfs
clients (where an unresponsive server means applications on the
mountpoint are probably hanging), but not for the callback client (which
can fail more commonly, with the only result just of disabling some
optimizations).
Previously cl_chatty was removed, do to lack of users; reinstate it, and
use it for the nfsd's callback client.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change the WR context pool to be shared across mount points. This
reduces the RDMA transport memory footprint significantly since
idle mounts don't consume WR context memory.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Add a dma map count in order to verify that all DMA mapping resources
have been freed when the transport is closed.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Modify the RDMA_READ processing to use the reply and chunk list mapping data
types. Also add a special purpose 'hdr_count' field in in the context to hold
the header page count instead of overloading the SGE length field and
corrupting the DMA map length.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Create a new data structure to hold the remote client address space
to local server address space mapping.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
cleanup:
Document token header size with a #define instead of open-coding it.
Don't needlessly increment "ptr" past the beginning of the header
which makes the values passed to functions more understandable and
eliminates the need for extra "krb5_hdr" pointer.
Clean up some intersecting white-space issues flagged by checkpatch.pl.
This leaves the checksum length hard-coded at 8 for DES. A later patch
cleans that up.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Since we no longer make any distinction between shutdown signals with
nfsd, then it becomes easier to just standardize on a particular signal
to use to bring it down (SIGINT, in this case).
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:
- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call kthread_create directly
- eliminate __svc_create_thread
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The svc_rdma_send_error function is called when an RPCRDMA protocol
error is detected. This function attempts to post an error reply message.
Since an error posting to a transport in error is ignored, change
the return type to void.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Some providers may wait while destroying adapter resources.
Since it is possible that the last reference is put on the
dto_tasklet, the actual destroy must be scheduled as a work item.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Replace the one-off linked list implementation used to implement the
context cache with the standard Linux list_head lists. Add a context
counter to catch resource leaks. A WARN_ON will be added later to
ensure that we've freed all contexts.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
An NFS_WRITE requires a set of RDMA_READ requests to fetch the write
data from the client. There are two principal pieces of data that
need to be tracked: the list of pages that comprise the completed RPC
and the SGE of dma mapped pages to refer to this list of pages. Previously
this whole bit was managed as a linked list of contexts with the
context containing the page list buried in this list. This patch
simplifies this processing by not keeping a linked list, but rather only
a pionter from the last submitted RDMA_READ's context to the context
that maps the set of pages that describe the RPC. This significantly
simplifies this code path. SGE contexts are cleaned up inline in the DTO
path instead of at read completion time.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Clean up: Update the RPC server's TCP record marker decoder to match the
constructs used by the RPC client's TCP socket transport.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that the nfs4 callback thread uses the kthread API, there are no
more users of svc_create_thread(). Remove it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Consistently use unsigned (u32 vs. s32) for seqnum.
In get_mic function, send the local copy of seq_send,
rather than the context version.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
cleanup: When adding new encryption types, the checksum length
can be different for each enctype. Face the fact that the
current code only supports DES which has a checksum length of 8.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This field is set once and never used; probably some artifact of an
earlier implementation idea.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This adds IPv6 support to the interfaces that are used to express nfsd
exports. All addressed are stored internally as IPv6; backwards
compatibility is maintained using mapped addresses.
Thanks to Bruce Fields, Brian Haley, Neil Brown and Hideaki Joshifuji
for comments
Signed-off-by: Aurelien Charbon <aurelien.charbon@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Brian Haley <brian.haley@hp.com>
Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
NFSv4 requires us to ensure that we break the TCP connection before we're
allowed to retransmit a request. However in the case where we're
retransmitting several requests that have been sent on the same
connection, we need to ensure that we don't interfere with the attempt to
reconnect and/or break the connection again once it has been established.
We therefore introduce a 'connection' cookie that is bumped every time a
connection is broken. This allows requests to track if they need to force a
disconnection.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We need to try to ensure that we always use the same credentials whenever
we re-establish the clientid on the server. If not, the server won't
recognise that we're the same client, and so may not allow us to recover
state.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
With the recent change to generic creds, we can no longer use
cred->cr_ops->cr_name to distinguish between RPCSEC_GSS principals and
AUTH_SYS/AUTH_NULL identities. Replace it with the rpc_authops->au_name
instead...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether
or not we have someone waiting for buffer memory. Convert the SUNRPC layer
to use the same idiom.
Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In
fact, the most common case will be that there is nobody waiting for buffer
space.
SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was
limited by the application window. Ensure that we follow the same idiom as
the rest of the networking layer here too.
Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that
write_space() doesn't keep waking things up on xprt->pending.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We need the ability to treat 'generic' creds specially, since they want to
bind instances of the auth cred instead of binding themselves.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add an rpc credential that is not tied to any particular auth mechanism,
but that can be cached by NFS, and later used to look up a cred for
whichever auth mechanism that turns out to be valid when the RPC call is
being made.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The current RPCAUTH_LOOKUP_ROOTCREDS flag only works for AUTH_SYS
authentication, and then only as a special case in the code. This patch
removes the auth_sys special casing, and replaces it with generic code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that we've tightened up the locking rules for RPC queue wakeups, we can
remove the RCU-safe kfree calls...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This is designed to replace the timeout timer in the individual rpc_tasks.
By putting the timer function in the wait queue, we will eventually be able
to reduce the total number of timers in use by the RPC subsystem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
An audit of the current RPC timeout functions shows that they don't really
ever need to run in the softirq context. As long as the softirq is
able to signal that the wakeup is due to a timeout (which it can do by
setting task->tk_status to -ETIMEDOUT) then the callback functions can just
run as standard task->tk_callback functions (in the rpciod/process
context).
The only possible border-line case would be xprt_timer() for the case of
UDP, when the callback is used to reduce the size of the transport
congestion window. In testing, however, the effect of moving that update
to a callback would appear to be minor.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In all cases where we currently use rpc_wake_up_task(), we almost always
know on which waitqueue the rpc_task is actually sleeping. This will allows
us to simplify the queue locking in a future patch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A lot of the work done by the rpc_release() callback is inappropriate for
rpciod as it will often involve things like starting a new rpc call in
order to clean up state after an interrupted NFSv4 open() call, or
calls to mntput(), etc.
This patch allows the caller of rpc_run_task() to specify that the
rpc_release callback should run on a different workqueue than the default
rpciod_workqueue.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This is a void function attempting to return the return value from
another void function, which seems harmless but extremely weird, and
apparently makes some compilers complain.
While we're there, clean up a little (e.g. the switch statement had a
minor style problem and seemed overkill as long as there's only one
case).
Thanks to Trond for noticing this.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the initialzation in __svc_create_thread that happens prior to
thread creation to a new function. Export the function to allow
services to have better control over the svc_rqst structs.
Also rearrange the rqstp initialization to prevent NULL pointer
dereferences in svc_exit_thread in case allocations fail.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This file defines the data types used by the SVCRDMA transport module.
The principle data structure is the transport specific extension to
the svcxprt structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Create a transport independent version of the svc_sock_names function.
The toclose capability of the svc_sock_names service can be implemented
using the svc_xprt_find and svc_xprt_close services.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add a new svc function that allows a service to query whether a
transport instance has already been created. This is used in lockd
to determine whether or not a transport needs to be created when
a lockd instance is brought up.
Specifying 0 for the address family or port is effectively a wild-card,
and will result in matching the first transport in the service's list
that has a matching class name.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add a file that when read lists the set of registered svc
transports.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Some transports have a header in front of the RPC header. The current
defer/revisit processing considers only the iov_len and arg_len to
determine how much to back up when saving the original request
to revisit. Add a field to the rqstp structure to save the size
of the transport header so svc_defer can correctly compute
the start of a request.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally trivial patch moves all of the transport independent
functions from the svcsock.c file to the transport independent svc_xprt.c
file.
In addition the following formatting changes were made:
- White space cleanup
- Function signatures on single line
- The inline directive was removed
- Lines over 80 columns were reformatted
- The term 'socket' was changed to 'transport' in comments
- The SMP comment was moved and updated.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>