This patch implements receive flow steering (RFS). RFS steers
received packets for layer 3 and 4 processing to the CPU where
the application for the corresponding flow is running. RFS is an
extension of Receive Packet Steering (RPS).
The basic idea of RFS is that when an application calls recvmsg
(or sendmsg) the application's running CPU is stored in a hash
table that is indexed by the connection's rxhash which is stored in
the socket structure. The rxhash is passed in skb's received on
the connection from netif_receive_skb. For each received packet,
the associated rxhash is used to look up the CPU in the hash table,
if a valid CPU is set then the packet is steered to that CPU using
the RPS mechanisms.
The convolution of the simple approach is that it would potentially
allow OOO packets. If threads are thrashing around CPUs or multiple
threads are trying to read from the same sockets, a quickly changing
CPU value in the hash table could cause rampant OOO packets--
we consider this a non-starter.
To avoid OOO packets, this solution implements two types of hash
tables: rps_sock_flow_table and rps_dev_flow_table.
rps_sock_table is a global hash table. Each entry is just a CPU
number and it is populated in recvmsg and sendmsg as described above.
This table contains the "desired" CPUs for flows.
rps_dev_flow_table is specific to each device queue. Each entry
contains a CPU and a tail queue counter. The CPU is the "current"
CPU for a matching flow. The tail queue counter holds the value
of a tail queue counter for the associated CPU's backlog queue at
the time of last enqueue for a flow matching the entry.
Each backlog queue has a queue head counter which is incremented
on dequeue, and so a queue tail counter is computed as queue head
count + queue length. When a packet is enqueued on a backlog queue,
the current value of the queue tail counter is saved in the hash
entry of the rps_dev_flow_table.
And now the trick: when selecting the CPU for RPS (get_rps_cpu)
the rps_sock_flow table and the rps_dev_flow table for the RX queue
are consulted. When the desired CPU for the flow (found in the
rps_sock_flow table) does not match the current CPU (found in the
rps_dev_flow table), the current CPU is changed to the desired CPU
if one of the following is true:
- The current CPU is unset (equal to RPS_NO_CPU)
- Current CPU is offline
- The current CPU's queue head counter >= queue tail counter in the
rps_dev_flow table. This checks if the queue tail has advanced
beyond the last packet that was enqueued using this table entry.
This guarantees that all packets queued using this entry have been
dequeued, thus preserving in order delivery.
Making each queue have its own rps_dev_flow table has two advantages:
1) the tail queue counters will be written on each receive, so
keeping the table local to interrupting CPU s good for locality. 2)
this allows lockless access to the table-- the CPU number and queue
tail counter need to be accessed together under mutual exclusion
from netif_receive_skb, we assume that this is only called from
device napi_poll which is non-reentrant.
This patch implements RFS for TCP and connected UDP sockets.
It should be usable for other flow oriented protocols.
There are two configuration parameters for RFS. The
"rps_flow_entries" kernel init parameter sets the number of
entries in the rps_sock_flow_table, the per rxqueue sysfs entry
"rps_flow_cnt" contains the number of entries in the rps_dev_flow
table for the rxqueue. Both are rounded to power of two.
The obvious benefit of RFS (over just RPS) is that it achieves
CPU locality between the receive processing for a flow and the
applications processing; this can result in increased performance
(higher pps, lower latency).
The benefits of RFS are dependent on cache hierarchy, application
load, and other factors. On simple benchmarks, we don't necessarily
see improvement and sometimes see degradation. However, for more
complex benchmarks and for applications where cache pressure is
much higher this technique seems to perform very well.
Below are some benchmark results which show the potential benfit of
this patch. The netperf test has 500 instances of netperf TCP_RR
test with 1 byte req. and resp. The RPC test is an request/response
test similar in structure to netperf RR test ith 100 threads on
each host, but does more work in userspace that netperf.
e1000e on 8 core Intel
No RFS or RPS 104K tps at 30% CPU
No RFS (best RPS config): 290K tps at 63% CPU
RFS 303K tps at 61% CPU
RPC test tps CPU% 50/90/99% usec latency Latency StdDev
No RFS/RPS 103K 48% 757/900/3185 4472.35
RPS only: 174K 73% 415/993/2468 491.66
RFS 223K 73% 379/651/1382 315.61
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The af_packet protocol is used by Perl to do ioctls as reported by
Stephane Riviere:
"Net::RawIP relies on SIOCGIFADDR et SIOCGIFHWADDR to get the IP and MAC
addresses of the network interface."
But in a new network namespace these ioctl fail because it is disabled for
a namespace different from the init_net_ns.
These two lines should not be there as af_inet and af_packet are
namespace aware since a long time now. I suppose we forget to remove these
lines because we sent the af_packet first, before af_inet was supported.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Reported-by: Stephane Riviere <stephane.riviere@regis-dgac.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
tx_queue is used as a temporary queue when not allowed to queue skb
directly to the hw device driver (which may sleep). Most paths flush
it before returning, but ppp_start() currently cannot. Make sure we
don't leave skbs pointing to a non-existent device.
Thanks to Michael Barkowski for reporting this problem.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some future hardware will also require some antenna
overrides so make the current logic more generic;
right now it is semantically based on a workaround
for off-channel reception but the reasons for the
new antenna overrides will be different.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Off-channel reception is acceptable in monitor
mode, and checking for monitor mode this way is
not really correct anyway since it could be the
case while operating.
Now iwl_is_monitor_mode() is no longer used so
remove it completely.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Monitor mode operation need not (and probably should
not) affect scanning this way since real monitoring
can not properly happen while scanning anyway.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
The flag name is a little misleading, this
flag instructs the device to ignore bluetooth
messages for purposes of frame transmissions,
so rename the flag to TX_CMD_FLG_IGNORE_BT.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Some future hardware will require a different command to
be sent for bluetooth coexist, so make this a virtual
method that can be changed on a per-device basis.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Since multiple new devices having similar uCode architecture and use same
registers address, remove more reference to 5000 series to eliminate the
confusion.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Perform sanity check for turn on aggregation tid. Also remove the
option for turn on all the aggregation tids at once since it is
deprecated function and not being used.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
For 6000g2 series of NICs, PA type is determined by uCode, driver do not
have to set the register for internal/external PA. It is a workaround
just for 6000 series NICs.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Add hardware revision for 6000g2 series of NIC
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
This function is called twice in a row, remove the second one.
Signed-off-by: Shanyu Zhao <shanyu.zhao@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Use the show uCode statistics function for uCode debugging purposes only; it
is being duplicated in both debugfs and sysfs. remove the one from sysfs.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
This patch is to bring up 6000 Series 2x2 AGN Gen2 adapters.
Seperate various version numbers from 6000 Series definitions;
Add module firmware declaration for the new adapters;
Add additional device IDs and subsystem IDs;
Signed-off-by: Shanyu Zhao <shanyu.zhao@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Some definition for eeprom apply to more than 5000 series device, change
the name to reflect it for easy reading.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Scan requesting doesn't need to be asynchronous
since all code paths leading up to it can sleep.
Make the scan request a new util operation that
is hw-specific (to account for 3945 vs. agn)
and call it right in place.
This patch moves a lot of code into iwlagn as
it need not be in iwlcore.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
I keep checking what "priv->scan" is, so rename
it to "priv->scan_cmd" which more clearly tells
us what it is.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Since we no longer do a multi-pass scan,
keeping track of how long each pass took
is pointless since there will only be one.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
This logic is just confusing, if anything it
belongs into mac80211. Also, even if we do
scan during the EAPOL handshake, that will
not cause any problems, just a short delay.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
With atomic bitops, test_and_{set,clear}_bit
should be used instead of separate test_bit
and set_bit/clear_bit.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Since mac80211 will now never request scanning
multiple bands, we can remove all the associated
logic and scan a single band only in each scan.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Even the initial single/dual stream values will be overridden later when
issue link quality command; but still make sense not to use hard-code
value during initialization. Single/Dual stream mask are used to indicate the
best antenna for SISO/MIMO; different NIC has different tx antenna
configuration; so the parameter need to based on the valid tx antenna.
1x2 device: single tx antenna available, only SISO is valid
configuration, but still need to set up MIMO configuration, so set it up
with antenna A & B as default.
2x2 device: two tx antenna available, dual_stream will use both valid
antenna.
3x3 device: three tx antenna available, skip the first antenna and
choice the second and third antenna for dual_stream.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
For 6000 series, the 2.4G HT40 band regulatory settings address in EEPROM
was off by 2.
Before the fix, you'll see this in dmesg:
[79535.788877] ieee80211 phy8: U iwl_mod_ht40_chan_info HT40 Ch. 7 [2.4GHz]
WIDE (0x61 0dBm): Ad-Hoc not supported
[79535.788880] ieee80211 phy8: U iwl_mod_ht40_chan_info HT40 Ch. 11 [2.4GHz]
WIDE (0x61 0dBm): Ad-Hoc not supported
And after the fix:
[91132.688706] ieee80211 phy14: U iwl_mod_ht40_chan_info HT40 Ch. 7 [2.4GHz]
IBSS ACTIVE WIDE (0x6f 0dBm): Ad-Hoc supported
[91132.688709] ieee80211 phy14: U iwl_mod_ht40_chan_info HT40 Ch. 11 [2.4GHz]
IBSS ACTIVE WIDE (0x6f 0dBm): Ad-Hoc supported
Signed-off-by: Shanyu Zhao <shanyu.zhao@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
When an internal scan is started, nothing protects the
is_internal_short_scan variable which can cause crashes,
cf. https://bugzilla.kernel.org/show_bug.cgi?id=15667.
Fix this by making the short scan request use the mutex
for locking, which requires making the request go to a
work struct so that it can sleep.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
iwl_wimax_coex_cmd.flags can be really uninitialized, so fix
that.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In mac80211 we always check both scan_req->ie and scan_req->ie_len
against zero before usage, in iwlwifi we should do the same.
Remove not needed "left -= ie_len" while at it.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
wl1251 has WLAN_IRQ pin for generating interrupts to host processor,
which is mandatory in SPI mode and optional in SDIO mode (which can
use SDIO interrupts instead). However TI recommends using deditated
IRQ line for SDIO too.
Add support for using dedicated interrupt line with SDIO, but also leave
ability to switch to SDIO interrupts in case it's needed.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes a bunch of endian issues that
were exposed by sparse. It's a miracle that the driver
worked at all till now.
The Lord be praised.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If a WMI command has timed out for some reason,
a late WMI response would end up updating the
response region of a new WMI request that has been
issued in the meantime.
Fix this race condition by dropping a WMI response
if a new WMI command has been issued.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There is no point in trying to set the LED pin
when the module is being unloaded. The target
would be reset anyway.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds macros at certain places
which could be optimized for multiple register writes.
The performance of ath9k_htc improves considerably,
especially reducing the latency involved in a scan run.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Programming the opmode in the HW can be done
before the assoc_id and STA_ID registers are
setup. This helps ath9k_htc when multiple register
writes are used.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds support for writing multiple registers
in a single USB command.
Specific calls from the HW code that performs multiple
register writes would be modified to make use of this
in subsequent patches.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is required to implement delayed/buffered
register writes in ath9k_htc.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch cleans up beacon configuration,
removing a redundant interface type check
and updating beacon interval in the correct place.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add -D__CHECK_ENDIAN__ to driver ccflags so that sparse will
always check endianness by default.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
According to tests, both TSF lower and upper registers kept counting, so
the higher part could have been updated after the lower part has been
read, as shown in the following log where the upper part is read first
and the lower part next.
tsf = {00000003-fffffffd}
tsf = {00000003-00000001}
tsf = {00000004-0000000b}
This patch corrects this by checking that the upper part has not been
changed while the lower part was read. It has been tested in an IBSS
network where artifical IBSS merges have been done in order to trigger
hundreds of rollover for the TSF lower part.
It follows the logic mentionned by Derek, with only 2 register reads
needed at each additional steps instead of 3 (the minimum number of
register reads is still 3).
Signed-off-by: Benoit Papillault <benoit.papillault@free.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The first AR9003 hardware family device supported is the
AR9300, which has the vendor:device id 168c:0030
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
LDPC is enabled by the rate control if the its determined
that the target peer supports LDPC. We would have already
intersected the HT capabilities so if our peer supports
LDPC so do we.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
LDPC will be enabled through the rate control algorithm
for each buffer the the tx_info flags.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>