The check for multicast shouldn't exclude broadcast type addresses.
This reverts the incorrect change done in 2.6.13.
The broadcast address is a multicast address and should be excluded
from being a valid_ether_address for use in bridging or device address.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Select a block size for IO based on the read and write block size
combinations, and whether the card supports partial block reads
and/or partial block writes.
If we are able to satisfy block reads but not block writes, mark
the device read only.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
From: Benjamin LaHaise <bcrl@kvack.org>
In __alloc_skb(), the use of skb_shinfo() which casts a u8 * to the
shared info structure results in gcc being forced to do a reload of the
pointer since it has no information on possible aliasing. Fix this by
using a pointer to refer to skb_shared_info.
By initializing skb_shared_info sequentially, the write combining buffers
can reduce the number of memory transactions to a single write. Reorder
the initialization in __alloc_skb() to match the structure definition.
There is also an alignment issue on 64 bit systems with skb_shared_info
by converting nr_frags to a short everything packs up nicely.
Also, pass the slab cache pointer according to the fclone flag instead
of using two almost identical function calls.
This raises bw_unix performance up to a peak of 707KB/s when combined
with the spinlock patch. It should help other networking protocols, too.
Signed-off-by: David S. Miller <davem@davemloft.net>
And actually, with this, the whole pppox layer can basically
be removed and subsumed into pppoe.c, no other pppox sub-protocol
implementation exists and we've had this thing for at least 4
years.
Signed-off-by: David S. Miller <davem@davemloft.net>
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.
Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed that some of 'struct proto_ops' used in the kernel may share
a cache line used by locks or other heavily modified data. (default
linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
least)
This patch makes sure a 'struct proto_ops' can be declared as const,
so that all cpus can share all parts of it without false sharing.
This is not mandatory : a driver can still use a read/write structure
if it needs to (and eventually a __read_mostly)
I made a global stubstitute to change all existing occurences to make
them const.
This should reduce the possibility of false sharing on SMP, and
speedup some socket system calls.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock_init can be done as a core_initcall instead of calling
it directly in init/main.c
Also I removed an out of date #ifdef.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a new feature for netem in 2.6.16. It adds the ability to
randomly corrupt packets with netem. A version was done by
Hagen Paul Pfeifer, but I redid it to handle the cases of backwards
compatibility with netlink interface and presence of hardware checksum
offload. It is useful for testing hardware offload in devices.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As DCCP needs to be called in the same spots.
Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upcoming patches will make, for instance, ip_sockglue.c need just this enum
and not all of tcp.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming it to inet6_hash_connect, making it possible to ditch
dccp_v6_hash_connect and share the same code with TCP instead.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.
Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.
tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using sk->sk_protocol instead of IPPROTO_TCP.
Will be used by DCCPv6 in the next changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
it is left on the socket receive queue. This means that when we detect
a checksum error we have to be careful when trying to free the packet
as someone could have dequeued it in the time being.
Currently this delicate logic is duplicated three times between UDPv4,
UDPv6 and RAWv6. This patch moves them into a one place and simplifies
the code somewhat.
This is based on a suggestion by Eric Dumazet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And inet6_rsk_offset in inet_request_sock, for the same reasons as
inet_sock's pinfo6 member.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.
(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)
This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets. Extensions to the SELinux LSM are
included that leverage the patch for this purpose.
This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.
Patch purpose:
The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association. Such access
controls augment the existing ones based on network interface and IP
address. The former are very coarse-grained, and the latter can be
spoofed. By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.
Patch design approach:
The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.
A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.
Patch implementation details:
On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools). This is enforced in xfrm_state_find.
On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.
The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.
Also, if IPSec is used without security contexts, the impact is
minimal. The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.
Testing:
The pfkey interface is tested using the ipsec-tools. ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.
The xfrm_user interface is tested via ad hoc programs that set
security contexts. These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface. Testing of sa functions was done by tracing kernel
behavior.
Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
readpage(), prepare_write(), and commit_write() callers are updated to
understand the special return code AOP_TRUNCATED_PAGE in the style of
writepage() and WRITEPAGE_ACTIVATE. AOP_TRUNCATED_PAGE tells the caller that
the callee has unlocked the page and that the operation should be tried again
with a new page. OCFS2 uses this to detect and work around a lock inversion in
its aop methods. There should be no change in behaviour for methods that don't
return AOP_TRUNCATED_PAGE.
WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
made enums so that kerneldoc can be used to document their semantics.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Configfs, a file system for userspace-driven kernel object configuration.
The OCFS2 stack makes extensive use of this for propagation of cluster
configuration information into kernel.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Also use the PnP functions to start/stop the devices during the suspend so
that drivers will not have to duplicate this code.
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add support for the CS5535 Audio device. I've fixed up some errors as per
Takashi's advice from the thread:
http://lkml.org/lkml/2005/9/15/119
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
cs5535 is a 32bit x86 only device using weird CPU features
Signed-off-by: Jaya Kumar <jayakumar.alsa@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This patch fixes a problem when we use well known kernel symbols as module
names.
For example, if module source name is current.c, idle_stack.c or etc.,
we have a bad KBUILD_MODNAME value.
For example, KBUILD_MODNAME will be "get_current()" instead of "current", or
"(init_thread_union.stack)" instead of "idle_task".
The trick is to define a stringify macro on the commandline - named
KBUILD_STR for namespace reasons - and then to stringify the module
name.
There are a few uses of KBUILD_MODNAME throughout the tree but the usage
is for debug and will not be harmed by this change so left untouched for now.
While at it KBUILD_BASENAME was changed too. Any spinlock usage in the
unix module would have created wrong section names without it.
Usage in spinlock.h fixed so it no longer stringify KBUILD_BASENAME.
Original patch from Ustyogov Roman - all bugs introduced by me.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Fix n_r3964 timeouts (hardcoded for 100Hz)
Also the include of <asm/termios.h> in 'n_r3964.h' is unnecessary and
prevents using the header file in any application that has to include
<termios.h> due to duplicate definition of 'struct termio'.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently a simple
void foo(void) { preempt_enable(); }
produces the following code on ARM:
foo:
bic r3, sp, #8128
bic r3, r3, #63
ldr r2, [r3, #4]
ldr r1, [r3, #0]
sub r2, r2, #1
tst r1, #4
str r2, [r3, #4]
blne preempt_schedule
mov pc, lr
The problem is that the TIF_NEED_RESCHED flag is loaded _before_ the
preemption count is stored back, hence any interrupt coming within that
3 instruction window causing TIF_NEED_RESCHED to be set won't be
seen and scheduling won't happen as it should.
Nothing currently prevents gcc from performing that reordering. There
is already a barrier() before the decrement of the preemption count, but
another one is needed between this and the TIF_NEED_RESCHED flag test
for proper code ordering.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan's crosscompile page [1] shows, that one regression in 2.6.15-rc is
that the v850 defconfig does no longer compile.
The compile error is:
<-- snip -->
...
CC arch/v850/kernel/setup.o
In file included from /usr/src/ctest/rc/kernel/arch/v850/kernel/setup.c:17:
/usr/src/ctest/rc/kernel/include/linux/irq.h:13:43: asm/smp.h: No such file or directory
make[2]: *** [arch/v850/kernel/setup.o] Error 1
<-- snip -->
The #include <asm/smp.h> in irq.h was intruduced in 2.6.15-rc.
Since include/linux/irq.h needs code from asm/smp.h only in the
CONFIG_SMP=y case and linux/smp.h #include's asm/smp.h only in the
CONFIG_SMP=y case, I'm suggesting this patch to #include <linux/smp.h>
in irq.h.
I've tested the compilation with both CONFIG_SMP=y and CONFIG_SMP=n
on i386.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's currently a diagnostic printk in relay_switch_subbuf() meant as
a warning if you accidentally try to log an event larger than the
sub-buffer size.
The problem is if this happens while logging from somewhere it's not
safe to be doing printks, such as in the scheduler, you can end up with
a deadlock. This patch removes the warning from relay_switch_subbuf()
and instead prints some diagnostic info when the channel is closed.
Thanks to Mathieu Desnoyers for pointing out the problem and
suggesting a fix.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ensure we call unmap_mapping_range() and sync dirty pages to disk before
doing an NFS direct write.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I reported a problem and gave hints to the solution, but nobody seemed
to react. So I prepared a patch against 2.6.14.4.
Tested on 2.6.14.4 with "ip monitor addr" and with the program
attached, while adding and removing IPv6 address. Both programs didn't
receive any messages. Tested 2.6.14.4 + this patch, and both programs
received add and remove messages.
Signed-off-by: Kristian Slavov <kristian.slavov@nomadiclab.com>
Acked-by: Jamal Hadi salim <hadi@cyberus.ca>
ACKed-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This (and the three subsequent patches) is working well on OMAP H4 with
2.6.15-rc4 kernel and passes the LTP fs test.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sparc64, i386 and x86_64 have support for a special data section dedicated
to rarely updated data that is frequently read. The section was created to
avoid false sharing of those rarely read data with frequently written kernel
data.
This patch creates such a data section for ia64 and will group rarely written
data into this section.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The logic that decides that a fork() might be able to avoid copying a VM
area when it can be re-created by page faults didn't know about the new
vm_insert_page() case.
Also make some things a bit more anal wrt VM_PFNMAP.
Pointed out by Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.
Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For tape we need to control the retries. This patch adds a retries
counter on the request for REQ_BLOCK_PC commands originating from
scsi_execute* to use. REQ_BLOCK_PC commands comming from the block
layer SG_IO path continue to use the retires set in the ULD init_command.
(scsi_execute* does not set the gendisk so we do not execute
the init_command in that path).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add scsi helpers to create really-large-requests and convert
scsi-ml to scsi_execute_async().
Per Jens's previous comments, I placed this function in scsi_lib.c.
I made it follow all the queue's limits - I think I did at least :), so
I removed the warning on the function header.
I think the scsi_execute_* functions should eventually take a request_queue
and be placed some place where the dm-multipath hw_handler can use them
if that failover code is going to stay in the kernel. That conversion
patch will be sent in another mail though.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
To send async requests we need these two functions exported.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Some motherboards (such as the Asus P5V800-MX) ship a
PCI_DEVICE_ID_VIA_82C586_1 IDE controller alongside a VT8251 southbridge.
This southbridge is currently unrecognised in the via82cxxx IDE driver,
preventing those users from getting DMA access to disks.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Some hardware does not support the PACKET command at all.
Other hardware supports ATAPI, but the driver does something nasty such
as calling BUG() when an ATAPI command is issued.
For these such cases, we mark them with a new flag, ATA_FLAG_NO_ATAPI.
Initial version contributed by Ben Collins.