This patch adds device tree binding documentation for the SATA
controller found on NVIDIA Tegra SoCs.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add documentation of the fsl,no-spread-spectrum option.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add the documentation for the electrical properties for the iMX SATA
controller. There are many values for these, and listing them would
be error prone. Refer readers to the device documentation and driver
source code for these details.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
To: Tejun Heo <tj@kernel.org>,linux-ide@vger.kernel.org
The Freescale i.MX SATA controller mostly conforms to the AHCI
interface, but there are some special extensions at integration level
like clocks settings and hardware parameters.
Let's create a separate bindings doc for imx sata controller, so that
more imx specific properties can be added later without messing up the
generic ahci-platform bindings.
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add support for Sensirion SHTC1 and compatible temperature and humidity
sensors.
Signed-off-by: Tomas Pop <tomas.pop@sensirion.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
This patch adds a description of eBPFs instruction encoding in order
to bring the documentation in line with the implementation.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the term eBPF is used anyway on mailing list discussions, lets
also document that in the main BPF documentation file and replace a
couple of occurrences with eBPF terminology to be more clear.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document new compatible strings for clock provided by the PRCM
(Power/Reset/Clock Management) unit.
Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com>
Signed-off-by: Emilio López <emilio@elopez.com.ar>
Support for the USB gates and resets on A31 has been recently added
using a new compatible, so let's document it here.
Signed-off-by: Emilio López <emilio@elopez.com.ar>
The macro 'A' used in internal BPF interpreter:
#define A regs[insn->a_reg]
was easily confused with the name of classic BPF register 'A', since
'A' would mean two different things depending on context.
This patch is trying to clean up the naming and clarify its usage in the
following way:
- A and X are names of two classic BPF registers
- BPF_REG_A denotes internal BPF register R0 used to map classic register A
in internal BPF programs generated from classic
- BPF_REG_X denotes internal BPF register R7 used to map classic register X
in internal BPF programs generated from classic
- internal BPF instruction format:
struct sock_filter_int {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};
- BPF_X/BPF_K is 1 bit used to encode source operand of instruction
In classic:
BPF_X - means use register X as source operand
BPF_K - means use 32-bit immediate as source operand
In internal:
BPF_X - means use 'src_reg' register as source operand
BPF_K - means use 32-bit immediate as source operand
Suggested-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit documents the new support for "marvell,armada-{375,380}-wdt"
compatible strings and the extra 'reg' entry requirement.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Matt aded this plane property before we had a table giving a summary of
the properties. Add it there.
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Extend the year to 2014 in the copyright.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This panel is used by nyan-big and can be supported by the simple-panel
driver.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
[treding@nvidia.com: add device tree binding document]
Signed-off-by: Thierry Reding <treding@nvidia.com>
The remap_file_pages() system call is used to create a nonlinear
mapping, that is, a mapping in which the pages of the file are mapped
into a nonsequential order in memory. The advantage of using
remap_file_pages() over using repeated calls to mmap(2) is that the
former approach does not require the kernel to create additional VMA
(Virtual Memory Area) data structures.
Supporting of nonlinear mapping requires significant amount of
non-trivial code in kernel virtual memory subsystem including hot paths.
Also to get nonlinear mapping work kernel need a way to distinguish
normal page table entries from entries with file offset (pte_file).
Kernel reserves flag in PTE for this purpose. PTE flags are scarce
resource especially on some CPU architectures. It would be nice to free
up the flag for other usage.
Fortunately, there are not many users of remap_file_pages() in the wild.
It's only known that one enterprise RDBMS implementation uses the
syscall on 32-bit systems to map files bigger than can linearly fit into
32-bit virtual address space. This use-case is not critical anymore
since 64-bit systems are widely available.
The plan is to deprecate the syscall and replace it with an emulation.
The emulation will create new VMAs instead of nonlinear mappings. It's
going to work slower for rare users of remap_file_pages() but ABI is
preserved.
One side effect of emulation (apart from performance) is that user can
hit vm.max_map_count limit more easily due to additional VMAs. See
comment for DEFAULT_MAX_MAP_COUNT for more details on the limit.
[akpm@linux-foundation.org: fix spello]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memory allocation stack trace is not always useful for debugging a
memory leak (e.g. radix_tree_preload). This function, when called,
updates the stack trace for an already allocated object.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory reclaim always uses swappiness of the reclaim target memcg
(origin of the memory pressure) or vm_swappiness for global memory
reclaim. This behavior was consistent (except for difference between
global and hard limit reclaim) because swappiness was enforced to be
consistent within each memcg hierarchy.
After "mm: memcontrol: remove hierarchy restrictions for swappiness and
oom_control" each memcg can have its own swappiness independent of
hierarchical parents, though, so the consistency guarantee is gone.
This can lead to an unexpected behavior. Say that a group is explicitly
configured to not swapout by memory.swappiness=0 but its memory gets
swapped out anyway when the memory pressure comes from its parent with a
It is also unexpected that the knob is meaningless without setting the
hard limit which would trigger the reclaim and enforce the swappiness.
There are setups where the hard limit is configured higher in the
hierarchy by an administrator and children groups are under control of
somebody else who is interested in the swapout behavior but not
necessarily about the memory limit.
From a semantic point of view swappiness is an attribute defining anon
vs.
file proportional scanning of LRU which is memcg specific (unlike
charges which are propagated up the hierarchy) so it should be applied
to the particular memcg's LRU regardless where the memory pressure comes
from.
This patch removes vmscan_swappiness() and stores the swappiness into
the scan_control structure. mem_cgroup_swappiness is then used to
provide the correct value before shrink_lruvec is called. The global
vm_swappiness is used for the root memcg.
[hughd@google.com: oopses immediately when booted with cgroup_disable=memory]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When writing to a sysctl string, each write, regardless of VFS position,
begins writing the string from the start. This means the contents of
the last write to the sysctl controls the string contents instead of the
first:
open("/proc/sys/kernel/modprobe", O_WRONLY) = 1
write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096
write(1, "/bin/true", 9) = 9
close(1) = 0
$ cat /proc/sys/kernel/modprobe
/bin/true
Expected behaviour would be to have the sysctl be "AAAA..." capped at
maxlen (in this case KMOD_PATH_LEN: 256), instead of truncating to the
contents of the second write. Similarly, multiple short writes would
not append to the sysctl.
The old behavior is unlike regular POSIX files enough that doing audits
of software that interact with sysctls can end up in unexpected or
dangerous situations. For example, "as long as the input starts with a
trusted path" turns out to be an insufficient filter, as what must also
happen is for the input to be entirely contained in a single write
syscall -- not a common consideration, especially for high level tools.
This provides kernel.sysctl_writes_strict as a way to make this behavior
act in a less surprising manner for strings, and disallows non-zero file
position when writing numeric sysctls (similar to what is already done
when reading from non-zero file positions). For now, the default (0) is
to warn about non-zero file position use, but retain the legacy
behavior. Setting this to -1 disables the warning, and setting this to
1 enables the file position respecting behavior.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: move misplaced hunk, per Randy]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg. This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).
Usage: add "crash_kexec_post_notifiers" to kernel boot option.
Note that this actually increases risks of the failure of kdump. This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Motohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Satoru MORIYA <satoru.moriya.br@hitachi.com>
Cc: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Examples introducing neccesity of RMB+WMP pair reads as
A=3 READ B
www rrrrrr
B=4 READ A
Note the opposite order of reads vs writes.
But the first example without barriers reads as
A=3 READ A
B=4 READ B
There are 4 outcomes in the first example.
But if someone new to the concept tries to insert barriers like this:
A=3 READ A
www rrrrrr
B=4 READ B
he will still get all 4 possible outcomes, because "READ A" is first.
All this can be utterly confusing because barrier pair seems to be
superfluous. In short, fixup first example to match latter examples
with barriers.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linked article in seq_file.txt still uses create_proc_entry which was
removed in commit 80e928f7eb ("proc: Kill create_proc_entry()").
This patch adds information for kernel 3.10 and above
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update the SubmittingPatches process to include howto about the new
'Fixes:' tag to be used when a patch fixes an issue in a previous commit
(found by git-bisect for example).
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add structure for parsed BPB information, struct fat_bios_param_block,
and move all of the deserialization and validation logic from
fat_fill_super() into fat_read_bpb().
Add a 'dos1xfloppy' mount option to infer DOS 2.x BIOS Parameter Block
defaults from block device geometry for ancient floppies and floppy
images, as a fall-back from the default BPB parsing logic.
When fat_read_bpb() finds an invalid FAT filesystem and dos1xfloppy is
set, fall back to fat_read_static_bpb(). fat_read_static_bpb()
validates that the entire BPB is zero, and that the floppy has a
DOS-style 8086 code bootstrapping header. Then it fills in default BPB
values from media size and a table.[0]
Media size is assumed to be static for archaic FAT volumes. See also:
[1].
Fixes kernel.org bug #42617.
[0]: https://en.wikipedia.org/wiki/File_Allocation_Table#Exceptions
[1]: http://www.win.tue.nl/~aeb/linux/fs/fat/fat-1.html
[hirofumi@mail.parknet.co.jp: fix missed error code]
Signed-off-by: Conrad Meyer <cse.cem@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This enables the setting of a custom clock name for the clock provided by
the hym8563 rtc.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Mike Turquette <mturquette@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MPU DPLL on OMAP5, DRA75x, DRA72x has a limitation on the maximum
frequency it can be locked at. Duty Cycle Correction circuit is used
to recover a correct duty cycle for achieving higher frequencies
(hardware internally switches output to M3 output(CLKOUTHIF) from M2
output (CLKOUT)).
So provide support to setup required data to handle Duty cycle by
the setting up the minimum frequency for DPLL. 1.4GHz is common
for all these devices and is based on Technical Reference Manual
information for OMAP5432((SWPU282U) chapter 3.6.3.3.1 "DPLLs Output
Clocks Parameters", and equivalent information from DRA75x, DRA72x
documentation(SPRUHP2E, SPRUHI2P).
Signed-off-by: Nishanth Menon <nm@ti.com>
[t-kristo@ti.com: updated for latest dpll init API call]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
This patch provides the documentation of the device bindings
for the AMD 10GbE platform driver.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.
While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.
For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.
To get this fixed in a generic way, introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().
NOTE: ->target_index() should restore to policy->restore_freq in case of
failures as core would send notifications for that.
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With the recent addition of the drm_set_unique() function, devices can
now be registered without requiring a drm_bus. Add a brief description
to the DRM docbook to show how that can be achieved.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Describe how devices are registered using the drm_*_init() functions.
Adding this to docbook requires a largish set of changes to the comments
in drm_{pci,usb,platform}.c since they are doxygen-style rather than
proper kernel-doc and therefore mess with the docbook generation.
While at it, mark usage of drm_put_dev() as discouraged in favour of
calling drm_dev_unregister() and drm_dev_unref() directly.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The DSI controllers are powered by a (typically 1.2V) regulator. Usually
this is always on, so there was no need to support enabling or disabling
it thus far. But in order not to consume any power when DSI is inactive,
give the driver a chance to enable or disable the supply as needed.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Revert commit 18ebc0f404 "drm/tegra: hdmi: Enable VDD earlier for
hotplug/DDC" and instead add a new supply for the +5V pin on the HDMI
connector.
The vdd-supply property refers to the regulator that supplies the
AVDD_HDMI input on Tegra, rather than the +5V HDMI connector pin. This
was never a problem before, because all boards had that pin hooked up to
a regulator that was always on. Starting with Dalmore and continuing
with Venice2, the +5V pin is controllable via a GPIO. For reasons
unknown, the GPIO ended up as the controlling GPIO of the AVDD_HDMI
supply in the Dalmore and Venice2 DTS files. But that's not correct.
Instead, a separate supply must be introduced so that the +5V pin can be
controlled separately from the supplies that feed the HDMI block within
Tegra.
A new hdmi-supply property is introduced that takes the place of the
vdd-supply and vdd-supply is only enabled when HDMI is enabled rather
than all the time.
Signed-off-by: Thierry Reding <treding@nvidia.com>
This panel is sold by Toradex for Colibri T20/T30 and Apalis T30
evaluation kits.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The EDT ETM0700G0DH6 and ET070080DH6 are 7" 800x480 panels,
which can be supported by the simple panel driver.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Our mutexes have gone a long ways since the original
implementation back in 2005/2006. However, the mutex-design.txt
document is still stuck in the past, to the point where most of
the information there is practically useless and, more
important, simply incorrect. This patch pretty much rewrites it
to resemble what we have nowadays.
Since regular semaphores are almost much extinct in the kernel
(most users now rely on mutexes or rwsems), it no longer makes
sense to have such a close comparison, which was copied from
most of the cover letter when Ingo introduced the generic mutex
subsystem.
Note that ww_mutexes are intentionally left out, leaving things
as generic as possible.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: tim.c.chen@linux.intel.com
Cc: paulmck@linux.vnet.ibm.com
Cc: waiman.long@hp.com
Cc: jason.low2@hp.com
Cc: aswin@hp.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1401338203.2618.11.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds some documentation on the different cpu families
supported by arch/powerpc.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For atomic, it will be quite necessary to not need to care so much
about locking order. And 'state' gives us a convenient place to stash a
ww_ctx for any sort of update that needs to grab multiple crtc locks.
Because we will want to eventually make locking even more fine grained
(giving locks to planes, connectors, etc), split out drm_modeset_lock
and drm_modeset_acquire_ctx to track acquired locks.
Atomic will use this to keep track of which locks have been acquired
in a transaction.
v1: original
v2: remove a few things not needed until atomic, for now
v3: update for v3 of connection_mutex patch..
v4: squash in docbook
v5: doc tweaks/fixes
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When a module is built into the kernel the module_init() function
becomes an initcall. Sometimes debugging through dynamic debug can
help, however, debugging built in kernel modules is typically done by
changing the .config, recompiling, and booting the new kernel in an
effort to determine exactly which module caused a problem.
This patchset can be useful stand-alone or combined with initcall_debug.
There are cases where some initcalls can hang the machine before the
console can be flushed, which can make initcall_debug output inaccurate.
Having the ability to skip initcalls can help further debugging of these
scenarios.
Usage: initcall_blacklist=<list of comma separated initcalls>
ex) added "initcall_blacklist=sgi_uv_sysfs_init" as a kernel parameter and
the log contains:
blacklisting initcall sgi_uv_sysfs_init
...
...
initcall sgi_uv_sysfs_init blacklisted
ex) added "initcall_blacklist=foo_bar,sgi_uv_sysfs_init" as a kernel parameter
and the log contains:
blacklisting initcall foo_bar
blacklisting initcall sgi_uv_sysfs_init
...
...
initcall sgi_uv_sysfs_init blacklisted
[akpm@linux-foundation.org: tweak printk text]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Rob Landley <rob@landley.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The pr_debug() and related debug print macros all differ from the normal
pr_XXX() macros, in that the normal ones print unconditionally, while
the debug macros are compiled out unless DEBUG is defined or
CONFIG_DYNAMIC_DEBUG is set. This isn't obvious, and the only way to
find this out is either to review the actual printk.h code or to read
CodingStyle, and the message there doesn't highlight the fact.
Change Documentation/CodingStyle to clearly indicate that pr_debug() and
related debug printing macros behave differently than all other pr_XXX()
macros, and attempt to clarify when and where the different debug
printing methods might be used.
Add short comment to printk.h above the pr_XXX() macros indicating that
while these macros print unconditionally, pr_debug() does not.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Joe Perches <joe@perches.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Existing description is worded in a way which almost encourages setting of
vfs_cache_pressure above 100, possibly way above it.
Users are left in a dark what this numeric value is - an int? a
percentage? what the scale is?
As a result, we are getting reports about noticeable performance
degradation from users who have set vfs_cache_pressure to ridiculously
high values - because they thought there is no downside to it.
Via code inspection it's obvious that this value is treated as a
percentage. This patch changes text to reflect this fact, and adds a
cautionary paragraph advising against setting vfs_cache_pressure sky high.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently memory error handler handles action optional errors in the
deferred manner by default. And if a recovery aware application wants
to handle it immediately, it can do it by setting PF_MCE_EARLY flag.
However, such signal can be sent only to the main thread, so it's
problematic if the application wants to have a dedicated thread to
handler such signals.
So this patch adds dedicated thread support to memory error handler. We
have PF_MCE_EARLY flags for each thread separately, so with this patch
AO signal is sent to the thread with PF_MCE_EARLY flag set, not the main
thread. If you want to implement a dedicated thread, you call prctl()
to set PF_MCE_EARLY on the thread.
Memory error handler collects processes to be killed, so this patch lets
it check PF_MCE_EARLY flag on each thread in the collecting routines.
No behavioral change for all non-early kill cases.
Tony said:
: The old behavior was crazy - someone with a multithreaded process might
: well expect that if they call prctl(PF_MCE_EARLY) in just one thread, then
: that thread would see the SIGBUS with si_code = BUS_MCEERR_A0 - even if
: that thread wasn't the main thread for the process.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Kamil Iskra <iskra@mcs.anl.gov>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chen Gong <gong.chen@linux.jf.intel.com>
Cc: <stable@vger.kernel.org> [3.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kmemcg is currently under development and lacks some important features.
In particular, it does not have support of kmem reclaim on memory pressure
inside cgroup, which practically makes it unusable in real life. Let's
warn about it in both Kconfig and Documentation to prevent complaints
arising.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When it was introduced, zone_reclaim_mode made sense as NUMA distances
punished and workloads were generally partitioned to fit into a NUMA
node. NUMA machines are now common but few of the workloads are
NUMA-aware and it's routine to see major performance degradation due to
zone_reclaim_mode being enabled but relatively few can identify the
problem.
Those that require zone_reclaim_mode are likely to be able to detect
when it needs to be enabled and tune appropriately so lets have a
sensible default for the bulk of users.
This patch (of 2):
zone_reclaim_mode causes processes to prefer reclaiming memory from
local node instead of spilling over to other nodes. This made sense
initially when NUMA machines were almost exclusively HPC and the
workload was partitioned into nodes. The NUMA penalties were
sufficiently high to justify reclaiming the memory. On current machines
and workloads it is often the case that zone_reclaim_mode destroys
performance but not all users know how to detect this. Favour the
common case and disable it by default. Users that are sophisticated
enough to know they need zone_reclaim_mode will detect it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Seems we all agree that information about SECTION, e.g. section size,
sections per memory block should be kept as kernel internals, and not
exposed to userspace.
This patch updates Documentation/memory-hotplug.txt to refer to memory
blocks instead of memory sections where appropriate and added a
paragraph to explain that memory blocks are made of memory sections.
The documentation update is mostly provided by Nathan.
Also, as end_phys_index in code is actually not the end section id, but
the end memory block id, which should always be the same as phys_index.
So it is removed here.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>