These are blocking some CPUs in the LITTLE cluster from entering deep
idle because the driver assumes that display rendering work occurs on a
hardcoded set of CPUs, which is false. The scope of this is also quite
large, which increases power consumption.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Combined with LTO, this yields a consistent 5% boost to procfs I/O
performance right off the bat (as measured with callbench). The spin
lock functions constitute some of the hottest code paths in the kernel;
inlining them to improve performance makes sense.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
There's plenty of room on the stack for a few more inlined bytes here
and there. The measured stack usage at runtime is still safe without
this, and performance is surely improved at a microscopic level, so
remove it.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
A measurably significant amount of CPU time is spent in these routines
while the camera is open. These are also responsible for a grotesque
amount of dmesg spam, so let's nuke them.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
This call to smp_processor_id() forces gic_raise_softirq() to require
being called while preemption is disabled, which isn't an actual
requirement. When called without preemption disabled, smp_processor_id()
is thus used incorrectly and generates a warning splat with the relevant
kernel debug options enabled.
Get rid of the useless pr_devel message outright to fix the incorrect
smp_processor_id() usage.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
In order to prevent redundant entry creation by racing against itself,
mb_cache_entry_create scans through a large hash-list of all current
entries in order to see if another allocation for the requested new
entry has been made. Furthermore, it allocates memory for a new entry
before scanning through this hash-list, which results in that allocated
memory being discarded when the requested new entry is already present.
This happens more than half the time.
Speed up cache entry creation by keeping a small linked list of
requested new entries in progress, and scanning through that first
instead of the large hash-list. Additionally, don't bother allocating
memory for a new entry until it's known that the allocated memory will
be used.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
For the vast majority of mmio operations in this driver, explicit memory
barriers aren't needed either because a data dependency between a read
and write already exists, or because of the presence of the spin locks
which execute a full memory barrier.
Removing all the unneeded explicit barriers considerably reduces
overhead for pinctrl operations, which in turn benefits things like i2c.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
There's no reason to hold an RCU read lock the entire time while
optimistically spinning for a rwsem. This can needlessly lengthen RCU
grace periods and slow down synchronize_rcu() when it doesn't brute
force the RCU grace period via rcupdate.rcu_expedited=1.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
There's no reason to hold an RCU read lock the entire time while
optimistically spinning for a mutex lock. This can needlessly lengthen
RCU grace periods and slow down synchronize_rcu() when it doesn't brute
force the RCU grace period via rcupdate.rcu_expedited=1.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
It isn't guaranteed a CPU will idle upon calling lpm_cpuidle_enter(),
since it could abort early at the need_resched() check. In this case,
it's possible for an IPI to be sent to this "idle" CPU needlessly, thus
wasting power. For the same reason, it's also wasteful to keep a CPU
marked idle even after it's woken up.
Reduce the window that CPUs are marked idle to as small as it can be in
order to improve power consumption.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
The pm_qos callback currently suffers from a number of pitfalls: it
sends IPIs to CPUs that may not be idle, waits for those IPIs to finish
propagating while preemption is disabled (resulting in a long busy wait
for the pm_qos_update_target() caller), and needlessly calls a no-op
function when the IPIs are processed.
Optimize the pm_qos notifier by only sending IPIs to CPUs that are
idle, and by using arch_send_wakeup_ipi_mask() instead of
smp_call_function_many(). Using IPI_WAKEUP instead of IPI_CALL_FUNC,
which is what smp_call_function_many() uses behind the scenes, has the
benefit of doing zero work upon receipt of the IPI; IPI_WAKEUP is
designed purely for sending an IPI without a payload, whereas
IPI_CALL_FUNC does unwanted extra work just to run the empty
smp_callback() function.
Determining which CPUs are idle is done efficiently with an atomic
bitmask instead of using the wake_up_if_idle() API, which checks the
CPU's runqueue in an RCU read-side critical section and under a spin
lock. Not very efficient in comparison to a simple, atomic bitwise
operation. A cpumask isn't needed for this because NR_CPUS is
guaranteed to fit within a word.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
An empty IPI is useful for cpuidle to wake sleeping CPUs without causing
them to do unnecessary work upon receipt of the IPI. IPI_WAKEUP fills
this use-case nicely, so let it be used outside of the ACPI parking
protocol.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
None of the pm_qos functions actually run in interrupt context; if some
driver calls pm_qos_update_target in interrupt context then it's already
broken. There's no need to disable interrupts while holding pm_qos_lock,
so don't do it.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
This reverts commit 1e5a5b5e00e9706cd48e3c87de1607fcaa5214d2.
This doesn't make sense for a few reasons. Firstly, upstream uses this
mutex code and it works fine on all arches; why should arm be any
different?
Secondly, once the mutex owner starts to spin on `wait_lock`,
preemption is disabled and the owner will be in an actively-running
state. The optimistic mutex spinning occurs when the lock owner is
actively running on a CPU, and while the optimistic spinning takes
place, no attempt to acquire `wait_lock` is made by the new waiter.
Therefore, it is guaranteed that new mutex waiters which optimistically
spin will not contend the `wait_lock` spin lock that the owner needs to
acquire in order to make forward progress.
Another potential source of `wait_lock` contention can come from tasks
that call mutex_trylock(), but this isn't actually problematic (and if
it were, it would affect the MUTEX_SPIN_ON_OWNER=n use-case too). This
won't introduce significant contention on `wait_lock` because the
trylock code exits before attempting to lock `wait_lock`, specifically
when the atomic mutex counter indicates that the mutex is already
locked. So in reality, the amount of `wait_lock` contention that can
come from mutex_trylock() amounts to only one task. And once it
finishes, `wait_lock` will no longer be contended and the previous
mutex owner can proceed with clean up.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
This reverts commit 0db49c2550a09458db188fb7312c66783c5af104.
This results in kmalloc() abuse to find a large number of contiguous
pages, which thrashes the page allocator and hurts overall performance.
I couldn't reproduce the improved MTP throughput that this commit
claimed either, so just revert it.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
This reverts commit a9a60c58e0fa21c41ac284282949187b13bdd756.
This results in kmalloc() abuse to find a large number of contiguous
pages, which thrashes the page allocator and hurts overall performance.
I couldn't reproduce the improved MTP throughput that this commit
claimed either, so just revert it.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
This scope of this driver's lock usage is extremely wide, leading to
excessively long lock hold times. Additionally, there is lots of
excessive linked-list traversal and unnecessary dynamic memory
allocation in a critical path, causing poor performance across the
board.
Fix all of this by greatly reducing the scope of the locks used and by
significantly reducing the amount of operations performed when
msm_dma_map_sg_attrs() is called. The entire driver's code is overhauled
for better cleanliness and performance.
Note that ION must be modified to pass a known structure via the private
dma_buf pointer, so that the IOMMU driver can prevent races when
operating on the same buffer concurrently. This is the only way to
eliminate said buffer races without hurting the IOMMU driver's
performance.
Some additional members are added to the device struct as well to make
these various performance improvements possible.
This also removes the manual cache maintenance since ION already handles
it.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
commit b312b4f0e2f9 ("iommu: arm-smmu: Preallocate memory for map
only on failure") had the following two errors:
1. The return code we checking when map_sg fails and we preallocte
is wrong. The check should be for 0 and not -ENOMEM.
So the preallocate is never happening when map_sg fails.
2. map_sg could've have mapped certain elements in sglist and later
had got failed. With proper check, we are trying to call map_sg
on the same size again, which would leave to double map of
previously mapped elements in sglist.
Fix this by returning the actual ret code from arm_lpae_map_sg()
and check it against -ENOMEM if we need to preallocate or not.
Also, unmap any partial iovas that was mapped previously.
Change-Id: Ifee7c0bed6b9cf1c35ebb4a03d51a1a80ab0ed58
Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
page allocation failure: order:0, mode:0x2088020(GFP_ATOMIC|__GFP_ZERO)
Call trace:
[<ffffff80080f15c8>] dump_backtrace+0x0/0x248
[<ffffff80080f1894>] show_stack+0x18/0x28
[<ffffff8008484984>] dump_stack+0x98/0xc0
[<ffffff8008231b0c>] warn_alloc+0x114/0x134
[<ffffff8008231f7c>] __alloc_pages_nodemask+0x3e8/0xd30
[<ffffff8008232b2c>] alloc_pages_exact+0x4c/0xa4
[<ffffff800866bec4>] arm_smmu_alloc_pages_exact+0x188/0x1bc
[<ffffff8008664b28>] io_pgtable_alloc_pages_exact+0x30/0xa0
[<ffffff8008664ff8>] __arm_lpae_alloc_pages+0x40/0x1c8
[<ffffff8008665cb4>] __arm_lpae_map+0x224/0x3b4
[<ffffff8008665b98>] __arm_lpae_map+0x108/0x3b4
[<ffffff8008666474>] arm_lpae_map+0x78/0x9c
[<ffffff800866aed4>] arm_smmu_map+0x80/0xdc
[<ffffff800866015c>] iommu_map+0x118/0x284
[<ffffff8008c66294>] cam_smmu_alloc_firmware+0x188/0x3c0
[<ffffff8008cc8afc>] cam_icp_mgr_hw_open+0x88/0x874
[<ffffff8008cca030>] cam_icp_mgr_acquire_hw+0x2d4/0xc9c
[<ffffff8008c5fe84>] cam_context_acquire_dev_to_hw+0xb0/0x26c
[<ffffff8008cd0ce0>] __cam_icp_acquire_dev_in_available+0x1c/0xf0
[<ffffff8008c5ea98>] cam_context_handle_acquire_dev+0x5c/0x1a8
[<ffffff8008c619b4>] cam_node_handle_ioctl+0x30c/0xdc8
[<ffffff8008c62640>] cam_subdev_compat_ioctl+0xe4/0x1dc
[<ffffff8008bcf8bc>] subdev_compat_ioctl32+0x40/0x68
[<ffffff8008bd3858>] v4l2_compat_ioctl32+0x64/0x1780
In order to avoid page allocation failure of order 0 during the
smmu map operation, the existing implementation preallocates
the required memory using GFP_KERNEL so as to make sure that
there is sufficient page table memory available and the atomic
allocation succeeds during the map operation.This might not be
necessary for every single map call as the atomic allocation
might succeed most of the time.Hence preallocate the necessary
memory only when the map operation fails due to insufficient
memory and again retry the map operation with the preallocated
memory.This solution applies only to map calls made from a
non-atomic context.
Change-Id: I417f311c2224eb863d6c99612b678bbb2dd3db58
Signed-off-by: Swathi Sridhar <swatsrid@codeaurora.org>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
When memory is leaking, it's going to be harder to allocate more memory,
making it more likely for this failure condition inside of kmemleak to
manifest itself. This is extremely frustrating since kmemleak kills
itself upon the first instance of memory allocation failure.
Bypass that and make kmemleak more resilient when memory is running low.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
The memory allocated dynamically here is just used to store a single
instance of a struct. Allocate both possible structs on the stack
instead of allocating them dynamically to improve performance.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Trying to wait for fences that have already been signaled incurs a high
setup cost, since dynamic memory allocation must be used. Avoiding this
overhead when it isn't needed improves performance.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
A measurably significant amount of CPU time is spent on logging events
for debugging purposes in lpm_cpuidle_enter. Kill the useless logging to
reduce overhead.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
A lot of CPU time is wasted on allocating, populating, and copying
debug names back and forth with userspace when they're not actually
needed. We can't just remove the name buffers from the various sync data
structures though because we must preserve ABI compatibility with
userspace, but instead we can just pretend the name fields of the
user-shared structs aren't there. This massively reduces the sizes of
memory allocated for these data structures and the amount of data passed
between userspace, as well as eliminates a kzalloc() entirely from
sync_file_ioctl_fence_info(), thus improving graphics performance.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Giving userspace intimate control over CPU latency requirements is
nonsense. Userspace can't even stop itself from being preempted, so
there's no reason for it to have access to a mechanism primarily used to
eliminate CPU delays on the order of microseconds.
Remove userspace's ability to send pm_qos requests so that it can't hurt
power consumption.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
This allows pm_qos votes with, say, 100 us for example to select power
levels with exit latencies equal to 100 us. The extra microsecond of
exit latency doesn't hurt.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Generating a sync fence name by allocating memory dynamically and using
scnprintf in a hot path results in excessive CPU time wasted on unneeded
debug info. Remove the name generation entirely to cut down CPU waste in
the GPU's rendering hot path.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
POPP constantly attempts to lower the GPU's frequency behind the
governor's back in order to save power; however, the GPU governor in use
(msm-adreno-tz) is very good at determining the GPU's load and selecting
an appropriate frequency to run the GPU at.
POPP was created long ago, perhaps when msm-adreno-tz didn't exist or
didn't work so well, so it is clearly deprecated. Remove it.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Waking the GPU upon touch wastes power when the screen is being touched
in a way that does not induce animation or any actual need for GPU usage.
Instead of preemptively waking the GPU on touch input, wake it up upon
receiving a IOCTL_KGSL_GPU_COMMAND ioctl since it is a sign that the GPU
will soon be needed.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Currently, the kgsl worker thread is erroneously ranked right below
Android's audio threads in terms of priority.
The kgsl worker thread is in the critical path for rendering frames to
the display, so increase its priority to match the priority of the
display commit threads.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
cpuidle was disabled while entering suspend as part of commit
8651f97bd9 in order to work around some
ACPI bugs. However, there's no reason to do this on modern
platforms. Leaving cpuidle enabled can result in improved power
consumption if dpm_resume_noirq runs for a significant time.
Change-Id: Ie182785b176f448698c0264eba554d1e315e8a06
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
The synchronize_rcu() in namespace_unlock() is called every time
a filesystem is unmounted. If a great many filesystems are mounted,
this can cause a noticable slow-down in, for example, system shutdown.
The sequence:
mkdir -p /tmp/Mtest/{0..5000}
time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
time umount /tmp/Mtest/*
on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
100 seconds to unmount them.
Boot the same VM with 1 CPU and it takes 18 seconds to mount the
tmpfs filesystems, but only 36 to unmount.
If we change the synchronize_rcu() to synchronize_rcu_expedited()
the umount time on a 4-cpu VM drop to 0.6 seconds
I think this 200-fold speed up is worth the slightly high system
impact of using synchronize_rcu_expedited().
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
The page allocator wakes all kswapds in an allocation context's allowed
nodemask in the slow path, so it doesn't make sense to have the kswapd-
waiter count per each NUMA node. Instead, it should be a global counter
to stop all kswapds when there are no failed allocation requests.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
PAGE_ALLOC_COSTLY_ORDER allocations can cause vmpressure to incorrectly
think that memory pressure is high, when it's really just that the
allocation's high order is difficult to satisfy. When this rare scenario
occurs, ignore the input to vmpressure to avoid sending out a spurious
high-pressure signal.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
It can be normal for a dying process to have its page allocation request
fail when it has an OOM or LMK kill pending. In this case, it's actually
detrimental to print out a massive allocation failure message because
this means the running process needs to die quickly and release its
memory, which is slowed down slightly by the massive kmsg splat. The
allocation failure message is also a false positive in this case, since
the failure is intentional rather than being the result of an inability
to allocate memory.
Suppress the allocation failure warning for processes that are killed to
release memory in order to expedite their death and remedy the kmsg
confusion from seeing spurious allocation failure messages.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Caching the window size can result in delayed or inaccurate pressure
reports. Since calculating a fresh window size is cheap, do so all the
time instead of relying on a stale, cached value.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
When no pages are scanned, it usually means no zones were reclaimable
and nothing could be done. In this case, the reported pressure should be
100 to elicit help from any listeners. This fixes the vmpressure
framework not working when memory pressure is very high.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Although userspace processes can't directly help with kernel memory
pressure, killing userspace processes can relieve kernel memory if they
are responsible for that pressure in the first place. It doesn't make
sense to exclude any allocation types knowing that userspace can indeed
affect all memory pressure, so don't exclude any allocation types from
the pressure calculations.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Keeping kswapd running when all the failed allocations that invoked it
are satisfied incurs a high overhead due to unnecessary page eviction
and writeback, as well as spurious VM pressure events to various
registered shrinkers. When kswapd doesn't need to work to make an
allocation succeed anymore, stop it prematurely to save resources.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
These are causing parts of techpack/audio to get rebuilt on every build
for no reason.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Using per-cpu thread pool we can reduce the scheduling latency compared
to workqueue implementation. With this patch scheduling latency and
variation is reduced as per-cpu threads are high priority kthread_workers.
The results were evaluated on arm64 Android devices running 5.10 kernel.
The table below shows resulting improvements of total scheduling latency
for the same app launch benchmark runs with 50 iterations. Scheduling
latency is the latency between when the task (workqueue kworker vs
kthread_worker) became eligible to run to when it actually started
running.
+-------------------------+-----------+----------------+---------+
| | workqueue | kthread_worker | diff |
+-------------------------+-----------+----------------+---------+
| Average (us) | 15253 | 2914 | -80.89% |
| Median (us) | 14001 | 2912 | -79.20% |
| Minimum (us) | 3117 | 1027 | -67.05% |
| Maximum (us) | 30170 | 3805 | -87.39% |
| Standard deviation (us) | 7166 | 359 | |
+-------------------------+-----------+----------------+---------+
Background: Boot times and cold app launch benchmarks are very
important to the android ecosystem as they directly translate to
responsiveness from user point of view. While erofs provides
a lot of important features like space savings, we saw some
performance penalty in cold app launch benchmarks in few scenarios.
Analysis showed that the significant variance was coming from the
scheduling cost while decompression cost was more or less the same.
Having per-cpu thread pool we can see from the above table that this
variation is reduced by ~80% on average. This problem was discussed
at LPC 2022. Link to LPC 2022 slides and
talk at [1]
[1] https://lpc.events/event/16/contributions/1338/
Link: https://lore.kernel.org/lkml/Y+DP6V9fZG7XPPGy@debian/
Change-Id: I454da5bc17f285d99047b93dc1fc70444f287156
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
- This fixes the below warnings:
In file included from ../fs/f2fs/super.c:28:
../include/linux/lz4.h:221:12: warning: 'LZ4_compress_fast' declared 'static' but never defined [-Wunused-function]
221 | static int LZ4_compress_fast(const char *source, char *dest, int inputSize,
| ^~~~~~~~~~~~~~~~~
../include/linux/lz4.h:245:12: warning: 'LZ4_compress_destSize' declared 'static' but never defined [-Wunused-function]
245 | static int LZ4_compress_destSize(const char *source, char *dest, int *sourceSizePtr,
| ^~~~~~~~~~~~~~~~~~~~~
../include/linux/lz4.h:361:13: warning: 'LZ4_resetStreamHC' declared 'static' but never defined [-Wunused-function]
361 | static void LZ4_resetStreamHC(LZ4_streamHC_t *streamHCPtr, int compressionLevel);
| ^~~~~~~~~~~~~~~~~
../include/linux/lz4.h:376:17: warning: 'LZ4_loadDictHC' declared 'static' but never defined [-Wunused-function]
376 | static int LZ4_loadDictHC(LZ4_streamHC_t *streamHCPtr, const char *dictionary,
| ^~~~~~~~~~~~~~
../include/linux/lz4.h:415:12: warning: 'LZ4_compress_HC_continue' declared 'static' but never defined [-Wunused-function]
415 | static int LZ4_compress_HC_continue(LZ4_streamHC_t *streamHCPtr, const char *src,
| ^~~~~~~~~~~~~~~~~~~~~~~~
../include/linux/lz4.h:434:12: warning: 'LZ4_saveDictHC' declared 'static' but never defined [-Wunused-function]
434 | static int LZ4_saveDictHC(LZ4_streamHC_t *streamHCPtr, char *safeBuffer,
| ^~~~~~~~~~~~~~
../include/linux/lz4.h:450:29: warning: 'LZ4_resetStream' declared 'static' but never defined [-Wunused-function]
450 | static __always_inline void LZ4_resetStream(LZ4_stream_t *LZ4_stream);
| ^~~~~~~~~~~~~~~
../include/linux/lz4.h:507:12: warning: 'LZ4_compress_fast_continue' declared 'static' but never defined [-Wunused-function]
507 | static int LZ4_compress_fast_continue(LZ4_stream_t *streamPtr, const char *src,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
../include/linux/lz4.h:521:12: warning: 'LZ4_setStreamDecode' declared 'static' but never defined [-Wunused-function]
521 | static int LZ4_setStreamDecode(LZ4_streamDecode_t *LZ4_streamDecode,
| ^~~~~~~~~~~~~~~~~~~
../include/linux/lz4.h:560:12: warning: 'LZ4_decompress_safe_continue' declared 'static' but never defined [-Wunused-function]
560 | static int LZ4_decompress_safe_continue(LZ4_streamDecode_t *LZ4_streamDecode,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
../include/linux/lz4.h:599:12: warning: 'LZ4_decompress_fast_continue' declared 'static' but never defined [-Wunused-function]
599 | static int LZ4_decompress_fast_continue(LZ4_streamDecode_t *LZ4_streamDecode,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
../include/linux/lz4.h:622:12: warning: 'LZ4_decompress_safe_usingDict' declared 'static' but never defined [-Wunused-function]
622 | static int LZ4_decompress_safe_usingDict(const char *source, char *dest,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../include/linux/lz4.h:645:12: warning: 'LZ4_decompress_fast_usingDict' declared 'static' but never defined [-Wunused-function]
645 | static int LZ4_decompress_fast_usingDict(const char *source, char *dest,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
- 64KB seems to not behave well at high memory pressure hence let's reduce it to 16KB which is the default.
Suggested-by: vantoman <mustafa.vantom@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
The big pcluster feature has been merged for a year, it has been mostly
stable now.
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220407050505.12683-1-huyue2@coolpad.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>
Whilst we currently provide smp_cond_load_acquire() and
atomic_cond_read_acquire(), there are cases where the ACQUIRE semantics are
not required because of a subsequent fence or release operation once the
conditional loop has exited.
This patch adds relaxed versions of the conditional spinning primitives
to avoid unnecessary barrier overhead on architectures such as arm64.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Ruchit <ruchitmarathe@gmail.com>