kernel_samsung_sm7125

jenna

Author	SHA1	Message	Date
Sultan Alsawaf	7b2c91efe3	drm/msm/sde: Remove unneeded PM QoS requests These are blocking some CPUs in the LITTLE cluster from entering deep idle because the driver assumes that display rendering work occurs on a hardcoded set of CPUs, which is false. The scope of this is also quite large, which increases power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	179aae8cc2	arm64: Inline the spin lock function family Combined with LTO, this yields a consistent 5% boost to procfs I/O performance right off the bat (as measured with callbench). The spin lock functions constitute some of the hottest code paths in the kernel; inlining them to improve performance makes sense. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	bc91f74e2e	kbuild: Disable stack conservation for GCC There's plenty of room on the stack for a few more inlined bytes here and there. The measured stack usage at runtime is still safe without this, and performance is surely improved at a microscopic level, so remove it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	c315ea8471	msm: camera: Stub out the camera_debug_util API and compile it out A measurably significant amount of CPU time is spent in these routines while the camera is open. These are also responsible for a grotesque amount of dmesg spam, so let's nuke them. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	9887b4e6de	irqchip/gic-v3: Remove pr_devel message containing smp_processor_id() This call to smp_processor_id() forces gic_raise_softirq() to require being called while preemption is disabled, which isn't an actual requirement. When called without preemption disabled, smp_processor_id() is thus used incorrectly and generates a warning splat with the relevant kernel debug options enabled. Get rid of the useless pr_devel message outright to fix the incorrect smp_processor_id() usage. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	c1433d926f	mbcache: Speed up cache entry creation In order to prevent redundant entry creation by racing against itself, mb_cache_entry_create scans through a large hash-list of all current entries in order to see if another allocation for the requested new entry has been made. Furthermore, it allocates memory for a new entry before scanning through this hash-list, which results in that allocated memory being discarded when the requested new entry is already present. This happens more than half the time. Speed up cache entry creation by keeping a small linked list of requested new entries in progress, and scanning through that first instead of the large hash-list. Additionally, don't bother allocating memory for a new entry until it's known that the allocated memory will be used. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	899365ef6e	pinctrl: msm: Remove explicit barriers from mmio ops where unneeded For the vast majority of mmio operations in this driver, explicit memory barriers aren't needed either because a data dependency between a read and write already exists, or because of the presence of the spin locks which execute a full memory barrier. Removing all the unneeded explicit barriers considerably reduces overhead for pinctrl operations, which in turn benefits things like i2c. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	603c9c59a5	locking/rwsem: Don't hog RCU read lock while optimistically spinning There's no reason to hold an RCU read lock the entire time while optimistically spinning for a rwsem. This can needlessly lengthen RCU grace periods and slow down synchronize_rcu() when it doesn't brute force the RCU grace period via rcupdate.rcu_expedited=1. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	adad23165c	locking/mutex: Don't hog RCU read lock while optimistically spinning There's no reason to hold an RCU read lock the entire time while optimistically spinning for a mutex lock. This can needlessly lengthen RCU grace periods and slow down synchronize_rcu() when it doesn't brute force the RCU grace period via rcupdate.rcu_expedited=1. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	1809f0e739	cpuidle: Mark CPUs idle as late as possible to avoid unneeded IPIs It isn't guaranteed a CPU will idle upon calling lpm_cpuidle_enter(), since it could abort early at the need_resched() check. In this case, it's possible for an IPI to be sent to this "idle" CPU needlessly, thus wasting power. For the same reason, it's also wasteful to keep a CPU marked idle even after it's woken up. Reduce the window that CPUs are marked idle to as small as it can be in order to improve power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	8b5690c507	cpuidle: Optimize pm_qos notifier callback and IPI semantics The pm_qos callback currently suffers from a number of pitfalls: it sends IPIs to CPUs that may not be idle, waits for those IPIs to finish propagating while preemption is disabled (resulting in a long busy wait for the pm_qos_update_target() caller), and needlessly calls a no-op function when the IPIs are processed. Optimize the pm_qos notifier by only sending IPIs to CPUs that are idle, and by using arch_send_wakeup_ipi_mask() instead of smp_call_function_many(). Using IPI_WAKEUP instead of IPI_CALL_FUNC, which is what smp_call_function_many() uses behind the scenes, has the benefit of doing zero work upon receipt of the IPI; IPI_WAKEUP is designed purely for sending an IPI without a payload, whereas IPI_CALL_FUNC does unwanted extra work just to run the empty smp_callback() function. Determining which CPUs are idle is done efficiently with an atomic bitmask instead of using the wake_up_if_idle() API, which checks the CPU's runqueue in an RCU read-side critical section and under a spin lock. Not very efficient in comparison to a simple, atomic bitwise operation. A cpumask isn't needed for this because NR_CPUS is guaranteed to fit within a word. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	b9ca3f96e4	arm64: Allow IPI_WAKEUP to be used outside of the ACPI parking protocol An empty IPI is useful for cpuidle to wake sleeping CPUs without causing them to do unnecessary work upon receipt of the IPI. IPI_WAKEUP fills this use-case nicely, so let it be used outside of the ACPI parking protocol. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	1e9c71adfa	qos: Don't disable interrupts while holding pm_qos_lock None of the pm_qos functions actually run in interrupt context; if some driver calls pm_qos_update_target in interrupt context then it's already broken. There's no need to disable interrupts while holding pm_qos_lock, so don't do it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	4fcddf0cea	Revert "mutex: Add a delay into the SPIN_ON_OWNER wait loop." This reverts commit 1e5a5b5e00e9706cd48e3c87de1607fcaa5214d2. This doesn't make sense for a few reasons. Firstly, upstream uses this mutex code and it works fine on all arches; why should arm be any different? Secondly, once the mutex owner starts to spin on `wait_lock`, preemption is disabled and the owner will be in an actively-running state. The optimistic mutex spinning occurs when the lock owner is actively running on a CPU, and while the optimistic spinning takes place, no attempt to acquire `wait_lock` is made by the new waiter. Therefore, it is guaranteed that new mutex waiters which optimistically spin will not contend the `wait_lock` spin lock that the owner needs to acquire in order to make forward progress. Another potential source of `wait_lock` contention can come from tasks that call mutex_trylock(), but this isn't actually problematic (and if it were, it would affect the MUTEX_SPIN_ON_OWNER=n use-case too). This won't introduce significant contention on `wait_lock` because the trylock code exits before attempting to lock `wait_lock`, specifically when the atomic mutex counter indicates that the mutex is already locked. So in reality, the amount of `wait_lock` contention that can come from mutex_trylock() amounts to only one task. And once it finishes, `wait_lock` will no longer be contended and the previous mutex owner can proceed with clean up. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	0b2b6cd70f	Revert "usb: gadget: mtp: Increase RX transfer length to 1M" This reverts commit 0db49c2550a09458db188fb7312c66783c5af104. This results in kmalloc() abuse to find a large number of contiguous pages, which thrashes the page allocator and hurts overall performance. I couldn't reproduce the improved MTP throughput that this commit claimed either, so just revert it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	f1e33d3e21	Revert "usb: gadget: f_mtp: Increase default TX buffer size" This reverts commit a9a60c58e0fa21c41ac284282949187b13bdd756. This results in kmalloc() abuse to find a large number of contiguous pages, which thrashes the page allocator and hurts overall performance. I couldn't reproduce the improved MTP throughput that this commit claimed either, so just revert it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	b95daf2c6c	iommu: msm: Rewrite to improve clarity and performance This scope of this driver's lock usage is extremely wide, leading to excessively long lock hold times. Additionally, there is lots of excessive linked-list traversal and unnecessary dynamic memory allocation in a critical path, causing poor performance across the board. Fix all of this by greatly reducing the scope of the locks used and by significantly reducing the amount of operations performed when msm_dma_map_sg_attrs() is called. The entire driver's code is overhauled for better cleanliness and performance. Note that ION must be modified to pass a known structure via the private dma_buf pointer, so that the IOMMU driver can prevent races when operating on the same buffer concurrently. This is the only way to eliminate said buffer races without hurting the IOMMU driver's performance. Some additional members are added to the device struct as well to make these various performance improvements possible. This also removes the manual cache maintenance since ION already handles it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sudarshan Rajagopalan	26d3c8aa56	iommu: arm-smmu: fix check for need for preallocate memory commit b312b4f0e2f9 ("iommu: arm-smmu: Preallocate memory for map only on failure") had the following two errors: 1. The return code we checking when map_sg fails and we preallocte is wrong. The check should be for 0 and not -ENOMEM. So the preallocate is never happening when map_sg fails. 2. map_sg could've have mapped certain elements in sglist and later had got failed. With proper check, we are trying to call map_sg on the same size again, which would leave to double map of previously mapped elements in sglist. Fix this by returning the actual ret code from arm_lpae_map_sg() and check it against -ENOMEM if we need to preallocate or not. Also, unmap any partial iovas that was mapped previously. Change-Id: Ifee7c0bed6b9cf1c35ebb4a03d51a1a80ab0ed58 Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Swathi Sridhar	373f56a7bb	iommu: arm-smmu: Preallocate memory for map only on failure page allocation failure: order:0, mode:0x2088020(GFP_ATOMIC\|__GFP_ZERO) Call trace: [<ffffff80080f15c8>] dump_backtrace+0x0/0x248 [<ffffff80080f1894>] show_stack+0x18/0x28 [<ffffff8008484984>] dump_stack+0x98/0xc0 [<ffffff8008231b0c>] warn_alloc+0x114/0x134 [<ffffff8008231f7c>] __alloc_pages_nodemask+0x3e8/0xd30 [<ffffff8008232b2c>] alloc_pages_exact+0x4c/0xa4 [<ffffff800866bec4>] arm_smmu_alloc_pages_exact+0x188/0x1bc [<ffffff8008664b28>] io_pgtable_alloc_pages_exact+0x30/0xa0 [<ffffff8008664ff8>] __arm_lpae_alloc_pages+0x40/0x1c8 [<ffffff8008665cb4>] __arm_lpae_map+0x224/0x3b4 [<ffffff8008665b98>] __arm_lpae_map+0x108/0x3b4 [<ffffff8008666474>] arm_lpae_map+0x78/0x9c [<ffffff800866aed4>] arm_smmu_map+0x80/0xdc [<ffffff800866015c>] iommu_map+0x118/0x284 [<ffffff8008c66294>] cam_smmu_alloc_firmware+0x188/0x3c0 [<ffffff8008cc8afc>] cam_icp_mgr_hw_open+0x88/0x874 [<ffffff8008cca030>] cam_icp_mgr_acquire_hw+0x2d4/0xc9c [<ffffff8008c5fe84>] cam_context_acquire_dev_to_hw+0xb0/0x26c [<ffffff8008cd0ce0>] __cam_icp_acquire_dev_in_available+0x1c/0xf0 [<ffffff8008c5ea98>] cam_context_handle_acquire_dev+0x5c/0x1a8 [<ffffff8008c619b4>] cam_node_handle_ioctl+0x30c/0xdc8 [<ffffff8008c62640>] cam_subdev_compat_ioctl+0xe4/0x1dc [<ffffff8008bcf8bc>] subdev_compat_ioctl32+0x40/0x68 [<ffffff8008bd3858>] v4l2_compat_ioctl32+0x64/0x1780 In order to avoid page allocation failure of order 0 during the smmu map operation, the existing implementation preallocates the required memory using GFP_KERNEL so as to make sure that there is sufficient page table memory available and the atomic allocation succeeds during the map operation.This might not be necessary for every single map call as the atomic allocation might succeed most of the time.Hence preallocate the necessary memory only when the map operation fails due to insufficient memory and again retry the map operation with the preallocated memory.This solution applies only to map calls made from a non-atomic context. Change-Id: I417f311c2224eb863d6c99612b678bbb2dd3db58 Signed-off-by: Swathi Sridhar <swatsrid@codeaurora.org> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	690fe03782	scatterlist: Don't allocate sg lists using __get_free_page Allocating pages with __get_free_page is slower than going through the slab allocator to grab free pages out from a pool. These are the results from running the code at the bottom of this message: [ 1.278602] speedtest: __get_free_page: 9 us [ 1.278606] speedtest: kmalloc: 4 us [ 1.278609] speedtest: kmem_cache_alloc: 4 us [ 1.278611] speedtest: vmalloc: 13 us kmalloc and kmem_cache_alloc (which is what kmalloc uses for common sizes behind the scenes) are the fastest choices. Use kmalloc to speed up sg list allocation. This is the code used to produce the above measurements: #include <linux/kthread.h> #include <linux/slab.h> #include <linux/vmalloc.h> static int speedtest(void data) { static const struct sched_param sched_max_rt_prio = { .sched_priority = MAX_RT_PRIO - 1 }; volatile s64 ctotal = 0, gtotal = 0, ktotal = 0, vtotal = 0; struct kmem_cache page_pool; int i, j, trials = 1000; volatile ktime_t start; void ptr[100]; sched_setscheduler_nocheck(current, SCHED_FIFO, &sched_max_rt_prio); page_pool = kmem_cache_create("pages", PAGE_SIZE, PAGE_SIZE, SLAB_PANIC, NULL); for (i = 0; i < trials; i++) { start = ktime_get(); for (j = 0; j < ARRAY_SIZE(ptr); j++) while (!(ptr[j] = kmem_cache_alloc(page_pool, GFP_KERNEL))); ctotal += ktime_us_delta(ktime_get(), start); for (j = 0; j < ARRAY_SIZE(ptr); j++) kmem_cache_free(page_pool, ptr[j]); start = ktime_get(); for (j = 0; j < ARRAY_SIZE(ptr); j++) while (!(ptr[j] = (void )__get_free_page(GFP_KERNEL))); gtotal += ktime_us_delta(ktime_get(), start); for (j = 0; j < ARRAY_SIZE(ptr); j++) free_page((unsigned long)ptr[j]); start = ktime_get(); for (j = 0; j < ARRAY_SIZE(ptr); j++) while (!(ptr[j] = kmalloc(PAGE_SIZE, GFP_KERNEL))); ktotal += ktime_us_delta(ktime_get(), start); for (j = 0; j < ARRAY_SIZE(ptr); j++) kfree(ptr[j]); start = ktime_get(); ptr = vmalloc(ARRAY_SIZE(ptr) PAGE_SIZE); vtotal += ktime_us_delta(ktime_get(), start); vfree(*ptr); } kmem_cache_destroy(page_pool); printk("%s: __get_free_page: %lld us\n", __func__, gtotal / trials); printk("%s: kmalloc: %lld us\n", __func__, ktotal / trials); printk("%s: kmem_cache_alloc: %lld us\n", __func__, ctotal / trials); printk("%s: vmalloc: %lld us\n", __func__, vtotal / trials); complete(data); return 0; } static int __init start_test(void) { DECLARE_COMPLETION_ONSTACK(done); BUG_ON(IS_ERR(kthread_run(speedtest, &done, "malloc_test"))); wait_for_completion(&done); return 0; } late_initcall(start_test); Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	66822a0c1a	mm: kmemleak: Don't die when memory allocation fails When memory is leaking, it's going to be harder to allocate more memory, making it more likely for this failure condition inside of kmemleak to manifest itself. This is extremely frustrating since kmemleak kills itself upon the first instance of memory allocation failure. Bypass that and make kmemleak more resilient when memory is running low. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	1e86c65339	msm: kgsl: Don't allocate memory dynamically for drawobj sync structs The memory allocated dynamically here is just used to store a single instance of a struct. Allocate both possible structs on the stack instead of allocating them dynamically to improve performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	5e51cea6b2	msm: kgsl: Don't try to wait for fences that have been signaled Trying to wait for fences that have already been signaled incurs a high setup cost, since dynamic memory allocation must be used. Avoiding this overhead when it isn't needed improves performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Danny Lin	573bb1ccb9	cpuidle: lpm-levels: Remove debug event logging A measurably significant amount of CPU time is spent on logging events for debugging purposes in lpm_cpuidle_enter. Kill the useless logging to reduce overhead. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	fe1d3ee5cb	dma-buf/sync_file: Speed up ioctl by omitting debug names A lot of CPU time is wasted on allocating, populating, and copying debug names back and forth with userspace when they're not actually needed. We can't just remove the name buffers from the various sync data structures though because we must preserve ABI compatibility with userspace, but instead we can just pretend the name fields of the user-shared structs aren't there. This massively reduces the sizes of memory allocated for these data structures and the amount of data passed between userspace, as well as eliminates a kzalloc() entirely from sync_file_ioctl_fence_info(), thus improving graphics performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	29e99761eb	qos: Don't allow userspace to impose restrictions on CPU idle levels Giving userspace intimate control over CPU latency requirements is nonsense. Userspace can't even stop itself from being preempted, so there's no reason for it to have access to a mechanism primarily used to eliminate CPU delays on the order of microseconds. Remove userspace's ability to send pm_qos requests so that it can't hurt power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	66afad287b	cpuidle: lpm-levels: Allow exit latencies equal to target latencies This allows pm_qos votes with, say, 100 us for example to select power levels with exit latencies equal to 100 us. The extra microsecond of exit latency doesn't hurt. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	5bbfc2cfe4	msm: kgsl: Remove sync fence names Generating a sync fence name by allocating memory dynamically and using scnprintf in a hot path results in excessive CPU time wasted on unneeded debug info. Remove the name generation entirely to cut down CPU waste in the GPU's rendering hot path. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	01a1e69bdf	msm: kgsl: Remove POPP POPP constantly attempts to lower the GPU's frequency behind the governor's back in order to save power; however, the GPU governor in use (msm-adreno-tz) is very good at determining the GPU's load and selecting an appropriate frequency to run the GPU at. POPP was created long ago, perhaps when msm-adreno-tz didn't exist or didn't work so well, so it is clearly deprecated. Remove it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	fb364c393b	msm: kgsl: Wake GPU upon receiving an ioctl rather than upon touch input Waking the GPU upon touch wastes power when the screen is being touched in a way that does not induce animation or any actual need for GPU usage. Instead of preemptively waking the GPU on touch input, wake it up upon receiving a IOCTL_KGSL_GPU_COMMAND ioctl since it is a sign that the GPU will soon be needed. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	7829fc221b	msm: kgsl: Increase worker thread priority Currently, the kgsl worker thread is erroneously ranked right below Android's audio threads in terms of priority. The kgsl worker thread is in the critical path for rendering frames to the display, so increase its priority to match the priority of the display commit threads. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Tim Murray	9192837c82	cpuidle: don't disable cpuidle when entering suspend cpuidle was disabled while entering suspend as part of commit `8651f97bd9` in order to work around some ACPI bugs. However, there's no reason to do this on modern platforms. Leaving cpuidle enabled can result in improved power consumption if dpm_resume_noirq runs for a significant time. Change-Id: Ie182785b176f448698c0264eba554d1e315e8a06 Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
NeilBrown	f8d2f076d5	VFS: use synchronize_rcu_expedited() in namespace_unlock() The synchronize_rcu() in namespace_unlock() is called every time a filesystem is unmounted. If a great many filesystems are mounted, this can cause a noticable slow-down in, for example, system shutdown. The sequence: mkdir -p /tmp/Mtest/{0..5000} time for i in /tmp/Mtest/; do mount -t tmpfs tmpfs $i ; done time umount /tmp/Mtest/ on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and 100 seconds to unmount them. Boot the same VM with 1 CPU and it takes 18 seconds to mount the tmpfs filesystems, but only 36 to unmount. If we change the synchronize_rcu() to synchronize_rcu_expedited() the umount time on a 4-cpu VM drop to 0.6 seconds I think this 200-fold speed up is worth the slightly high system impact of using synchronize_rcu_expedited(). Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	e37af60502	mm: Don't stop kswapd on a per-node basis when there are no waiters The page allocator wakes all kswapds in an allocation context's allowed nodemask in the slow path, so it doesn't make sense to have the kswapd- waiter count per each NUMA node. Instead, it should be a global counter to stop all kswapds when there are no failed allocation requests. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	65b42b14d7	mm: vmpressure: Ignore allocation orders above PAGE_ALLOC_COSTLY_ORDER PAGE_ALLOC_COSTLY_ORDER allocations can cause vmpressure to incorrectly think that memory pressure is high, when it's really just that the allocation's high order is difficult to satisfy. When this rare scenario occurs, ignore the input to vmpressure to avoid sending out a spurious high-pressure signal. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	c7a8d93d3e	mm: Don't warn on page allocation failures for OOM-killed processes It can be normal for a dying process to have its page allocation request fail when it has an OOM or LMK kill pending. In this case, it's actually detrimental to print out a massive allocation failure message because this means the running process needs to die quickly and release its memory, which is slowed down slightly by the massive kmsg splat. The allocation failure message is also a false positive in this case, since the failure is intentional rather than being the result of an inability to allocate memory. Suppress the allocation failure warning for processes that are killed to release memory in order to expedite their death and remedy the kmsg confusion from seeing spurious allocation failure messages. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	5440e78367	mm: vmpressure: Don't cache the window size Caching the window size can result in delayed or inaccurate pressure reports. Since calculating a fresh window size is cheap, do so all the time instead of relying on a stale, cached value. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	26a1450ce9	mm: vmpressure: Interpret zero scanned pages as 100% pressure When no pages are scanned, it usually means no zones were reclaimable and nothing could be done. In this case, the reported pressure should be 100 to elicit help from any listeners. This fixes the vmpressure framework not working when memory pressure is very high. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	2641839440	mm: vmpressure: Don't exclude any allocation types Although userspace processes can't directly help with kernel memory pressure, killing userspace processes can relieve kernel memory if they are responsible for that pressure in the first place. It doesn't make sense to exclude any allocation types knowing that userspace can indeed affect all memory pressure, so don't exclude any allocation types from the pressure calculations. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sultan Alsawaf	f20b737ad5	mm: Stop kswapd early when nothing's waiting for it to free pages Keeping kswapd running when all the failed allocations that invoked it are satisfied incurs a high overhead due to unnecessary page eviction and writeback, as well as spurious VM pressure events to various registered shrinkers. When kswapd doesn't need to work to make an allocation succeed anymore, stop it prematurely to save resources. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Danny Lin	f8ae8f1f2b	techpack: audio: Remove build timestamps These are causing parts of techpack/audio to get rebuilt on every build for no reason. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Rajan Palaniya	81353f2949	ARM64: configs: sm7125: Enable erofs PCD Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
John Galt	786545ff97	erofs/zdata: modify set sched to use FIFO at high prio for lower latency Fixes: bdd668d3b54202 Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Sandeep Dhavale	ae900d3dfa	BACKPORT: FROMLIST: erofs: add per-cpu threads for decompression Using per-cpu thread pool we can reduce the scheduling latency compared to workqueue implementation. With this patch scheduling latency and variation is reduced as per-cpu threads are high priority kthread_workers. The results were evaluated on arm64 Android devices running 5.10 kernel. The table below shows resulting improvements of total scheduling latency for the same app launch benchmark runs with 50 iterations. Scheduling latency is the latency between when the task (workqueue kworker vs kthread_worker) became eligible to run to when it actually started running. +-------------------------+-----------+----------------+---------+ \| \| workqueue \| kthread_worker \| diff \| +-------------------------+-----------+----------------+---------+ \| Average (us) \| 15253 \| 2914 \| -80.89% \| \| Median (us) \| 14001 \| 2912 \| -79.20% \| \| Minimum (us) \| 3117 \| 1027 \| -67.05% \| \| Maximum (us) \| 30170 \| 3805 \| -87.39% \| \| Standard deviation (us) \| 7166 \| 359 \| \| +-------------------------+-----------+----------------+---------+ Background: Boot times and cold app launch benchmarks are very important to the android ecosystem as they directly translate to responsiveness from user point of view. While erofs provides a lot of important features like space savings, we saw some performance penalty in cold app launch benchmarks in few scenarios. Analysis showed that the significant variance was coming from the scheduling cost while decompression cost was more or less the same. Having per-cpu thread pool we can see from the above table that this variation is reduced by ~80% on average. This problem was discussed at LPC 2022. Link to LPC 2022 slides and talk at [1] [1] https://lpc.events/event/16/contributions/1338/ Link: https://lore.kernel.org/lkml/Y+DP6V9fZG7XPPGy@debian/ Change-Id: I454da5bc17f285d99047b93dc1fc70444f287156 Signed-off-by: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Cyber Knight	8ca4419d7d	lz4: Un-staticify some functions - This fixes the below warnings: In file included from ../fs/f2fs/super.c:28: ../include/linux/lz4.h:221:12: warning: 'LZ4_compress_fast' declared 'static' but never defined [-Wunused-function] 221 \| static int LZ4_compress_fast(const char source, char dest, int inputSize, \| ^~~~~~~~~~~~~~~~~ ../include/linux/lz4.h:245:12: warning: 'LZ4_compress_destSize' declared 'static' but never defined [-Wunused-function] 245 \| static int LZ4_compress_destSize(const char source, char dest, int sourceSizePtr, \| ^~~~~~~~~~~~~~~~~~~~~ ../include/linux/lz4.h:361:13: warning: 'LZ4_resetStreamHC' declared 'static' but never defined [-Wunused-function] 361 \| static void LZ4_resetStreamHC(LZ4_streamHC_t streamHCPtr, int compressionLevel); \| ^~~~~~~~~~~~~~~~~ ../include/linux/lz4.h:376:17: warning: 'LZ4_loadDictHC' declared 'static' but never defined [-Wunused-function] 376 \| static int LZ4_loadDictHC(LZ4_streamHC_t streamHCPtr, const char dictionary, \| ^~~~~~~~~~~~~~ ../include/linux/lz4.h:415:12: warning: 'LZ4_compress_HC_continue' declared 'static' but never defined [-Wunused-function] 415 \| static int LZ4_compress_HC_continue(LZ4_streamHC_t streamHCPtr, const char src, \| ^~~~~~~~~~~~~~~~~~~~~~~~ ../include/linux/lz4.h:434:12: warning: 'LZ4_saveDictHC' declared 'static' but never defined [-Wunused-function] 434 \| static int LZ4_saveDictHC(LZ4_streamHC_t streamHCPtr, char safeBuffer, \| ^~~~~~~~~~~~~~ ../include/linux/lz4.h:450:29: warning: 'LZ4_resetStream' declared 'static' but never defined [-Wunused-function] 450 \| static __always_inline void LZ4_resetStream(LZ4_stream_t LZ4_stream); \| ^~~~~~~~~~~~~~~ ../include/linux/lz4.h:507:12: warning: 'LZ4_compress_fast_continue' declared 'static' but never defined [-Wunused-function] 507 \| static int LZ4_compress_fast_continue(LZ4_stream_t streamPtr, const char src, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/linux/lz4.h:521:12: warning: 'LZ4_setStreamDecode' declared 'static' but never defined [-Wunused-function] 521 \| static int LZ4_setStreamDecode(LZ4_streamDecode_t LZ4_streamDecode, \| ^~~~~~~~~~~~~~~~~~~ ../include/linux/lz4.h:560:12: warning: 'LZ4_decompress_safe_continue' declared 'static' but never defined [-Wunused-function] 560 \| static int LZ4_decompress_safe_continue(LZ4_streamDecode_t LZ4_streamDecode, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/linux/lz4.h:599:12: warning: 'LZ4_decompress_fast_continue' declared 'static' but never defined [-Wunused-function] 599 \| static int LZ4_decompress_fast_continue(LZ4_streamDecode_t LZ4_streamDecode, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/linux/lz4.h:622:12: warning: 'LZ4_decompress_safe_usingDict' declared 'static' but never defined [-Wunused-function] 622 \| static int LZ4_decompress_safe_usingDict(const char source, char dest, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/linux/lz4.h:645:12: warning: 'LZ4_decompress_fast_usingDict' declared 'static' but never defined [-Wunused-function] 645 \| static int LZ4_decompress_fast_usingDict(const char source, char dest, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Cyber Knight	768ae8e611	include/linux: lz4: Reduce LZ4 memory usage to 16KB - 64KB seems to not behave well at high memory pressure hence let's reduce it to 16KB which is the default. Suggested-by: vantoman <mustafa.vantom@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Yue Hu	41efee5dac	erofs: do not prompt for risk any more when using big pcluster The big pcluster feature has been merged for a year, it has been mostly stable now. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220407050505.12683-1-huyue2@coolpad.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Rajan Palaniya	fe8e6abb84	defconfig: enable EROFS & crypto LZ4, LZ4HC Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
Will Deacon	08762039e9	locking/barriers: Introduce smp_cond_load_relaxed() and atomic_cond_read_relaxed() Whilst we currently provide smp_cond_load_acquire() and atomic_cond_read_acquire(), there are cases where the ACQUIRE semantics are not required because of a subsequent fence or release operation once the conditional loop has exited. This patch adds relaxed versions of the conditional spinning primitives to avoid unnecessary barrier overhead on architectures such as arm64. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1524738868-31318-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago
John Galt	a8492570b4	Revert "erofs: compression fixes" This reverts commit 208dabff2d5e3e616a86df8bdba814d54b1a8a1f. Fixes a deadlock when fix shrinking erofs slab. Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	5 months ago

... 5 6 7 8 9 ...

762308 Commits (fourteen) All Branches Search

762308 Commits (fourteen)

All Branches