kernel_samsung_sm7125

jenna

Author	SHA1	Message	Date
Sultan Alsawaf	108891ff97	simple_lmk: Place victims onto SCHED_RR Just increasing the victim's priority to the maximum niceness isn't enough to make it totally preempt everything in SCHED_FAIR, which is important to make sure victims die quickly. Resource-wise, this isn't very burdensome since the RT priority is just set to zero, and because dying victims don't have much to do: they only need to finish whatever they're doing quickly. SCHED_RR is used over SCHED_FIFO so that CPU time between the victims is divided evenly to help them all finish at around the same time, as fast as possible. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	af498dd7d4	simple_lmk: Add a timeout to stop waiting for victims to die Simple LMK tries to wait until all of the victims it kills have their memory freed; however, sometimes victims can take a while to die, which can block Simple LMK from killing more processes in time when needed. After the specified timeout elapses, Simple LMK will stop waiting and make itself available to kill more processes. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	ffc1cc89f8	simple_lmk: Ignore tasks that won't free memory Dying processes aren't going to help free memory, so ignore them. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	1cca6b1979	simple_lmk: Simplify tricks used to speed up the death process set_user_nice() doesn't schedule, and although set_cpus_allowed_ptr() can schedule, it will only do so when the specified task cannot run on the new set of allowed CPUs. Since cpu_all_mask is used, set_cpus_allowed_ptr() will never schedule. Therefore, both the priority elevation and cpus_allowed change can be moved to inside the task lock to simplify and speed things up. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	640ed44853	simple_lmk: Report mm as freed as soon as exit_mmap() finishes exit_mmap() is responsible for freeing the vast majority of an mm's memory; in order to unblock Simple LMK faster, report an mm as freed as soon as exit_mmap() finishes. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	4d0060a1bc	simple_lmk: Mark victim thread group with TIF_MEMDIE The OOM killer sets the TIF_MEMDIE thread flag for its victims to alert other kernel code that the current process was killed due to memory pressure, and needs to finish whatever it's doing quickly. In the page allocator this allows victim processes to quickly allocate memory using emergency reserves. This is especially important when memory pressure is high; if all processes are taking a while to allocate memory, then our victim processes will face the same problem and can potentially get stuck in the page allocator for a while rather than die expeditiously. To ensure that victim processes die quickly, set TIF_MEMDIE for the entire victim thread group. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	506bd7e8cd	simple_lmk: Disable OOM killer when Simple LMK is enabled The OOM killer only serves to be a liability when Simple LMK is used. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	e66f1253c5	simple_lmk: Print a message when there are no processes to kill Makes it clear that Simple LMK tried its best but there was nothing it could do. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	2a11b49e85	simple_lmk: Remove compat cruft not specific to 4.14 Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	9473b2077a	simple_lmk: Update copyright to 2020 Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	8b1a977007	simple_lmk: Don't queue up new reclaim requests during reclaim Queuing up reclaim requests while a reclaim is in progress doesn't make sense, since the additional reclaims may not be needed after the existing reclaim completes. This would cause Simple LMK to go berserk during periods of high memory pressure where kswapd would fire off reclaim requests nonstop. Make Simple LMK ignore new reclaim requests until an existing reclaim is finished to prevent a slaughter-fest. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	1c20528d99	simple_lmk: Increase default minfree value After commit "simple_lmk: Make reclaim deterministic", Simple LMK's behavior changed and thus requires some slight re-tuning to make it work well again. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	45fe8ea09f	simple_lmk: Clean up some code style nitpicks Using a parameter to pass around a unmodified pointer to a global variable is crufty; just use the `victims` variable directly instead. Also, compress the code in simple_lmk_init_set() a bit to make it look cleaner. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	f25447ebe2	simple_lmk: Make reclaim deterministic The 20 ms delay in the reclaim thread is a hacky fudge factor that can cause Simple LMK to behave wildly differently depending on the circumstances of when it is invoked. When kswapd doesn't get enough CPU time to finish up and go back to sleep within 20 ms, Simple LMK performs superfluous reclaims. This is suboptimal, so make Simple LMK more deterministic by eliminating the delay and instead queuing up reclaim requests from kswapd. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	bfbd452015	simple_lmk: Fix broken multicopy atomicity for victims_to_kill When the reclaim thread writes to victims_to_kill on one CPU, it expects the updated value to be immediately reflected on all CPUs in order for simple_lmk_mm_freed() to work correctly. Due to the lack of memory barriers to guarantee multicopy atomicity, simple_lmk_mm_freed() can be given a victim's mm without knowing the correct victims_to_kill value, which can cause the reclaim thread to remain stuck waiting forever for all victims to be freed. This scenario, despite being rare, has been observed. Fix this by using proper atomic helpers with memory barriers. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	1babf49723	simple_lmk: Use proper atomic_* operations where needed cmpxchg() is only atomic with respect to the local CPU, so it cannot be relied on with how it's used in Simple LMK. Switch to fully atomic operations instead for full atomic guarantees. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	471c2dc741	simple_lmk: Remove kthread_should_stop() exit condition Simple LMK's reclaim thread should never stop; there's no need to have this check. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	a101a92313	simple_lmk: Fix pages_found calculation Previously, pages_found would be calculated using an uninitialized variable. Fix it. Reported-by: Julian Liu <wlootlxt123@gmail.com> Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Sultan Alsawaf	3b28df735e	simple_lmk: Introduce Simple Low Memory Killer for Android This is a complete low memory killer solution for Android that is small and simple. Processes are killed according to the priorities that Android gives them, so that the least important processes are always killed first. Processes are killed until memory deficits are satisfied, as observed from kswapd struggling to free up pages. Simple LMK stops killing processes when kswapd finally goes back to sleep. The only tunables are the desired amount of memory to be freed per reclaim event and desired frequency of reclaim events. Simple LMK tries to free at least the desired amount of memory per reclaim and waits until all of its victims' memory is freed before proceeding to kill more processes. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	6 months ago
Tashfin Shakeer Rhythm	64ae5e767b	sched/fair: Do not use check_for_migration() while using CASS WALT has check_for_migration() that calls find_energy_efficient_cpu(). But with CASS, using find_energy_efficient_cpu() is irrelevant. Since check_for_migration() also doesn't prove to be much useful even without the reference to find_energy_efficient_cpu(), do not use it while using CASS with WALT. There's no need to use IS_ENABLED() for CONFIG_SCHED_WALT here since the function is already guarded with CONFIG_SCHED_WALT. Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>	6 months ago
kerneltoast	34d1c95b90	sched/cass: Introduce the Capacity Aware Superset Scheduler The Capacity Aware Superset Scheduler (CASS) optimizes runqueue selection of CFS tasks. By using CPU capacity as a basis for comparing the relative utilization between different CPUs, CASS fairly balances load across CPUs of varying capacities. This results in improved multi-core performance, especially when CPUs are overutilized because CASS doesn't clip a CPU's utilization when it eclipses the CPU's capacity. As a superset of capacity aware scheduling, CASS implements a hierarchy of criteria to determine the better CPU to wake a task upon between CPUs that have the same relative utilization. This way, single-core performance, latency, and cache affinity are all optimized where possible. CASS doesn't feature explicit energy awareness but its basic load balancing principle results in decreased overall energy, often better than what is possible with explicit energy awareness. By fairly balancing load based on relative utilization, all CPUs are kept at their lowest P-state necessary to satisfy the overall load at any given moment. This version of CASS is adjusted to work on older kernels. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: clarencelol <clarencekuiek@icloud.com>	6 months ago
Ruchit	2e33cbe1c4	sm7125: rebrand	6 months ago
Sultan Alsawaf	fe593f6bc0	msm/sde/rotator: Remove unneeded PM QoS requests When the rotator is actually used (still an unsolved question in computer science), these PM QoS requests block some CPUs in the LITTLE cluster from entering deep idle because the driver assumes that display rotating work occurs on a hardcoded set of CPUs, which is false. We already have the IRQ PM QoS machinery for display rendering operations that actually matter, so this cruft is unneeded. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	7b2c91efe3	drm/msm/sde: Remove unneeded PM QoS requests These are blocking some CPUs in the LITTLE cluster from entering deep idle because the driver assumes that display rendering work occurs on a hardcoded set of CPUs, which is false. The scope of this is also quite large, which increases power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	179aae8cc2	arm64: Inline the spin lock function family Combined with LTO, this yields a consistent 5% boost to procfs I/O performance right off the bat (as measured with callbench). The spin lock functions constitute some of the hottest code paths in the kernel; inlining them to improve performance makes sense. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	bc91f74e2e	kbuild: Disable stack conservation for GCC There's plenty of room on the stack for a few more inlined bytes here and there. The measured stack usage at runtime is still safe without this, and performance is surely improved at a microscopic level, so remove it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	c315ea8471	msm: camera: Stub out the camera_debug_util API and compile it out A measurably significant amount of CPU time is spent in these routines while the camera is open. These are also responsible for a grotesque amount of dmesg spam, so let's nuke them. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	9887b4e6de	irqchip/gic-v3: Remove pr_devel message containing smp_processor_id() This call to smp_processor_id() forces gic_raise_softirq() to require being called while preemption is disabled, which isn't an actual requirement. When called without preemption disabled, smp_processor_id() is thus used incorrectly and generates a warning splat with the relevant kernel debug options enabled. Get rid of the useless pr_devel message outright to fix the incorrect smp_processor_id() usage. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	c1433d926f	mbcache: Speed up cache entry creation In order to prevent redundant entry creation by racing against itself, mb_cache_entry_create scans through a large hash-list of all current entries in order to see if another allocation for the requested new entry has been made. Furthermore, it allocates memory for a new entry before scanning through this hash-list, which results in that allocated memory being discarded when the requested new entry is already present. This happens more than half the time. Speed up cache entry creation by keeping a small linked list of requested new entries in progress, and scanning through that first instead of the large hash-list. Additionally, don't bother allocating memory for a new entry until it's known that the allocated memory will be used. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	899365ef6e	pinctrl: msm: Remove explicit barriers from mmio ops where unneeded For the vast majority of mmio operations in this driver, explicit memory barriers aren't needed either because a data dependency between a read and write already exists, or because of the presence of the spin locks which execute a full memory barrier. Removing all the unneeded explicit barriers considerably reduces overhead for pinctrl operations, which in turn benefits things like i2c. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	603c9c59a5	locking/rwsem: Don't hog RCU read lock while optimistically spinning There's no reason to hold an RCU read lock the entire time while optimistically spinning for a rwsem. This can needlessly lengthen RCU grace periods and slow down synchronize_rcu() when it doesn't brute force the RCU grace period via rcupdate.rcu_expedited=1. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	adad23165c	locking/mutex: Don't hog RCU read lock while optimistically spinning There's no reason to hold an RCU read lock the entire time while optimistically spinning for a mutex lock. This can needlessly lengthen RCU grace periods and slow down synchronize_rcu() when it doesn't brute force the RCU grace period via rcupdate.rcu_expedited=1. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	1809f0e739	cpuidle: Mark CPUs idle as late as possible to avoid unneeded IPIs It isn't guaranteed a CPU will idle upon calling lpm_cpuidle_enter(), since it could abort early at the need_resched() check. In this case, it's possible for an IPI to be sent to this "idle" CPU needlessly, thus wasting power. For the same reason, it's also wasteful to keep a CPU marked idle even after it's woken up. Reduce the window that CPUs are marked idle to as small as it can be in order to improve power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	8b5690c507	cpuidle: Optimize pm_qos notifier callback and IPI semantics The pm_qos callback currently suffers from a number of pitfalls: it sends IPIs to CPUs that may not be idle, waits for those IPIs to finish propagating while preemption is disabled (resulting in a long busy wait for the pm_qos_update_target() caller), and needlessly calls a no-op function when the IPIs are processed. Optimize the pm_qos notifier by only sending IPIs to CPUs that are idle, and by using arch_send_wakeup_ipi_mask() instead of smp_call_function_many(). Using IPI_WAKEUP instead of IPI_CALL_FUNC, which is what smp_call_function_many() uses behind the scenes, has the benefit of doing zero work upon receipt of the IPI; IPI_WAKEUP is designed purely for sending an IPI without a payload, whereas IPI_CALL_FUNC does unwanted extra work just to run the empty smp_callback() function. Determining which CPUs are idle is done efficiently with an atomic bitmask instead of using the wake_up_if_idle() API, which checks the CPU's runqueue in an RCU read-side critical section and under a spin lock. Not very efficient in comparison to a simple, atomic bitwise operation. A cpumask isn't needed for this because NR_CPUS is guaranteed to fit within a word. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	b9ca3f96e4	arm64: Allow IPI_WAKEUP to be used outside of the ACPI parking protocol An empty IPI is useful for cpuidle to wake sleeping CPUs without causing them to do unnecessary work upon receipt of the IPI. IPI_WAKEUP fills this use-case nicely, so let it be used outside of the ACPI parking protocol. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	1e9c71adfa	qos: Don't disable interrupts while holding pm_qos_lock None of the pm_qos functions actually run in interrupt context; if some driver calls pm_qos_update_target in interrupt context then it's already broken. There's no need to disable interrupts while holding pm_qos_lock, so don't do it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	4fcddf0cea	Revert "mutex: Add a delay into the SPIN_ON_OWNER wait loop." This reverts commit 1e5a5b5e00e9706cd48e3c87de1607fcaa5214d2. This doesn't make sense for a few reasons. Firstly, upstream uses this mutex code and it works fine on all arches; why should arm be any different? Secondly, once the mutex owner starts to spin on `wait_lock`, preemption is disabled and the owner will be in an actively-running state. The optimistic mutex spinning occurs when the lock owner is actively running on a CPU, and while the optimistic spinning takes place, no attempt to acquire `wait_lock` is made by the new waiter. Therefore, it is guaranteed that new mutex waiters which optimistically spin will not contend the `wait_lock` spin lock that the owner needs to acquire in order to make forward progress. Another potential source of `wait_lock` contention can come from tasks that call mutex_trylock(), but this isn't actually problematic (and if it were, it would affect the MUTEX_SPIN_ON_OWNER=n use-case too). This won't introduce significant contention on `wait_lock` because the trylock code exits before attempting to lock `wait_lock`, specifically when the atomic mutex counter indicates that the mutex is already locked. So in reality, the amount of `wait_lock` contention that can come from mutex_trylock() amounts to only one task. And once it finishes, `wait_lock` will no longer be contended and the previous mutex owner can proceed with clean up. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	0b2b6cd70f	Revert "usb: gadget: mtp: Increase RX transfer length to 1M" This reverts commit 0db49c2550a09458db188fb7312c66783c5af104. This results in kmalloc() abuse to find a large number of contiguous pages, which thrashes the page allocator and hurts overall performance. I couldn't reproduce the improved MTP throughput that this commit claimed either, so just revert it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	f1e33d3e21	Revert "usb: gadget: f_mtp: Increase default TX buffer size" This reverts commit a9a60c58e0fa21c41ac284282949187b13bdd756. This results in kmalloc() abuse to find a large number of contiguous pages, which thrashes the page allocator and hurts overall performance. I couldn't reproduce the improved MTP throughput that this commit claimed either, so just revert it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	b95daf2c6c	iommu: msm: Rewrite to improve clarity and performance This scope of this driver's lock usage is extremely wide, leading to excessively long lock hold times. Additionally, there is lots of excessive linked-list traversal and unnecessary dynamic memory allocation in a critical path, causing poor performance across the board. Fix all of this by greatly reducing the scope of the locks used and by significantly reducing the amount of operations performed when msm_dma_map_sg_attrs() is called. The entire driver's code is overhauled for better cleanliness and performance. Note that ION must be modified to pass a known structure via the private dma_buf pointer, so that the IOMMU driver can prevent races when operating on the same buffer concurrently. This is the only way to eliminate said buffer races without hurting the IOMMU driver's performance. Some additional members are added to the device struct as well to make these various performance improvements possible. This also removes the manual cache maintenance since ION already handles it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sudarshan Rajagopalan	26d3c8aa56	iommu: arm-smmu: fix check for need for preallocate memory commit b312b4f0e2f9 ("iommu: arm-smmu: Preallocate memory for map only on failure") had the following two errors: 1. The return code we checking when map_sg fails and we preallocte is wrong. The check should be for 0 and not -ENOMEM. So the preallocate is never happening when map_sg fails. 2. map_sg could've have mapped certain elements in sglist and later had got failed. With proper check, we are trying to call map_sg on the same size again, which would leave to double map of previously mapped elements in sglist. Fix this by returning the actual ret code from arm_lpae_map_sg() and check it against -ENOMEM if we need to preallocate or not. Also, unmap any partial iovas that was mapped previously. Change-Id: Ifee7c0bed6b9cf1c35ebb4a03d51a1a80ab0ed58 Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Swathi Sridhar	373f56a7bb	iommu: arm-smmu: Preallocate memory for map only on failure page allocation failure: order:0, mode:0x2088020(GFP_ATOMIC\|__GFP_ZERO) Call trace: [<ffffff80080f15c8>] dump_backtrace+0x0/0x248 [<ffffff80080f1894>] show_stack+0x18/0x28 [<ffffff8008484984>] dump_stack+0x98/0xc0 [<ffffff8008231b0c>] warn_alloc+0x114/0x134 [<ffffff8008231f7c>] __alloc_pages_nodemask+0x3e8/0xd30 [<ffffff8008232b2c>] alloc_pages_exact+0x4c/0xa4 [<ffffff800866bec4>] arm_smmu_alloc_pages_exact+0x188/0x1bc [<ffffff8008664b28>] io_pgtable_alloc_pages_exact+0x30/0xa0 [<ffffff8008664ff8>] __arm_lpae_alloc_pages+0x40/0x1c8 [<ffffff8008665cb4>] __arm_lpae_map+0x224/0x3b4 [<ffffff8008665b98>] __arm_lpae_map+0x108/0x3b4 [<ffffff8008666474>] arm_lpae_map+0x78/0x9c [<ffffff800866aed4>] arm_smmu_map+0x80/0xdc [<ffffff800866015c>] iommu_map+0x118/0x284 [<ffffff8008c66294>] cam_smmu_alloc_firmware+0x188/0x3c0 [<ffffff8008cc8afc>] cam_icp_mgr_hw_open+0x88/0x874 [<ffffff8008cca030>] cam_icp_mgr_acquire_hw+0x2d4/0xc9c [<ffffff8008c5fe84>] cam_context_acquire_dev_to_hw+0xb0/0x26c [<ffffff8008cd0ce0>] __cam_icp_acquire_dev_in_available+0x1c/0xf0 [<ffffff8008c5ea98>] cam_context_handle_acquire_dev+0x5c/0x1a8 [<ffffff8008c619b4>] cam_node_handle_ioctl+0x30c/0xdc8 [<ffffff8008c62640>] cam_subdev_compat_ioctl+0xe4/0x1dc [<ffffff8008bcf8bc>] subdev_compat_ioctl32+0x40/0x68 [<ffffff8008bd3858>] v4l2_compat_ioctl32+0x64/0x1780 In order to avoid page allocation failure of order 0 during the smmu map operation, the existing implementation preallocates the required memory using GFP_KERNEL so as to make sure that there is sufficient page table memory available and the atomic allocation succeeds during the map operation.This might not be necessary for every single map call as the atomic allocation might succeed most of the time.Hence preallocate the necessary memory only when the map operation fails due to insufficient memory and again retry the map operation with the preallocated memory.This solution applies only to map calls made from a non-atomic context. Change-Id: I417f311c2224eb863d6c99612b678bbb2dd3db58 Signed-off-by: Swathi Sridhar <swatsrid@codeaurora.org> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	690fe03782	scatterlist: Don't allocate sg lists using __get_free_page Allocating pages with __get_free_page is slower than going through the slab allocator to grab free pages out from a pool. These are the results from running the code at the bottom of this message: [ 1.278602] speedtest: __get_free_page: 9 us [ 1.278606] speedtest: kmalloc: 4 us [ 1.278609] speedtest: kmem_cache_alloc: 4 us [ 1.278611] speedtest: vmalloc: 13 us kmalloc and kmem_cache_alloc (which is what kmalloc uses for common sizes behind the scenes) are the fastest choices. Use kmalloc to speed up sg list allocation. This is the code used to produce the above measurements: #include <linux/kthread.h> #include <linux/slab.h> #include <linux/vmalloc.h> static int speedtest(void data) { static const struct sched_param sched_max_rt_prio = { .sched_priority = MAX_RT_PRIO - 1 }; volatile s64 ctotal = 0, gtotal = 0, ktotal = 0, vtotal = 0; struct kmem_cache page_pool; int i, j, trials = 1000; volatile ktime_t start; void ptr[100]; sched_setscheduler_nocheck(current, SCHED_FIFO, &sched_max_rt_prio); page_pool = kmem_cache_create("pages", PAGE_SIZE, PAGE_SIZE, SLAB_PANIC, NULL); for (i = 0; i < trials; i++) { start = ktime_get(); for (j = 0; j < ARRAY_SIZE(ptr); j++) while (!(ptr[j] = kmem_cache_alloc(page_pool, GFP_KERNEL))); ctotal += ktime_us_delta(ktime_get(), start); for (j = 0; j < ARRAY_SIZE(ptr); j++) kmem_cache_free(page_pool, ptr[j]); start = ktime_get(); for (j = 0; j < ARRAY_SIZE(ptr); j++) while (!(ptr[j] = (void )__get_free_page(GFP_KERNEL))); gtotal += ktime_us_delta(ktime_get(), start); for (j = 0; j < ARRAY_SIZE(ptr); j++) free_page((unsigned long)ptr[j]); start = ktime_get(); for (j = 0; j < ARRAY_SIZE(ptr); j++) while (!(ptr[j] = kmalloc(PAGE_SIZE, GFP_KERNEL))); ktotal += ktime_us_delta(ktime_get(), start); for (j = 0; j < ARRAY_SIZE(ptr); j++) kfree(ptr[j]); start = ktime_get(); ptr = vmalloc(ARRAY_SIZE(ptr) PAGE_SIZE); vtotal += ktime_us_delta(ktime_get(), start); vfree(*ptr); } kmem_cache_destroy(page_pool); printk("%s: __get_free_page: %lld us\n", __func__, gtotal / trials); printk("%s: kmalloc: %lld us\n", __func__, ktotal / trials); printk("%s: kmem_cache_alloc: %lld us\n", __func__, ctotal / trials); printk("%s: vmalloc: %lld us\n", __func__, vtotal / trials); complete(data); return 0; } static int __init start_test(void) { DECLARE_COMPLETION_ONSTACK(done); BUG_ON(IS_ERR(kthread_run(speedtest, &done, "malloc_test"))); wait_for_completion(&done); return 0; } late_initcall(start_test); Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	66822a0c1a	mm: kmemleak: Don't die when memory allocation fails When memory is leaking, it's going to be harder to allocate more memory, making it more likely for this failure condition inside of kmemleak to manifest itself. This is extremely frustrating since kmemleak kills itself upon the first instance of memory allocation failure. Bypass that and make kmemleak more resilient when memory is running low. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	1e86c65339	msm: kgsl: Don't allocate memory dynamically for drawobj sync structs The memory allocated dynamically here is just used to store a single instance of a struct. Allocate both possible structs on the stack instead of allocating them dynamically to improve performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	5e51cea6b2	msm: kgsl: Don't try to wait for fences that have been signaled Trying to wait for fences that have already been signaled incurs a high setup cost, since dynamic memory allocation must be used. Avoiding this overhead when it isn't needed improves performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Danny Lin	573bb1ccb9	cpuidle: lpm-levels: Remove debug event logging A measurably significant amount of CPU time is spent on logging events for debugging purposes in lpm_cpuidle_enter. Kill the useless logging to reduce overhead. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	fe1d3ee5cb	dma-buf/sync_file: Speed up ioctl by omitting debug names A lot of CPU time is wasted on allocating, populating, and copying debug names back and forth with userspace when they're not actually needed. We can't just remove the name buffers from the various sync data structures though because we must preserve ABI compatibility with userspace, but instead we can just pretend the name fields of the user-shared structs aren't there. This massively reduces the sizes of memory allocated for these data structures and the amount of data passed between userspace, as well as eliminates a kzalloc() entirely from sync_file_ioctl_fence_info(), thus improving graphics performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	29e99761eb	qos: Don't allow userspace to impose restrictions on CPU idle levels Giving userspace intimate control over CPU latency requirements is nonsense. Userspace can't even stop itself from being preempted, so there's no reason for it to have access to a mechanism primarily used to eliminate CPU delays on the order of microseconds. Remove userspace's ability to send pm_qos requests so that it can't hurt power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago
Sultan Alsawaf	66afad287b	cpuidle: lpm-levels: Allow exit latencies equal to target latencies This allows pm_qos votes with, say, 100 us for example to select power levels with exit latencies equal to 100 us. The extra microsecond of exit latency doesn't hurt. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Ruchit <ruchitmarathe@gmail.com>	6 months ago

1 2 3 4 5 ...

762081 Commits (919cc2ab499e7ea153a37171126cecb1487ed691) All Branches Search

762081 Commits (919cc2ab499e7ea153a37171126cecb1487ed691)

All Branches