You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
kernel_samsung_sm7125/Documentation/scheduler/sched-pelt.c

126 lines
2.4 KiB

/*
* The following program is used to generate the constants for
* computing sched averages.
*
* ==============================================================
* C program (compile with -lm)
* ==============================================================
*/
#include <math.h>
#include <stdio.h>
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
#define HALFLIFE { 32, 16, 8 }
#define SHIFT 32
double y;
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
void calc_runnable_avg_yN_inv(const int halflife)
{
int i;
unsigned int x;
printf("static const u32 runnable_avg_yN_inv[] = {");
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
for (i = 0; i < halflife; i++) {
x = ((1UL<<32)-1)*pow(y, i);
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
if (i % 4 == 0) printf("\n\t");
printf("0x%8x,", x);
}
printf("\n};\n\n");
}
int sum;
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
void calc_runnable_avg_yN_sum(const int halflife)
{
int i;
printf("static const u32 runnable_avg_yN_sum[] = {\n\t 0,");
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
for (i = 1; i <= halflife; i++) {
if (i == 1)
sum *= y;
else
sum = sum*y + 1024*y;
if (i % 11 == 0)
printf("\n\t");
printf("%5d,", sum);
}
printf("\n};\n\n");
}
int n;
long max;
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
void calc_converged_max(const int halflife)
{
long last = 0, y_inv = ((1UL<<32)-1)*y;
for (; ; n++) {
if (n > -1)
max = ((max*y_inv)>>SHIFT) + 1024;
/*
* This is the same as:
* max = max*y + 1024;
*/
if (last == max)
break;
last = max;
}
n--;
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
printf("#define LOAD_AVG_PERIOD %d\n", halflife);
printf("#define LOAD_AVG_MAX %ld\n", max);
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
printf("#define LOAD_AVG_MAX_N %d\n\n", n);
}
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
void calc_accumulated_sum_32(const int halflife)
{
int i, x = sum;
printf("static const u32 __accumulated_sum_N32[] = {\n\t 0,");
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
for (i = 1; i <= n/halflife+1; i++) {
if (i > 1)
x = x/2 + sum;
if (i % 6 == 0)
printf("\n\t");
printf("%6d,", x);
}
printf("\n};\n\n");
}
void main(void)
{
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
int hl_value[] = HALFLIFE;
int hl_count = sizeof(hl_value) / sizeof(int);
int hl_idx, halflife;
printf("/* SPDX-License-Identifier: GPL-2.0 */\n");
printf("/* Generated by Documentation/scheduler/sched-pelt; do not modify. */\n");
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
for (hl_idx = 0; hl_idx < hl_count; ++hl_idx) {
halflife = hl_value[hl_idx];
y = pow(0.5, 1/(double)halflife);
sum = 1024;
/* first period */
max = 1024;
n = -1;
printf("\n#ifdef CONFIG_PELT_UTIL_HALFLIFE_%d\n", halflife);
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
calc_runnable_avg_yN_inv(halflife);
calc_runnable_avg_yN_sum(halflife);
calc_converged_max(halflife);
/*
* calc_accumulated_sum_32(halflife) precomputed load sum table of half-life,
* not used yet.
*/
printf("#endif\n");
FROMLIST: sched/fair: add support to tune PELT ramp/decay timings The PELT half-life is the time [ms] required by the PELT signal to build up a 50% load/utilization, starting from zero. This time is currently hardcoded to be 32ms, a value which seems to make sense for most of the workloads. However, 32ms has been verified to be too long for certain classes of workloads. For example, in the mobile space many tasks affecting the user-experience run with a 16ms or 8ms cadence, since they need to match the common 60Hz or 120Hz refresh rate of the graphics pipeline. This contributed so fare to the idea that "PELT is too slow" to properly track the utilization of interactive mobile workloads, especially compared to alternative load tracking solutions which provides a better representation of tasks demand in the range of 10-20ms. A faster PELT ramp-up time could give some advantages to speed-up the time required for the signal to stabilize and thus to better represent task demands in the mobile space. As a downside, it also reduces the decay time, and thus we forget the load/utilization of sleeping tasks (or idle CPUs) faster. Fortunately, since the integration of the utilization estimation support in mainline kernel: commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") a fast decay time is no longer an issue for tasks utilization estimation. Although estimated utilization does not slow down the decay of blocked utilization on idle CPUs, for mobile workloads this seems not to be a major concern compared to the benefits in interactivity responsiveness. Let's add a compile time option to choose the PELT speed which better fits for a specific system. By default the current 32ms half-life is used, but we can also compile a kernel to use a faster ramp-up time of either 16ms or 8ms. These two configurations have been verified to give PELT a further improvement in performance, compared to other out-of-tree load tracking solutions, when it comes to track interactive workloads thus better supporting both tasks placements and frequencies selections. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Turner <pjt@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org [ backport from LKML: Message-ID: <20180409165134.707-1-patrick.bellasi@arm.com> ] Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Change-Id: I50569748918b799ac4bf4e7d2b387253080a0fd2
7 years ago
}
}