Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.
A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.
Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
add error handling for cpufreq_register_driver() error
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: cpufreq@lists.linux.org.uk
Signed-off-by: Dave Jones <davej@redhat.com>
* Replace previous instances of the cpumask_of_cpu_ptr* macros
with a the new (lvalue capable) generic cpumask_of_cpu().
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* This patch replaces the dangerous lvalue version of cpumask_of_cpu
with new cpumask_of_cpu_ptr macros. These are patterned after the
node_to_cpumask_ptr macros.
In general terms, if there is a cpumask_of_cpu_map[] then a pointer to
the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map
is provided when there is a large NR_CPUS count, reducing
greatly the amount of code generated and stack space used for
cpumask_of_cpu(). The pointer to the cpumask_t value is needed for
calling set_cpus_allowed_ptr() to reduce the amount of stack space
needed to pass the cpumask_t value.
If there isn't a cpumask_of_cpu_map[], then a temporary variable is
declared and filled in with value from cpumask_of_cpu(cpu) as well as
a pointer variable pointing to this temporary variable. Afterwards,
the pointer is used to reference the cpumask value. The compiler
will optimize out the extra dereference through the pointer as well
as the stack space used for the pointer, resulting in identical code.
A good example of the orthogonal usages is in net/sunrpc/svc.c:
case SVC_POOL_PERCPU:
{
unsigned int cpu = m->pool_to[pidx];
cpumask_of_cpu_ptr(cpumask, cpu);
*oldmask = current->cpus_allowed;
set_cpus_allowed_ptr(current, cpumask);
return 1;
}
case SVC_POOL_PERNODE:
{
unsigned int node = m->pool_to[pidx];
node_to_cpumask_ptr(nodecpumask, node);
*oldmask = current->cpus_allowed;
set_cpus_allowed_ptr(current, nodecpumask);
return 1;
}
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit 2d474871e2fb092eb46a0930aba5442e10eb96cc
Author: Mike Travis <travis@sgi.com>
Date: Mon May 12 21:21:13 2008 +0200
Currently, affected_cpus shows which CPUs need to have their frequency
coordinated in software. When hardware coordination is in use, the contents
of this file appear the same as when no coordination is required. This can
lead to some confusion among user-space programs, for example, that do not
know that extra coordination is required to force a CPU core to a particular
speed to control power consumption.
To fix this, create a "related_cpus" attribute that always displays the
coordination map regardless of whatever coordination strategy the cpufreq
driver uses (sw or hw). If the cpufreq driver does not provide a value, fall
back to policy->cpus.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
We checked the hardware freq with OS cached freq value in get_cur_freqon_cpu().
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:
-int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
+int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
* Cleanup uses of CPU_MASK_ALL.
* Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
Use pointers to cpumask_t arguments whenever possible.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: Len Brown <len.brown@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change the following static arrays sized by NR_CPUS to
per_cpu data variables:
acpi_cpufreq_data *drv_data[NR_CPUS]
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cpu_data is currently an array defined using NR_CPUS. This means that
we overallocate since we will rarely really use maximum configured cpus.
When NR_CPU count is raised to 4096 the size of cpu_data becomes
3,145,728 bytes.
These changes were adopted from the sparc64 (and ia64) code. An
additional field was added to cpuinfo_x86 to be a non-ambiguous cpu
index. This corresponds to the index into a cpumask_t as well as the
per_cpu index. It's used in various places like show_cpuinfo().
cpu_data is defined to be the boot_cpu_data structure for the NON-SMP
case.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is from an earlier message from 'Christoph Lameter':
cpu_core_map is currently an array defined using NR_CPUS. This means that
we overallocate since we will rarely really use maximum configured cpu.
If we put the cpu_core_map into the per cpu area then it will be allocated
for each processor as it comes online.
This means that the core map cannot be accessed until the per cpu area
has been allocated. Xen does a weird thing here looping over all processors
and zeroing the masks that are not yet allocated and that will be zeroed
when they are allocated. I commented the code out.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Three main sets of changes:
1) dmi_get_system_info() return value should have been marked const,
since callers should not be changing that data.
2) const-ify DMI internals, since DMI firmware tables should,
whenever possible, be marked const to ensure we never ever write to
that data area.
3) const-ify DMI API, to enable marking tables const where possible
in low-level drivers.
And if we're really lucky, this might enable some additional
optimizations on the part of the compiler.
The bulk of the changes are #2 and #3, which are interrelated. #1 could
have been a separate patch, but it was so small compared to the others,
it was easier to roll it into this changeset.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch addresses some issues in x86/x86-64 acpi-cpufreq driver:
1. Current memory allocation for acpi_perf_data is actually open-coded
alloc_percpu(). The patch defines and handles acpi_perf_data as percpu
data. The code will be cleaner and easier to be maintained with this
change.
2. Won't load driver in acpi_cpufreq_early_init() failure case.
3. Add __init for acpi_cpufreq_early_init().
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
The local variable "covered" is used without initialization in i386
acpi-cpufreq driver. The initial value of covered should be 0. The bug
will cause memory leak when hit. The following patch fixes this bug.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make sure that the _PSS list is sorted in
descending order by typical power dissipation.
http://bugzilla.kernel.org/show_bug.cgi?id=7880
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
During recent acpi-cpufreq changes, writing to PERF_CTL msr
changed from RMW of entire 64 bit to RMW of low 32 bit and clearing of
upper 32 bit. Fix it back to do a proper RMW of the MSR.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
cmd.val was used uninitialized on the line below.
Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Fixes the oops in cpufreq_stats with acpi_cpufreq driver. The issue was
that the frequency was reported as 0 in acpi-cpufreq.c. The bug is due to
different indicies for freq_table and ACPI perf table.
Also adds a check in cpufreq_stats to check for error return from
freq_table_get_index() and avoid using the error return value.
Patch fixes the issue reported at
http://www.ussg.iu.edu/hypermail/linux/kernel/0611.2/0629.html
and also other similar issue here
http://bugme.osdl.org/show_bug.cgi?id=7383 comment 53
Signed-off-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Check the correct variable and set policy->cur upon acpi-cpufreq
initialization to allow the userspace governor to be used as default.
Signed-off-by: Mattia Dongili <malattia@linux.it>
Acked-by: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Fix the bug in duplicate states elimination in acpi-cpufreq.
Bug: Due to duplicate state elimiation in the loop earlier, the number
of valid_states can be less than perf->state_count, in which case
freq_table was ending up with some garbage/uninitialized entries
in the table.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
From: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
acpi-cpufreq needs the same patch as the previous speedstep-centrino change.
Additionally, the centrino driver can have its ifdef moved out a little
further to eliminate some more code/variables.
Signed-off-by: Dave Jones <davej@redhat.com>
acpi-cpufreq.c, speedstep-centrino.c: warning: 'sw_any_bug_dmi_table' defined but not used
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Enable ondemand governor and acpi-cpufreq to use IA32_APERF and IA32_MPERF MSR
to get active frequency feedback for the last sampling interval. This will
make ondemand take right frequency decisions when hardware coordination of
frequency is going on.
Without APERF/MPERF, ondemand can take wrong decision at times due
to underlying hardware coordination or TM2.
Example:
* CPU 0 and CPU 1 are hardware cooridnated.
* CPU 1 running at highest frequency.
* CPU 0 was running at highest freq. Now ondemand reduces it to
some intermediate frequency based on utilization.
* Due to underlying hardware coordination with other CPU 1, CPU 0 continues to
run at highest frequency (as long as other CPU is at highest).
* When ondemand samples CPU 0 again next time, without actual frequency
feedback from APERF/MPERF, it will think that previous frequency change
was successful and can go to wrong target frequency. This is because it
thinks that utilization it has got this sampling interval is when running at
intermediate frequency, rather than actual highest frequency.
More information about IA32_APERF IA32_MPERF MSR:
Refer to IA-32 Intel® Architecture Software Developer's Manual at
http://developer.intel.com
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Recent speedstep-centrino unification onto acpi-cpufreq patchset broke
cpuinfo_cur_freq interface in /sys/../cpuinfo/, when MSR was used for
transitions. Attached patch fixes that breakage.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Only change the frequency if the state previously set is different
from what we are trying to set. We don't really have to get the current
frequency at this point.
Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Add in the support for Intel Enhanced Speedstep - MSR based transitions.
With this change, the ACPI based support in speedstep-centrino can be
deprecated and duplicate code in that driver can be marked for removal.
Much easier to maintain and support this way. This also reduces the
user misconfigurations and questions on which driver is to be used
under which CPUs to support Enhanced Speedstep.
Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Some clean up and redsign of the driver. Mainly making it easier to add
support for multiple sub-mechanisms of changing frequency. Currently this
driver supports only ACPI SYSTEM_IO address space. With the changes
below it is easier to add support for other address spaces like Intel
Enhanced Speedstep which uses MSR (ACPI FIXED_FEATURE_HARDWARE) to do the
transitions.
Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
This patchset has refresh/rebase of a bunch of patches/bugfixes related to
acpi-cpufreq that were sent earlier on this list.
patch 1/8
Patch that fixes a bug in swcoordination code in acpi-cpufreq
patch 2/8 through patch 7/8
Grand unification of ACPI based speedstep-centrino and acpi-cpufreq drivers.
ACPI allows P-state transitions in multiple ways. Like using IO ports or using
processor native method (MSR). Without this patch, IO port based P-state
transitions are handled in acpi-cpufreq driver and MSR based transitions on
Intel CPUs are handled in speedstep-centrino driver. Even though most of the
code in these two drivers should be similar, except for final changing/checking
of frequency (one driver does it using IO port and other does it through
MSR), we have duplicated code in these two drivers. There are also issues
around BIOSes supporting both MSR and IO port and which driver should be
loaded first in standard installations.
The patchset combines functionality of these two driver into acpi-cpufreq
driver. ACPI based functionality in speedstep-centrino is marked deprecated
and will be removed in future. speedstep-centrino will continue to work
on systems that depend on older non-ACPI table based P-state chanes.
* 2/8 - Patch that reorganizes the code in acpi-cpufreq, cleaning it up
a little and making it easier to add MSR support later.
* 3/8 - Pull in the MSR based transition support into acpi-cpufreq.
* 4/8 - Mark speedstep-centrino deprecated. Change the order in Makefile to
load acpi-cpufreq first and speedstep-centrino later, in cases where both
are configured in.
* 5/8 - lindent acpi-cpufreq.c
* 6/8 - Minor change to eliminate the check of current frequency on
notifications. We can use last set frequency instead.
* 7/8 - Make cpufreq->get of acpi_cpufreq work correctly again.
There will be a patch in future that removes ACPI based support in
speedstep-centrino in coming months.
patch 8/8
Add support for IA32_APERF and IA32_MPERF MSR and get the actual frequency
from these MSRs and use it to determine the next frequency target in ondemand
governor
This patch:
There is a bug in software coordination patch in acpi-cpufreq, due to which
frequency will only be set on first CPU of any coordinated group.
Bug identified by Denis, was not recognised earlier as there are no platforms
yet that use software coordination with acpi-cpufreq driver.
Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Make the sections proper and get rid of section mismatch warnings.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Some buggy BIOSes do a "software any" kind of coordination without telling
about it to OS. So, when OS sets frequency on one CPU on these platforms,
it will also impact all the other logical CPUs that are in the same power
domain. Attached patch is a workaround for those buggy BIOSes.
Patch should be a noop on the normal non-buggy platforms.
Applies over previously sent acpi-cpufreq and software coordination
bug fix patch
Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Ignore the return value of early_init_acpi(), as it can give false error
messages. If there is something really wrong, then register_driver will
fail cleanly with EINVAL later.
[ background: modprobe acpi-cpufreq on systems not capable of speed-scaling
started failing with 'invalid argument', where previously it would only
ever -ENODEV
I'm not 100% happy with the solution. It'd be better to handle
failure properly, but this is a low-impact change for 2.6.18
We can always revisit doing this better in .19 --davej.]
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Note how any error from acpi_processor_preregister_performance is ignored.
From: bert hubert <bert.hubert@netherlabs.nl>
Signed-off-by: Dave Jones <davej@redhat.com>
Treat HW coordination as independent CPUs.
This enables per-cpu monintoring of P-states
http://bugzilla.kernel.org/show_bug.cgi?id=5737
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Convert a few stragglers over to for_each_possible_cpu(), remove
for_each_cpu().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This prevents annoying messages being printed when it gets
loaded on a machine that doesn't have support scaling via ACPI.
Signed-off-by: Dave Jones <davej@redhat.com>
cpu_online_map doesn't exist if !CONFIG_SMP.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
They previously tried to figure this out on their own.
Suggested by Venkatesh.
Cc: venkatesh.pallipadi@intel.com
Cc: davej@redhat.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>