kernel_samsung_sm7125

jenna

Author	SHA1	Message	Date
Glauber de Oliveira Costa	f72a9ef979	x86: cleanup write_tsc write_tsc() does not need to be enclosed in any paravirt closure, as it uses wrmsr(). So we rip off the duplicate in msr.h and the definition from paravirt.h Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
Glauber de Oliveira Costa	a4746364da	x86: adjust PVOP_CALL/VCALL macros for 64-bit This patch adjust the PVOP_VCALL and PVOP_CALL macros to work with x86_64. It has a different calling convention, and we use auxiliary macros to account for both calling conventions as cleanly as possible Comments are adjusted accordingly. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
Nick Piggin	314cdbefd1	x86: FIFO ticket spinlocks Introduce ticket lock spinlocks for x86 which are FIFO. The implementation is described in the comments. The straight-line lock/unlock instruction sequence is slightly slower than the dec based locks on modern x86 CPUs, however the difference is quite small on Core2 and Opteron when working out of cache, and becomes almost insignificant even on P4 when the lock misses cache. trylock is more significantly slower, but they are relatively rare. On an 8 core (2 socket) Opteron, spinlock unfairness is extremely noticable, with a userspace test having a difference of up to 2x runtime per thread, and some threads are starved or "unfairly" granted the lock up to 1 000 000 (!) times. After this patch, all threads appear to finish at exactly the same time. The memory ordering of the lock does conform to x86 standards, and the implementation has been reviewed by Intel and AMD engineers. The algorithm also tells us how many CPUs are contending the lock, so lockbreak becomes trivial and we no longer have to waste 4 bytes per spinlock for it. After this, we can no longer spin on any locks with preempt enabled and cannot reenable interrupts when spinning on an irq safe lock, because at that point we have already taken a ticket and the would deadlock if the same CPU tries to take the lock again. These are questionable anyway: if the lock happens to be called under a preempt or interrupt disabled section, then it will just have the same latency problems. The real fix is to keep critical sections short, and ensure locks are reasonably fair (which this patch does). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	17 years ago
Glauber de Oliveira Costa	75b8bb3e56	x86: change write_ldt_entry signature this patch changes the signature of write_ldt_entry. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> CC: Zachary Amsden <zach@vmware.com> CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
Glauber de Oliveira Costa	014b15be30	x86: change write_gdt_entry signature. This patch changes the write_gdt_entry function signature. Instead of the old "a" and "b" parameters, it now receives a pointer to a desc_struct, and the size of the entry being handled. This is because x86_64 can have some 16-byte entries as well as 8-byte ones. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> CC: Zachary Amsden <zach@vmware.com> CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
Glauber de Oliveira Costa	8d947344c4	x86: change write_idt_entry signature this patch changes write_idt_entry signature. It now takes a gate_desc instead of the a and b parameters. It will allow it to be later unified between i386 and x86_64. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> CC: Zachary Amsden <zach@vmware.com> CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
Glauber de Oliveira Costa	6b68f01baa	x86: unify struct desc_ptr This patch unifies struct desc_ptr between i386 and x86_64. They can be expressed in the exact same way in C code, only having to change the name of one of them. As Xgt_desc_struct is ugly and big, this is the one that goes away. There's also a padding field in i386, but it is not really needed in the C structure definition. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
Glauber de Oliveira Costa	c9dcda5ce4	x86: change write msr functions interface This patche changes the native_write_msr() and friends interface to explicitly take 2 32-bit registers instead of a 64-bit value. The change will ease the merge with 64-bit code. As the 64-bit value will be passed as two registers anyway in i386, the PVOP_CALL interface has to account for that and use low/high parameters It would force the x86_64 version to be different. The change does not make i386 generated code less efficient. As said above, it would get the values from two registers anyway. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
Glauber de Oliveira Costa	b8d1fae7db	x86: change rdpmc interface the rdpmc instruction gets a counter argument in rcx. However, the i386 version was ignoring it. To make both x86_64 and i386 versions the same, as well as to comply with the instruction semantics, this parameter is added in the i386 version Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
H. Peter Anvin	faca62273b	x86: use generic register name in the thread and tss structures This changes size-specific register names (eip/rip, esp/rsp, etc.) to generic names in the thread and tss structures. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
Glauber de Oliveira Costa	6abcd98ffa	x86: irqflags consolidation This patch consolidates the irqflags include files containing common paravirt definitions. The native definition for interrupt handling, halt, and such, are the same for 32 and 64 bit, and they are kept in irqflags.h. the differences are split in the arch-specific files. The syscall function, irq_enable_sysexit, has a very specific i386 naming, and its name is then changed to a more general one. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	17 years ago
Thomas Gleixner	42e0a9aa5d	x86: use u32 for some lapic functions Use u32 so 32 and 64bit have the same interface. Andrew Morton: xen, lguest build fixes Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	17 years ago
Jeremy Fitzhardinge	8965c1c095	paravirt: clean up lazy mode handling Currently, the set_lazy_mode pv_op is overloaded with 5 functions: 1. enter lazy cpu mode 2. leave lazy cpu mode 3. enter lazy mmu mode 4. leave lazy mmu mode 5. flush pending batched operations This complicates each paravirt backend, since it needs to deal with all the possible state transitions, handling flushing, etc. In particular, flushing is quite distinct from the other 4 functions, and seems to just cause complication. This patch removes the set_lazy_mode operation, and adds "enter" and "leave" lazy mode operations on mmu_ops and cpu_ops. All the logic associated with enter and leaving lazy states is now in common code (basically BUG_ONs to make sure that no mode is current when entering a lazy mode, and make sure that the mode is current when leaving). Also, flush is handled in a common way, by simply leaving and re-entering the lazy mode. The result is that the Xen, lguest and VMI lazy mode implementations are much simpler. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Avi Kivity <avi@qumranet.com> Cc: Anthony Liguory <aliguori@us.ibm.com> Cc: "Glauber de Oliveira Costa" <glommer@gmail.com> Cc: Jun Nakajima <jun.nakajima@intel.com>	18 years ago
Jeremy Fitzhardinge	93b1eab3d2	paravirt: refactor struct paravirt_ops into smaller pv_*_ops This patch refactors the paravirt_ops structure into groups of functionally related ops: pv_info - random info, rather than function entrypoints pv_init_ops - functions used at boot time (some for module_init too) pv_misc_ops - lazy mode, which didn't fit well anywhere else pv_time_ops - time-related functions pv_cpu_ops - various privileged instruction ops pv_irq_ops - operations for managing interrupt state pv_apic_ops - APIC operations pv_mmu_ops - operations for managing pagetables There are several motivations for this: 1. Some of these ops will be general to all x86, and some will be i386/x86-64 specific. This makes it easier to share common stuff while allowing separate implementations where needed. 2. At the moment we must export all of paravirt_ops, but modules only need selected parts of it. This allows us to export on a case by case basis (and also choose which export license we want to apply). 3. Functional groupings make things a bit more readable. Struct paravirt_ops is now only used as a template to generate patch-site identifiers, and to extract function pointers for inserting into jmp/calls when patching. It is only instantiated when needed. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Anthony Liguory <aliguori@us.ibm.com> Cc: "Glauber de Oliveira Costa" <glommer@gmail.com> Cc: Jun Nakajima <jun.nakajima@intel.com>	18 years ago
Thomas Gleixner	96a388de5d	i386/x86_64: move headers to include/asm-x86 Move the headers to include/asm-x86 and fixup the header install make rules Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	18 years ago
Andi Kleen	ab144f5ec6	i386: Make patching more robust, fix paravirt issue Commit `19d36ccdc3` "x86: Fix alternatives and kprobes to remap write-protected kernel text" uses code which is being patched for patching. In particular, paravirt_ops does patching in two stages: first it calls paravirt_ops.patch, then it fills any remaining instructions with nop_out(). nop_out calls text_poke() which calls lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val): that call site is one of the places we patch. If we always do patching as one single call to text_poke(), we only need make sure we're not patching the memcpy in text_poke itself. This means the prototype to paravirt_ops.patch needs to change, to marshal the new code into a buffer rather than patching in place as it does now. It also means all patching goes through text_poke(), which is known to be safe (apply_alternatives is also changed to make a single patch). AK: fix compilation on x86-64 (bad rusty!) AK: fix boot on x86-64 (sigh) AK: merged with other patches Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Jeremy Fitzhardinge	688340ea34	Add a sched_clock paravirt_op The tsc-based get_scheduled_cycles interface is not a good match for Xen's runstate accounting, which reports everything in nanoseconds. This patch replaces this interface with a sched_clock interface, which matches both Xen and VMI's requirements. In order to do this, we: 1. replace get_scheduled_cycles with sched_clock 2. hoist cycles_2_ns into a common header 3. update vmi accordingly One thing to note: because sched_clock is implemented as a weak function in kernel/sched.c, we must define a real function in order to override this weak binding. This means the usual paravirt_ops technique of using an inline function won't work in this case. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Cc: john stultz <johnstul@us.ibm.com>	18 years ago
Jeremy Fitzhardinge	d572929cdd	paravirt: helper to disable all IO space In a virtual environment, device drivers such as legacy IDE will waste quite a lot of time probing for their devices which will never appear. This helper function allows a paravirt implementation to lay claim to the whole iomem and ioport space, thereby disabling all device drivers trying to claim IO resources. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Rusty Russell <rusty@rustcorp.com.au>	18 years ago
Jeremy Fitzhardinge	6996d3b63f	paravirt: add a hook for once the allocator is ready Add a hook so that the paravirt backend knows when the allocator is ready. This is useful for the obvious reason that the allocator is available, but the other side-effect of having the bootmem allocator available is that each page now has an associated "struct page". Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	18 years ago
Jeremy Fitzhardinge	fdb4c338c8	paravirt: add an "mm" argument to alloc_pt It's useful to know which mm is allocating a pagetable. Xen uses this to determine whether the pagetable being added to is pinned or not. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	18 years ago
Björn Steinbrink	b9e3614f44	fix nmi_watchdog=2 bootup hang wrmsrl() is broken, dropping the upper 32bits of the value to be written. This broke the NMI watchdog on AMD hardware. (and it probably broke other code too.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Eric W. Biederman	5a18c92aab	Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization" This reverts commit `c9ccf30d77`. Entering the kernel at startup_32 without passing our real mode data in %esi, and without guaranteeing that physical and virtual addresses are identity mapped makes head.S impossible to maintain. The only user of this infrastructure is lguest which is not merged so nothing we currently support will break by removing this over designed nightmare, and only the pending lguest patches will be affected. The pending Xen patches have a different entry point that they use. We are currently discussing what Xen and lguest need to do to boot the kernel in a more normal fashion so using startup_32 in this weird manner is clearly not their long term direction. So let's remove this code in head.S before it causes brain damage to people trying to maintain head.S Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Zachary Amsden <zach@vmware.com> CC: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
H. Peter Anvin	29bd443377	i386: remove unused rdtsc() macro All users to the two-part rdtsc() macro have already switched to using rdtscl() or rdtscll(). Remove the now-obsolete macro. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Jeremy Fitzhardinge	4cdd9c8931	[PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear In shadow mode hypervisors, ptep_get_and_clear achieves the desired purpose of keeping the shadows in sync by issuing a native_get_and_clear, followed by a call to pte_update, which indicates the PTE has been modified. Direct mode hypervisors (Xen) have no need for this anyway, and will trap the update using writable pagetables. This means no hypervisor makes use of ptep_get_and_clear; there is no reason to have it in the paravirt-ops structure. Change confusing terminology about raw vs. native functions into consistent use of native_pte_xxx for operations which do not invoke paravirt-ops. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de>	18 years ago
Jeremy Fitzhardinge	1a45b7aaa5	[PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers Replace all the open-coded macros for generating calls with a pair of more general macros (__PVOP_CALL/VCALL), and redefine all the PVOP_V?CALL[0-4] in terms of them. [ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h" (paravirt_ops-document-asm-i386-paravirth.patch) ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu>	18 years ago
Jeremy Fitzhardinge	4e0fa85602	[PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi Remove #defines, add enum for PARAVIRT_LAZY_FLUSH. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	18 years ago
Jeremy Fitzhardinge	ce6234b529	[PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages Xen and VMI both have special requirements when mapping a highmem pte page into the kernel address space. These can be dealt with by adding a new kmap_atomic_pte() function for mapping highptes, and hooking it into the paravirt_ops infrastructure. Xen specifically wants to map the pte page RO, so this patch exposes a helper function, kmap_atomic_prot, which maps the page with the specified page protections. This also adds a kmap_flush_unused() function to clear out the cached kmap mappings. Xen needs this to clear out any potential stray RW mappings of pages which will become part of a pagetable. [ Zach - vmi.c will need some attention after this patch. It wasn't immediately obvious to me what needs to be done. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>	18 years ago
Jeremy Fitzhardinge	a27fe809b8	[PATCH] i386: PARAVIRT: revert map_pt_hook. Back out the map_pt_hook to clear the way for kmap_atomic_pte. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>	18 years ago
Jeremy Fitzhardinge	d4c104771a	[PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op This patch adds a pv_op for flush_tlb_others. Linux running on native hardware uses cross-CPU IPIs to flush the TLB on any CPU which may have a particular mm's pagetable entries cached in its TLB. This is inefficient in a paravirtualized environment, since the hypervisor knows which real CPUs actually contain cached mappings, which may be a small subset of a guest's VCPUs. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	18 years ago
Jeremy Fitzhardinge	63f70270cc	[PATCH] i386: PARAVIRT: add common patching machinery Implement the actual patching machinery. paravirt_patch_default() contains the logic to automatically patch a callsite based on a few simple rules: - if the paravirt_op function is paravirt_nop, then patch nops - if the paravirt_op function is a jmp target, then jmp to it - if the paravirt_op function is callable and doesn't clobber too much for the callsite, call it directly paravirt_patch_default is suitable as a default implementation of paravirt_ops.patch, will remove most of the expensive indirect calls in favour of either a direct call or a pile of nops. Backends may implement their own patcher, however. There are several helper functions to help with this: paravirt_patch_nop nop out a callsite paravirt_patch_ignore leave the callsite as-is paravirt_patch_call patch a call if the caller and callee have compatible clobbers paravirt_patch_jmp patch in a jmp paravirt_patch_insns patch some literal instructions over the callsite, if they fit This patch also implements more direct patches for the native case, so that when running on native hardware many common operations are implemented inline. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com> Cc: Anthony Liguori <anthony@codemonkey.ws> Acked-by: Ingo Molnar <mingo@elte.hu>	18 years ago
Jeremy Fitzhardinge	294688c028	[PATCH] i386: PARAVIRT: Document asm-i386/paravirt.h Clean things up, and broadly document: - the paravirt_ops functions themselves - the patching mechanism Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au>	18 years ago
Jeremy Fitzhardinge	f8822f4201	[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable Wrap a set of interesting paravirt_ops calls in a wrapper which makes the callsites available for patching. Unfortunately this is pretty ugly because there's no way to get gcc to generate a function call, but also wrap just the callsite itself with the necessary labels. This patch supports functions with 0-4 arguments, and either void or returning a value. 64-bit arguments must be split into a pair of 32-bit arguments (lower word first). Small structures are returned in registers. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com> Cc: Anthony Liguori <anthony@codemonkey.ws>	18 years ago
Jeremy Fitzhardinge	42c24fa22e	[PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register Fix a few clobbers to include the return register. The clobbers set is the set of all registers modified (or may be modified) by the code snippet, regardless of whether it was deliberate or accidental. Also, make sure that callsites which are used in contexts which don't allow clobbers actually save and restore all clobberable registers. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>	18 years ago
Jeremy Fitzhardinge	d582203578	[PATCH] i386: PARAVIRT: Use patch site IDs computed from offset in paravirt_ops structure Use patch type identifiers derived from the offset of the operation in the paravirt_ops structure. This avoids having to maintain a separate enum for patch site types. Also, since the identifier is derived from the offset into paravirt_ops, the offset can be derived from the identifier. This is used to remove replicated information in the various callsite macros, which has been a source of bugs in the past. This patch also drops the fused save_fl+cli operation, which doesn't really add much and makes things more complex - specifically because it breaks the 1:1 relationship between identifiers and offsets. If this operation turns out to be particularly beneficial, then the right answer is to define a new entrypoint for it. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>	18 years ago
Jeremy Fitzhardinge	98de032b68	[PATCH] i386: PARAVIRT: rename struct paravirt_patch to paravirt_patch_site for clarity Rename struct paravirt_patch to paravirt_patch_site, so that it clearly refers to a callsite, and not the patch which may be applied to that callsite. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>	18 years ago
Jeremy Fitzhardinge	d6dd61c831	[PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction Add hooks to allow a paravirt implementation to track the lifetime of an mm. Paravirtualization requires three hooks, but only two are needed in common code. They are: arch_dup_mmap, which is called when a new mmap is created at fork arch_exit_mmap, which is called when the last process reference to an mm is dropped, which typically happens on exit and exec. The third hook is activate_mm, which is called from the arch-specific activate_mm() macro/function, and so doesn't need stub versions for other architectures. It's called when an mm is first used. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: linux-arch@vger.kernel.org Cc: James Bottomley <James.Bottomley@SteelEye.com> Acked-by: Ingo Molnar <mingo@elte.hu>	18 years ago
Jeremy Fitzhardinge	5311ab62cd	[PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing Normally when running in PAE mode, the 4th PMD maps the kernel address space, which can be shared among all processes (since they all need the same kernel mappings). Xen, however, does not allow guests to have the kernel pmd shared between page tables, so parameterize pgtable.c to allow both modes of operation. There are several side-effects of this. One is that vmalloc will update the kernel address space mappings, and those updates need to be propagated into all processes if the kernel mappings are not intrinsically shared. In the non-PAE case, this is done by maintaining a pgd_list of all processes; this list is used when all process pagetables must be updated. pgd_list is threaded via otherwise unused entries in the page structure for the pgd, which means that the pgd must be page-sized for this to work. Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE pgd to page aligned anyway, so this patch forces the pgd to be page aligned+sized when the kernel pmd is unshared, to accomodate both these requirements. Also, since there may be several distinct kernel pmds (if the user/kernel split is below 3G), there's no point in allocating them from a slab cache; they're just allocated with get_free_page and initialized appropriately. (Of course the could be cached if there is just a single kernel pmd - which is the default with a 3G user/kernel split - but it doesn't seem worthwhile to add yet another case into this code). [ Many thanks to wli for review comments. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	18 years ago
Jeremy Fitzhardinge	b239fb2501	[PATCH] i386: PARAVIRT: Hooks to set up initial pagetable This patch introduces paravirt_ops hooks to control how the kernel's initial pagetable is set up. In the case of a native boot, the very early bootstrap code creates a simple non-PAE pagetable to map the kernel and physical memory. When the VM subsystem is initialized, it creates a proper pagetable which respects the PAE mode, large pages, etc. When booting under a hypervisor, there are many possibilities for what paging environment the hypervisor establishes for the guest kernel, so the constructon of the kernel's pagetable depends on the hypervisor. In the case of Xen, the hypervisor boots the kernel with a fully constructed pagetable, which is already using PAE if necessary. Also, Xen requires particular care when constructing pagetables to make sure all pagetables are always mapped read-only. In order to make this easier, kernel's initial pagetable construction has been changed to only allocate and initialize a pagetable page if there's no page already present in the pagetable. This allows the Xen paravirt backend to make a copy of the hypervisor-provided pagetable, allowing the kernel to establish any more mappings it needs while keeping the existing ones. A slightly subtle point which is worth highlighting here is that Xen requires all kernel mappings to share the same pte_t pages between all pagetables, so that updating a kernel page's mapping in one pagetable is reflected in all other pagetables. This makes it possible to allocate a page and attach it to a pagetable without having to explicitly enumerate that page's mapping in all pagetables. And: +From: "Eric W. Biederman" <ebiederm@xmission.com> If we don't set the leaf page table entries it is quite possible that will inherit and incorrect page table entry from the initial boot page table setup in head.S. So we need to redo the effort here, so we pick up PSE, PGE and the like. Hypervisors like Xen require that their page tables be read-only, which is slightly incompatible with our low identity mappings, however I discussed this with Jeremy he has modified the Xen early set_pte function to avoid problems in this area. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>	18 years ago
Jeremy Fitzhardinge	3dc494e86d	[PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries Add a set of accessors to pack, unpack and modify page table entries (at all levels). This allows a paravirt implementation to control the contents of pgd/pmd/pte entries. For example, Xen uses this to convert the (pseudo-)physical address into a machine address when populating a pagetable entry, and converting back to pphys address when an entry is read. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu>	18 years ago
Jeremy Fitzhardinge	4587623360	[PATCH] i386: PARAVIRT: use paravirt_nop to consistently mark no-op operations Add a _paravirt_nop function for use as a stub for no-op operations, and paravirt_nop #defined void * version to make using it easier (since all its uses are as a void ). This is useful to allow the patcher to automatically identify noop operations so it can simply nop out the callsite. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> [mingo] but only as a cleanup of the current open-coded (void ) casts. My problem with this is that it loses the types. Not that there is much to check for, but still, this adds some assumptions about how function calls look like	18 years ago
Rusty Russell	90a0a06aa8	[PATCH] i386: rationalize paravirt wrappers paravirt.c used to implement native versions of all low-level functions. Far cleaner is to have the native versions exposed in the headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply #define XXX native_XXX. There are several nice side effects: 1) write_dt_entry() now takes the correct "struct Xgt_desc_struct " not "void ". 2) load_TLS is reintroduced to the for loop, not manually unrolled with a #error in case the bounds ever change. 3) Macros become inlines, with type checking. 4) Access to the native versions is trivial for KVM, lguest, Xen and others who might want it. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@muc.de> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	18 years ago
Zachary Amsden	49f1971051	[PATCH] Proper fix for highmem kmap_atomic functions for VMI for 2.6.21 Since lazy MMU batching mode still allows interrupts to enter, it is possible for interrupt handlers to try to use kmap_atomic, which fails when lazy mode is active, since the PTE update to highmem will be delayed. The best workaround is to issue an explicit flush in kmap_atomic_functions case; this is the only way nested PTE updates can happen in the interrupt handler. Thanks to Jeremy Fitzhardinge for noting the bug and suggestions on a fix. This patch gets reverted again when we start 2.6.22 and the bug gets fixed differently. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: Andi Kleen <ak@muc.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Al Viro	192cd59bd9	[PATCH] fastcall still doesn't make sense in paravirt Andi had removed a bunch of those, but one more had creeped in... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Zachary Amsden	e30fab3ad3	[PATCH] vmi: pit override The time_init_hook in paravirt-ops no longer functions in the correct manner after the integration of the hrtimers code. The problem is that now the call path for time initialization is: time_init : late_time_init = hpet_time_init; late_time_init -> hpet_time_init: setup_pit_timer (BAD) do_time_init --> (via paravirt.h) time_init_hook --> (via arch_hooks.h) time_init_hook (in SUBARCH/setup.c) If this isn't confusing enough, the paravirt case goes through an indirect function pointer in the paravirt-ops table. The problem is, by the time the paravirt hook is called, the pit timer is already enabled. But paravirt guests have their own timer, and don't want to use the PIT. Rather than intensify the struggle for power going on here, just make it all nice and simple and just unconditionally do all timer setup in the late_time_init hook. This also has the advantage of enabling timers in the same place in all code paths, so everyone has the same bugs and we don't have outliers who break other code because they turn on timer too early or too late. So the paravirt-ops time init function is now by default hpet_time_init, which is the time init function used for native hardware. Paravirt guests have the chance to override this when they setup the paravirt-ops table, and should need no change. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Zachary Amsden	eda08b1bef	[PATCH] vmi: paravirt drop udelay op Not respecting udelay causes problems with any virtual hardware that is passed through to real hardware. This can be noticed by any device that interacts with the real world in real time - like AP startup, which takes real time. Or keyboard LEDs, which should blink in real-time. Or floppy drives, but only when passed through to a real floppy controller on OSes which can't sufficiently buffer the floppy commands to emulate a zero latency floppy. Or IDE drives, when connecting to a physical CDROM. This was mostly a hack to get the kernel to boot faster, but it introduced a number of misvirtualization bugs, and Alan and Pavel argued pretty strongly against it. We were the only client, and now want to clean up this cruft. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Zachary Amsden	9a1c13e91f	[PATCH] vmi: fix highpte Provide a PT map hook for HIGHPTE kernels to designate where they are mapping page tables. This information is required so the physical address of PTE updates can be determined; otherwise, the mm layer would have to carry the physical address all the way to each PTE modification callsite, which is even more hideous that the macros required to provide the proper hooks. So lets not mess up arch neutral code to achieve this, but keep the horror in an #ifdef HIGHPTE in include/asm-i386/pgtable.h. I had to use macros here because some types are not yet defined in all the include paths for this header. This patch is absolutely required for HIGHPTE kernels to operate properly with VMI. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Zachary Amsden	1182d8528b	[PATCH] vmi: cpu cycles fix In order to share the common code in tsc.c which does CPU Khz calibration, we need to make an accurate value of CPU speed available to the tsc.c code. This value loses a lot of precision in a VM because of the timing differences with real hardware, but we need it to be as precise as possible so the guest can make accurate time calculations with the cycle counters. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Zachary Amsden	6cb9a8350a	[PATCH] vmi: sched clock paravirt op fix The custom_sched_clock hook is broken. The result from sched_clock needs to be in nanoseconds, not in CPU cycles. The TSC is insufficient for this purpose, because TSC is poorly defined in a virtual environment, and mostly represents real world time instead of scheduled process time (which can be interrupted without notice when a virtual machine is descheduled). To make the scheduler consistent, we must expose a different nature of time, that is scheduled time. So deprecate this custom_sched_clock hack and turn it into a paravirt-op, as it should have been all along. This allows the tsc.c code which converts cycles to nanoseconds to be shared by all paravirt-ops backends. It is unfortunate to add a new paravirt-op, but this is a very distinct abstraction which is clearly different for all virtual machine implementations, and it gets rid of an ugly indirect function which I ashamedly admit I hacked in to try to get this to work earlier, and then even got in the wrong units. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	18 years ago
Andi Kleen	1a1eecd1c2	[PATCH] i386: Remove fastcall in paravirt.[ch] Not needed because fastcall is always default now Signed-off-by: Andi Kleen <ak@suse.de>	18 years ago
Zachary Amsden	bbab4f3bb7	[PATCH] i386: vMI timer patches VMI timer code. It works by taking over the local APIC clock when APIC is configured, which requires a couple hooks into the APIC code. The backend timer code could be commonized into the timer infrastructure, but there are some pieces missing (stolen time, in particular), and the exact semantics of when to do accounting for NO_IDLE need to be shared between different hypervisors as well. So for now, VMI timer is a separate module. [Adrian Bunk: cleanups] Subject: VMI timer patches Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>	18 years ago

15 Commits (f72a9ef979c5a828c64deb88ebba743f7d899907)