This patch provides a shim between the kernel-internal cleancache
API (see Documentation/mm/cleancache.txt) and the Xen Transcendent
Memory ABI (see http://oss.oracle.com/projects/tmem).
Xen tmem provides "hypervisor RAM" as an ephemeral page-oriented
pseudo-RAM store for cleancache pages, shared cleancache pages,
and frontswap pages. Tmem provides enterprise-quality concurrency,
full save/restore and live migration support, compression
and deduplication.
A presentation showing up to 8% faster performance and up to 52%
reduction in sectors read on a kernel compile workload, despite
aggressive in-kernel page reclamation ("self-ballooning") can be
found at:
http://oss.oracle.com/projects/tmem/dist/documentation/presentations/TranscendentMemoryXenSummit2010.pdf
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
We only supported the M2P (and P2M) override only for the
GNTMAP_contains_pte type mappings. Meaning that we grants
operations would "contain the machine address of the PTE to update"
If the flag is unset, then the grant operation is
"contains a host virtual address". The latter case means that
the Hypervisor takes care of updating our page table
(specifically the PTE entry) with the guest's MFN. As such we should
not try to do anything with the PTE. Previous to this patch
we would try to clear the PTE which resulted in Xen hypervisor
being upset with us:
(XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
(XEN) domain_crash called from mm.c:1067
(XEN) Domain 0 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.0-110228 x86_64 debug=y Not tainted ]----
and crashing us.
This patch allows us to inhibit the PTE clearing in the PV guest
if the GNTMAP_contains_pte is not set.
On the m2p_remove_override path we provide the same parameter.
Sadly in the grant-table driver we do not have a mechanism to
tell m2p_remove_override whether to clear the PTE or not. Since
the grant-table driver is used by user-space, we can safely assume
that it operates only on PTE's. Hence the implementation for
it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
we can implement the support for this. It will require some extra
accounting structure to keep track of the page[i], and the flag.
[v1: Added documentation details, made it return -EOPNOTSUPP instead
of trying to do a half-way implementation]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Various merges over time have led to rather a mish-mash of indentation.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel@lists.xensource.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Since suspend, resume and shutdown operations in struct sysdev_class
and struct sysdev_driver are not used any more, remove them. Also
drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
for executing those operations and modify all of their users
accordingly. This reduces kernel code size quite a bit and reduces
its complexity.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Change the irq handler of evtchns and pirqs that don't need EOI (pirqs
that correspond to physical edge interrupts) to handle_edge_irq.
Use handle_fasteoi_irq for pirqs that need eoi (they generally
correspond to level triggered irqs), no risk in loosing interrupts
because we have to EOI the irq anyway.
This change has the following benefits:
- it uses the very same handlers that Linux would use on native for the
same irqs (handle_edge_irq for edge irqs and msis, and
handle_fasteoi_irq for everything else);
- it uses these handlers in the same way native code would use them: it
let Linux mask\unmask and ack the irq when Linux want to mask\unmask
and ack the irq;
- it fixes a problem occurring when a driver calls disable_irq() in its
handler: the old code was unconditionally unmasking the evtchn even if
the irq is disabled when irq_eoi was called.
See Documentation/DocBook/genericirq.tmpl for more informations.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Fixed space/tab issues]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Device suspend/resume infrastructure is used not only by the suspend
and hibernate code in kernel/power, but also by APM, Xen and the
kexec jump feature. However, commit 40dc166cb5
(PM / Core: Introduce struct syscore_ops for core subsystems PM)
failed to add syscore_suspend() and syscore_resume() calls to that
code, which generally leads to breakage when the features in question
are used.
To fix this problem, add the missing syscore_suspend() and
syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c
and drivers/xen/manage.c.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
They were used to check if the queue does not have QUEUE_FLAG_DEAD
set. That is not necessary anymore as the 'submit_io' call
ends up doing that for us.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
After the commit 0faa8cca88
(" xen/blkback: remove per-queue plugging") we forgot
to retrieve the 'struct request_queue' from the block device.
This puts the functionality back in and fixes a NULL pointer
bug.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The commit 976222e05e
xen/blkback: Move the check for misaligned I/O higher.
moved it a bit to high. The preq->vbdev was not set, so the
check for misaligned I/O would cause a NULL pointer derefence.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We only supported the M2P (and P2M) override only for the
GNTMAP_contains_pte type mappings. Meaning that we grants
operations would "contain the machine address of the PTE to update"
If the flag is unset, then the grant operation is
"contains a host virtual address". The latter case means that
the Hypervisor takes care of updating our page table
(specifically the PTE entry) with the guest's MFN. As such we should
not try to do anything with the PTE. Previous to this patch
we would try to clear the PTE which resulted in Xen hypervisor
being upset with us:
(XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
(XEN) domain_crash called from mm.c:1067
(XEN) Domain 0 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.0-110228 x86_64 debug=y Not tainted ]----
and crashing us.
This patch allows us to inhibit the PTE clearing in the PV guest
if the GNTMAP_contains_pte is not set.
On the m2p_remove_override path we provide the same parameter.
Sadly in the grant-table driver we do not have a mechanism to
tell m2p_remove_override whether to clear the PTE or not. Since
the grant-table driver is used by user-space, we can safely assume
that it operates only on PTE's. Hence the implementation for
it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
we can implement the support for this. It will require some extra
accounting structure to keep track of the page[i], and the flag.
[v1: Added documentation details, made it return -EOPNOTSUPP instead
of trying to do a half-way implementation]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The previous name ('fast_flush_area') had nothing to do with what it
does right now. Changing the names so that the code dealing with
mapping pages in and out of the guest is called xen_blkbk_[map|unmap].
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We move it up higher to be in same loop that actually computes
the sector number.
This way, all of the code that deals with verifying that the
request is correct is all done before we do any of the
page mapping, I/O submission, etc.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We take out the chunk of code dealing with mapping to the guest
of pages into the xen_blk_map_buf code. And we also move the
vbd_translate to be done much earlier.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Moving it so that the code that 'fast_flush_area' code is
close to the code that deals with it so that the reader
won't lose focus.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We seperate the bio allocation (bio_alloc) from the bio submission so
that the error paths are much easier, and also so that the bio
submission can be done in one tight loop. It also makes the
plug/unplug calls much much easier.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
commit 7eaceaccab ("block: remove per-queue plugging")
added two new interfaces to plug and unplug: blk_start_plug
and blk_finish_plug. Lets use those.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+ sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring_area->addr;
WARNING: line over 80 characters
+ BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE);
as breaking them up really does not help that much.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
checkpatch.pl suggested that we don't use the typdef in common.h
and this triggered this avalanche of patches.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The WRITE_BARRIER was missing the REQ_WRITE option. This
was causing the blktap to die.
Signed-off-by: Tom Goetz <tom.goetz@virtualcomputer.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The patch titled:"xen/blkback: Use 'vzalloc' for page arrays and pre-allocate pages."
allocates the structures and its member variables using the 'vzalloc'.
Daniel Stodden pointed out that vzalloc is good when we use
big number of pages - while these are at the max two pages.
We can do this using kzalloc. Also the GFP_HIGHMEM does not
work properly with Xen, so take that out.
We will have to revisit this when a "get_empty_pages_and_pagevec"
type API shows up to leverage that.
BugLink: http://mid.gmane.org/1299898639.11681.227.camel@agari.van.xensource.com
CC: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Instead of doing copy grants lets do mapping grants using
the M2P(and P2M) override mechanism.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
drivers/xen/blkback/blkback.c
Previously we would allocate the array for page using 'kmalloc' which
we can as easily do with 'vzalloc'. The pre-allocation of pages
was done a bit differently in the past - it used to be that
the balloon driver would export "alloc_empty_pages_and_pagevec"
which would have in one function created an array, allocated
the pages, balloned the pages out (so the memory behind those
pages would be non-present), and provide us those pages.
This was OK as those pages were shared between other guest and
the only thing we needed was to "swizzel" the MFN of those pages
to point to the other guest MFN. We can still "swizzel" the MFNs
using the M2P (and P2M) override API calls, but for the sake of
simplicity we are dropping the balloon API calls. We can return
to those later on.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Following in the steps of patch:
"xen: Union the blkif_request request specific fields" this patch
changes the blkback. Per the original patch:
"Prepare for extending the block device ring to allow request
specific fields, by moving the request specific fields for
reads, writes and barrier requests to a union member."
Cc: Owen Smith <owen.smith@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bundle the lot of discrete variables into a single structure.
This is based on what was done in the xen-netback driver:
xen: netback: Move global/static variables into struct xen_netbk.
(094944631cc5a9d6e623302c987f78117c0bf7ac)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cherry-pick and modified from 69d64727c42eecd47fdf82c15a54474d21a4012a
("blkback/blktap2: simplify address translations"):
"There are quite a number of places where e.g. page->va->page
translations happen.
Besides yielding smaller code (source and binary), a second goal is to
make it easier to determine where virtual addresses of pages allocated
through alloc_empty_pages_and_pagevec() are really used (in turn in
order to determine whether using highmem pages would be possible
there)."
The second goal is not the purpose of this patch - it is just to
make it easier to read the code.
linux-2.6-pvops:
* Stripped drivers/xen/gntdev/*
* Stripped drivers/xen/netback/*
[v2: Stripped blktap off]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
A guest can cause the backend driver to leak a kernel thread. Such
leaked threads hold references to the device, whichmakes the device
impossible to tear down. If shut down, the guest remains a zombie
domain, the xenwatch process hangs, and most xm commands will stop
working.
This patch tries to do the following for blkback:
- identify/extract idempotent teardown operations,
- add/move the invocation of said teardown operation
right before we're about to allocate new resources in the
Connected states.
[ linux-2.6.18-xen.hg 59f097ef181b ]
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
First cut at flushing blkback data when first connecting
blkback. This should avoid the pygrub issues we are experiencing
in (RedHat bugzilla) 466681.
[ 2.6.18-xen.hg commit 63b4d7f56688 ]
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Support dynamic resizing of virtual block devices. This patch supports
both file backed block devices as well as physical devices that can be
dynamically resized on the host side.
Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
blkbk is rather generic for a modular distro style kernel.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
We neglect to check the return value of xenbus_register_backend
and take actions when that fails. This patch fixes that and adds
code to deal with those type of failures.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
drivers/xen/Makefile
And if the other domain forgot to clean up its PIRQs we don't need
to fail the operation. Just take a note of it and continue on.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We need this to find the real Xen PIRQ value for a device
that requests an MSI or MSI-X. In the past we used
'xen_gsi_from_irq' since that function would return
an Xen PIRQ or GSI depending on the provided IRQ. Now that
we have seperated that we need to use the correct
function.
[v2: Deal with rebase on stable/irq.cleanup]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>