As with other functions, batch the CPU data cache flushes and don't keep
recalculating PTE addresses.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The loop condition was wrong -- we should free a PMD only if its
_entire_ range is within the range we're intending to clear. The
early-termination condition was right, but not the loop.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Instead of calling domain_pfn_mapping() repeatedly with single or
small numbers of pages, just pass the sglist in. It can optimise the
number of cache flushes like domain_pfn_mapping() does, and gives a huge
speedup for large scatterlists.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There's no need for the separate iommu_alloc_iova() function, and
certainly not for it to be global. Remove the underscores while we're at
it.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
As with dma_pte_clear_range(), don't keep flushing a single PTE at a
time. And also micro-optimise the setting of PTE values rather than
using the helper functions to do all the masking.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It's a bit silly to repeatedly call domain_flush_cache() for each PTE
individually, as we clear it. Instead, batch them up and flush a whole
range at a time. We might as well refrain from recalculating the PTE
address from scratch each time round the loop too.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This is fairly broken anyway -- it doesn't take hotplug into account.
We should probably be checking page_is_ram() instead.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Most of its callers are having to shift for themselves anyway, so we might
as well do it in iommu_flush_iotlb_psi().
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
... and use it in the trivial cases; the other callers want individual
(and bisectable) attention, since I screwed them up the first time...
Make the BUG_ON() happen on too-large virtual address rather than
physical address, too. That's the one we care about.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use unaligned address for domain->max_addr. That algorithm isn't ideal
anyway -- we should probably just look at the last iova in the tree.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
With some cleanup of intel_unmap_page(), intel_unmap_sg() and
vm_domain_exit() to no longer play with 64-bit addresses.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Add some helpers for converting between VT-d and normal system pfns,
since system pages can be larger than VT-d pages.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There's no need for the GFX workaround now we have 'iommu=pt' for the
cases where people really care about performance. There's no need to
have a special case for just one type of device.
This also speeds up the iommu=pt path and reduces memory usage by
setting up the si_domain _once_ and then using it for all devices,
rather than giving each device its own private page tables.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In caching mode, domain ID 0 is reserved for non-present to present
mapping flush. Device IOTLB doesn't need to be flushed in this case.
Previously we were avoiding the flush for domain zero, even if the IOMMU
wasn't in caching mode and domain zero wasn't special.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Drop the e820 scanning and use existing function for finding valid
RAM regions to add to 1:1 mapping.
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
To support domain-isolation usages, the platform hardware must be
capable of uniquely identifying the requestor (source-id) for each
interrupt message. Without source-id checking for interrupt remapping
, a rouge guest/VM with assigned devices can launch interrupt attacks
to bring down anothe guest/VM or the VMM itself.
This patch adds source-id checking for interrupt remapping, and then
really isolates interrupts for guests/VMs with assigned devices.
Because PCI subsystem is not initialized yet when set up IOAPIC
entries, use read_pci_config_byte to access PCI config space directly.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Interrupt remapping table entry is 128bits. Currently, it only sets low
64bits of irte in modify_irte and free_irte. This ignores high 64bits
setting of irte, that means source-id setting will be ignored. This patch
sets the whole 128bits of irte when modify/free it. Following source-id
checking patch depends on this.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Identity mapping for IOMMU defines a single domain to 1:1 map all PCI
devices to all usable memory.
This reduces map/unmap overhead in DMA API's and improve IOMMU
performance. On 10Gb network cards, Netperf shows no performance
degradation compared to non-IOMMU performance.
This method may lose some of DMA remapping benefits like isolation.
The patch sets up identity mapping for all PCI devices to all usable
memory. In the DMA API, there is no overhead to maintain page tables,
invalidate iotlb, flush cache etc.
32 bit DMA devices don't use identity mapping domain, in order to access
memory beyond 4GiB.
When kernel option iommu=pt, pass through is first tried. If pass
through succeeds, IOMMU goes to pass through. If pass through is not
supported in hw or fail for whatever reason, IOMMU goes to identity
mapping.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use msix_mask_irq() instead of direct use of writel, so as not to clear
preserved bits in the Vector Control register [31:1].
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The previous MSI-X fix (8d18101853) had
three bugs. First, it didn't move the write that disabled the vector.
This led to writing garbage to the MSI-X vector (spotted by Michael
Ellerman). It didn't fix the PCI resume case, and it had a race window
where the device could generate an interrupt before the MSI-X registers
were programmed (leading to a DMA to random addresses).
Fortunately, the MSI-X capability has a bit to mask all the vectors.
By setting this bit instead of clearing the enable bit, we can ensure
the device will not generate spurious interrupts. Since the capability
is now enabled, the NIU device will not have a problem with the reads
and writes to the MSI-X registers being in the original order in the code.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
By having a pointer to the root port link, we can remove loops in
get_root_port_link() to search the root port link.
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We don't need the 'has_switch' field in the struct pcie_link_state.
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cleanup for calc_L0S_latency() and calc_L1_latency().
- Separate exit latency and acceptable latency calculation.
- Some minor cleanups.
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In the current ASPM implementation, callers of pcie_set_clock_pm() check
Clock PM capability of the link or current Clock PM state of the link.
This check should be done in pcie_set_clock_pm() itself.
This patch moves those checks into pcie_set_clock_pm(). It also
introduces pcie_set_clkpm_nocheck() that is equivalent to old
pcie_set_clock_pm(), for the caller who wants to change Clocl PM state
regardless of the Clock PM capability or current Clock PM state. In
addition, this patch changes the function name from
pcie_set_clock_pm() to pcie_set_clkpm() for consistency.
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Clean up ASPM initialization by refactoring some functionality, renaming
functions, and moving things around.
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In the current ASPM implementation, there are many functions that
take a pointer to struct pci_dev corresponding to the upstream component
of the link as a parameter. But, since those functions handle PCI
express link state, a pointer to struct pcie_link_state is more
suitable than a pointer to struct pci_dev. Changing a parameter to a
pointer to struct pcie_link_state makes ASPM code much simpler and
easier to read. This patch also contains some minor cleanups. This patch
doesn't have any functional change.
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cleanup for some fields in pcie_link_state.
- Add comments.
- make "downstream_has_switch" field 1-bit.
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The "clk_pm_capable", "clk_pm_enable" and "bios_clk_state" fields in
the struct pcie_link_state only take 1-bit value. So those fields
don't need to be defined as unsigned int. This patch makes those
fields 1-bit, and cleans up some related code.
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Clean up latency related data structures for ASPM.
- Introduce struct acpi_latency for exit latency and acceptable
latency management. With this change, struct endpoint_state is no
longer needed.
- We don't need to hold both upstream latency and downstream latency
in the current implementation.
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>