Pull x86 mm changes from Ingo Molnar: "PCID support, 5-level paging support, Secure Memory Encryption support The main changes in this cycle are support for three new, complex hardware features of x86 CPUs: - Add 5-level paging support, which is a new hardware feature on upcoming Intel CPUs allowing up to 128 PB of virtual address space and 4 PB of physical RAM space - a 512-fold increase over the old limits. (Supercomputers of the future forecasting hurricanes on an ever warming planet can certainly make good use of more RAM.) Many of the necessary changes went upstream in previous cycles, v4.14 is the first kernel that can enable 5-level paging. This feature is activated via CONFIG_X86_5LEVEL=y - disabled by default. (By Kirill A. Shutemov) - Add 'encrypted memory' support, which is a new hardware feature on upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system RAM to be encrypted and decrypted (mostly) transparently by the CPU, with a little help from the kernel to transition to/from encrypted RAM. Such RAM should be more secure against various attacks like RAM access via the memory bus and should make the radio signature of memory bus traffic harder to intercept (and decrypt) as well. This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled by default. (By Tom Lendacky) - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a hardware feature that attaches an address space tag to TLB entries and thus allows to skip TLB flushing in many cases, even if we switch mm's. (By Andy Lutomirski) All three of these features were in the works for a long time, and it's coincidence of the three independent development paths that they are all enabled in v4.14 at once" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits) x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y) x86/mm: Use pr_cont() in dump_pagetable() x86/mm: Fix SME encryption stack ptr handling kvm/x86: Avoid clearing the C-bit in rsvd_bits() x86/CPU: Align CR3 defines x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages acpi, x86/mm: Remove encryption mask from ACPI page protection type x86/mm, kexec: Fix memory corruption with SME on successive kexecs x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y x86/mm: Allow userspace have mappings above 47-bit x86/mm: Prepare to expose larger address space to userspace x86/mpx: Do not allow MPX if we have mappings above 47-bit x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit() x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD x86/mm/dump_pagetables: Fix printout of p4d level x86/mm/dump_pagetables: Generalize address normalization x86/boot: Fix memremap() related build failure ...tirimbino
commit
b1b6f83ac9
@ -0,0 +1,68 @@ |
||||
Secure Memory Encryption (SME) is a feature found on AMD processors. |
||||
|
||||
SME provides the ability to mark individual pages of memory as encrypted using |
||||
the standard x86 page tables. A page that is marked encrypted will be |
||||
automatically decrypted when read from DRAM and encrypted when written to |
||||
DRAM. SME can therefore be used to protect the contents of DRAM from physical |
||||
attacks on the system. |
||||
|
||||
A page is encrypted when a page table entry has the encryption bit set (see |
||||
below on how to determine its position). The encryption bit can also be |
||||
specified in the cr3 register, allowing the PGD table to be encrypted. Each |
||||
successive level of page tables can also be encrypted by setting the encryption |
||||
bit in the page table entry that points to the next table. This allows the full |
||||
page table hierarchy to be encrypted. Note, this means that just because the |
||||
encryption bit is set in cr3, doesn't imply the full hierarchy is encyrpted. |
||||
Each page table entry in the hierarchy needs to have the encryption bit set to |
||||
achieve that. So, theoretically, you could have the encryption bit set in cr3 |
||||
so that the PGD is encrypted, but not set the encryption bit in the PGD entry |
||||
for a PUD which results in the PUD pointed to by that entry to not be |
||||
encrypted. |
||||
|
||||
Support for SME can be determined through the CPUID instruction. The CPUID |
||||
function 0x8000001f reports information related to SME: |
||||
|
||||
0x8000001f[eax]: |
||||
Bit[0] indicates support for SME |
||||
0x8000001f[ebx]: |
||||
Bits[5:0] pagetable bit number used to activate memory |
||||
encryption |
||||
Bits[11:6] reduction in physical address space, in bits, when |
||||
memory encryption is enabled (this only affects |
||||
system physical addresses, not guest physical |
||||
addresses) |
||||
|
||||
If support for SME is present, MSR 0xc00100010 (MSR_K8_SYSCFG) can be used to |
||||
determine if SME is enabled and/or to enable memory encryption: |
||||
|
||||
0xc0010010: |
||||
Bit[23] 0 = memory encryption features are disabled |
||||
1 = memory encryption features are enabled |
||||
|
||||
Linux relies on BIOS to set this bit if BIOS has determined that the reduction |
||||
in the physical address space as a result of enabling memory encryption (see |
||||
CPUID information above) will not conflict with the address space resource |
||||
requirements for the system. If this bit is not set upon Linux startup then |
||||
Linux itself will not set it and memory encryption will not be possible. |
||||
|
||||
The state of SME in the Linux kernel can be documented as follows: |
||||
- Supported: |
||||
The CPU supports SME (determined through CPUID instruction). |
||||
|
||||
- Enabled: |
||||
Supported and bit 23 of MSR_K8_SYSCFG is set. |
||||
|
||||
- Active: |
||||
Supported, Enabled and the Linux kernel is actively applying |
||||
the encryption bit to page table entries (the SME mask in the |
||||
kernel is non-zero). |
||||
|
||||
SME can also be enabled and activated in the BIOS. If SME is enabled and |
||||
activated in the BIOS, then all memory accesses will be encrypted and it will |
||||
not be necessary to activate the Linux memory encryption support. If the BIOS |
||||
merely enables SME (sets bit 23 of the MSR_K8_SYSCFG), then Linux can activate |
||||
memory encryption by default (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y) or |
||||
by supplying mem_encrypt=on on the kernel command line. However, if BIOS does |
||||
not enable SME, then Linux will not be able to activate memory encryption, even |
||||
if configured to do so by default or the mem_encrypt=on command line parameter |
||||
is specified. |
@ -0,0 +1,64 @@ |
||||
== Overview == |
||||
|
||||
Original x86-64 was limited by 4-level paing to 256 TiB of virtual address |
||||
space and 64 TiB of physical address space. We are already bumping into |
||||
this limit: some vendors offers servers with 64 TiB of memory today. |
||||
|
||||
To overcome the limitation upcoming hardware will introduce support for |
||||
5-level paging. It is a straight-forward extension of the current page |
||||
table structure adding one more layer of translation. |
||||
|
||||
It bumps the limits to 128 PiB of virtual address space and 4 PiB of |
||||
physical address space. This "ought to be enough for anybody" ©. |
||||
|
||||
QEMU 2.9 and later support 5-level paging. |
||||
|
||||
Virtual memory layout for 5-level paging is described in |
||||
Documentation/x86/x86_64/mm.txt |
||||
|
||||
== Enabling 5-level paging == |
||||
|
||||
CONFIG_X86_5LEVEL=y enables the feature. |
||||
|
||||
So far, a kernel compiled with the option enabled will be able to boot |
||||
only on machines that supports the feature -- see for 'la57' flag in |
||||
/proc/cpuinfo. |
||||
|
||||
The plan is to implement boot-time switching between 4- and 5-level paging |
||||
in the future. |
||||
|
||||
== User-space and large virtual address space == |
||||
|
||||
On x86, 5-level paging enables 56-bit userspace virtual address space. |
||||
Not all user space is ready to handle wide addresses. It's known that |
||||
at least some JIT compilers use higher bits in pointers to encode their |
||||
information. It collides with valid pointers with 5-level paging and |
||||
leads to crashes. |
||||
|
||||
To mitigate this, we are not going to allocate virtual address space |
||||
above 47-bit by default. |
||||
|
||||
But userspace can ask for allocation from full address space by |
||||
specifying hint address (with or without MAP_FIXED) above 47-bits. |
||||
|
||||
If hint address set above 47-bit, but MAP_FIXED is not specified, we try |
||||
to look for unmapped area by specified address. If it's already |
||||
occupied, we look for unmapped area in *full* address space, rather than |
||||
from 47-bit window. |
||||
|
||||
A high hint address would only affect the allocation in question, but not |
||||
any future mmap()s. |
||||
|
||||
Specifying high hint address on older kernel or on machine without 5-level |
||||
paging support is safe. The hint will be ignored and kernel will fall back |
||||
to allocation from 47-bit address space. |
||||
|
||||
This approach helps to easily make application's memory allocator aware |
||||
about large address space without manually tracking allocated virtual |
||||
address space. |
||||
|
||||
One important case we need to handle here is interaction with MPX. |
||||
MPX (without MAWA extension) cannot handle addresses above 47-bit, so we |
||||
need to make sure that MPX cannot be enabled we already have VMA above |
||||
the boundary and forbid creating such VMAs once MPX is enabled. |
||||
|
@ -0,0 +1,80 @@ |
||||
/*
|
||||
* AMD Memory Encryption Support |
||||
* |
||||
* Copyright (C) 2016 Advanced Micro Devices, Inc. |
||||
* |
||||
* Author: Tom Lendacky <thomas.lendacky@amd.com> |
||||
* |
||||
* This program is free software; you can redistribute it and/or modify |
||||
* it under the terms of the GNU General Public License version 2 as |
||||
* published by the Free Software Foundation. |
||||
*/ |
||||
|
||||
#ifndef __X86_MEM_ENCRYPT_H__ |
||||
#define __X86_MEM_ENCRYPT_H__ |
||||
|
||||
#ifndef __ASSEMBLY__ |
||||
|
||||
#include <linux/init.h> |
||||
|
||||
#include <asm/bootparam.h> |
||||
|
||||
#ifdef CONFIG_AMD_MEM_ENCRYPT |
||||
|
||||
extern unsigned long sme_me_mask; |
||||
|
||||
void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr, |
||||
unsigned long decrypted_kernel_vaddr, |
||||
unsigned long kernel_len, |
||||
unsigned long encryption_wa, |
||||
unsigned long encryption_pgd); |
||||
|
||||
void __init sme_early_encrypt(resource_size_t paddr, |
||||
unsigned long size); |
||||
void __init sme_early_decrypt(resource_size_t paddr, |
||||
unsigned long size); |
||||
|
||||
void __init sme_map_bootdata(char *real_mode_data); |
||||
void __init sme_unmap_bootdata(char *real_mode_data); |
||||
|
||||
void __init sme_early_init(void); |
||||
|
||||
void __init sme_encrypt_kernel(void); |
||||
void __init sme_enable(struct boot_params *bp); |
||||
|
||||
/* Architecture __weak replacement functions */ |
||||
void __init mem_encrypt_init(void); |
||||
|
||||
void swiotlb_set_mem_attributes(void *vaddr, unsigned long size); |
||||
|
||||
#else /* !CONFIG_AMD_MEM_ENCRYPT */ |
||||
|
||||
#define sme_me_mask 0UL |
||||
|
||||
static inline void __init sme_early_encrypt(resource_size_t paddr, |
||||
unsigned long size) { } |
||||
static inline void __init sme_early_decrypt(resource_size_t paddr, |
||||
unsigned long size) { } |
||||
|
||||
static inline void __init sme_map_bootdata(char *real_mode_data) { } |
||||
static inline void __init sme_unmap_bootdata(char *real_mode_data) { } |
||||
|
||||
static inline void __init sme_early_init(void) { } |
||||
|
||||
static inline void __init sme_encrypt_kernel(void) { } |
||||
static inline void __init sme_enable(struct boot_params *bp) { } |
||||
|
||||
#endif /* CONFIG_AMD_MEM_ENCRYPT */ |
||||
|
||||
/*
|
||||
* The __sme_pa() and __sme_pa_nodebug() macros are meant for use when |
||||
* writing to or comparing values from the cr3 register. Having the |
||||
* encryption mask set in cr3 enables the PGD entry to be encrypted and |
||||
* avoid special case handling of PGD allocations. |
||||
*/ |
||||
#define __sme_pa(x) (__pa(x) | sme_me_mask) |
||||
#define __sme_pa_nodebug(x) (__pa_nodebug(x) | sme_me_mask) |
||||
|
||||
#endif /* __ASSEMBLY__ */ |
||||
|
||||
#endif /* __X86_MEM_ENCRYPT_H__ */ |
@ -0,0 +1,593 @@ |
||||
/*
|
||||
* AMD Memory Encryption Support |
||||
* |
||||
* Copyright (C) 2016 Advanced Micro Devices, Inc. |
||||
* |
||||
* Author: Tom Lendacky <thomas.lendacky@amd.com> |
||||
* |
||||
* This program is free software; you can redistribute it and/or modify |
||||
* it under the terms of the GNU General Public License version 2 as |
||||
* published by the Free Software Foundation. |
||||
*/ |
||||
|
||||
#include <linux/linkage.h> |
||||
#include <linux/init.h> |
||||
#include <linux/mm.h> |
||||
#include <linux/dma-mapping.h> |
||||
#include <linux/swiotlb.h> |
||||
#include <linux/mem_encrypt.h> |
||||
|
||||
#include <asm/tlbflush.h> |
||||
#include <asm/fixmap.h> |
||||
#include <asm/setup.h> |
||||
#include <asm/bootparam.h> |
||||
#include <asm/set_memory.h> |
||||
#include <asm/cacheflush.h> |
||||
#include <asm/sections.h> |
||||
#include <asm/processor-flags.h> |
||||
#include <asm/msr.h> |
||||
#include <asm/cmdline.h> |
||||
|
||||
static char sme_cmdline_arg[] __initdata = "mem_encrypt"; |
||||
static char sme_cmdline_on[] __initdata = "on"; |
||||
static char sme_cmdline_off[] __initdata = "off"; |
||||
|
||||
/*
|
||||
* Since SME related variables are set early in the boot process they must |
||||
* reside in the .data section so as not to be zeroed out when the .bss |
||||
* section is later cleared. |
||||
*/ |
||||
unsigned long sme_me_mask __section(.data) = 0; |
||||
EXPORT_SYMBOL_GPL(sme_me_mask); |
||||
|
||||
/* Buffer used for early in-place encryption by BSP, no locking needed */ |
||||
static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE); |
||||
|
||||
/*
|
||||
* This routine does not change the underlying encryption setting of the |
||||
* page(s) that map this memory. It assumes that eventually the memory is |
||||
* meant to be accessed as either encrypted or decrypted but the contents |
||||
* are currently not in the desired state. |
||||
* |
||||
* This routine follows the steps outlined in the AMD64 Architecture |
||||
* Programmer's Manual Volume 2, Section 7.10.8 Encrypt-in-Place. |
||||
*/ |
||||
static void __init __sme_early_enc_dec(resource_size_t paddr, |
||||
unsigned long size, bool enc) |
||||
{ |
||||
void *src, *dst; |
||||
size_t len; |
||||
|
||||
if (!sme_me_mask) |
||||
return; |
||||
|
||||
local_flush_tlb(); |
||||
wbinvd(); |
||||
|
||||
/*
|
||||
* There are limited number of early mapping slots, so map (at most) |
||||
* one page at time. |
||||
*/ |
||||
while (size) { |
||||
len = min_t(size_t, sizeof(sme_early_buffer), size); |
||||
|
||||
/*
|
||||
* Create mappings for the current and desired format of |
||||
* the memory. Use a write-protected mapping for the source. |
||||
*/ |
||||
src = enc ? early_memremap_decrypted_wp(paddr, len) : |
||||
early_memremap_encrypted_wp(paddr, len); |
||||
|
||||
dst = enc ? early_memremap_encrypted(paddr, len) : |
||||
early_memremap_decrypted(paddr, len); |
||||
|
||||
/*
|
||||
* If a mapping can't be obtained to perform the operation, |
||||
* then eventual access of that area in the desired mode |
||||
* will cause a crash. |
||||
*/ |
||||
BUG_ON(!src || !dst); |
||||
|
||||
/*
|
||||
* Use a temporary buffer, of cache-line multiple size, to |
||||
* avoid data corruption as documented in the APM. |
||||
*/ |
||||
memcpy(sme_early_buffer, src, len); |
||||
memcpy(dst, sme_early_buffer, len); |
||||
|
||||
early_memunmap(dst, len); |
||||
early_memunmap(src, len); |
||||
|
||||
paddr += len; |
||||
size -= len; |
||||
} |
||||
} |
||||
|
||||
void __init sme_early_encrypt(resource_size_t paddr, unsigned long size) |
||||
{ |
||||
__sme_early_enc_dec(paddr, size, true); |
||||
} |
||||
|
||||
void __init sme_early_decrypt(resource_size_t paddr, unsigned long size) |
||||
{ |
||||
__sme_early_enc_dec(paddr, size, false); |
||||
} |
||||
|
||||
static void __init __sme_early_map_unmap_mem(void *vaddr, unsigned long size, |
||||
bool map) |
||||
{ |
||||
unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET; |
||||
pmdval_t pmd_flags, pmd; |
||||
|
||||
/* Use early_pmd_flags but remove the encryption mask */ |
||||
pmd_flags = __sme_clr(early_pmd_flags); |
||||
|
||||
do { |
||||
pmd = map ? (paddr & PMD_MASK) + pmd_flags : 0; |
||||
__early_make_pgtable((unsigned long)vaddr, pmd); |
||||
|
||||
vaddr += PMD_SIZE; |
||||
paddr += PMD_SIZE; |
||||
size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE; |
||||
} while (size); |
||||
|
||||
__native_flush_tlb(); |
||||
} |
||||
|
||||
void __init sme_unmap_bootdata(char *real_mode_data) |
||||
{ |
||||
struct boot_params *boot_data; |
||||
unsigned long cmdline_paddr; |
||||
|
||||
if (!sme_active()) |
||||
return; |
||||
|
||||
/* Get the command line address before unmapping the real_mode_data */ |
||||
boot_data = (struct boot_params *)real_mode_data; |
||||
cmdline_paddr = boot_data->hdr.cmd_line_ptr | ((u64)boot_data->ext_cmd_line_ptr << 32); |
||||
|
||||
__sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), false); |
||||
|
||||
if (!cmdline_paddr) |
||||
return; |
||||
|
||||
__sme_early_map_unmap_mem(__va(cmdline_paddr), COMMAND_LINE_SIZE, false); |
||||
} |
||||
|
||||
void __init sme_map_bootdata(char *real_mode_data) |
||||
{ |
||||
struct boot_params *boot_data; |
||||
unsigned long cmdline_paddr; |
||||
|
||||
if (!sme_active()) |
||||
return; |
||||
|
||||
__sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), true); |
||||
|
||||
/* Get the command line address after mapping the real_mode_data */ |
||||
boot_data = (struct boot_params *)real_mode_data; |
||||
cmdline_paddr = boot_data->hdr.cmd_line_ptr | ((u64)boot_data->ext_cmd_line_ptr << 32); |
||||
|
||||
if (!cmdline_paddr) |
||||
return; |
||||
|
||||
__sme_early_map_unmap_mem(__va(cmdline_paddr), COMMAND_LINE_SIZE, true); |
||||
} |
||||
|
||||
void __init sme_early_init(void) |
||||
{ |
||||
unsigned int i; |
||||
|
||||
if (!sme_me_mask) |
||||
return; |
||||
|
||||
early_pmd_flags = __sme_set(early_pmd_flags); |
||||
|
||||
__supported_pte_mask = __sme_set(__supported_pte_mask); |
||||
|
||||
/* Update the protection map with memory encryption mask */ |
||||
for (i = 0; i < ARRAY_SIZE(protection_map); i++) |
||||
protection_map[i] = pgprot_encrypted(protection_map[i]); |
||||
} |
||||
|
||||
/* Architecture __weak replacement functions */ |
||||
void __init mem_encrypt_init(void) |
||||
{ |
||||
if (!sme_me_mask) |
||||
return; |
||||
|
||||
/* Call into SWIOTLB to update the SWIOTLB DMA buffers */ |
||||
swiotlb_update_mem_attributes(); |
||||
|
||||
pr_info("AMD Secure Memory Encryption (SME) active\n"); |
||||
} |
||||
|
||||
void swiotlb_set_mem_attributes(void *vaddr, unsigned long size) |
||||
{ |
||||
WARN(PAGE_ALIGN(size) != size, |
||||
"size is not page-aligned (%#lx)\n", size); |
||||
|
||||
/* Make the SWIOTLB buffer area decrypted */ |
||||
set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT); |
||||
} |
||||
|
||||
static void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start, |
||||
unsigned long end) |
||||
{ |
||||
unsigned long pgd_start, pgd_end, pgd_size; |
||||
pgd_t *pgd_p; |
||||
|
||||
pgd_start = start & PGDIR_MASK; |
||||
pgd_end = end & PGDIR_MASK; |
||||
|
||||
pgd_size = (((pgd_end - pgd_start) / PGDIR_SIZE) + 1); |
||||
pgd_size *= sizeof(pgd_t); |
||||
|
||||
pgd_p = pgd_base + pgd_index(start); |
||||
|
||||
memset(pgd_p, 0, pgd_size); |
||||
} |
||||
|
||||
#define PGD_FLAGS _KERNPG_TABLE_NOENC |
||||
#define P4D_FLAGS _KERNPG_TABLE_NOENC |
||||
#define PUD_FLAGS _KERNPG_TABLE_NOENC |
||||
#define PMD_FLAGS (__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL) |
||||
|
||||
static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area, |
||||
unsigned long vaddr, pmdval_t pmd_val) |
||||
{ |
||||
pgd_t *pgd_p; |
||||
p4d_t *p4d_p; |
||||
pud_t *pud_p; |
||||
pmd_t *pmd_p; |
||||
|
||||
pgd_p = pgd_base + pgd_index(vaddr); |
||||
if (native_pgd_val(*pgd_p)) { |
||||
if (IS_ENABLED(CONFIG_X86_5LEVEL)) |
||||
p4d_p = (p4d_t *)(native_pgd_val(*pgd_p) & ~PTE_FLAGS_MASK); |
||||
else |
||||
pud_p = (pud_t *)(native_pgd_val(*pgd_p) & ~PTE_FLAGS_MASK); |
||||
} else { |
||||
pgd_t pgd; |
||||
|
||||
if (IS_ENABLED(CONFIG_X86_5LEVEL)) { |
||||
p4d_p = pgtable_area; |
||||
memset(p4d_p, 0, sizeof(*p4d_p) * PTRS_PER_P4D); |
||||
pgtable_area += sizeof(*p4d_p) * PTRS_PER_P4D; |
||||
|
||||
pgd = native_make_pgd((pgdval_t)p4d_p + PGD_FLAGS); |
||||
} else { |
||||
pud_p = pgtable_area; |
||||
memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD); |
||||
pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD; |
||||
|
||||
pgd = native_make_pgd((pgdval_t)pud_p + PGD_FLAGS); |
||||
} |
||||
native_set_pgd(pgd_p, pgd); |
||||
} |
||||
|
||||
if (IS_ENABLED(CONFIG_X86_5LEVEL)) { |
||||
p4d_p += p4d_index(vaddr); |
||||
if (native_p4d_val(*p4d_p)) { |
||||
pud_p = (pud_t *)(native_p4d_val(*p4d_p) & ~PTE_FLAGS_MASK); |
||||
} else { |
||||
p4d_t p4d; |
||||
|
||||
pud_p = pgtable_area; |
||||
memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD); |
||||
pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD; |
||||
|
||||
p4d = native_make_p4d((pudval_t)pud_p + P4D_FLAGS); |
||||
native_set_p4d(p4d_p, p4d); |
||||
} |
||||
} |
||||
|
||||
pud_p += pud_index(vaddr); |
||||
if (native_pud_val(*pud_p)) { |
||||
if (native_pud_val(*pud_p) & _PAGE_PSE) |
||||
goto out; |
||||
|
||||
pmd_p = (pmd_t *)(native_pud_val(*pud_p) & ~PTE_FLAGS_MASK); |
||||
} else { |
||||
pud_t pud; |
||||
|
||||
pmd_p = pgtable_area; |
||||
memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD); |
||||
pgtable_area += sizeof(*pmd_p) * PTRS_PER_PMD; |
||||
|
||||
pud = native_make_pud((pmdval_t)pmd_p + PUD_FLAGS); |
||||
native_set_pud(pud_p, pud); |
||||
} |
||||
|
||||
pmd_p += pmd_index(vaddr); |
||||
if (!native_pmd_val(*pmd_p) || !(native_pmd_val(*pmd_p) & _PAGE_PSE)) |
||||
native_set_pmd(pmd_p, native_make_pmd(pmd_val)); |
||||
|
||||
out: |
||||
return pgtable_area; |
||||
} |
||||
|
||||
static unsigned long __init sme_pgtable_calc(unsigned long len) |
||||
{ |
||||
unsigned long p4d_size, pud_size, pmd_size; |
||||
unsigned long total; |
||||
|
||||
/*
|
||||
* Perform a relatively simplistic calculation of the pagetable |
||||
* entries that are needed. That mappings will be covered by 2MB |
||||
* PMD entries so we can conservatively calculate the required |
||||
* number of P4D, PUD and PMD structures needed to perform the |
||||
* mappings. Incrementing the count for each covers the case where |
||||
* the addresses cross entries. |
||||
*/ |
||||
if (IS_ENABLED(CONFIG_X86_5LEVEL)) { |
||||
p4d_size = (ALIGN(len, PGDIR_SIZE) / PGDIR_SIZE) + 1; |
||||
p4d_size *= sizeof(p4d_t) * PTRS_PER_P4D; |
||||
pud_size = (ALIGN(len, P4D_SIZE) / P4D_SIZE) + 1; |
||||
pud_size *= sizeof(pud_t) * PTRS_PER_PUD; |
||||
} else { |
||||
p4d_size = 0; |
||||
pud_size = (ALIGN(len, PGDIR_SIZE) / PGDIR_SIZE) + 1; |
||||
pud_size *= sizeof(pud_t) * PTRS_PER_PUD; |
||||
} |
||||
pmd_size = (ALIGN(len, PUD_SIZE) / PUD_SIZE) + 1; |
||||
pmd_size *= sizeof(pmd_t) * PTRS_PER_PMD; |
||||
|
||||
total = p4d_size + pud_size + pmd_size; |
||||
|
||||
/*
|
||||
* Now calculate the added pagetable structures needed to populate |
||||
* the new pagetables. |
||||
*/ |
||||
if (IS_ENABLED(CONFIG_X86_5LEVEL)) { |
||||
p4d_size = ALIGN(total, PGDIR_SIZE) / PGDIR_SIZE; |
||||
p4d_size *= sizeof(p4d_t) * PTRS_PER_P4D; |
||||
pud_size = ALIGN(total, P4D_SIZE) / P4D_SIZE; |
||||
pud_size *= sizeof(pud_t) * PTRS_PER_PUD; |
||||
} else { |
||||
p4d_size = 0; |
||||
pud_size = ALIGN(total, PGDIR_SIZE) / PGDIR_SIZE; |
||||
pud_size *= sizeof(pud_t) * PTRS_PER_PUD; |
||||
} |
||||
pmd_size = ALIGN(total, PUD_SIZE) / PUD_SIZE; |
||||
pmd_size *= sizeof(pmd_t) * PTRS_PER_PMD; |
||||
|
||||
total += p4d_size + pud_size + pmd_size; |
||||
|
||||
return total; |
||||
} |
||||
|
||||
void __init sme_encrypt_kernel(void) |
||||
{ |
||||
unsigned long workarea_start, workarea_end, workarea_len; |
||||
unsigned long execute_start, execute_end, execute_len; |
||||
unsigned long kernel_start, kernel_end, kernel_len; |
||||
unsigned long pgtable_area_len; |
||||
unsigned long paddr, pmd_flags; |
||||
unsigned long decrypted_base; |
||||
void *pgtable_area; |
||||
pgd_t *pgd; |
||||
|
||||
if (!sme_active()) |
||||
return; |
||||
|
||||
/*
|
||||
* Prepare for encrypting the kernel by building new pagetables with |
||||
* the necessary attributes needed to encrypt the kernel in place. |
||||
* |
||||
* One range of virtual addresses will map the memory occupied |
||||
* by the kernel as encrypted. |
||||
* |
||||
* Another range of virtual addresses will map the memory occupied |
||||
* by the kernel as decrypted and write-protected. |
||||
* |
||||
* The use of write-protect attribute will prevent any of the |
||||
* memory from being cached. |
||||
*/ |
||||
|
||||
/* Physical addresses gives us the identity mapped virtual addresses */ |
||||
kernel_start = __pa_symbol(_text); |
||||
kernel_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE); |
||||
kernel_len = kernel_end - kernel_start; |
||||
|
||||
/* Set the encryption workarea to be immediately after the kernel */ |
||||
workarea_start = kernel_end; |
||||
|
||||
/*
|
||||
* Calculate required number of workarea bytes needed: |
||||
* executable encryption area size: |
||||
* stack page (PAGE_SIZE) |
||||
* encryption routine page (PAGE_SIZE) |
||||
* intermediate copy buffer (PMD_PAGE_SIZE) |
||||
* pagetable structures for the encryption of the kernel |
||||
* pagetable structures for workarea (in case not currently mapped) |
||||
*/ |
||||
execute_start = workarea_start; |
||||
execute_end = execute_start + (PAGE_SIZE * 2) + PMD_PAGE_SIZE; |
||||
execute_len = execute_end - execute_start; |
||||
|
||||
/*
|
||||
* One PGD for both encrypted and decrypted mappings and a set of |
||||
* PUDs and PMDs for each of the encrypted and decrypted mappings. |
||||
*/ |
||||
pgtable_area_len = sizeof(pgd_t) * PTRS_PER_PGD; |
||||
pgtable_area_len += sme_pgtable_calc(execute_end - kernel_start) * 2; |
||||
|
||||
/* PUDs and PMDs needed in the current pagetables for the workarea */ |
||||
pgtable_area_len += sme_pgtable_calc(execute_len + pgtable_area_len); |
||||
|
||||
/*
|
||||
* The total workarea includes the executable encryption area and |
||||
* the pagetable area. |
||||
*/ |
||||
workarea_len = execute_len + pgtable_area_len; |
||||
workarea_end = workarea_start + workarea_len; |
||||
|
||||
/*
|
||||
* Set the address to the start of where newly created pagetable |
||||
* structures (PGDs, PUDs and PMDs) will be allocated. New pagetable |
||||
* structures are created when the workarea is added to the current |
||||
* pagetables and when the new encrypted and decrypted kernel |
||||
* mappings are populated. |
||||
*/ |
||||
pgtable_area = (void *)execute_end; |
||||
|
||||
/*
|
||||
* Make sure the current pagetable structure has entries for |
||||
* addressing the workarea. |
||||
*/ |
||||
pgd = (pgd_t *)native_read_cr3_pa(); |
||||
paddr = workarea_start; |
||||
while (paddr < workarea_end) { |
||||
pgtable_area = sme_populate_pgd(pgd, pgtable_area, |
||||
paddr, |
||||
paddr + PMD_FLAGS); |
||||
|
||||
paddr += PMD_PAGE_SIZE; |
||||
} |
||||
|
||||
/* Flush the TLB - no globals so cr3 is enough */ |
||||
native_write_cr3(__native_read_cr3()); |
||||
|
||||
/*
|
||||
* A new pagetable structure is being built to allow for the kernel |
||||
* to be encrypted. It starts with an empty PGD that will then be |
||||
* populated with new PUDs and PMDs as the encrypted and decrypted |
||||
* kernel mappings are created. |
||||
*/ |
||||
pgd = pgtable_area; |
||||
memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD); |
||||
pgtable_area += sizeof(*pgd) * PTRS_PER_PGD; |
||||
|
||||
/* Add encrypted kernel (identity) mappings */ |
||||
pmd_flags = PMD_FLAGS | _PAGE_ENC; |
||||
paddr = kernel_start; |
||||
while (paddr < kernel_end) { |
||||
pgtable_area = sme_populate_pgd(pgd, pgtable_area, |
||||
paddr, |
||||
paddr + pmd_flags); |
||||
|
||||
paddr += PMD_PAGE_SIZE; |
||||
} |
||||
|
||||
/*
|
||||
* A different PGD index/entry must be used to get different |
||||
* pagetable entries for the decrypted mapping. Choose the next |
||||
* PGD index and convert it to a virtual address to be used as |
||||
* the base of the mapping. |
||||
*/ |
||||
decrypted_base = (pgd_index(workarea_end) + 1) & (PTRS_PER_PGD - 1); |
||||
decrypted_base <<= PGDIR_SHIFT; |
||||
|
||||
/* Add decrypted, write-protected kernel (non-identity) mappings */ |
||||
pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT); |
||||
paddr = kernel_start; |
||||
while (paddr < kernel_end) { |
||||
pgtable_area = sme_populate_pgd(pgd, pgtable_area, |
||||
paddr + decrypted_base, |
||||
paddr + pmd_flags); |
||||
|
||||
paddr += PMD_PAGE_SIZE; |
||||
} |
||||
|
||||
/* Add decrypted workarea mappings to both kernel mappings */ |
||||
paddr = workarea_start; |
||||
while (paddr < workarea_end) { |
||||
pgtable_area = sme_populate_pgd(pgd, pgtable_area, |
||||
paddr, |
||||
paddr + PMD_FLAGS); |
||||
|
||||
pgtable_area = sme_populate_pgd(pgd, pgtable_area, |
||||
paddr + decrypted_base, |
||||
paddr + PMD_FLAGS); |
||||
|
||||
paddr += PMD_PAGE_SIZE; |
||||
} |
||||
|
||||
/* Perform the encryption */ |
||||
sme_encrypt_execute(kernel_start, kernel_start + decrypted_base, |
||||
kernel_len, workarea_start, (unsigned long)pgd); |
||||
|
||||
/*
|
||||
* At this point we are running encrypted. Remove the mappings for |
||||
* the decrypted areas - all that is needed for this is to remove |
||||
* the PGD entry/entries. |
||||
*/ |
||||
sme_clear_pgd(pgd, kernel_start + decrypted_base, |
||||
kernel_end + decrypted_base); |
||||
|
||||
sme_clear_pgd(pgd, workarea_start + decrypted_base, |
||||
workarea_end + decrypted_base); |
||||
|
||||
/* Flush the TLB - no globals so cr3 is enough */ |
||||
native_write_cr3(__native_read_cr3()); |
||||
} |
||||
|
||||
void __init __nostackprotector sme_enable(struct boot_params *bp) |
||||
{ |
||||
const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off; |
||||
unsigned int eax, ebx, ecx, edx; |
||||
bool active_by_default; |
||||
unsigned long me_mask; |
||||
char buffer[16]; |
||||
u64 msr; |
||||
|
||||
/* Check for the SME support leaf */ |
||||
eax = 0x80000000; |
||||
ecx = 0; |
||||
native_cpuid(&eax, &ebx, &ecx, &edx); |
||||
if (eax < 0x8000001f) |
||||
return; |
||||
|
||||
/*
|
||||
* Check for the SME feature: |
||||
* CPUID Fn8000_001F[EAX] - Bit 0 |
||||
* Secure Memory Encryption support |
||||
* CPUID Fn8000_001F[EBX] - Bits 5:0 |
||||
* Pagetable bit position used to indicate encryption |
||||
*/ |
||||
eax = 0x8000001f; |
||||
ecx = 0; |
||||
native_cpuid(&eax, &ebx, &ecx, &edx); |
||||
if (!(eax & 1)) |
||||
return; |
||||
|
||||
me_mask = 1UL << (ebx & 0x3f); |
||||
|
||||
/* Check if SME is enabled */ |
||||
msr = __rdmsr(MSR_K8_SYSCFG); |
||||
if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT)) |
||||
return; |
||||
|
||||
/*
|
||||
* Fixups have not been applied to phys_base yet and we're running |
||||
* identity mapped, so we must obtain the address to the SME command |
||||
* line argument data using rip-relative addressing. |
||||
*/ |
||||
asm ("lea sme_cmdline_arg(%%rip), %0" |
||||
: "=r" (cmdline_arg) |
||||
: "p" (sme_cmdline_arg)); |
||||
asm ("lea sme_cmdline_on(%%rip), %0" |
||||
: "=r" (cmdline_on) |
||||
: "p" (sme_cmdline_on)); |
||||
asm ("lea sme_cmdline_off(%%rip), %0" |
||||
: "=r" (cmdline_off) |
||||
: "p" (sme_cmdline_off)); |
||||
|
||||
if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT)) |
||||
active_by_default = true; |
||||
else |
||||
active_by_default = false; |
||||
|
||||
cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr | |
||||
((u64)bp->ext_cmd_line_ptr << 32)); |
||||
|
||||
cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)); |
||||
|
||||
if (!strncmp(buffer, cmdline_on, sizeof(buffer))) |
||||
sme_me_mask = me_mask; |
||||
else if (!strncmp(buffer, cmdline_off, sizeof(buffer))) |
||||
sme_me_mask = 0; |
||||
else |
||||
sme_me_mask = active_by_default ? me_mask : 0; |
||||
} |
@ -0,0 +1,149 @@ |
||||
/* |
||||
* AMD Memory Encryption Support |
||||
* |
||||
* Copyright (C) 2016 Advanced Micro Devices, Inc. |
||||
* |
||||
* Author: Tom Lendacky <thomas.lendacky@amd.com>
|
||||
* |
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License version 2 as |
||||
* published by the Free Software Foundation. |
||||
*/ |
||||
|
||||
#include <linux/linkage.h> |
||||
#include <asm/pgtable.h> |
||||
#include <asm/page.h> |
||||
#include <asm/processor-flags.h> |
||||
#include <asm/msr-index.h> |
||||
|
||||
.text |
||||
.code64 |
||||
ENTRY(sme_encrypt_execute) |
||||
|
||||
/* |
||||
* Entry parameters: |
||||
* RDI - virtual address for the encrypted kernel mapping |
||||
* RSI - virtual address for the decrypted kernel mapping |
||||
* RDX - length of kernel |
||||
* RCX - virtual address of the encryption workarea, including: |
||||
* - stack page (PAGE_SIZE) |
||||
* - encryption routine page (PAGE_SIZE) |
||||
* - intermediate copy buffer (PMD_PAGE_SIZE) |
||||
* R8 - physcial address of the pagetables to use for encryption |
||||
*/ |
||||
|
||||
push %rbp |
||||
movq %rsp, %rbp /* RBP now has original stack pointer */ |
||||
|
||||
/* Set up a one page stack in the non-encrypted memory area */ |
||||
movq %rcx, %rax /* Workarea stack page */ |
||||
leaq PAGE_SIZE(%rax), %rsp /* Set new stack pointer */ |
||||
addq $PAGE_SIZE, %rax /* Workarea encryption routine */ |
||||
|
||||
push %r12 |
||||
movq %rdi, %r10 /* Encrypted kernel */ |
||||
movq %rsi, %r11 /* Decrypted kernel */ |
||||
movq %rdx, %r12 /* Kernel length */ |
||||
|
||||
/* Copy encryption routine into the workarea */ |
||||
movq %rax, %rdi /* Workarea encryption routine */ |
||||
leaq __enc_copy(%rip), %rsi /* Encryption routine */ |
||||
movq $(.L__enc_copy_end - __enc_copy), %rcx /* Encryption routine length */ |
||||
rep movsb |
||||
|
||||
/* Setup registers for call */ |
||||
movq %r10, %rdi /* Encrypted kernel */ |
||||
movq %r11, %rsi /* Decrypted kernel */ |
||||
movq %r8, %rdx /* Pagetables used for encryption */ |
||||
movq %r12, %rcx /* Kernel length */ |
||||
movq %rax, %r8 /* Workarea encryption routine */ |
||||
addq $PAGE_SIZE, %r8 /* Workarea intermediate copy buffer */ |
||||
|
||||
call *%rax /* Call the encryption routine */ |
||||
|
||||
pop %r12 |
||||
|
||||
movq %rbp, %rsp /* Restore original stack pointer */ |
||||
pop %rbp |
||||
|
||||
ret |
||||
ENDPROC(sme_encrypt_execute) |
||||
|
||||
ENTRY(__enc_copy) |
||||
/* |
||||
* Routine used to encrypt kernel. |
||||
* This routine must be run outside of the kernel proper since |
||||
* the kernel will be encrypted during the process. So this |
||||
* routine is defined here and then copied to an area outside |
||||
* of the kernel where it will remain and run decrypted |
||||
* during execution. |
||||
* |
||||
* On entry the registers must be: |
||||
* RDI - virtual address for the encrypted kernel mapping |
||||
* RSI - virtual address for the decrypted kernel mapping |
||||
* RDX - address of the pagetables to use for encryption |
||||
* RCX - length of kernel |
||||
* R8 - intermediate copy buffer |
||||
* |
||||
* RAX - points to this routine |
||||
* |
||||
* The kernel will be encrypted by copying from the non-encrypted |
||||
* kernel space to an intermediate buffer and then copying from the |
||||
* intermediate buffer back to the encrypted kernel space. The physical |
||||
* addresses of the two kernel space mappings are the same which |
||||
* results in the kernel being encrypted "in place". |
||||
*/ |
||||
/* Enable the new page tables */ |
||||
mov %rdx, %cr3 |
||||
|
||||
/* Flush any global TLBs */ |
||||
mov %cr4, %rdx |
||||
andq $~X86_CR4_PGE, %rdx |
||||
mov %rdx, %cr4 |
||||
orq $X86_CR4_PGE, %rdx |
||||
mov %rdx, %cr4 |
||||
|
||||
/* Set the PAT register PA5 entry to write-protect */ |
||||
push %rcx |
||||
movl $MSR_IA32_CR_PAT, %ecx |
||||
rdmsr |
||||
push %rdx /* Save original PAT value */ |
||||
andl $0xffff00ff, %edx /* Clear PA5 */ |
||||
orl $0x00000500, %edx /* Set PA5 to WP */ |
||||
wrmsr |
||||
pop %rdx /* RDX contains original PAT value */ |
||||
pop %rcx |
||||
|
||||
movq %rcx, %r9 /* Save kernel length */ |
||||
movq %rdi, %r10 /* Save encrypted kernel address */ |
||||
movq %rsi, %r11 /* Save decrypted kernel address */ |
||||
|
||||
wbinvd /* Invalidate any cache entries */ |
||||
|
||||
/* Copy/encrypt 2MB at a time */ |
||||
1: |
||||
movq %r11, %rsi /* Source - decrypted kernel */ |
||||
movq %r8, %rdi /* Dest - intermediate copy buffer */ |
||||
movq $PMD_PAGE_SIZE, %rcx /* 2MB length */ |
||||
rep movsb |
||||
|
||||
movq %r8, %rsi /* Source - intermediate copy buffer */ |
||||
movq %r10, %rdi /* Dest - encrypted kernel */ |
||||
movq $PMD_PAGE_SIZE, %rcx /* 2MB length */ |
||||
rep movsb |
||||
|
||||
addq $PMD_PAGE_SIZE, %r11 |
||||
addq $PMD_PAGE_SIZE, %r10 |
||||
subq $PMD_PAGE_SIZE, %r9 /* Kernel length decrement */ |
||||
jnz 1b /* Kernel length not zero? */ |
||||
|
||||
/* Restore PAT register */ |
||||
push %rdx /* Save original PAT value */ |
||||
movl $MSR_IA32_CR_PAT, %ecx |
||||
rdmsr |
||||
pop %rdx /* Restore original PAT value */ |
||||
wrmsr |
||||
|
||||
ret |
||||
.L__enc_copy_end: |
||||
ENDPROC(__enc_copy) |
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue