diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 381ab9fed3e6..a6fe7368b26c 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -86,6 +86,13 @@ Description: The unit size is one block, now only support configuring in range of [1, 512]. +What: /sys/fs/f2fs//umount_discard_timeout +Date: January 2019 +Contact: "Jaegeuk Kim" +Description: + Set timeout to issue discard commands during umount. + Default: 5 secs + What: /sys/fs/f2fs//max_victim_search Date: January 2014 Contact: "Jaegeuk Kim" diff --git a/Documentation/device-mapper/dm-bow.txt b/Documentation/device-mapper/dm-bow.txt new file mode 100644 index 000000000000..e3fc4d22e0f4 --- /dev/null +++ b/Documentation/device-mapper/dm-bow.txt @@ -0,0 +1,99 @@ +dm_bow (backup on write) +======================== + +dm_bow is a device mapper driver that uses the free space on a device to back up +data that is overwritten. The changes can then be committed by a simple state +change, or rolled back by removing the dm_bow device and running a command line +utility over the underlying device. + +dm_bow has three states, set by writing ‘1’ or ‘2’ to /sys/block/dm-?/bow/state. +It is only possible to go from state 0 (initial state) to state 1, and then from +state 1 to state 2. + +State 0: dm_bow collects all trims to the device and assumes that these mark +free space on the overlying file system that can be safely used. Typically the +mount code would create the dm_bow device, mount the file system, call the +FITRIM ioctl on the file system then switch to state 1. These trims are not +propagated to the underlying device. + +State 1: All writes to the device cause the underlying data to be backed up to +the free (trimmed) area as needed in such a way as they can be restored. +However, the writes, with one exception, then happen exactly as they would +without dm_bow, so the device is always in a good final state. The exception is +that sector 0 is used to keep a log of the latest changes, both to indicate that +we are in this state and to allow rollback. See below for all details. If there +isn't enough free space, writes are failed with -ENOSPC. + +State 2: The transition to state 2 triggers replacing the special sector 0 with +the normal sector 0, and the freeing of all state information. dm_bow then +becomes a pass-through driver, allowing the device to continue to be used with +minimal performance impact. + +Usage +===== +dm-bow takes one command line parameter, the name of the underlying device. + +dm-bow will typically be used in the following way. dm-bow will be loaded with a +suitable underlying device and the resultant device will be mounted. A file +system trim will be issued via the FITRIM ioctl, then the device will be +switched to state 1. The file system will now be used as normal. At some point, +the changes can either be committed by switching to state 2, or rolled back by +unmounting the file system, removing the dm-bow device and running the command +line utility. Note that rebooting the device will be equivalent to unmounting +and removing, but the command line utility must still be run + +Details of operation in state 1 +=============================== + +dm_bow maintains a type for all sectors. A sector can be any of: + +SECTOR0 +SECTOR0_CURRENT +UNCHANGED +FREE +CHANGED +BACKUP + +SECTOR0 is the first sector on the device, and is used to hold the log of +changes. This is the one exception. + +SECTOR0_CURRENT is a sector picked from the FREE sectors, and is where reads and +writes from the true sector zero are redirected to. Note that like any backup +sector, if the sector is written to directly, it must be moved again. + +UNCHANGED means that the sector has not been changed since we entered state 1. +Thus if it is written to or trimmed, the contents must first be backed up. + +FREE means that the sector was trimmed in state 0 and has not yet been written +to or used for backup. On being written to, a FREE sector is changed to CHANGED. + +CHANGED means that the sector has been modified, and can be further modified +without further backup. + +BACKUP means that this is a free sector being used as a backup. On being written +to, the contents must first be backed up again. + +All backup operations are logged to the first sector. The log sector has the +format: +-------------------------------------------------------- +| Magic | Count | Sequence | Log entry | Log entry | … +-------------------------------------------------------- + +Magic is a magic number. Count is the number of log entries. Sequence is 0 +initially. A log entry is + +----------------------------------- +| Source | Dest | Size | Checksum | +----------------------------------- + +When SECTOR0 is full, the log sector is backed up and another empty log sector +created with sequence number one higher. The first entry in any log entry with +sequence > 0 therefore must be the log of the backing up of the previous log +sector. Note that sequence is not strictly needed, but is a useful sanity check +and potentially limits the time spent trying to restore a corrupted snapshot. + +On entering state 1, dm_bow has a list of free sectors. All other sectors are +unchanged. Sector0_current is selected from the free sectors and the contents of +sector 0 are copied there. The sector 0 is backed up, which triggers the first +log entry to be written. + diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index fedbea893ee7..35c777b6e69b 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -126,6 +126,8 @@ disable_ext_identify Disable the extension list configured by mkfs, so f2fs does not aware of cold files such as media files. inline_xattr Enable the inline xattrs feature. noinline_xattr Disable the inline xattrs feature. +inline_xattr_size=%u Support configuring inline xattr size, it depends on + flexible inline xattr feature. inline_data Enable the inline data feature: New created small(<~3.4k) files can be written into inode block. inline_dentry Enable the inline dir feature: data in new created diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index d1aecf53badb..bbfeeb0813d3 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -5,7 +5,6 @@ How to get printk format specifiers right :Author: Randy Dunlap :Author: Andrew Murray - Integer types ============= @@ -45,6 +44,18 @@ return from vsnprintf. Raw pointer value SHOULD be printed with %p. The kernel supports the following extended format specifiers for pointer types: +Pointer Types +============= + +Pointers printed without a specifier extension (i.e unadorned %p) are +hashed to give a unique identifier without leaking kernel addresses to user +space. On 64 bit machines the first 32 bits are zeroed. If you _really_ +want the address see %px below. + +:: + + %p abcdef12 or 00000000abcdef12 + Symbols/Function Pointers ========================= @@ -85,18 +96,32 @@ Examples:: printk("Faulted at %pS\n", (void *)regs->ip); printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack); - Kernel Pointers =============== :: - %pK 0x01234567 or 0x0123456789abcdef + %pK 01234567 or 0123456789abcdef For printing kernel pointers which should be hidden from unprivileged users. The behaviour of ``%pK`` depends on the ``kptr_restrict sysctl`` - see Documentation/sysctl/kernel.txt for more details. +Unmodified Addresses +==================== + +:: + + %px 01234567 or 0123456789abcdef + +For printing pointers when you _really_ want to print the address. Please +consider whether or not you are leaking sensitive information about the +Kernel layout in memory before printing pointers with %px. %px is +functionally equivalent to %lx. %px is preferred to %lx because it is more +uniquely grep'able. If, in the future, we need to modify the way the Kernel +handles printing pointers it will be nice to be able to find the call +sites. + Struct Resources ================ diff --git a/Makefile b/Makefile index 9f7ef29640e0..9d70e5f502c6 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 108 +SUBLEVEL = 109 EXTRAVERSION = NAME = Petit Gorille diff --git a/arch/arm/configs/vendor/qcs405-perf_defconfig b/arch/arm/configs/vendor/qcs405-perf_defconfig index 3e9287287b86..e96de87f0bf4 100644 --- a/arch/arm/configs/vendor/qcs405-perf_defconfig +++ b/arch/arm/configs/vendor/qcs405-perf_defconfig @@ -159,7 +159,6 @@ CONFIG_KEYBOARD_GPIO=y # CONFIG_INPUT_MOUSE is not set CONFIG_INPUT_MISC=y CONFIG_INPUT_QPNP_POWER_ON=y -CONFIG_INPUT_KEYCHORD=y CONFIG_INPUT_UINPUT=y CONFIG_INPUT_GPIO=y # CONFIG_LEGACY_PTYS is not set diff --git a/arch/arm/configs/vendor/qcs405_defconfig b/arch/arm/configs/vendor/qcs405_defconfig index 73ffd70e25fe..f8fcf714bd37 100644 --- a/arch/arm/configs/vendor/qcs405_defconfig +++ b/arch/arm/configs/vendor/qcs405_defconfig @@ -272,7 +272,6 @@ CONFIG_TOUCHSCREEN_ATMEL_MXT=y CONFIG_INPUT_MISC=y CONFIG_INPUT_HBTP_INPUT=y CONFIG_INPUT_QPNP_POWER_ON=y -CONFIG_INPUT_KEYCHORD=y CONFIG_INPUT_UINPUT=y CONFIG_INPUT_GPIO=y # CONFIG_LEGACY_PTYS is not set diff --git a/arch/arm64/configs/cuttlefish_defconfig b/arch/arm64/configs/cuttlefish_defconfig index e6c3dcad412e..856f1f4991ed 100644 --- a/arch/arm64/configs/cuttlefish_defconfig +++ b/arch/arm64/configs/cuttlefish_defconfig @@ -217,6 +217,7 @@ CONFIG_DM_UEVENT=y CONFIG_DM_VERITY=y CONFIG_DM_VERITY_FEC=y CONFIG_DM_VERITY_AVB=y +CONFIG_DM_BOW=y CONFIG_NETDEVICES=y CONFIG_NETCONSOLE=y CONFIG_NETCONSOLE_DYNAMIC=y diff --git a/arch/arm64/configs/vendor/qcs405-perf_defconfig b/arch/arm64/configs/vendor/qcs405-perf_defconfig index cae1a485badd..09fa869ec194 100644 --- a/arch/arm64/configs/vendor/qcs405-perf_defconfig +++ b/arch/arm64/configs/vendor/qcs405-perf_defconfig @@ -272,7 +272,6 @@ CONFIG_TOUCHSCREEN_ATMEL_MXT=y CONFIG_INPUT_MISC=y CONFIG_INPUT_HBTP_INPUT=y CONFIG_INPUT_QPNP_POWER_ON=y -CONFIG_INPUT_KEYCHORD=y CONFIG_INPUT_UINPUT=y CONFIG_INPUT_GPIO=y # CONFIG_LEGACY_PTYS is not set diff --git a/arch/arm64/configs/vendor/qcs405_defconfig b/arch/arm64/configs/vendor/qcs405_defconfig index e1bf94de302a..b6ae861a73d9 100644 --- a/arch/arm64/configs/vendor/qcs405_defconfig +++ b/arch/arm64/configs/vendor/qcs405_defconfig @@ -277,7 +277,6 @@ CONFIG_TOUCHSCREEN_ATMEL_MXT=y CONFIG_INPUT_MISC=y CONFIG_INPUT_HBTP_INPUT=y CONFIG_INPUT_QPNP_POWER_ON=y -CONFIG_INPUT_KEYCHORD=y CONFIG_INPUT_UINPUT=y CONFIG_INPUT_GPIO=y # CONFIG_LEGACY_PTYS is not set diff --git a/arch/mips/include/asm/jump_label.h b/arch/mips/include/asm/jump_label.h index e77672539e8e..e4456e450f94 100644 --- a/arch/mips/include/asm/jump_label.h +++ b/arch/mips/include/asm/jump_label.h @@ -21,15 +21,15 @@ #endif #ifdef CONFIG_CPU_MICROMIPS -#define NOP_INSN "nop32" +#define B_INSN "b32" #else -#define NOP_INSN "nop" +#define B_INSN "b" #endif static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto("1:\t" NOP_INSN "\n\t" - "nop\n\t" + asm_volatile_goto("1:\t" B_INSN " 2f\n\t" + "2:\tnop\n\t" ".pushsection __jump_table, \"aw\"\n\t" WORD_INSN " 1b, %l[l_yes], %0\n\t" ".popsection\n\t" diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S index 971a504001c2..36f2e860ba3e 100644 --- a/arch/mips/kernel/vmlinux.lds.S +++ b/arch/mips/kernel/vmlinux.lds.S @@ -140,6 +140,13 @@ SECTIONS PERCPU_SECTION(1 << CONFIG_MIPS_L1_CACHE_SHIFT) #endif +#ifdef CONFIG_MIPS_ELF_APPENDED_DTB + .appended_dtb : AT(ADDR(.appended_dtb) - LOAD_OFFSET) { + *(.appended_dtb) + KEEP(*(.appended_dtb)) + } +#endif + #ifdef CONFIG_RELOCATABLE . = ALIGN(4); @@ -164,11 +171,6 @@ SECTIONS __appended_dtb = .; /* leave space for appended DTB */ . += 0x100000; -#elif defined(CONFIG_MIPS_ELF_APPENDED_DTB) - .appended_dtb : AT(ADDR(.appended_dtb) - LOAD_OFFSET) { - *(.appended_dtb) - KEEP(*(.appended_dtb)) - } #endif /* * Align to 64K in attempt to eliminate holes before the diff --git a/arch/mips/loongson64/lemote-2f/irq.c b/arch/mips/loongson64/lemote-2f/irq.c index 9e33e45aa17c..b213cecb8e3a 100644 --- a/arch/mips/loongson64/lemote-2f/irq.c +++ b/arch/mips/loongson64/lemote-2f/irq.c @@ -103,7 +103,7 @@ static struct irqaction ip6_irqaction = { static struct irqaction cascade_irqaction = { .handler = no_action, .name = "cascade", - .flags = IRQF_NO_THREAD, + .flags = IRQF_NO_THREAD | IRQF_NO_SUSPEND, }; void __init mach_init_irq(void) diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c index be3136f142a9..a8103a84b4ac 100644 --- a/arch/sparc/mm/fault_32.c +++ b/arch/sparc/mm/fault_32.c @@ -113,7 +113,7 @@ show_signal_msg(struct pt_regs *regs, int sig, int code, if (!printk_ratelimit()) return; - printk("%s%s[%d]: segfault at %lx ip %p (rpc %p) sp %p error %x", + printk("%s%s[%d]: segfault at %lx ip %px (rpc %px) sp %px error %x", task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG, tsk->comm, task_pid_nr(tsk), address, (void *)regs->pc, (void *)regs->u_regs[UREG_I7], diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c index 815c03d7a765..41363f46797b 100644 --- a/arch/sparc/mm/fault_64.c +++ b/arch/sparc/mm/fault_64.c @@ -154,7 +154,7 @@ show_signal_msg(struct pt_regs *regs, int sig, int code, if (!printk_ratelimit()) return; - printk("%s%s[%d]: segfault at %lx ip %p (rpc %p) sp %p error %x", + printk("%s%s[%d]: segfault at %lx ip %px (rpc %px) sp %px error %x", task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG, tsk->comm, task_pid_nr(tsk), address, (void *)regs->tpc, (void *)regs->u_regs[UREG_I7], diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index 4e6fcb32620f..428644175956 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -150,7 +150,7 @@ static void show_segv_info(struct uml_pt_regs *regs) if (!printk_ratelimit()) return; - printk("%s%s[%d]: segfault at %lx ip %p sp %p error %x", + printk("%s%s[%d]: segfault at %lx ip %px sp %px error %x", task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG, tsk->comm, task_pid_nr(tsk), FAULT_ADDRESS(*fi), (void *)UPT_IP(regs), (void *)UPT_SP(regs), diff --git a/arch/x86/configs/x86_64_cuttlefish_defconfig b/arch/x86/configs/x86_64_cuttlefish_defconfig index ec2686d8abc2..e101b10a4d72 100644 --- a/arch/x86/configs/x86_64_cuttlefish_defconfig +++ b/arch/x86/configs/x86_64_cuttlefish_defconfig @@ -230,6 +230,7 @@ CONFIG_DM_UEVENT=y CONFIG_DM_VERITY=y CONFIG_DM_VERITY_FEC=y CONFIG_DM_ANDROID_VERITY=y +CONFIG_DM_BOW=y CONFIG_NETDEVICES=y CONFIG_NETCONSOLE=y CONFIG_NETCONSOLE_DYNAMIC=y @@ -283,7 +284,6 @@ CONFIG_TABLET_USB_GTCO=y CONFIG_TABLET_USB_HANWANG=y CONFIG_TABLET_USB_KBTAB=y CONFIG_INPUT_MISC=y -CONFIG_INPUT_KEYCHORD=y CONFIG_INPUT_UINPUT=y CONFIG_INPUT_GPIO=y # CONFIG_SERIO_I8042 is not set diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h index 1f86e1b0a5cd..499578f7e6d7 100644 --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -23,6 +23,12 @@ struct unwind_state { #elif defined(CONFIG_UNWINDER_FRAME_POINTER) bool got_irq; unsigned long *bp, *orig_sp, ip; + /* + * If non-NULL: The current frame is incomplete and doesn't contain a + * valid BP. When looking for the next frame, use this instead of the + * non-existent saved BP. + */ + unsigned long *next_bp; struct pt_regs *regs; #else unsigned long *sp; diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index b034826a0b3b..21be0193d9dc 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -287,7 +287,7 @@ recompute_jump(struct alt_instr *a, u8 *orig_insn, u8 *repl_insn, u8 *insnbuf) tgt_rip = next_rip + o_dspl; n_dspl = tgt_rip - orig_insn; - DPRINTK("target RIP: %p, new_displ: 0x%x", tgt_rip, n_dspl); + DPRINTK("target RIP: %px, new_displ: 0x%x", tgt_rip, n_dspl); if (tgt_rip - orig_insn >= 0) { if (n_dspl - 2 <= 127) @@ -344,7 +344,7 @@ static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *ins add_nops(instr + (a->instrlen - a->padlen), a->padlen); local_irq_restore(flags); - DUMP_BYTES(instr, a->instrlen, "%p: [%d:%d) optimized NOPs: ", + DUMP_BYTES(instr, a->instrlen, "%px: [%d:%d) optimized NOPs: ", instr, a->instrlen - a->padlen, a->padlen); } @@ -365,7 +365,7 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, u8 *instr, *replacement; u8 insnbuf[MAX_PATCH_LEN]; - DPRINTK("alt table %p -> %p", start, end); + DPRINTK("alt table %px, -> %px", start, end); /* * The scan order should be from start to end. A later scanned * alternative code can overwrite previously scanned alternative code. @@ -389,14 +389,14 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, continue; } - DPRINTK("feat: %d*32+%d, old: (%p, len: %d), repl: (%p, len: %d), pad: %d", + DPRINTK("feat: %d*32+%d, old: (%px len: %d), repl: (%px, len: %d), pad: %d", a->cpuid >> 5, a->cpuid & 0x1f, instr, a->instrlen, replacement, a->replacementlen, a->padlen); - DUMP_BYTES(instr, a->instrlen, "%p: old_insn: ", instr); - DUMP_BYTES(replacement, a->replacementlen, "%p: rpl_insn: ", replacement); + DUMP_BYTES(instr, a->instrlen, "%px: old_insn: ", instr); + DUMP_BYTES(replacement, a->replacementlen, "%px: rpl_insn: ", replacement); memcpy(insnbuf, replacement, a->replacementlen); insnbuf_sz = a->replacementlen; @@ -422,7 +422,7 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, a->instrlen - a->replacementlen); insnbuf_sz += a->instrlen - a->replacementlen; } - DUMP_BYTES(insnbuf, insnbuf_sz, "%p: final_insn: ", instr); + DUMP_BYTES(insnbuf, insnbuf_sz, "%px: final_insn: ", instr); text_poke_early(instr, insnbuf, insnbuf_sz); } diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c index 3dc26f95d46e..9b9fd4826e7a 100644 --- a/arch/x86/kernel/unwind_frame.c +++ b/arch/x86/kernel/unwind_frame.c @@ -320,10 +320,14 @@ bool unwind_next_frame(struct unwind_state *state) } /* Get the next frame pointer: */ - if (state->regs) + if (state->next_bp) { + next_bp = state->next_bp; + state->next_bp = NULL; + } else if (state->regs) { next_bp = (unsigned long *)state->regs->bp; - else + } else { next_bp = (unsigned long *)READ_ONCE_TASK_STACK(state->task, *state->bp); + } /* Move to the next frame if it's safe: */ if (!update_stack_state(state, next_bp)) @@ -398,6 +402,21 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task, bp = get_frame_pointer(task, regs); + /* + * If we crash with IP==0, the last successfully executed instruction + * was probably an indirect function call with a NULL function pointer. + * That means that SP points into the middle of an incomplete frame: + * *SP is a return pointer, and *(SP-sizeof(unsigned long)) is where we + * would have written a frame pointer if we hadn't crashed. + * Pretend that the frame is complete and that BP points to it, but save + * the real BP so that we can use it when looking for the next frame. + */ + if (regs && regs->ip == 0 && + (unsigned long *)kernel_stack_pointer(regs) >= first_frame) { + state->next_bp = bp; + bp = ((unsigned long *)kernel_stack_pointer(regs)) - 1; + } + /* Initialize stack info and make sure the frame data is accessible: */ get_stack_info(bp, state->task, &state->stack_info, &state->stack_mask); @@ -410,7 +429,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task, */ while (!unwind_done(state) && (!on_stack(&state->stack_info, first_frame, sizeof(long)) || - state->bp < first_frame)) + (state->next_bp == NULL && state->bp < first_frame))) unwind_next_frame(state); } EXPORT_SYMBOL_GPL(__unwind_start); diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index be86a865087a..3bbb399f7ead 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -74,11 +74,28 @@ static struct orc_entry *orc_module_find(unsigned long ip) } #endif +/* + * If we crash with IP==0, the last successfully executed instruction + * was probably an indirect function call with a NULL function pointer, + * and we don't have unwind information for NULL. + * This hardcoded ORC entry for IP==0 allows us to unwind from a NULL function + * pointer into its parent and then continue normally from there. + */ +static struct orc_entry null_orc_entry = { + .sp_offset = sizeof(long), + .sp_reg = ORC_REG_SP, + .bp_reg = ORC_REG_UNDEFINED, + .type = ORC_TYPE_CALL +}; + static struct orc_entry *orc_find(unsigned long ip) { if (!orc_init) return NULL; + if (ip == 0) + return &null_orc_entry; + /* For non-init vmlinux addresses, use the fast lookup table: */ if (ip >= LOOKUP_START_IP && ip < LOOKUP_STOP_IP) { unsigned int idx, start, stop; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 794c35c4ca73..99a141d500dc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -860,7 +860,7 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code, if (!printk_ratelimit()) return; - printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx", + printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx", task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG, tsk->comm, task_pid_nr(tsk), address, (void *)regs->ip, (void *)regs->sp, error_code); diff --git a/drivers/android/binder.c b/drivers/android/binder.c index c7478aeb9b10..64824dc8fc22 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -663,6 +663,26 @@ struct binder_transaction { spinlock_t lock; }; +/** + * struct binder_object - union of flat binder object types + * @hdr: generic object header + * @fbo: binder object (nodes and refs) + * @fdo: file descriptor object + * @bbo: binder buffer pointer + * @fdao: file descriptor array + * + * Used for type-independent object copies + */ +struct binder_object { + union { + struct binder_object_header hdr; + struct flat_binder_object fbo; + struct binder_fd_object fdo; + struct binder_buffer_object bbo; + struct binder_fd_array_object fdao; + }; +}; + /** * binder_proc_lock() - Acquire outer lock for given binder_proc * @proc: struct binder_proc to acquire @@ -2199,26 +2219,34 @@ static void binder_cleanup_transaction(struct binder_transaction *t, } /** - * binder_validate_object() - checks for a valid metadata object in a buffer. + * binder_get_object() - gets object and checks for valid metadata + * @proc: binder_proc owning the buffer * @buffer: binder_buffer that we're parsing. - * @offset: offset in the buffer at which to validate an object. + * @offset: offset in the @buffer at which to validate an object. + * @object: struct binder_object to read into * * Return: If there's a valid metadata object at @offset in @buffer, the - * size of that object. Otherwise, it returns zero. + * size of that object. Otherwise, it returns zero. The object + * is read into the struct binder_object pointed to by @object. */ -static size_t binder_validate_object(struct binder_buffer *buffer, u64 offset) +static size_t binder_get_object(struct binder_proc *proc, + struct binder_buffer *buffer, + unsigned long offset, + struct binder_object *object) { - /* Check if we can read a header first */ + size_t read_size; struct binder_object_header *hdr; size_t object_size = 0; - if (buffer->data_size < sizeof(*hdr) || - offset > buffer->data_size - sizeof(*hdr) || + read_size = min_t(size_t, sizeof(*object), buffer->data_size - offset); + if (offset > buffer->data_size || read_size < sizeof(*hdr) || !IS_ALIGNED(offset, sizeof(u32))) return 0; + binder_alloc_copy_from_buffer(&proc->alloc, object, buffer, + offset, read_size); - /* Ok, now see if we can read a complete object. */ - hdr = (struct binder_object_header *)(buffer->data + offset); + /* Ok, now see if we read a complete object. */ + hdr = &object->hdr; switch (hdr->type) { case BINDER_TYPE_BINDER: case BINDER_TYPE_WEAK_BINDER: @@ -2247,10 +2275,13 @@ static size_t binder_validate_object(struct binder_buffer *buffer, u64 offset) /** * binder_validate_ptr() - validates binder_buffer_object in a binder_buffer. + * @proc: binder_proc owning the buffer * @b: binder_buffer containing the object + * @object: struct binder_object to read into * @index: index in offset array at which the binder_buffer_object is * located - * @start: points to the start of the offset array + * @start_offset: points to the start of the offset array + * @object_offsetp: offset of @object read from @b * @num_valid: the number of valid offsets in the offset array * * Return: If @index is within the valid range of the offset array @@ -2261,34 +2292,46 @@ static size_t binder_validate_object(struct binder_buffer *buffer, u64 offset) * Note that the offset found in index @index itself is not * verified; this function assumes that @num_valid elements * from @start were previously verified to have valid offsets. + * If @object_offsetp is non-NULL, then the offset within + * @b is written to it. */ -static struct binder_buffer_object *binder_validate_ptr(struct binder_buffer *b, - binder_size_t index, - binder_size_t *start, - binder_size_t num_valid) +static struct binder_buffer_object *binder_validate_ptr( + struct binder_proc *proc, + struct binder_buffer *b, + struct binder_object *object, + binder_size_t index, + binder_size_t start_offset, + binder_size_t *object_offsetp, + binder_size_t num_valid) { - struct binder_buffer_object *buffer_obj; - binder_size_t *offp; + size_t object_size; + binder_size_t object_offset; + unsigned long buffer_offset; if (index >= num_valid) return NULL; - offp = start + index; - buffer_obj = (struct binder_buffer_object *)(b->data + *offp); - if (buffer_obj->hdr.type != BINDER_TYPE_PTR) + buffer_offset = start_offset + sizeof(binder_size_t) * index; + binder_alloc_copy_from_buffer(&proc->alloc, &object_offset, + b, buffer_offset, sizeof(object_offset)); + object_size = binder_get_object(proc, b, object_offset, object); + if (!object_size || object->hdr.type != BINDER_TYPE_PTR) return NULL; + if (object_offsetp) + *object_offsetp = object_offset; - return buffer_obj; + return &object->bbo; } /** * binder_validate_fixup() - validates pointer/fd fixups happen in order. + * @proc: binder_proc owning the buffer * @b: transaction buffer - * @objects_start start of objects buffer - * @buffer: binder_buffer_object in which to fix up - * @offset: start offset in @buffer to fix up - * @last_obj: last binder_buffer_object that we fixed up in - * @last_min_offset: minimum fixup offset in @last_obj + * @objects_start_offset: offset to start of objects buffer + * @buffer_obj_offset: offset to binder_buffer_object in which to fix up + * @fixup_offset: start offset in @buffer to fix up + * @last_obj_offset: offset to last binder_buffer_object that we fixed + * @last_min_offset: minimum fixup offset in object at @last_obj_offset * * Return: %true if a fixup in buffer @buffer at offset @offset is * allowed. @@ -2319,63 +2362,83 @@ static struct binder_buffer_object *binder_validate_ptr(struct binder_buffer *b, * C (parent = A, offset = 16) * D (parent = B, offset = 0) // B is not A or any of A's parents */ -static bool binder_validate_fixup(struct binder_buffer *b, - binder_size_t *objects_start, - struct binder_buffer_object *buffer, +static bool binder_validate_fixup(struct binder_proc *proc, + struct binder_buffer *b, + binder_size_t objects_start_offset, + binder_size_t buffer_obj_offset, binder_size_t fixup_offset, - struct binder_buffer_object *last_obj, + binder_size_t last_obj_offset, binder_size_t last_min_offset) { - if (!last_obj) { + if (!last_obj_offset) { /* Nothing to fix up in */ return false; } - while (last_obj != buffer) { + while (last_obj_offset != buffer_obj_offset) { + unsigned long buffer_offset; + struct binder_object last_object; + struct binder_buffer_object *last_bbo; + size_t object_size = binder_get_object(proc, b, last_obj_offset, + &last_object); + if (object_size != sizeof(*last_bbo)) + return false; + + last_bbo = &last_object.bbo; /* * Safe to retrieve the parent of last_obj, since it * was already previously verified by the driver. */ - if ((last_obj->flags & BINDER_BUFFER_FLAG_HAS_PARENT) == 0) + if ((last_bbo->flags & BINDER_BUFFER_FLAG_HAS_PARENT) == 0) return false; - last_min_offset = last_obj->parent_offset + sizeof(uintptr_t); - last_obj = (struct binder_buffer_object *) - (b->data + *(objects_start + last_obj->parent)); + last_min_offset = last_bbo->parent_offset + sizeof(uintptr_t); + buffer_offset = objects_start_offset + + sizeof(binder_size_t) * last_bbo->parent, + binder_alloc_copy_from_buffer(&proc->alloc, &last_obj_offset, + b, buffer_offset, + sizeof(last_obj_offset)); } return (fixup_offset >= last_min_offset); } static void binder_transaction_buffer_release(struct binder_proc *proc, struct binder_buffer *buffer, - binder_size_t *failed_at) + binder_size_t failed_at, + bool is_failure) { - binder_size_t *offp, *off_start, *off_end; int debug_id = buffer->debug_id; + binder_size_t off_start_offset, buffer_offset, off_end_offset; binder_debug(BINDER_DEBUG_TRANSACTION, - "%d buffer release %d, size %zd-%zd, failed at %pK\n", + "%d buffer release %d, size %zd-%zd, failed at %llx\n", proc->pid, buffer->debug_id, - buffer->data_size, buffer->offsets_size, failed_at); + buffer->data_size, buffer->offsets_size, + (unsigned long long)failed_at); if (buffer->target_node) binder_dec_node(buffer->target_node, 1, 0); - off_start = (binder_size_t *)(buffer->data + - ALIGN(buffer->data_size, sizeof(void *))); - if (failed_at) - off_end = failed_at; - else - off_end = (void *)off_start + buffer->offsets_size; - for (offp = off_start; offp < off_end; offp++) { + off_start_offset = ALIGN(buffer->data_size, sizeof(void *)); + off_end_offset = is_failure ? failed_at : + off_start_offset + buffer->offsets_size; + for (buffer_offset = off_start_offset; buffer_offset < off_end_offset; + buffer_offset += sizeof(binder_size_t)) { struct binder_object_header *hdr; - size_t object_size = binder_validate_object(buffer, *offp); - + size_t object_size; + struct binder_object object; + binder_size_t object_offset; + + binder_alloc_copy_from_buffer(&proc->alloc, &object_offset, + buffer, buffer_offset, + sizeof(object_offset)); + object_size = binder_get_object(proc, buffer, + object_offset, &object); if (object_size == 0) { pr_err("transaction release %d bad object at offset %lld, size %zd\n", - debug_id, (u64)*offp, buffer->data_size); + debug_id, (u64)object_offset, buffer->data_size); continue; } - hdr = (struct binder_object_header *)(buffer->data + *offp); + hdr = &object.hdr; switch (hdr->type) { case BINDER_TYPE_BINDER: case BINDER_TYPE_WEAK_BINDER: { @@ -2433,28 +2496,25 @@ static void binder_transaction_buffer_release(struct binder_proc *proc, case BINDER_TYPE_FDA: { struct binder_fd_array_object *fda; struct binder_buffer_object *parent; - uintptr_t parent_buffer; - u32 *fd_array; + struct binder_object ptr_object; + binder_size_t fda_offset; size_t fd_index; binder_size_t fd_buf_size; + binder_size_t num_valid; + num_valid = (buffer_offset - off_start_offset) / + sizeof(binder_size_t); fda = to_binder_fd_array_object(hdr); - parent = binder_validate_ptr(buffer, fda->parent, - off_start, - offp - off_start); + parent = binder_validate_ptr(proc, buffer, &ptr_object, + fda->parent, + off_start_offset, + NULL, + num_valid); if (!parent) { pr_err("transaction release %d bad parent offset", debug_id); continue; } - /* - * Since the parent was already fixed up, convert it - * back to kernel address space to access it - */ - parent_buffer = parent->buffer - - binder_alloc_get_user_buffer_offset( - &proc->alloc); - fd_buf_size = sizeof(u32) * fda->num_fds; if (fda->num_fds >= SIZE_MAX / sizeof(u32)) { pr_err("transaction release %d invalid number of fds (%lld)\n", @@ -2468,9 +2528,29 @@ static void binder_transaction_buffer_release(struct binder_proc *proc, debug_id, (u64)fda->num_fds); continue; } - fd_array = (u32 *)(parent_buffer + (uintptr_t)fda->parent_offset); - for (fd_index = 0; fd_index < fda->num_fds; fd_index++) - task_close_fd(proc, fd_array[fd_index]); + /* + * the source data for binder_buffer_object is visible + * to user-space and the @buffer element is the user + * pointer to the buffer_object containing the fd_array. + * Convert the address to an offset relative to + * the base of the transaction buffer. + */ + fda_offset = + (parent->buffer - (uintptr_t)buffer->user_data) + + fda->parent_offset; + for (fd_index = 0; fd_index < fda->num_fds; + fd_index++) { + u32 fd; + binder_size_t offset = fda_offset + + fd_index * sizeof(fd); + + binder_alloc_copy_from_buffer(&proc->alloc, + &fd, + buffer, + offset, + sizeof(fd)); + task_close_fd(proc, fd); + } } break; default: pr_err("transaction release %d bad object type %x\n", @@ -2667,9 +2747,8 @@ static int binder_translate_fd_array(struct binder_fd_array_object *fda, struct binder_transaction *in_reply_to) { binder_size_t fdi, fd_buf_size, num_installed_fds; + binder_size_t fda_offset; int target_fd; - uintptr_t parent_buffer; - u32 *fd_array; struct binder_proc *proc = thread->proc; struct binder_proc *target_proc = t->to_proc; @@ -2687,23 +2766,33 @@ static int binder_translate_fd_array(struct binder_fd_array_object *fda, return -EINVAL; } /* - * Since the parent was already fixed up, convert it - * back to the kernel address space to access it + * the source data for binder_buffer_object is visible + * to user-space and the @buffer element is the user + * pointer to the buffer_object containing the fd_array. + * Convert the address to an offset relative to + * the base of the transaction buffer. */ - parent_buffer = parent->buffer - - binder_alloc_get_user_buffer_offset(&target_proc->alloc); - fd_array = (u32 *)(parent_buffer + (uintptr_t)fda->parent_offset); - if (!IS_ALIGNED((unsigned long)fd_array, sizeof(u32))) { + fda_offset = (parent->buffer - (uintptr_t)t->buffer->user_data) + + fda->parent_offset; + if (!IS_ALIGNED((unsigned long)fda_offset, sizeof(u32))) { binder_user_error("%d:%d parent offset not aligned correctly.\n", proc->pid, thread->pid); return -EINVAL; } for (fdi = 0; fdi < fda->num_fds; fdi++) { - target_fd = binder_translate_fd(fd_array[fdi], t, thread, - in_reply_to); + u32 fd; + + binder_size_t offset = fda_offset + fdi * sizeof(fd); + + binder_alloc_copy_from_buffer(&target_proc->alloc, + &fd, t->buffer, + offset, sizeof(fd)); + target_fd = binder_translate_fd(fd, t, thread, in_reply_to); if (target_fd < 0) goto err_translate_fd_failed; - fd_array[fdi] = target_fd; + binder_alloc_copy_to_buffer(&target_proc->alloc, + t->buffer, offset, + &target_fd, sizeof(fd)); } return 0; @@ -2713,38 +2802,48 @@ err_translate_fd_failed: * installed so far. */ num_installed_fds = fdi; - for (fdi = 0; fdi < num_installed_fds; fdi++) - task_close_fd(target_proc, fd_array[fdi]); + for (fdi = 0; fdi < num_installed_fds; fdi++) { + u32 fd; + binder_size_t offset = fda_offset + fdi * sizeof(fd); + binder_alloc_copy_from_buffer(&target_proc->alloc, + &fd, t->buffer, + offset, sizeof(fd)); + task_close_fd(target_proc, fd); + } return target_fd; } static int binder_fixup_parent(struct binder_transaction *t, struct binder_thread *thread, struct binder_buffer_object *bp, - binder_size_t *off_start, + binder_size_t off_start_offset, binder_size_t num_valid, - struct binder_buffer_object *last_fixup_obj, + binder_size_t last_fixup_obj_off, binder_size_t last_fixup_min_off) { struct binder_buffer_object *parent; - u8 *parent_buffer; struct binder_buffer *b = t->buffer; struct binder_proc *proc = thread->proc; struct binder_proc *target_proc = t->to_proc; + struct binder_object object; + binder_size_t buffer_offset; + binder_size_t parent_offset; if (!(bp->flags & BINDER_BUFFER_FLAG_HAS_PARENT)) return 0; - parent = binder_validate_ptr(b, bp->parent, off_start, num_valid); + parent = binder_validate_ptr(target_proc, b, &object, bp->parent, + off_start_offset, &parent_offset, + num_valid); if (!parent) { binder_user_error("%d:%d got transaction with invalid parent offset or type\n", proc->pid, thread->pid); return -EINVAL; } - if (!binder_validate_fixup(b, off_start, - parent, bp->parent_offset, - last_fixup_obj, + if (!binder_validate_fixup(target_proc, b, off_start_offset, + parent_offset, bp->parent_offset, + last_fixup_obj_off, last_fixup_min_off)) { binder_user_error("%d:%d got transaction with out-of-order buffer fixup\n", proc->pid, thread->pid); @@ -2758,10 +2857,10 @@ static int binder_fixup_parent(struct binder_transaction *t, proc->pid, thread->pid); return -EINVAL; } - parent_buffer = (u8 *)((uintptr_t)parent->buffer - - binder_alloc_get_user_buffer_offset( - &target_proc->alloc)); - *(binder_uintptr_t *)(parent_buffer + bp->parent_offset) = bp->buffer; + buffer_offset = bp->parent_offset + + (uintptr_t)parent->buffer - (uintptr_t)b->user_data; + binder_alloc_copy_to_buffer(&target_proc->alloc, b, buffer_offset, + &bp->buffer, sizeof(bp->buffer)); return 0; } @@ -2886,9 +2985,10 @@ static void binder_transaction(struct binder_proc *proc, int ret; struct binder_transaction *t; struct binder_work *tcomplete; - binder_size_t *offp, *off_end, *off_start; + binder_size_t buffer_offset = 0; + binder_size_t off_start_offset, off_end_offset; binder_size_t off_min; - u8 *sg_bufp, *sg_buf_end; + binder_size_t sg_buf_offset, sg_buf_end_offset; struct binder_proc *target_proc = NULL; struct binder_thread *target_thread = NULL; struct binder_node *target_node = NULL; @@ -2897,7 +2997,7 @@ static void binder_transaction(struct binder_proc *proc, uint32_t return_error = 0; uint32_t return_error_param = 0; uint32_t return_error_line = 0; - struct binder_buffer_object *last_fixup_obj = NULL; + binder_size_t last_fixup_obj_off = 0; binder_size_t last_fixup_min_off = 0; struct binder_context *context = proc->context; int t_debug_id = atomic_inc_return(&binder_last_id); @@ -3161,11 +3261,11 @@ static void binder_transaction(struct binder_proc *proc, ALIGN(tr->offsets_size, sizeof(void *)) + ALIGN(extra_buffers_size, sizeof(void *)) - ALIGN(secctx_sz, sizeof(u64)); - char *kptr = t->buffer->data + buf_offset; - t->security_ctx = (uintptr_t)kptr + - binder_alloc_get_user_buffer_offset(&target_proc->alloc); - memcpy(kptr, secctx, secctx_sz); + t->security_ctx = (uintptr_t)t->buffer->user_data + buf_offset; + binder_alloc_copy_to_buffer(&target_proc->alloc, + t->buffer, buf_offset, + secctx, secctx_sz); security_release_secctx(secctx, secctx_sz); secctx = NULL; } @@ -3173,12 +3273,13 @@ static void binder_transaction(struct binder_proc *proc, t->buffer->transaction = t; t->buffer->target_node = target_node; trace_binder_transaction_alloc_buf(t->buffer); - off_start = (binder_size_t *)(t->buffer->data + - ALIGN(tr->data_size, sizeof(void *))); - offp = off_start; - if (copy_from_user(t->buffer->data, (const void __user *)(uintptr_t) - tr->data.ptr.buffer, tr->data_size)) { + if (binder_alloc_copy_user_to_buffer( + &target_proc->alloc, + t->buffer, 0, + (const void __user *) + (uintptr_t)tr->data.ptr.buffer, + tr->data_size)) { binder_user_error("%d:%d got transaction with invalid data ptr\n", proc->pid, thread->pid); return_error = BR_FAILED_REPLY; @@ -3186,8 +3287,13 @@ static void binder_transaction(struct binder_proc *proc, return_error_line = __LINE__; goto err_copy_data_failed; } - if (copy_from_user(offp, (const void __user *)(uintptr_t) - tr->data.ptr.offsets, tr->offsets_size)) { + if (binder_alloc_copy_user_to_buffer( + &target_proc->alloc, + t->buffer, + ALIGN(tr->data_size, sizeof(void *)), + (const void __user *) + (uintptr_t)tr->data.ptr.offsets, + tr->offsets_size)) { binder_user_error("%d:%d got transaction with invalid offsets ptr\n", proc->pid, thread->pid); return_error = BR_FAILED_REPLY; @@ -3212,17 +3318,30 @@ static void binder_transaction(struct binder_proc *proc, return_error_line = __LINE__; goto err_bad_offset; } - off_end = (void *)off_start + tr->offsets_size; - sg_bufp = (u8 *)(PTR_ALIGN(off_end, sizeof(void *))); - sg_buf_end = sg_bufp + extra_buffers_size; + off_start_offset = ALIGN(tr->data_size, sizeof(void *)); + buffer_offset = off_start_offset; + off_end_offset = off_start_offset + tr->offsets_size; + sg_buf_offset = ALIGN(off_end_offset, sizeof(void *)); + sg_buf_end_offset = sg_buf_offset + extra_buffers_size; off_min = 0; - for (; offp < off_end; offp++) { + for (buffer_offset = off_start_offset; buffer_offset < off_end_offset; + buffer_offset += sizeof(binder_size_t)) { struct binder_object_header *hdr; - size_t object_size = binder_validate_object(t->buffer, *offp); - - if (object_size == 0 || *offp < off_min) { + size_t object_size; + struct binder_object object; + binder_size_t object_offset; + + binder_alloc_copy_from_buffer(&target_proc->alloc, + &object_offset, + t->buffer, + buffer_offset, + sizeof(object_offset)); + object_size = binder_get_object(target_proc, t->buffer, + object_offset, &object); + if (object_size == 0 || object_offset < off_min) { binder_user_error("%d:%d got transaction with invalid offset (%lld, min %lld max %lld) or object.\n", - proc->pid, thread->pid, (u64)*offp, + proc->pid, thread->pid, + (u64)object_offset, (u64)off_min, (u64)t->buffer->data_size); return_error = BR_FAILED_REPLY; @@ -3231,8 +3350,8 @@ static void binder_transaction(struct binder_proc *proc, goto err_bad_offset; } - hdr = (struct binder_object_header *)(t->buffer->data + *offp); - off_min = *offp + object_size; + hdr = &object.hdr; + off_min = object_offset + object_size; switch (hdr->type) { case BINDER_TYPE_BINDER: case BINDER_TYPE_WEAK_BINDER: { @@ -3246,6 +3365,9 @@ static void binder_transaction(struct binder_proc *proc, return_error_line = __LINE__; goto err_translate_failed; } + binder_alloc_copy_to_buffer(&target_proc->alloc, + t->buffer, object_offset, + fp, sizeof(*fp)); } break; case BINDER_TYPE_HANDLE: case BINDER_TYPE_WEAK_HANDLE: { @@ -3259,6 +3381,9 @@ static void binder_transaction(struct binder_proc *proc, return_error_line = __LINE__; goto err_translate_failed; } + binder_alloc_copy_to_buffer(&target_proc->alloc, + t->buffer, object_offset, + fp, sizeof(*fp)); } break; case BINDER_TYPE_FD: { @@ -3274,14 +3399,23 @@ static void binder_transaction(struct binder_proc *proc, } fp->pad_binder = 0; fp->fd = target_fd; + binder_alloc_copy_to_buffer(&target_proc->alloc, + t->buffer, object_offset, + fp, sizeof(*fp)); } break; case BINDER_TYPE_FDA: { + struct binder_object ptr_object; + binder_size_t parent_offset; struct binder_fd_array_object *fda = to_binder_fd_array_object(hdr); + size_t num_valid = (buffer_offset - off_start_offset) * + sizeof(binder_size_t); struct binder_buffer_object *parent = - binder_validate_ptr(t->buffer, fda->parent, - off_start, - offp - off_start); + binder_validate_ptr(target_proc, t->buffer, + &ptr_object, fda->parent, + off_start_offset, + &parent_offset, + num_valid); if (!parent) { binder_user_error("%d:%d got transaction with invalid parent offset or type\n", proc->pid, thread->pid); @@ -3290,9 +3424,11 @@ static void binder_transaction(struct binder_proc *proc, return_error_line = __LINE__; goto err_bad_parent; } - if (!binder_validate_fixup(t->buffer, off_start, - parent, fda->parent_offset, - last_fixup_obj, + if (!binder_validate_fixup(target_proc, t->buffer, + off_start_offset, + parent_offset, + fda->parent_offset, + last_fixup_obj_off, last_fixup_min_off)) { binder_user_error("%d:%d got transaction with out-of-order buffer fixup\n", proc->pid, thread->pid); @@ -3309,14 +3445,15 @@ static void binder_transaction(struct binder_proc *proc, return_error_line = __LINE__; goto err_translate_failed; } - last_fixup_obj = parent; + last_fixup_obj_off = parent_offset; last_fixup_min_off = fda->parent_offset + sizeof(u32) * fda->num_fds; } break; case BINDER_TYPE_PTR: { struct binder_buffer_object *bp = to_binder_buffer_object(hdr); - size_t buf_left = sg_buf_end - sg_bufp; + size_t buf_left = sg_buf_end_offset - sg_buf_offset; + size_t num_valid; if (bp->length > buf_left) { binder_user_error("%d:%d got transaction with too large buffer\n", @@ -3326,9 +3463,13 @@ static void binder_transaction(struct binder_proc *proc, return_error_line = __LINE__; goto err_bad_offset; } - if (copy_from_user(sg_bufp, - (const void __user *)(uintptr_t) - bp->buffer, bp->length)) { + if (binder_alloc_copy_user_to_buffer( + &target_proc->alloc, + t->buffer, + sg_buf_offset, + (const void __user *) + (uintptr_t)bp->buffer, + bp->length)) { binder_user_error("%d:%d got transaction with invalid offsets ptr\n", proc->pid, thread->pid); return_error_param = -EFAULT; @@ -3337,14 +3478,16 @@ static void binder_transaction(struct binder_proc *proc, goto err_copy_data_failed; } /* Fixup buffer pointer to target proc address space */ - bp->buffer = (uintptr_t)sg_bufp + - binder_alloc_get_user_buffer_offset( - &target_proc->alloc); - sg_bufp += ALIGN(bp->length, sizeof(u64)); - - ret = binder_fixup_parent(t, thread, bp, off_start, - offp - off_start, - last_fixup_obj, + bp->buffer = (uintptr_t) + t->buffer->user_data + sg_buf_offset; + sg_buf_offset += ALIGN(bp->length, sizeof(u64)); + + num_valid = (buffer_offset - off_start_offset) * + sizeof(binder_size_t); + ret = binder_fixup_parent(t, thread, bp, + off_start_offset, + num_valid, + last_fixup_obj_off, last_fixup_min_off); if (ret < 0) { return_error = BR_FAILED_REPLY; @@ -3352,7 +3495,10 @@ static void binder_transaction(struct binder_proc *proc, return_error_line = __LINE__; goto err_translate_failed; } - last_fixup_obj = bp; + binder_alloc_copy_to_buffer(&target_proc->alloc, + t->buffer, object_offset, + bp, sizeof(*bp)); + last_fixup_obj_off = object_offset; last_fixup_min_off = 0; } break; default: @@ -3432,7 +3578,8 @@ err_bad_offset: err_bad_parent: err_copy_data_failed: trace_binder_transaction_failed_buffer_release(t->buffer); - binder_transaction_buffer_release(target_proc, t->buffer, offp); + binder_transaction_buffer_release(target_proc, t->buffer, + buffer_offset, true); if (target_node) binder_dec_node_tmpref(target_node); target_node = NULL; @@ -3711,7 +3858,7 @@ static int binder_thread_write(struct binder_proc *proc, binder_node_inner_unlock(buf_node); } trace_binder_transaction_buffer_release(buffer); - binder_transaction_buffer_release(proc, buffer, NULL); + binder_transaction_buffer_release(proc, buffer, 0, false); binder_alloc_free_buf(&proc->alloc, buffer); break; } @@ -4319,9 +4466,7 @@ retry: trd->data_size = t->buffer->data_size; trd->offsets_size = t->buffer->offsets_size; - trd->data.ptr.buffer = (binder_uintptr_t) - ((uintptr_t)t->buffer->data + - binder_alloc_get_user_buffer_offset(&proc->alloc)); + trd->data.ptr.buffer = (uintptr_t)t->buffer->user_data; trd->data.ptr.offsets = trd->data.ptr.buffer + ALIGN(t->buffer->data_size, sizeof(void *)); @@ -5380,7 +5525,7 @@ static void print_binder_transaction_ilocked(struct seq_file *m, seq_printf(m, " node %d", buffer->target_node->debug_id); seq_printf(m, " size %zd:%zd data %pK\n", buffer->data_size, buffer->offsets_size, - buffer->data); + buffer->user_data); } static void print_binder_work_ilocked(struct seq_file *m, diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index e00d4d13810a..648d8f8e3d68 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -28,6 +28,8 @@ #include #include #include +#include +#include #include "binder_alloc.h" #include "binder_trace.h" @@ -65,9 +67,8 @@ static size_t binder_alloc_buffer_size(struct binder_alloc *alloc, struct binder_buffer *buffer) { if (list_is_last(&buffer->entry, &alloc->buffers)) - return (u8 *)alloc->buffer + - alloc->buffer_size - (u8 *)buffer->data; - return (u8 *)binder_buffer_next(buffer)->data - (u8 *)buffer->data; + return alloc->buffer + alloc->buffer_size - buffer->user_data; + return binder_buffer_next(buffer)->user_data - buffer->user_data; } static void binder_insert_free_buffer(struct binder_alloc *alloc, @@ -117,9 +118,9 @@ static void binder_insert_allocated_buffer_locked( buffer = rb_entry(parent, struct binder_buffer, rb_node); BUG_ON(buffer->free); - if (new_buffer->data < buffer->data) + if (new_buffer->user_data < buffer->user_data) p = &parent->rb_left; - else if (new_buffer->data > buffer->data) + else if (new_buffer->user_data > buffer->user_data) p = &parent->rb_right; else BUG(); @@ -134,17 +135,17 @@ static struct binder_buffer *binder_alloc_prepare_to_free_locked( { struct rb_node *n = alloc->allocated_buffers.rb_node; struct binder_buffer *buffer; - void *kern_ptr; + void __user *uptr; - kern_ptr = (void *)(user_ptr - alloc->user_buffer_offset); + uptr = (void __user *)user_ptr; while (n) { buffer = rb_entry(n, struct binder_buffer, rb_node); BUG_ON(buffer->free); - if (kern_ptr < buffer->data) + if (uptr < buffer->user_data) n = n->rb_left; - else if (kern_ptr > buffer->data) + else if (uptr > buffer->user_data) n = n->rb_right; else { /* @@ -184,9 +185,9 @@ struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc, } static int binder_update_page_range(struct binder_alloc *alloc, int allocate, - void *start, void *end) + void __user *start, void __user *end) { - void *page_addr; + void __user *page_addr; unsigned long user_page_addr; struct binder_lru_page *page; struct vm_area_struct *vma = NULL; @@ -260,18 +261,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, page->alloc = alloc; INIT_LIST_HEAD(&page->lru); - ret = map_kernel_range_noflush((unsigned long)page_addr, - PAGE_SIZE, PAGE_KERNEL, - &page->page_ptr); - flush_cache_vmap((unsigned long)page_addr, - (unsigned long)page_addr + PAGE_SIZE); - if (ret != 1) { - pr_err("%d: binder_alloc_buf failed to map page at %pK in kernel\n", - alloc->pid, page_addr); - goto err_map_kernel_failed; - } - user_page_addr = - (uintptr_t)page_addr + alloc->user_buffer_offset; + user_page_addr = (uintptr_t)page_addr; ret = vm_insert_page(vma, user_page_addr, page[0].page_ptr); if (ret) { pr_err("%d: binder_alloc_buf failed to map page at %lx in userspace\n", @@ -309,8 +299,6 @@ free_range: continue; err_vm_insert_page_failed: - unmap_kernel_range((unsigned long)page_addr, PAGE_SIZE); -err_map_kernel_failed: __free_page(page->page_ptr); page->page_ptr = NULL; err_alloc_page_failed: @@ -364,8 +352,8 @@ static struct binder_buffer *binder_alloc_new_buf_locked( struct binder_buffer *buffer; size_t buffer_size; struct rb_node *best_fit = NULL; - void *has_page_addr; - void *end_page_addr; + void __user *has_page_addr; + void __user *end_page_addr; size_t size, data_offsets_size; int ret; @@ -459,15 +447,15 @@ static struct binder_buffer *binder_alloc_new_buf_locked( "%d: binder_alloc_buf size %zd got buffer %pK size %zd\n", alloc->pid, size, buffer, buffer_size); - has_page_addr = - (void *)(((uintptr_t)buffer->data + buffer_size) & PAGE_MASK); + has_page_addr = (void __user *) + (((uintptr_t)buffer->user_data + buffer_size) & PAGE_MASK); WARN_ON(n && buffer_size != size); end_page_addr = - (void *)PAGE_ALIGN((uintptr_t)buffer->data + size); + (void __user *)PAGE_ALIGN((uintptr_t)buffer->user_data + size); if (end_page_addr > has_page_addr) end_page_addr = has_page_addr; - ret = binder_update_page_range(alloc, 1, - (void *)PAGE_ALIGN((uintptr_t)buffer->data), end_page_addr); + ret = binder_update_page_range(alloc, 1, (void __user *) + PAGE_ALIGN((uintptr_t)buffer->user_data), end_page_addr); if (ret) return ERR_PTR(ret); @@ -480,7 +468,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked( __func__, alloc->pid); goto err_alloc_buf_struct_failed; } - new_buffer->data = (u8 *)buffer->data + size; + new_buffer->user_data = (u8 __user *)buffer->user_data + size; list_add(&new_buffer->entry, &buffer->entry); new_buffer->free = 1; binder_insert_free_buffer(alloc, new_buffer); @@ -506,8 +494,8 @@ static struct binder_buffer *binder_alloc_new_buf_locked( return buffer; err_alloc_buf_struct_failed: - binder_update_page_range(alloc, 0, - (void *)PAGE_ALIGN((uintptr_t)buffer->data), + binder_update_page_range(alloc, 0, (void __user *) + PAGE_ALIGN((uintptr_t)buffer->user_data), end_page_addr); return ERR_PTR(-ENOMEM); } @@ -542,14 +530,15 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc, return buffer; } -static void *buffer_start_page(struct binder_buffer *buffer) +static void __user *buffer_start_page(struct binder_buffer *buffer) { - return (void *)((uintptr_t)buffer->data & PAGE_MASK); + return (void __user *)((uintptr_t)buffer->user_data & PAGE_MASK); } -static void *prev_buffer_end_page(struct binder_buffer *buffer) +static void __user *prev_buffer_end_page(struct binder_buffer *buffer) { - return (void *)(((uintptr_t)(buffer->data) - 1) & PAGE_MASK); + return (void __user *) + (((uintptr_t)(buffer->user_data) - 1) & PAGE_MASK); } static void binder_delete_free_buffer(struct binder_alloc *alloc, @@ -564,7 +553,8 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc, to_free = false; binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, "%d: merge free, buffer %pK share page with %pK\n", - alloc->pid, buffer->data, prev->data); + alloc->pid, buffer->user_data, + prev->user_data); } if (!list_is_last(&buffer->entry, &alloc->buffers)) { @@ -574,23 +564,24 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc, binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, "%d: merge free, buffer %pK share page with %pK\n", alloc->pid, - buffer->data, - next->data); + buffer->user_data, + next->user_data); } } - if (PAGE_ALIGNED(buffer->data)) { + if (PAGE_ALIGNED(buffer->user_data)) { binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, "%d: merge free, buffer start %pK is page aligned\n", - alloc->pid, buffer->data); + alloc->pid, buffer->user_data); to_free = false; } if (to_free) { binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, "%d: merge free, buffer %pK do not share page with %pK or %pK\n", - alloc->pid, buffer->data, - prev->data, next ? next->data : NULL); + alloc->pid, buffer->user_data, + prev->user_data, + next ? next->user_data : NULL); binder_update_page_range(alloc, 0, buffer_start_page(buffer), buffer_start_page(buffer) + PAGE_SIZE); } @@ -616,8 +607,8 @@ static void binder_free_buf_locked(struct binder_alloc *alloc, BUG_ON(buffer->free); BUG_ON(size > buffer_size); BUG_ON(buffer->transaction != NULL); - BUG_ON(buffer->data < alloc->buffer); - BUG_ON(buffer->data > alloc->buffer + alloc->buffer_size); + BUG_ON(buffer->user_data < alloc->buffer); + BUG_ON(buffer->user_data > alloc->buffer + alloc->buffer_size); if (buffer->async_transaction) { alloc->free_async_space += size + sizeof(struct binder_buffer); @@ -628,8 +619,9 @@ static void binder_free_buf_locked(struct binder_alloc *alloc, } binder_update_page_range(alloc, 0, - (void *)PAGE_ALIGN((uintptr_t)buffer->data), - (void *)(((uintptr_t)buffer->data + buffer_size) & PAGE_MASK)); + (void __user *)PAGE_ALIGN((uintptr_t)buffer->user_data), + (void __user *)(((uintptr_t) + buffer->user_data + buffer_size) & PAGE_MASK)); rb_erase(&buffer->rb_node, &alloc->allocated_buffers); buffer->free = 1; @@ -685,7 +677,6 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc, struct vm_area_struct *vma) { int ret; - struct vm_struct *area; const char *failure_string; struct binder_buffer *buffer; @@ -696,28 +687,9 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc, goto err_already_mapped; } - area = get_vm_area(vma->vm_end - vma->vm_start, VM_ALLOC); - if (area == NULL) { - ret = -ENOMEM; - failure_string = "get_vm_area"; - goto err_get_vm_area_failed; - } - alloc->buffer = area->addr; - alloc->user_buffer_offset = - vma->vm_start - (uintptr_t)alloc->buffer; + alloc->buffer = (void __user *)vma->vm_start; mutex_unlock(&binder_alloc_mmap_lock); -#ifdef CONFIG_CPU_CACHE_VIPT - if (cache_is_vipt_aliasing()) { - while (CACHE_COLOUR( - (vma->vm_start ^ (uint32_t)alloc->buffer))) { - pr_info("%s: %d %lx-%lx maps %pK bad alignment\n", - __func__, alloc->pid, vma->vm_start, - vma->vm_end, alloc->buffer); - vma->vm_start += PAGE_SIZE; - } - } -#endif alloc->pages = kzalloc(sizeof(alloc->pages[0]) * ((vma->vm_end - vma->vm_start) / PAGE_SIZE), GFP_KERNEL); @@ -735,7 +707,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc, goto err_alloc_buf_struct_failed; } - buffer->data = alloc->buffer; + buffer->user_data = alloc->buffer; list_add(&buffer->entry, &alloc->buffers); buffer->free = 1; binder_insert_free_buffer(alloc, buffer); @@ -750,9 +722,7 @@ err_alloc_buf_struct_failed: alloc->pages = NULL; err_alloc_pages_failed: mutex_lock(&binder_alloc_mmap_lock); - vfree(alloc->buffer); alloc->buffer = NULL; -err_get_vm_area_failed: err_already_mapped: mutex_unlock(&binder_alloc_mmap_lock); pr_err("%s: %d %lx-%lx %s failed %d\n", __func__, @@ -796,7 +766,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc) int i; for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) { - void *page_addr; + void __user *page_addr; bool on_lru; if (!alloc->pages[i].page_ptr) @@ -809,12 +779,10 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc) "%s: %d: page %d at %pK %s\n", __func__, alloc->pid, i, page_addr, on_lru ? "on lru" : "active"); - unmap_kernel_range((unsigned long)page_addr, PAGE_SIZE); __free_page(alloc->pages[i].page_ptr); page_count++; } kfree(alloc->pages); - vfree(alloc->buffer); } mutex_unlock(&alloc->mutex); if (alloc->vma_vm_mm) @@ -829,7 +797,7 @@ static void print_binder_buffer(struct seq_file *m, const char *prefix, struct binder_buffer *buffer) { seq_printf(m, "%s %d: %pK size %zd:%zd:%zd %s\n", - prefix, buffer->debug_id, buffer->data, + prefix, buffer->debug_id, buffer->user_data, buffer->data_size, buffer->offsets_size, buffer->extra_buffers_size, buffer->transaction ? "active" : "delivered"); @@ -963,9 +931,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item, if (vma) { trace_binder_unmap_user_start(alloc, index); - zap_page_range(vma, - page_addr + alloc->user_buffer_offset, - PAGE_SIZE); + zap_page_range(vma, page_addr, PAGE_SIZE); trace_binder_unmap_user_end(alloc, index); @@ -975,7 +941,6 @@ enum lru_status binder_alloc_free_page(struct list_head *item, trace_binder_unmap_kernel_start(alloc, index); - unmap_kernel_range(page_addr, PAGE_SIZE); __free_page(page->page_ptr); page->page_ptr = NULL; @@ -1042,3 +1007,173 @@ int binder_alloc_shrinker_init(void) } return ret; } + +/** + * check_buffer() - verify that buffer/offset is safe to access + * @alloc: binder_alloc for this proc + * @buffer: binder buffer to be accessed + * @offset: offset into @buffer data + * @bytes: bytes to access from offset + * + * Check that the @offset/@bytes are within the size of the given + * @buffer and that the buffer is currently active and not freeable. + * Offsets must also be multiples of sizeof(u32). The kernel is + * allowed to touch the buffer in two cases: + * + * 1) when the buffer is being created: + * (buffer->free == 0 && buffer->allow_user_free == 0) + * 2) when the buffer is being torn down: + * (buffer->free == 0 && buffer->transaction == NULL). + * + * Return: true if the buffer is safe to access + */ +static inline bool check_buffer(struct binder_alloc *alloc, + struct binder_buffer *buffer, + binder_size_t offset, size_t bytes) +{ + size_t buffer_size = binder_alloc_buffer_size(alloc, buffer); + + return buffer_size >= bytes && + offset <= buffer_size - bytes && + IS_ALIGNED(offset, sizeof(u32)) && + !buffer->free && + (!buffer->allow_user_free || !buffer->transaction); +} + +/** + * binder_alloc_get_page() - get kernel pointer for given buffer offset + * @alloc: binder_alloc for this proc + * @buffer: binder buffer to be accessed + * @buffer_offset: offset into @buffer data + * @pgoffp: address to copy final page offset to + * + * Lookup the struct page corresponding to the address + * at @buffer_offset into @buffer->user_data. If @pgoffp is not + * NULL, the byte-offset into the page is written there. + * + * The caller is responsible to ensure that the offset points + * to a valid address within the @buffer and that @buffer is + * not freeable by the user. Since it can't be freed, we are + * guaranteed that the corresponding elements of @alloc->pages[] + * cannot change. + * + * Return: struct page + */ +static struct page *binder_alloc_get_page(struct binder_alloc *alloc, + struct binder_buffer *buffer, + binder_size_t buffer_offset, + pgoff_t *pgoffp) +{ + binder_size_t buffer_space_offset = buffer_offset + + (buffer->user_data - alloc->buffer); + pgoff_t pgoff = buffer_space_offset & ~PAGE_MASK; + size_t index = buffer_space_offset >> PAGE_SHIFT; + struct binder_lru_page *lru_page; + + lru_page = &alloc->pages[index]; + *pgoffp = pgoff; + return lru_page->page_ptr; +} + +/** + * binder_alloc_copy_user_to_buffer() - copy src user to tgt user + * @alloc: binder_alloc for this proc + * @buffer: binder buffer to be accessed + * @buffer_offset: offset into @buffer data + * @from: userspace pointer to source buffer + * @bytes: bytes to copy + * + * Copy bytes from source userspace to target buffer. + * + * Return: bytes remaining to be copied + */ +unsigned long +binder_alloc_copy_user_to_buffer(struct binder_alloc *alloc, + struct binder_buffer *buffer, + binder_size_t buffer_offset, + const void __user *from, + size_t bytes) +{ + if (!check_buffer(alloc, buffer, buffer_offset, bytes)) + return bytes; + + while (bytes) { + unsigned long size; + unsigned long ret; + struct page *page; + pgoff_t pgoff; + void *kptr; + + page = binder_alloc_get_page(alloc, buffer, + buffer_offset, &pgoff); + size = min_t(size_t, bytes, PAGE_SIZE - pgoff); + kptr = kmap(page) + pgoff; + ret = copy_from_user(kptr, from, size); + kunmap(page); + if (ret) + return bytes - size + ret; + bytes -= size; + from += size; + buffer_offset += size; + } + return 0; +} + +static void binder_alloc_do_buffer_copy(struct binder_alloc *alloc, + bool to_buffer, + struct binder_buffer *buffer, + binder_size_t buffer_offset, + void *ptr, + size_t bytes) +{ + /* All copies must be 32-bit aligned and 32-bit size */ + BUG_ON(!check_buffer(alloc, buffer, buffer_offset, bytes)); + + while (bytes) { + unsigned long size; + struct page *page; + pgoff_t pgoff; + void *tmpptr; + void *base_ptr; + + page = binder_alloc_get_page(alloc, buffer, + buffer_offset, &pgoff); + size = min_t(size_t, bytes, PAGE_SIZE - pgoff); + base_ptr = kmap_atomic(page); + tmpptr = base_ptr + pgoff; + if (to_buffer) + memcpy(tmpptr, ptr, size); + else + memcpy(ptr, tmpptr, size); + /* + * kunmap_atomic() takes care of flushing the cache + * if this device has VIVT cache arch + */ + kunmap_atomic(base_ptr); + bytes -= size; + pgoff = 0; + ptr = ptr + size; + buffer_offset += size; + } +} + +void binder_alloc_copy_to_buffer(struct binder_alloc *alloc, + struct binder_buffer *buffer, + binder_size_t buffer_offset, + void *src, + size_t bytes) +{ + binder_alloc_do_buffer_copy(alloc, true, buffer, buffer_offset, + src, bytes); +} + +void binder_alloc_copy_from_buffer(struct binder_alloc *alloc, + void *dest, + struct binder_buffer *buffer, + binder_size_t buffer_offset, + size_t bytes) +{ + binder_alloc_do_buffer_copy(alloc, false, buffer, buffer_offset, + dest, bytes); +} + diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h index fb3238c74c8a..b60d161b7a7a 100644 --- a/drivers/android/binder_alloc.h +++ b/drivers/android/binder_alloc.h @@ -22,6 +22,7 @@ #include #include #include +#include extern struct list_lru binder_alloc_lru; struct binder_transaction; @@ -30,16 +31,16 @@ struct binder_transaction; * struct binder_buffer - buffer used for binder transactions * @entry: entry alloc->buffers * @rb_node: node for allocated_buffers/free_buffers rb trees - * @free: true if buffer is free - * @allow_user_free: describe the second member of struct blah, - * @async_transaction: describe the second member of struct blah, - * @debug_id: describe the second member of struct blah, - * @transaction: describe the second member of struct blah, - * @target_node: describe the second member of struct blah, - * @data_size: describe the second member of struct blah, - * @offsets_size: describe the second member of struct blah, - * @extra_buffers_size: describe the second member of struct blah, - * @data:i describe the second member of struct blah, + * @free: %true if buffer is free + * @allow_user_free: %true if user is allowed to free buffer + * @async_transaction: %true if buffer is in use for an async txn + * @debug_id: unique ID for debugging + * @transaction: pointer to associated struct binder_transaction + * @target_node: struct binder_node associated with this buffer + * @data_size: size of @transaction data + * @offsets_size: size of array of offsets + * @extra_buffers_size: size of space for other objects (like sg lists) + * @user_data: user pointer to base of buffer space * * Bookkeeping structure for binder transaction buffers */ @@ -58,7 +59,7 @@ struct binder_buffer { size_t data_size; size_t offsets_size; size_t extra_buffers_size; - void *data; + void __user *user_data; }; /** @@ -81,7 +82,6 @@ struct binder_lru_page { * (invariant after init) * @vma_vm_mm: copy of vma->vm_mm (invarient after mmap) * @buffer: base of per-proc address space mapped via mmap - * @user_buffer_offset: offset between user and kernel VAs for buffer * @buffers: list of all buffers for this proc * @free_buffers: rb tree of buffers available for allocation * sorted by size @@ -102,8 +102,7 @@ struct binder_alloc { struct mutex mutex; struct vm_area_struct *vma; struct mm_struct *vma_vm_mm; - void *buffer; - ptrdiff_t user_buffer_offset; + void __user *buffer; struct list_head buffers; struct rb_root free_buffers; struct rb_root allocated_buffers; @@ -162,26 +161,24 @@ binder_alloc_get_free_async_space(struct binder_alloc *alloc) return free_async_space; } -/** - * binder_alloc_get_user_buffer_offset() - get offset between kernel/user addrs - * @alloc: binder_alloc for this proc - * - * Return: the offset between kernel and user-space addresses to use for - * virtual address conversion - */ -static inline ptrdiff_t -binder_alloc_get_user_buffer_offset(struct binder_alloc *alloc) -{ - /* - * user_buffer_offset is constant if vma is set and - * undefined if vma is not set. It is possible to - * get here with !alloc->vma if the target process - * is dying while a transaction is being initiated. - * Returning the old value is ok in this case and - * the transaction will fail. - */ - return alloc->user_buffer_offset; -} +unsigned long +binder_alloc_copy_user_to_buffer(struct binder_alloc *alloc, + struct binder_buffer *buffer, + binder_size_t buffer_offset, + const void __user *from, + size_t bytes); + +void binder_alloc_copy_to_buffer(struct binder_alloc *alloc, + struct binder_buffer *buffer, + binder_size_t buffer_offset, + void *src, + size_t bytes); + +void binder_alloc_copy_from_buffer(struct binder_alloc *alloc, + void *dest, + struct binder_buffer *buffer, + binder_size_t buffer_offset, + size_t bytes); #endif /* _LINUX_BINDER_ALLOC_H */ diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c index 8bd7bcef967d..b72708918b06 100644 --- a/drivers/android/binder_alloc_selftest.c +++ b/drivers/android/binder_alloc_selftest.c @@ -102,11 +102,12 @@ static bool check_buffer_pages_allocated(struct binder_alloc *alloc, struct binder_buffer *buffer, size_t size) { - void *page_addr, *end; + void __user *page_addr; + void __user *end; int page_index; - end = (void *)PAGE_ALIGN((uintptr_t)buffer->data + size); - page_addr = buffer->data; + end = (void __user *)PAGE_ALIGN((uintptr_t)buffer->user_data + size); + page_addr = buffer->user_data; for (; page_addr < end; page_addr += PAGE_SIZE) { page_index = (page_addr - alloc->buffer) / PAGE_SIZE; if (!alloc->pages[page_index].page_ptr || diff --git a/drivers/android/binder_trace.h b/drivers/android/binder_trace.h index b11dffc521e8..7674231af8cb 100644 --- a/drivers/android/binder_trace.h +++ b/drivers/android/binder_trace.h @@ -296,7 +296,7 @@ DEFINE_EVENT(binder_buffer_class, binder_transaction_failed_buffer_release, TRACE_EVENT(binder_update_page_range, TP_PROTO(struct binder_alloc *alloc, bool allocate, - void *start, void *end), + void __user *start, void __user *end), TP_ARGS(alloc, allocate, start, end), TP_STRUCT__entry( __field(int, proc) diff --git a/drivers/bluetooth/hci_h4.c b/drivers/bluetooth/hci_h4.c index 3b82a87224a9..d428117c97c3 100644 --- a/drivers/bluetooth/hci_h4.c +++ b/drivers/bluetooth/hci_h4.c @@ -174,6 +174,10 @@ struct sk_buff *h4_recv_buf(struct hci_dev *hdev, struct sk_buff *skb, struct hci_uart *hu = hci_get_drvdata(hdev); u8 alignment = hu->alignment ? hu->alignment : 1; + /* Check for error from previous call */ + if (IS_ERR(skb)) + skb = NULL; + while (count) { int i, len; diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c index 30bbe19b4b85..3b63a781f10f 100644 --- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -207,11 +207,11 @@ static void hci_uart_init_work(struct work_struct *work) err = hci_register_dev(hu->hdev); if (err < 0) { BT_ERR("Can't register HCI device"); + clear_bit(HCI_UART_PROTO_READY, &hu->flags); + hu->proto->close(hu); hdev = hu->hdev; hu->hdev = NULL; hci_free_dev(hdev); - clear_bit(HCI_UART_PROTO_READY, &hu->flags); - hu->proto->close(hu); return; } @@ -612,6 +612,7 @@ static void hci_uart_tty_receive(struct tty_struct *tty, const u8 *data, static int hci_uart_register_dev(struct hci_uart *hu) { struct hci_dev *hdev; + int err; BT_DBG(""); @@ -655,11 +656,22 @@ static int hci_uart_register_dev(struct hci_uart *hu) else hdev->dev_type = HCI_PRIMARY; + /* Only call open() for the protocol after hdev is fully initialized as + * open() (or a timer/workqueue it starts) may attempt to reference it. + */ + err = hu->proto->open(hu); + if (err) { + hu->hdev = NULL; + hci_free_dev(hdev); + return err; + } + if (test_bit(HCI_UART_INIT_PENDING, &hu->hdev_flags)) return 0; if (hci_register_dev(hdev) < 0) { BT_ERR("Can't register HCI device"); + hu->proto->close(hu); hu->hdev = NULL; hci_free_dev(hdev); return -ENODEV; @@ -679,20 +691,14 @@ static int hci_uart_set_proto(struct hci_uart *hu, int id) if (!p) return -EPROTONOSUPPORT; - err = p->open(hu); - if (err) - return err; - hu->proto = p; - set_bit(HCI_UART_PROTO_READY, &hu->flags); err = hci_uart_register_dev(hu); if (err) { - clear_bit(HCI_UART_PROTO_READY, &hu->flags); - p->close(hu); return err; } + set_bit(HCI_UART_PROTO_READY, &hu->flags); return 0; } diff --git a/drivers/gpu/drm/drm_mode_object.c b/drivers/gpu/drm/drm_mode_object.c index 1055533792f3..5b692ce6a45d 100644 --- a/drivers/gpu/drm/drm_mode_object.c +++ b/drivers/gpu/drm/drm_mode_object.c @@ -432,12 +432,13 @@ static int set_property_atomic(struct drm_mode_object *obj, struct drm_modeset_acquire_ctx ctx; int ret; - drm_modeset_acquire_init(&ctx, 0); - state = drm_atomic_state_alloc(dev); if (!state) return -ENOMEM; + + drm_modeset_acquire_init(&ctx, 0); state->acquire_ctx = &ctx; + retry: if (prop == state->dev->mode_config.dpms_property) { if (obj->type != DRM_MODE_OBJECT_CONNECTOR) { diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c index d23a18aae476..3ba9b6ad0281 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c @@ -588,11 +588,9 @@ static int vmw_fb_set_par(struct fb_info *info) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_PVSYNC) }; - struct drm_display_mode *old_mode; struct drm_display_mode *mode; int ret; - old_mode = par->set_mode; mode = drm_mode_duplicate(vmw_priv->dev, &new_mode); if (!mode) { DRM_ERROR("Could not create new fb mode.\n"); @@ -603,11 +601,7 @@ static int vmw_fb_set_par(struct fb_info *info) mode->vdisplay = var->yres; vmw_guess_mode_timing(mode); - if (old_mode && drm_mode_equal(old_mode, mode)) { - drm_mode_destroy(vmw_priv->dev, mode); - mode = old_mode; - old_mode = NULL; - } else if (!vmw_kms_validate_mode_vram(vmw_priv, + if (!vmw_kms_validate_mode_vram(vmw_priv, mode->hdisplay * DIV_ROUND_UP(var->bits_per_pixel, 8), mode->vdisplay)) { @@ -677,8 +671,8 @@ static int vmw_fb_set_par(struct fb_info *info) schedule_delayed_work(&par->local_work, 0); out_unlock: - if (old_mode) - drm_mode_destroy(vmw_priv->dev, old_mode); + if (par->set_mode) + drm_mode_destroy(vmw_priv->dev, par->set_mode); par->set_mode = mode; drm_modeset_unlock_all(vmw_priv->dev); diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig index d853373162c7..e0c7a6f358df 100644 --- a/drivers/input/misc/Kconfig +++ b/drivers/input/misc/Kconfig @@ -396,17 +396,6 @@ config INPUT_ATI_REMOTE2 To compile this driver as a module, choose M here: the module will be called ati_remote2. -config INPUT_KEYCHORD - tristate "Key chord input driver support" - help - Say Y here if you want to enable the key chord driver - accessible at /dev/keychord. This driver can be used - for receiving notifications when client specified key - combinations are pressed. - - To compile this driver as a module, choose M here: the - module will be called keychord. - config INPUT_KEYSPAN_REMOTE tristate "Keyspan DMR USB remote control" depends on USB_ARCH_HAS_HCD diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile index 137aee4ee2e3..137da08f75a8 100644 --- a/drivers/input/misc/Makefile +++ b/drivers/input/misc/Makefile @@ -44,7 +44,6 @@ obj-$(CONFIG_INPUT_HBTP_INPUT) += hbtp_input.o obj-$(CONFIG_HP_SDC_RTC) += hp_sdc_rtc.o obj-$(CONFIG_INPUT_IMS_PCU) += ims-pcu.o obj-$(CONFIG_INPUT_IXP4XX_BEEPER) += ixp4xx-beeper.o -obj-$(CONFIG_INPUT_KEYCHORD) += keychord.o obj-$(CONFIG_INPUT_KEYSPAN_REMOTE) += keyspan_remote.o obj-$(CONFIG_INPUT_KXTJ9) += kxtj9.o obj-$(CONFIG_INPUT_M68K_BEEP) += m68kspkr.o diff --git a/drivers/input/misc/keychord.c b/drivers/input/misc/keychord.c deleted file mode 100644 index 791f285b0c13..000000000000 --- a/drivers/input/misc/keychord.c +++ /dev/null @@ -1,467 +0,0 @@ -/* - * drivers/input/misc/keychord.c - * - * Copyright (C) 2008 Google, Inc. - * Author: Mike Lockwood - * - * This software is licensed under the terms of the GNU General Public - * License version 2, as published by the Free Software Foundation, and - * may be copied, distributed, and modified under those terms. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * -*/ - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#define KEYCHORD_NAME "keychord" -#define BUFFER_SIZE 16 - -MODULE_AUTHOR("Mike Lockwood "); -MODULE_DESCRIPTION("Key chord input driver"); -MODULE_SUPPORTED_DEVICE("keychord"); -MODULE_LICENSE("GPL"); - -#define NEXT_KEYCHORD(kc) ((struct input_keychord *) \ - ((char *)kc + sizeof(struct input_keychord) + \ - kc->count * sizeof(kc->keycodes[0]))) - -struct keychord_device { - struct input_handler input_handler; - int registered; - - /* list of keychords to monitor */ - struct input_keychord *keychords; - int keychord_count; - - /* bitmask of keys contained in our keychords */ - unsigned long keybit[BITS_TO_LONGS(KEY_CNT)]; - /* current state of the keys */ - unsigned long keystate[BITS_TO_LONGS(KEY_CNT)]; - /* number of keys that are currently pressed */ - int key_down; - - /* second input_device_id is needed for null termination */ - struct input_device_id device_ids[2]; - - spinlock_t lock; - wait_queue_head_t waitq; - unsigned char head; - unsigned char tail; - __u16 buff[BUFFER_SIZE]; - /* Bit to serialize writes to this device */ -#define KEYCHORD_BUSY 0x01 - unsigned long flags; - wait_queue_head_t write_waitq; -}; - -static int check_keychord(struct keychord_device *kdev, - struct input_keychord *keychord) -{ - int i; - - if (keychord->count != kdev->key_down) - return 0; - - for (i = 0; i < keychord->count; i++) { - if (!test_bit(keychord->keycodes[i], kdev->keystate)) - return 0; - } - - /* we have a match */ - return 1; -} - -static void keychord_event(struct input_handle *handle, unsigned int type, - unsigned int code, int value) -{ - struct keychord_device *kdev = handle->private; - struct input_keychord *keychord; - unsigned long flags; - int i, got_chord = 0; - - if (type != EV_KEY || code >= KEY_MAX) - return; - - spin_lock_irqsave(&kdev->lock, flags); - /* do nothing if key state did not change */ - if (!test_bit(code, kdev->keystate) == !value) - goto done; - __change_bit(code, kdev->keystate); - if (value) - kdev->key_down++; - else - kdev->key_down--; - - /* don't notify on key up */ - if (!value) - goto done; - /* ignore this event if it is not one of the keys we are monitoring */ - if (!test_bit(code, kdev->keybit)) - goto done; - - keychord = kdev->keychords; - if (!keychord) - goto done; - - /* check to see if the keyboard state matches any keychords */ - for (i = 0; i < kdev->keychord_count; i++) { - if (check_keychord(kdev, keychord)) { - kdev->buff[kdev->head] = keychord->id; - kdev->head = (kdev->head + 1) % BUFFER_SIZE; - got_chord = 1; - break; - } - /* skip to next keychord */ - keychord = NEXT_KEYCHORD(keychord); - } - -done: - spin_unlock_irqrestore(&kdev->lock, flags); - - if (got_chord) { - pr_info("keychord: got keychord id %d. Any tasks: %d\n", - keychord->id, - !list_empty_careful(&kdev->waitq.head)); - wake_up_interruptible(&kdev->waitq); - } -} - -static int keychord_connect(struct input_handler *handler, - struct input_dev *dev, - const struct input_device_id *id) -{ - int i, ret; - struct input_handle *handle; - struct keychord_device *kdev = - container_of(handler, struct keychord_device, input_handler); - - /* - * ignore this input device if it does not contain any keycodes - * that we are monitoring - */ - for (i = 0; i < KEY_MAX; i++) { - if (test_bit(i, kdev->keybit) && test_bit(i, dev->keybit)) - break; - } - if (i == KEY_MAX) - return -ENODEV; - - handle = kzalloc(sizeof(*handle), GFP_KERNEL); - if (!handle) - return -ENOMEM; - - handle->dev = dev; - handle->handler = handler; - handle->name = KEYCHORD_NAME; - handle->private = kdev; - - ret = input_register_handle(handle); - if (ret) - goto err_input_register_handle; - - ret = input_open_device(handle); - if (ret) - goto err_input_open_device; - - pr_info("keychord: using input dev %s for fevent\n", dev->name); - return 0; - -err_input_open_device: - input_unregister_handle(handle); -err_input_register_handle: - kfree(handle); - return ret; -} - -static void keychord_disconnect(struct input_handle *handle) -{ - input_close_device(handle); - input_unregister_handle(handle); - kfree(handle); -} - -/* - * keychord_read is used to read keychord events from the driver - */ -static ssize_t keychord_read(struct file *file, char __user *buffer, - size_t count, loff_t *ppos) -{ - struct keychord_device *kdev = file->private_data; - __u16 id; - int retval; - unsigned long flags; - - if (count < sizeof(id)) - return -EINVAL; - count = sizeof(id); - - if (kdev->head == kdev->tail && (file->f_flags & O_NONBLOCK)) - return -EAGAIN; - - retval = wait_event_interruptible(kdev->waitq, - kdev->head != kdev->tail); - if (retval) - return retval; - - spin_lock_irqsave(&kdev->lock, flags); - /* pop a keychord ID off the queue */ - id = kdev->buff[kdev->tail]; - kdev->tail = (kdev->tail + 1) % BUFFER_SIZE; - spin_unlock_irqrestore(&kdev->lock, flags); - - if (copy_to_user(buffer, &id, count)) - return -EFAULT; - - return count; -} - -/* - * serializes writes on a device. can use mutex_lock_interruptible() - * for this particular use case as well - a matter of preference. - */ -static int -keychord_write_lock(struct keychord_device *kdev) -{ - int ret; - unsigned long flags; - - spin_lock_irqsave(&kdev->lock, flags); - while (kdev->flags & KEYCHORD_BUSY) { - spin_unlock_irqrestore(&kdev->lock, flags); - ret = wait_event_interruptible(kdev->write_waitq, - ((kdev->flags & KEYCHORD_BUSY) == 0)); - if (ret) - return ret; - spin_lock_irqsave(&kdev->lock, flags); - } - kdev->flags |= KEYCHORD_BUSY; - spin_unlock_irqrestore(&kdev->lock, flags); - return 0; -} - -static void -keychord_write_unlock(struct keychord_device *kdev) -{ - unsigned long flags; - - spin_lock_irqsave(&kdev->lock, flags); - kdev->flags &= ~KEYCHORD_BUSY; - spin_unlock_irqrestore(&kdev->lock, flags); - wake_up_interruptible(&kdev->write_waitq); -} - -/* - * keychord_write is used to configure the driver - */ -static ssize_t keychord_write(struct file *file, const char __user *buffer, - size_t count, loff_t *ppos) -{ - struct keychord_device *kdev = file->private_data; - struct input_keychord *keychords = 0; - struct input_keychord *keychord; - int ret, i, key; - unsigned long flags; - size_t resid = count; - size_t key_bytes; - - if (count < sizeof(struct input_keychord) || count > PAGE_SIZE) - return -EINVAL; - keychords = kzalloc(count, GFP_KERNEL); - if (!keychords) - return -ENOMEM; - - /* read list of keychords from userspace */ - if (copy_from_user(keychords, buffer, count)) { - kfree(keychords); - return -EFAULT; - } - - /* - * Serialize writes to this device to prevent various races. - * 1) writers racing here could do duplicate input_unregister_handler() - * calls, resulting in attempting to unlink a node from a list that - * does not exist. - * 2) writers racing here could do duplicate input_register_handler() calls - * below, resulting in a duplicate insertion of a node into the list. - * 3) a double kfree of keychords can occur (in the event that - * input_register_handler() fails below. - */ - ret = keychord_write_lock(kdev); - if (ret) { - kfree(keychords); - return ret; - } - - /* unregister handler before changing configuration */ - if (kdev->registered) { - input_unregister_handler(&kdev->input_handler); - kdev->registered = 0; - } - - spin_lock_irqsave(&kdev->lock, flags); - /* clear any existing configuration */ - kfree(kdev->keychords); - kdev->keychords = 0; - kdev->keychord_count = 0; - kdev->key_down = 0; - memset(kdev->keybit, 0, sizeof(kdev->keybit)); - memset(kdev->keystate, 0, sizeof(kdev->keystate)); - kdev->head = kdev->tail = 0; - - keychord = keychords; - - while (resid > 0) { - /* Is the entire keychord entry header present ? */ - if (resid < sizeof(struct input_keychord)) { - pr_err("keychord: Insufficient bytes present for header %zu\n", - resid); - goto err_unlock_return; - } - resid -= sizeof(struct input_keychord); - if (keychord->count <= 0) { - pr_err("keychord: invalid keycode count %d\n", - keychord->count); - goto err_unlock_return; - } - key_bytes = keychord->count * sizeof(keychord->keycodes[0]); - /* Do we have all the expected keycodes ? */ - if (resid < key_bytes) { - pr_err("keychord: Insufficient bytes present for keycount %zu\n", - resid); - goto err_unlock_return; - } - resid -= key_bytes; - - if (keychord->version != KEYCHORD_VERSION) { - pr_err("keychord: unsupported version %d\n", - keychord->version); - goto err_unlock_return; - } - - /* keep track of the keys we are monitoring in keybit */ - for (i = 0; i < keychord->count; i++) { - key = keychord->keycodes[i]; - if (key < 0 || key >= KEY_CNT) { - pr_err("keychord: keycode %d out of range\n", - key); - goto err_unlock_return; - } - __set_bit(key, kdev->keybit); - } - - kdev->keychord_count++; - keychord = NEXT_KEYCHORD(keychord); - } - - kdev->keychords = keychords; - spin_unlock_irqrestore(&kdev->lock, flags); - - ret = input_register_handler(&kdev->input_handler); - if (ret) { - kfree(keychords); - kdev->keychords = 0; - keychord_write_unlock(kdev); - return ret; - } - kdev->registered = 1; - - keychord_write_unlock(kdev); - - return count; - -err_unlock_return: - spin_unlock_irqrestore(&kdev->lock, flags); - kfree(keychords); - keychord_write_unlock(kdev); - return -EINVAL; -} - -static unsigned int keychord_poll(struct file *file, poll_table *wait) -{ - struct keychord_device *kdev = file->private_data; - - poll_wait(file, &kdev->waitq, wait); - - if (kdev->head != kdev->tail) - return POLLIN | POLLRDNORM; - - return 0; -} - -static int keychord_open(struct inode *inode, struct file *file) -{ - struct keychord_device *kdev; - - kdev = kzalloc(sizeof(struct keychord_device), GFP_KERNEL); - if (!kdev) - return -ENOMEM; - - spin_lock_init(&kdev->lock); - init_waitqueue_head(&kdev->waitq); - init_waitqueue_head(&kdev->write_waitq); - - kdev->input_handler.event = keychord_event; - kdev->input_handler.connect = keychord_connect; - kdev->input_handler.disconnect = keychord_disconnect; - kdev->input_handler.name = KEYCHORD_NAME; - kdev->input_handler.id_table = kdev->device_ids; - - kdev->device_ids[0].flags = INPUT_DEVICE_ID_MATCH_EVBIT; - __set_bit(EV_KEY, kdev->device_ids[0].evbit); - - file->private_data = kdev; - - return 0; -} - -static int keychord_release(struct inode *inode, struct file *file) -{ - struct keychord_device *kdev = file->private_data; - - if (kdev->registered) - input_unregister_handler(&kdev->input_handler); - kfree(kdev->keychords); - kfree(kdev); - - return 0; -} - -static const struct file_operations keychord_fops = { - .owner = THIS_MODULE, - .open = keychord_open, - .release = keychord_release, - .read = keychord_read, - .write = keychord_write, - .poll = keychord_poll, -}; - -static struct miscdevice keychord_misc = { - .fops = &keychord_fops, - .name = KEYCHORD_NAME, - .minor = MISC_DYNAMIC_MINOR, -}; - -static int __init keychord_init(void) -{ - return misc_register(&keychord_misc); -} - -static void __exit keychord_exit(void) -{ - misc_deregister(&keychord_misc); -} - -module_init(keychord_init); -module_exit(keychord_exit); diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 78b97f31a1f2..bd339bfe0d15 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2548,7 +2548,12 @@ static int map_sg(struct device *dev, struct scatterlist *sglist, /* Everything is mapped - write the right values into s->dma_address */ for_each_sg(sglist, s, nelems, i) { - s->dma_address += address + s->offset; + /* + * Add in the remaining piece of the scatter-gather offset that + * was masked out when we were determining the physical address + * via (sg_phys(s) & PAGE_MASK) earlier. + */ + s->dma_address += address + (s->offset & ~PAGE_MASK); s->dma_length = s->length; } diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index 088ebca3e128..c805d628d04d 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -606,4 +606,17 @@ config DM_ANDROID_VERITY_AT_MOST_ONCE_DEFAULT_ENABLED any more after all the data blocks it covers have been verified anyway. If unsure, say N. + +config DM_BOW + tristate "Backup block device" + depends on BLK_DEV_DM + select DM_BUFIO + ---help--- + This device-mapper target takes a device and keeps a log of all + changes using free blocks identified by issuing a trim command. + This can then be restored by running a command line utility, + or committed by simply replacing the target. + + If unsure, say N. + endif # MD diff --git a/drivers/md/Makefile b/drivers/md/Makefile index fb2e9a64378b..1a03ebd1cee7 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -65,6 +65,7 @@ obj-$(CONFIG_DM_LOG_WRITES) += dm-log-writes.o obj-$(CONFIG_DM_INTEGRITY) += dm-integrity.o obj-$(CONFIG_DM_ZONED) += dm-zoned.o obj-$(CONFIG_DM_ANDROID_VERITY) += dm-android-verity.o +obj-$(CONFIG_DM_BOW) += dm-bow.o ifeq ($(CONFIG_DM_UEVENT),y) dm-mod-objs += dm-uevent.o diff --git a/drivers/md/dm-bow.c b/drivers/md/dm-bow.c new file mode 100644 index 000000000000..b92da30a3d42 --- /dev/null +++ b/drivers/md/dm-bow.c @@ -0,0 +1,1233 @@ +/* + * Copyright (C) 2018 Google Limited. + * + * This file is released under the GPL. + */ + +#include "dm.h" +#include "dm-bufio.h" +#include "dm-core.h" + +#include +#include + +#define DM_MSG_PREFIX "bow" +#define SECTOR_SIZE 512 + +struct log_entry { + u64 source; + u64 dest; + u32 size; + u32 checksum; +} __packed; + +struct log_sector { + u32 magic; + u16 header_version; + u16 header_size; + u32 block_size; + u32 count; + u32 sequence; + sector_t sector0; + struct log_entry entries[]; +} __packed; + +/* + * MAGIC is BOW in ascii + */ +#define MAGIC 0x00574f42 +#define HEADER_VERSION 0x0100 + +/* + * A sorted set of ranges representing the state of the data on the device. + * Use an rb_tree for fast lookup of a given sector + * Consecutive ranges are always of different type - operations on this + * set must merge matching consecutive ranges. + * + * Top range is always of type TOP + */ +struct bow_range { + struct rb_node node; + sector_t sector; + enum { + INVALID, /* Type not set */ + SECTOR0, /* First sector - holds log record */ + SECTOR0_CURRENT,/* Live contents of sector0 */ + UNCHANGED, /* Original contents */ + TRIMMED, /* Range has been trimmed */ + CHANGED, /* Range has been changed */ + BACKUP, /* Range is being used as a backup */ + TOP, /* Final range - sector is size of device */ + } type; + struct list_head trimmed_list; /* list of TRIMMED ranges */ +}; + +static const char * const readable_type[] = { + "Invalid", + "Sector0", + "Sector0_current", + "Unchanged", + "Free", + "Changed", + "Backup", + "Top", +}; + +enum state { + TRIM, + CHECKPOINT, + COMMITTED, +}; + +struct bow_context { + struct dm_dev *dev; + u32 block_size; + u32 block_shift; + struct workqueue_struct *workqueue; + struct dm_bufio_client *bufio; + struct mutex ranges_lock; /* Hold to access this struct and/or ranges */ + struct rb_root ranges; + struct dm_kobject_holder kobj_holder; /* for sysfs attributes */ + atomic_t state; /* One of the enum state values above */ + u64 trims_total; + struct log_sector *log_sector; + struct list_head trimmed_list; + bool forward_trims; +}; + +sector_t range_top(struct bow_range *br) +{ + return container_of(rb_next(&br->node), struct bow_range, node) + ->sector; +} + +u64 range_size(struct bow_range *br) +{ + return (range_top(br) - br->sector) * SECTOR_SIZE; +} + +static sector_t bvec_top(struct bvec_iter *bi_iter) +{ + return bi_iter->bi_sector + bi_iter->bi_size / SECTOR_SIZE; +} + +/* + * Find the first range that overlaps with bi_iter + * bi_iter is set to the size of the overlapping sub-range + */ +static struct bow_range *find_first_overlapping_range(struct rb_root *ranges, + struct bvec_iter *bi_iter) +{ + struct rb_node *node = ranges->rb_node; + struct bow_range *br; + + while (node) { + br = container_of(node, struct bow_range, node); + + if (br->sector <= bi_iter->bi_sector + && bi_iter->bi_sector < range_top(br)) + break; + + if (bi_iter->bi_sector < br->sector) + node = node->rb_left; + else + node = node->rb_right; + } + + WARN_ON(!node); + if (!node) + return NULL; + + if (range_top(br) - bi_iter->bi_sector + < bi_iter->bi_size >> SECTOR_SHIFT) + bi_iter->bi_size = (range_top(br) - bi_iter->bi_sector) + << SECTOR_SHIFT; + + return br; +} + +void add_before(struct rb_root *ranges, struct bow_range *new_br, + struct bow_range *existing) +{ + struct rb_node *parent = &(existing->node); + struct rb_node **link = &(parent->rb_left); + + while (*link) { + parent = *link; + link = &((*link)->rb_right); + } + + rb_link_node(&new_br->node, parent, link); + rb_insert_color(&new_br->node, ranges); +} + +/* + * Given a range br returned by find_first_overlapping_range, split br into a + * leading range, a range matching the bi_iter and a trailing range. + * Leading and trailing may end up size 0 and will then be deleted. The + * new range matching the bi_iter is then returned and should have its type + * and type specific fields populated. + * If bi_iter runs off the end of the range, bi_iter is truncated accordingly + */ +static int split_range(struct bow_context *bc, struct bow_range **br, + struct bvec_iter *bi_iter) +{ + struct bow_range *new_br; + + if (bi_iter->bi_sector < (*br)->sector) { + WARN_ON(true); + return BLK_STS_IOERR; + } + + if (bi_iter->bi_sector > (*br)->sector) { + struct bow_range *leading_br = + kzalloc(sizeof(*leading_br), GFP_KERNEL); + + if (!leading_br) + return BLK_STS_RESOURCE; + + *leading_br = **br; + if (leading_br->type == TRIMMED) + list_add(&leading_br->trimmed_list, &bc->trimmed_list); + + add_before(&bc->ranges, leading_br, *br); + (*br)->sector = bi_iter->bi_sector; + } + + if (bvec_top(bi_iter) >= range_top(*br)) { + bi_iter->bi_size = (range_top(*br) - (*br)->sector) + * SECTOR_SIZE; + return BLK_STS_OK; + } + + /* new_br will be the beginning, existing br will be the tail */ + new_br = kzalloc(sizeof(*new_br), GFP_KERNEL); + if (!new_br) + return BLK_STS_RESOURCE; + + new_br->sector = (*br)->sector; + (*br)->sector = bvec_top(bi_iter); + add_before(&bc->ranges, new_br, *br); + *br = new_br; + + return BLK_STS_OK; +} + +/* + * Sets type of a range. May merge range into surrounding ranges + * Since br may be invalidated, always sets br to NULL to prevent + * usage after this is called + */ +static void set_type(struct bow_context *bc, struct bow_range **br, int type) +{ + struct bow_range *prev = container_of(rb_prev(&(*br)->node), + struct bow_range, node); + struct bow_range *next = container_of(rb_next(&(*br)->node), + struct bow_range, node); + + if ((*br)->type == TRIMMED) { + bc->trims_total -= range_size(*br); + list_del(&(*br)->trimmed_list); + } + + if (type == TRIMMED) { + bc->trims_total += range_size(*br); + list_add(&(*br)->trimmed_list, &bc->trimmed_list); + } + + (*br)->type = type; + + if (next->type == type) { + if (type == TRIMMED) + list_del(&next->trimmed_list); + rb_erase(&next->node, &bc->ranges); + kfree(next); + } + + if (prev->type == type) { + if (type == TRIMMED) + list_del(&(*br)->trimmed_list); + rb_erase(&(*br)->node, &bc->ranges); + kfree(*br); + } + + *br = NULL; +} + +static struct bow_range *find_free_range(struct bow_context *bc) +{ + if (list_empty(&bc->trimmed_list)) { + DMERR("Unable to find free space to back up to"); + return NULL; + } + + return list_first_entry(&bc->trimmed_list, struct bow_range, + trimmed_list); +} + +static sector_t sector_to_page(struct bow_context const *bc, sector_t sector) +{ + WARN_ON((sector & (((sector_t)1 << (bc->block_shift - SECTOR_SHIFT)) - 1)) + != 0); + return sector >> (bc->block_shift - SECTOR_SHIFT); +} + +static int copy_data(struct bow_context const *bc, + struct bow_range *source, struct bow_range *dest, + u32 *checksum) +{ + int i; + + if (range_size(source) != range_size(dest)) { + WARN_ON(1); + return BLK_STS_IOERR; + } + + if (checksum) + *checksum = sector_to_page(bc, source->sector); + + for (i = 0; i < range_size(source) >> bc->block_shift; ++i) { + struct dm_buffer *read_buffer, *write_buffer; + u8 *read, *write; + sector_t page = sector_to_page(bc, source->sector) + i; + + read = dm_bufio_read(bc->bufio, page, &read_buffer); + if (IS_ERR(read)) { + DMERR("Cannot read page %llu", + (unsigned long long)page); + return PTR_ERR(read); + } + + if (checksum) + *checksum = crc32(*checksum, read, bc->block_size); + + write = dm_bufio_new(bc->bufio, + sector_to_page(bc, dest->sector) + i, + &write_buffer); + if (IS_ERR(write)) { + DMERR("Cannot write sector"); + dm_bufio_release(read_buffer); + return PTR_ERR(write); + } + + memcpy(write, read, bc->block_size); + + dm_bufio_mark_buffer_dirty(write_buffer); + dm_bufio_release(write_buffer); + dm_bufio_release(read_buffer); + } + + dm_bufio_write_dirty_buffers(bc->bufio); + return BLK_STS_OK; +} + +/****** logging functions ******/ + +static int add_log_entry(struct bow_context *bc, sector_t source, sector_t dest, + unsigned int size, u32 checksum); + +static int backup_log_sector(struct bow_context *bc) +{ + struct bow_range *first_br, *free_br; + struct bvec_iter bi_iter; + u32 checksum = 0; + int ret; + + first_br = container_of(rb_first(&bc->ranges), struct bow_range, node); + + if (first_br->type != SECTOR0) { + WARN_ON(1); + return BLK_STS_IOERR; + } + + if (range_size(first_br) != bc->block_size) { + WARN_ON(1); + return BLK_STS_IOERR; + } + + free_br = find_free_range(bc); + /* No space left - return this error to userspace */ + if (!free_br) + return BLK_STS_NOSPC; + bi_iter.bi_sector = free_br->sector; + bi_iter.bi_size = bc->block_size; + ret = split_range(bc, &free_br, &bi_iter); + if (ret) + return ret; + if (bi_iter.bi_size != bc->block_size) { + WARN_ON(1); + return BLK_STS_IOERR; + } + + ret = copy_data(bc, first_br, free_br, &checksum); + if (ret) + return ret; + + bc->log_sector->count = 0; + bc->log_sector->sequence++; + ret = add_log_entry(bc, first_br->sector, free_br->sector, + range_size(first_br), checksum); + if (ret) + return ret; + + set_type(bc, &free_br, BACKUP); + return BLK_STS_OK; +} + +static int add_log_entry(struct bow_context *bc, sector_t source, sector_t dest, + unsigned int size, u32 checksum) +{ + struct dm_buffer *sector_buffer; + u8 *sector; + + if (sizeof(struct log_sector) + + sizeof(struct log_entry) * (bc->log_sector->count + 1) + > bc->block_size) { + int ret = backup_log_sector(bc); + + if (ret) + return ret; + } + + sector = dm_bufio_new(bc->bufio, 0, §or_buffer); + if (IS_ERR(sector)) { + DMERR("Cannot write boot sector"); + dm_bufio_release(sector_buffer); + return BLK_STS_NOSPC; + } + + bc->log_sector->entries[bc->log_sector->count].source = source; + bc->log_sector->entries[bc->log_sector->count].dest = dest; + bc->log_sector->entries[bc->log_sector->count].size = size; + bc->log_sector->entries[bc->log_sector->count].checksum = checksum; + bc->log_sector->count++; + + memcpy(sector, bc->log_sector, bc->block_size); + dm_bufio_mark_buffer_dirty(sector_buffer); + dm_bufio_release(sector_buffer); + dm_bufio_write_dirty_buffers(bc->bufio); + return BLK_STS_OK; +} + +static int prepare_log(struct bow_context *bc) +{ + struct bow_range *free_br, *first_br; + struct bvec_iter bi_iter; + u32 checksum = 0; + int ret; + + /* Carve out first sector as log sector */ + first_br = container_of(rb_first(&bc->ranges), struct bow_range, node); + if (first_br->type != UNCHANGED) { + WARN_ON(1); + return BLK_STS_IOERR; + } + + if (range_size(first_br) < bc->block_size) { + WARN_ON(1); + return BLK_STS_IOERR; + } + bi_iter.bi_sector = 0; + bi_iter.bi_size = bc->block_size; + ret = split_range(bc, &first_br, &bi_iter); + if (ret) + return ret; + first_br->type = SECTOR0; + if (range_size(first_br) != bc->block_size) { + WARN_ON(1); + return BLK_STS_IOERR; + } + + /* Find free sector for active sector0 reads/writes */ + free_br = find_free_range(bc); + if (!free_br) + return BLK_STS_NOSPC; + bi_iter.bi_sector = free_br->sector; + bi_iter.bi_size = bc->block_size; + ret = split_range(bc, &free_br, &bi_iter); + if (ret) + return ret; + free_br->type = SECTOR0_CURRENT; + + /* Copy data */ + ret = copy_data(bc, first_br, free_br, NULL); + if (ret) + return ret; + + bc->log_sector->sector0 = free_br->sector; + + /* Find free sector to back up original sector zero */ + free_br = find_free_range(bc); + if (!free_br) + return BLK_STS_NOSPC; + bi_iter.bi_sector = free_br->sector; + bi_iter.bi_size = bc->block_size; + ret = split_range(bc, &free_br, &bi_iter); + if (ret) + return ret; + + /* Back up */ + ret = copy_data(bc, first_br, free_br, &checksum); + if (ret) + return ret; + + /* + * Set up our replacement boot sector - it will get written when we + * add the first log entry, which we do immediately + */ + bc->log_sector->magic = MAGIC; + bc->log_sector->header_version = HEADER_VERSION; + bc->log_sector->header_size = sizeof(*bc->log_sector); + bc->log_sector->block_size = bc->block_size; + bc->log_sector->count = 0; + bc->log_sector->sequence = 0; + + /* Add log entry */ + ret = add_log_entry(bc, first_br->sector, free_br->sector, + range_size(first_br), checksum); + if (ret) + return ret; + + set_type(bc, &free_br, BACKUP); + return BLK_STS_OK; +} + +static struct bow_range *find_sector0_current(struct bow_context *bc) +{ + struct bvec_iter bi_iter; + + bi_iter.bi_sector = bc->log_sector->sector0; + bi_iter.bi_size = bc->block_size; + return find_first_overlapping_range(&bc->ranges, &bi_iter); +} + +/****** sysfs interface functions ******/ + +static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct bow_context *bc = container_of(kobj, struct bow_context, + kobj_holder.kobj); + + return scnprintf(buf, PAGE_SIZE, "%d\n", atomic_read(&bc->state)); +} + +static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct bow_context *bc = container_of(kobj, struct bow_context, + kobj_holder.kobj); + enum state state, original_state; + int ret; + + state = buf[0] - '0'; + if (state < TRIM || state > COMMITTED) { + DMERR("State value %d out of range", state); + return -EINVAL; + } + + mutex_lock(&bc->ranges_lock); + original_state = atomic_read(&bc->state); + if (state != original_state + 1) { + DMERR("Invalid state change from %d to %d", + original_state, state); + ret = -EINVAL; + goto bad; + } + + DMINFO("Switching to state %s", state == CHECKPOINT ? "Checkpoint" + : state == COMMITTED ? "Committed" : "Unknown"); + + if (state == CHECKPOINT) { + ret = prepare_log(bc); + if (ret) { + DMERR("Failed to switch to checkpoint state"); + goto bad; + } + } else if (state == COMMITTED) { + struct bow_range *br = find_sector0_current(bc); + struct bow_range *sector0_br = + container_of(rb_first(&bc->ranges), struct bow_range, + node); + + ret = copy_data(bc, br, sector0_br, 0); + if (ret) { + DMERR("Failed to switch to committed state"); + goto bad; + } + } + atomic_inc(&bc->state); + ret = count; + +bad: + mutex_unlock(&bc->ranges_lock); + return ret; +} + +static ssize_t free_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct bow_context *bc = container_of(kobj, struct bow_context, + kobj_holder.kobj); + u64 trims_total; + + mutex_lock(&bc->ranges_lock); + trims_total = bc->trims_total; + mutex_unlock(&bc->ranges_lock); + + return scnprintf(buf, PAGE_SIZE, "%llu\n", trims_total); +} + +static struct kobj_attribute attr_state = __ATTR_RW(state); +static struct kobj_attribute attr_free = __ATTR_RO(free); + +static struct attribute *bow_attrs[] = { + &attr_state.attr, + &attr_free.attr, + NULL +}; + +static struct kobj_type bow_ktype = { + .sysfs_ops = &kobj_sysfs_ops, + .default_attrs = bow_attrs, + .release = dm_kobject_release +}; + +/****** constructor/destructor ******/ + +static void dm_bow_dtr(struct dm_target *ti) +{ + struct bow_context *bc = (struct bow_context *) ti->private; + struct kobject *kobj; + + while (rb_first(&bc->ranges)) { + struct bow_range *br = container_of(rb_first(&bc->ranges), + struct bow_range, node); + + rb_erase(&br->node, &bc->ranges); + kfree(br); + } + if (bc->workqueue) + destroy_workqueue(bc->workqueue); + if (bc->bufio) + dm_bufio_client_destroy(bc->bufio); + + kobj = &bc->kobj_holder.kobj; + if (kobj->state_initialized) { + kobject_put(kobj); + wait_for_completion(dm_get_completion_from_kobject(kobj)); + } + + kfree(bc->log_sector); + kfree(bc); +} + +static int dm_bow_ctr(struct dm_target *ti, unsigned int argc, char **argv) +{ + struct bow_context *bc; + struct bow_range *br; + int ret; + struct mapped_device *md = dm_table_get_md(ti->table); + + if (argc != 1) { + ti->error = "Invalid argument count"; + return -EINVAL; + } + + bc = kzalloc(sizeof(*bc), GFP_KERNEL); + if (!bc) { + ti->error = "Cannot allocate bow context"; + return -ENOMEM; + } + + ti->num_flush_bios = 1; + ti->num_discard_bios = 1; + ti->num_write_same_bios = 1; + ti->private = bc; + + ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), + &bc->dev); + if (ret) { + ti->error = "Device lookup failed"; + goto bad; + } + + if (bc->dev->bdev->bd_queue->limits.max_discard_sectors == 0) { + bc->dev->bdev->bd_queue->limits.discard_granularity = 1 << 12; + bc->dev->bdev->bd_queue->limits.max_hw_discard_sectors = 1 << 15; + bc->dev->bdev->bd_queue->limits.max_discard_sectors = 1 << 15; + bc->forward_trims = false; + } else { + bc->forward_trims = true; + } + + bc->block_size = bc->dev->bdev->bd_queue->limits.logical_block_size; + bc->block_shift = ilog2(bc->block_size); + bc->log_sector = kzalloc(bc->block_size, GFP_KERNEL); + if (!bc->log_sector) { + ti->error = "Cannot allocate log sector"; + goto bad; + } + + init_completion(&bc->kobj_holder.completion); + ret = kobject_init_and_add(&bc->kobj_holder.kobj, &bow_ktype, + &disk_to_dev(dm_disk(md))->kobj, "%s", + "bow"); + if (ret) { + ti->error = "Cannot create sysfs node"; + goto bad; + } + + mutex_init(&bc->ranges_lock); + bc->ranges = RB_ROOT; + bc->bufio = dm_bufio_client_create(bc->dev->bdev, bc->block_size, 1, 0, + NULL, NULL); + if (IS_ERR(bc->bufio)) { + ti->error = "Cannot initialize dm-bufio"; + ret = PTR_ERR(bc->bufio); + bc->bufio = NULL; + goto bad; + } + + bc->workqueue = alloc_workqueue("dm-bow", + WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM + | WQ_UNBOUND, num_online_cpus()); + if (!bc->workqueue) { + ti->error = "Cannot allocate workqueue"; + ret = -ENOMEM; + goto bad; + } + + INIT_LIST_HEAD(&bc->trimmed_list); + + br = kzalloc(sizeof(*br), GFP_KERNEL); + if (!br) { + ti->error = "Cannot allocate ranges"; + ret = -ENOMEM; + goto bad; + } + + br->sector = ti->len; + br->type = TOP; + rb_link_node(&br->node, NULL, &bc->ranges.rb_node); + rb_insert_color(&br->node, &bc->ranges); + + br = kzalloc(sizeof(*br), GFP_KERNEL); + if (!br) { + ti->error = "Cannot allocate ranges"; + ret = -ENOMEM; + goto bad; + } + + br->sector = 0; + br->type = UNCHANGED; + rb_link_node(&br->node, bc->ranges.rb_node, + &bc->ranges.rb_node->rb_left); + rb_insert_color(&br->node, &bc->ranges); + + ti->discards_supported = true; + + return 0; + +bad: + dm_bow_dtr(ti); + return ret; +} + +/****** Handle writes ******/ + +static int prepare_unchanged_range(struct bow_context *bc, struct bow_range *br, + struct bvec_iter *bi_iter, + bool record_checksum) +{ + struct bow_range *backup_br; + struct bvec_iter backup_bi; + sector_t log_source, log_dest; + unsigned int log_size; + u32 checksum = 0; + int ret; + int original_type; + sector_t sector0; + + /* Find a free range */ + backup_br = find_free_range(bc); + if (!backup_br) + return BLK_STS_NOSPC; + + /* Carve out a backup range. This may be smaller than the br given */ + backup_bi.bi_sector = backup_br->sector; + backup_bi.bi_size = min(range_size(backup_br), (u64) bi_iter->bi_size); + ret = split_range(bc, &backup_br, &backup_bi); + if (ret) + return ret; + + /* + * Carve out a changed range. This will not be smaller than the backup + * br since the backup br is smaller than the source range and iterator + */ + bi_iter->bi_size = backup_bi.bi_size; + ret = split_range(bc, &br, bi_iter); + if (ret) + return ret; + if (range_size(br) != range_size(backup_br)) { + WARN_ON(1); + return BLK_STS_IOERR; + } + + + /* Copy data over */ + ret = copy_data(bc, br, backup_br, record_checksum ? &checksum : NULL); + if (ret) + return ret; + + /* Add an entry to the log */ + log_source = br->sector; + log_dest = backup_br->sector; + log_size = range_size(br); + + /* + * Set the types. Note that since set_type also amalgamates ranges + * we have to set both sectors to their final type before calling + * set_type on either + */ + original_type = br->type; + sector0 = backup_br->sector; + if (backup_br->type == TRIMMED) + list_del(&backup_br->trimmed_list); + backup_br->type = br->type == SECTOR0_CURRENT ? SECTOR0_CURRENT + : BACKUP; + br->type = CHANGED; + set_type(bc, &backup_br, backup_br->type); + + /* + * Add the log entry after marking the backup sector, since adding a log + * can cause another backup + */ + ret = add_log_entry(bc, log_source, log_dest, log_size, checksum); + if (ret) { + br->type = original_type; + return ret; + } + + /* Now it is safe to mark this backup successful */ + if (original_type == SECTOR0_CURRENT) + bc->log_sector->sector0 = sector0; + + set_type(bc, &br, br->type); + return ret; +} + +static int prepare_free_range(struct bow_context *bc, struct bow_range *br, + struct bvec_iter *bi_iter) +{ + int ret; + + ret = split_range(bc, &br, bi_iter); + if (ret) + return ret; + set_type(bc, &br, CHANGED); + return BLK_STS_OK; +} + +static int prepare_changed_range(struct bow_context *bc, struct bow_range *br, + struct bvec_iter *bi_iter) +{ + /* Nothing to do ... */ + return BLK_STS_OK; +} + +static int prepare_one_range(struct bow_context *bc, + struct bvec_iter *bi_iter) +{ + struct bow_range *br = find_first_overlapping_range(&bc->ranges, + bi_iter); + switch (br->type) { + case CHANGED: + return prepare_changed_range(bc, br, bi_iter); + + case TRIMMED: + return prepare_free_range(bc, br, bi_iter); + + case UNCHANGED: + case BACKUP: + return prepare_unchanged_range(bc, br, bi_iter, true); + + /* + * We cannot track the checksum for the active sector0, since it + * may change at any point. + */ + case SECTOR0_CURRENT: + return prepare_unchanged_range(bc, br, bi_iter, false); + + case SECTOR0: /* Handled in the dm_bow_map */ + case TOP: /* Illegal - top is off the end of the device */ + default: + WARN_ON(1); + return BLK_STS_IOERR; + } +} + +struct write_work { + struct work_struct work; + struct bow_context *bc; + struct bio *bio; +}; + +static void bow_write(struct work_struct *work) +{ + struct write_work *ww = container_of(work, struct write_work, work); + struct bow_context *bc = ww->bc; + struct bio *bio = ww->bio; + struct bvec_iter bi_iter = bio->bi_iter; + int ret = BLK_STS_OK; + + kfree(ww); + + mutex_lock(&bc->ranges_lock); + do { + ret = prepare_one_range(bc, &bi_iter); + bi_iter.bi_sector += bi_iter.bi_size / SECTOR_SIZE; + bi_iter.bi_size = bio->bi_iter.bi_size + - (bi_iter.bi_sector - bio->bi_iter.bi_sector) + * SECTOR_SIZE; + } while (!ret && bi_iter.bi_size); + + mutex_unlock(&bc->ranges_lock); + + if (!ret) { + bio_set_dev(bio, bc->dev->bdev); + submit_bio(bio); + } else { + DMERR("Write failure with error %d", -ret); + bio->bi_status = ret; + bio_endio(bio); + } +} + +static int queue_write(struct bow_context *bc, struct bio *bio) +{ + struct write_work *ww = kmalloc(sizeof(*ww), GFP_NOIO | __GFP_NORETRY + | __GFP_NOMEMALLOC | __GFP_NOWARN); + if (!ww) { + DMERR("Failed to allocate write_work"); + return -ENOMEM; + } + + INIT_WORK(&ww->work, bow_write); + ww->bc = bc; + ww->bio = bio; + queue_work(bc->workqueue, &ww->work); + return DM_MAPIO_SUBMITTED; +} + +static int handle_sector0(struct bow_context *bc, struct bio *bio) +{ + int ret = DM_MAPIO_REMAPPED; + + if (bio->bi_iter.bi_size > bc->block_size) { + struct bio * split = bio_split(bio, + bc->block_size >> SECTOR_SHIFT, + GFP_NOIO, + fs_bio_set); + if (!split) { + DMERR("Failed to split bio"); + bio->bi_status = BLK_STS_RESOURCE; + bio_endio(bio); + return DM_MAPIO_SUBMITTED; + } + + bio_chain(split, bio); + split->bi_iter.bi_sector = bc->log_sector->sector0; + bio_set_dev(split, bc->dev->bdev); + submit_bio(split); + + if (bio_data_dir(bio) == WRITE) + ret = queue_write(bc, bio); + } else { + bio->bi_iter.bi_sector = bc->log_sector->sector0; + } + + return ret; +} + +static int add_trim(struct bow_context *bc, struct bio *bio) +{ + struct bow_range *br; + struct bvec_iter bi_iter = bio->bi_iter; + + DMDEBUG("add_trim: %llu, %u", + (unsigned long long)bio->bi_iter.bi_sector, + bio->bi_iter.bi_size); + + do { + br = find_first_overlapping_range(&bc->ranges, &bi_iter); + + switch (br->type) { + case UNCHANGED: + if (!split_range(bc, &br, &bi_iter)) + set_type(bc, &br, TRIMMED); + break; + + case TRIMMED: + /* Nothing to do */ + break; + + default: + /* No other case is legal in TRIM state */ + WARN_ON(true); + break; + } + + bi_iter.bi_sector += bi_iter.bi_size / SECTOR_SIZE; + bi_iter.bi_size = bio->bi_iter.bi_size + - (bi_iter.bi_sector - bio->bi_iter.bi_sector) + * SECTOR_SIZE; + + } while (bi_iter.bi_size); + + bio_endio(bio); + return DM_MAPIO_SUBMITTED; +} + +static int remove_trim(struct bow_context *bc, struct bio *bio) +{ + struct bow_range *br; + struct bvec_iter bi_iter = bio->bi_iter; + + DMDEBUG("remove_trim: %llu, %u", + (unsigned long long)bio->bi_iter.bi_sector, + bio->bi_iter.bi_size); + + do { + br = find_first_overlapping_range(&bc->ranges, &bi_iter); + + switch (br->type) { + case UNCHANGED: + /* Nothing to do */ + break; + + case TRIMMED: + if (!split_range(bc, &br, &bi_iter)) + set_type(bc, &br, UNCHANGED); + break; + + default: + /* No other case is legal in TRIM state */ + WARN_ON(true); + break; + } + + bi_iter.bi_sector += bi_iter.bi_size / SECTOR_SIZE; + bi_iter.bi_size = bio->bi_iter.bi_size + - (bi_iter.bi_sector - bio->bi_iter.bi_sector) + * SECTOR_SIZE; + + } while (bi_iter.bi_size); + + return DM_MAPIO_REMAPPED; +} + +int remap_unless_illegal_trim(struct bow_context *bc, struct bio *bio) +{ + if (!bc->forward_trims && bio_op(bio) == REQ_OP_DISCARD) { + bio->bi_status = BLK_STS_NOTSUPP; + bio_endio(bio); + return DM_MAPIO_SUBMITTED; + } else { + bio_set_dev(bio, bc->dev->bdev); + return DM_MAPIO_REMAPPED; + } +} + +/****** dm interface ******/ + +static int dm_bow_map(struct dm_target *ti, struct bio *bio) +{ + int ret = DM_MAPIO_REMAPPED; + struct bow_context *bc = ti->private; + + if (likely(bc->state.counter == COMMITTED)) + return remap_unless_illegal_trim(bc, bio); + + if (bio_data_dir(bio) == READ && bio->bi_iter.bi_sector != 0) + return remap_unless_illegal_trim(bc, bio); + + if (atomic_read(&bc->state) != COMMITTED) { + enum state state; + + mutex_lock(&bc->ranges_lock); + state = atomic_read(&bc->state); + if (state == TRIM) { + if (bio_op(bio) == REQ_OP_DISCARD) + ret = add_trim(bc, bio); + else if (bio_data_dir(bio) == WRITE) + ret = remove_trim(bc, bio); + else + /* pass-through */; + } else if (state == CHECKPOINT) { + if (bio->bi_iter.bi_sector == 0) + ret = handle_sector0(bc, bio); + else if (bio_data_dir(bio) == WRITE) + ret = queue_write(bc, bio); + else + /* pass-through */; + } else { + /* pass-through */ + } + mutex_unlock(&bc->ranges_lock); + } + + if (ret == DM_MAPIO_REMAPPED) + return remap_unless_illegal_trim(bc, bio); + + return ret; +} + +static void dm_bow_tablestatus(struct dm_target *ti, char *result, + unsigned int maxlen) +{ + char *end = result + maxlen; + struct bow_context *bc = ti->private; + struct rb_node *i; + int trimmed_list_length = 0; + int trimmed_range_count = 0; + struct bow_range *br; + + if (maxlen == 0) + return; + result[0] = 0; + + list_for_each_entry(br, &bc->trimmed_list, trimmed_list) + if (br->type == TRIMMED) { + ++trimmed_list_length; + } else { + scnprintf(result, end - result, + "ERROR: non-trimmed entry in trimmed_list"); + return; + } + + if (!rb_first(&bc->ranges)) { + scnprintf(result, end - result, "ERROR: Empty ranges"); + return; + } + + if (container_of(rb_first(&bc->ranges), struct bow_range, node) + ->sector) { + scnprintf(result, end - result, + "ERROR: First range does not start at sector 0"); + return; + } + + for (i = rb_first(&bc->ranges); i; i = rb_next(i)) { + struct bow_range *br = container_of(i, struct bow_range, node); + + result += scnprintf(result, end - result, "%s: %llu", + readable_type[br->type], + (unsigned long long)br->sector); + if (result >= end) + return; + + result += scnprintf(result, end - result, "\n"); + if (result >= end) + return; + + if (br->type == TRIMMED) + ++trimmed_range_count; + + if (br->type == TOP) { + if (br->sector != ti->len) { + scnprintf(result, end - result, + "\nERROR: Top sector is incorrect"); + } + + if (&br->node != rb_last(&bc->ranges)) { + scnprintf(result, end - result, + "\nERROR: Top sector is not last"); + } + + break; + } + + if (!rb_next(i)) { + scnprintf(result, end - result, + "\nERROR: Last range not of type TOP"); + return; + } + + if (br->sector > range_top(br)) { + scnprintf(result, end - result, + "\nERROR: sectors out of order"); + return; + } + } + + if (trimmed_range_count != trimmed_list_length) + scnprintf(result, end - result, + "\nERROR: not all trimmed ranges in trimmed list"); +} + +static void dm_bow_status(struct dm_target *ti, status_type_t type, + unsigned int status_flags, char *result, + unsigned int maxlen) +{ + switch (type) { + case STATUSTYPE_INFO: + if (maxlen) + result[0] = 0; + break; + + case STATUSTYPE_TABLE: + dm_bow_tablestatus(ti, result, maxlen); + break; + } +} + +int dm_bow_prepare_ioctl(struct dm_target *ti, struct block_device **bdev, + fmode_t *mode) +{ + struct bow_context *bc = ti->private; + struct dm_dev *dev = bc->dev; + + *bdev = dev->bdev; + /* Only pass ioctls through if the device sizes match exactly. */ + return ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT; +} + +static int dm_bow_iterate_devices(struct dm_target *ti, + iterate_devices_callout_fn fn, void *data) +{ + struct bow_context *bc = ti->private; + + return fn(ti, bc->dev, 0, ti->len, data); +} + +static struct target_type bow_target = { + .name = "bow", + .version = {1, 1, 1}, + .module = THIS_MODULE, + .ctr = dm_bow_ctr, + .dtr = dm_bow_dtr, + .map = dm_bow_map, + .status = dm_bow_status, + .prepare_ioctl = dm_bow_prepare_ioctl, + .iterate_devices = dm_bow_iterate_devices, +}; + +int __init dm_bow_init(void) +{ + int r = dm_register_target(&bow_target); + + if (r < 0) + DMERR("registering bow failed %d", r); + return r; +} + +void dm_bow_exit(void) +{ + dm_unregister_target(&bow_target); +} + +MODULE_LICENSE("GPL"); + +module_init(dm_bow_init); +module_exit(dm_bow_exit); diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c index 9f2a64cb691d..21102ea81307 100644 --- a/drivers/media/usb/uvc/uvc_ctrl.c +++ b/drivers/media/usb/uvc/uvc_ctrl.c @@ -1203,7 +1203,7 @@ static void uvc_ctrl_fill_event(struct uvc_video_chain *chain, __uvc_query_v4l2_ctrl(chain, ctrl, mapping, &v4l2_ctrl); - memset(ev->reserved, 0, sizeof(ev->reserved)); + memset(ev, 0, sizeof(*ev)); ev->type = V4L2_EVENT_CTRL; ev->id = v4l2_ctrl.id; ev->u.ctrl.value = value; diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c index 79350167f51b..57a7d17705c0 100644 --- a/drivers/media/v4l2-core/v4l2-ctrls.c +++ b/drivers/media/v4l2-core/v4l2-ctrls.c @@ -1248,7 +1248,7 @@ static u32 user_flags(const struct v4l2_ctrl *ctrl) static void fill_event(struct v4l2_event *ev, struct v4l2_ctrl *ctrl, u32 changes) { - memset(ev->reserved, 0, sizeof(ev->reserved)); + memset(ev, 0, sizeof(*ev)); ev->type = V4L2_EVENT_CTRL; ev->id = ctrl->id; ev->u.ctrl.changes = changes; diff --git a/drivers/mmc/host/pxamci.c b/drivers/mmc/host/pxamci.c index c763b404510f..3e139692fe8f 100644 --- a/drivers/mmc/host/pxamci.c +++ b/drivers/mmc/host/pxamci.c @@ -181,7 +181,7 @@ static void pxamci_dma_irq(void *param); static void pxamci_setup_data(struct pxamci_host *host, struct mmc_data *data) { struct dma_async_tx_descriptor *tx; - enum dma_data_direction direction; + enum dma_transfer_direction direction; struct dma_slave_config config; struct dma_chan *chan; unsigned int nob = data->blocks; diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c index 8cb47858eb00..ab8eb9cdfda0 100644 --- a/drivers/net/wireless/ath/ath10k/wmi.c +++ b/drivers/net/wireless/ath/ath10k/wmi.c @@ -4309,7 +4309,7 @@ static void ath10k_tpc_config_disp_tables(struct ath10k *ar, rate_code[i], type); snprintf(buff, sizeof(buff), "%8d ", tpc[j]); - strncat(tpc_value, buff, strlen(buff)); + strlcat(tpc_value, buff, sizeof(tpc_value)); } tpc_stats->tpc_table[type].pream_idx[i] = pream_idx; tpc_stats->tpc_table[type].rate_code[i] = rate_code[i]; diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c index 7c621877a939..abcbf0770358 100644 --- a/drivers/pci/dwc/pcie-designware-ep.c +++ b/drivers/pci/dwc/pcie-designware-ep.c @@ -35,8 +35,10 @@ static void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar) u32 reg; reg = PCI_BASE_ADDRESS_0 + (4 * bar); + dw_pcie_dbi_ro_wr_en(pci); dw_pcie_writel_dbi2(pci, reg, 0x0); dw_pcie_writel_dbi(pci, reg, 0x0); + dw_pcie_dbi_ro_wr_dis(pci); } static int dw_pcie_ep_write_header(struct pci_epc *epc, @@ -45,6 +47,7 @@ static int dw_pcie_ep_write_header(struct pci_epc *epc, struct dw_pcie_ep *ep = epc_get_drvdata(epc); struct dw_pcie *pci = to_dw_pcie_from_ep(ep); + dw_pcie_dbi_ro_wr_en(pci); dw_pcie_writew_dbi(pci, PCI_VENDOR_ID, hdr->vendorid); dw_pcie_writew_dbi(pci, PCI_DEVICE_ID, hdr->deviceid); dw_pcie_writeb_dbi(pci, PCI_REVISION_ID, hdr->revid); @@ -58,6 +61,7 @@ static int dw_pcie_ep_write_header(struct pci_epc *epc, dw_pcie_writew_dbi(pci, PCI_SUBSYSTEM_ID, hdr->subsys_id); dw_pcie_writeb_dbi(pci, PCI_INTERRUPT_PIN, hdr->interrupt_pin); + dw_pcie_dbi_ro_wr_dis(pci); return 0; } @@ -142,8 +146,10 @@ static int dw_pcie_ep_set_bar(struct pci_epc *epc, enum pci_barno bar, if (ret) return ret; + dw_pcie_dbi_ro_wr_en(pci); dw_pcie_writel_dbi2(pci, reg, size - 1); dw_pcie_writel_dbi(pci, reg, flags); + dw_pcie_dbi_ro_wr_dis(pci); return 0; } @@ -214,8 +220,12 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 encode_int) struct dw_pcie_ep *ep = epc_get_drvdata(epc); struct dw_pcie *pci = to_dw_pcie_from_ep(ep); - val = (encode_int << MSI_CAP_MMC_SHIFT); + val = dw_pcie_readw_dbi(pci, MSI_MESSAGE_CONTROL); + val &= ~MSI_CAP_MMC_MASK; + val |= (encode_int << MSI_CAP_MMC_SHIFT) & MSI_CAP_MMC_MASK; + dw_pcie_dbi_ro_wr_en(pci); dw_pcie_writew_dbi(pci, MSI_MESSAGE_CONTROL, val); + dw_pcie_dbi_ro_wr_dis(pci); return 0; } diff --git a/drivers/pci/dwc/pcie-designware.h b/drivers/pci/dwc/pcie-designware.h index 3551dd607b90..5af29d125c7e 100644 --- a/drivers/pci/dwc/pcie-designware.h +++ b/drivers/pci/dwc/pcie-designware.h @@ -99,6 +99,7 @@ #define MSI_MESSAGE_CONTROL 0x52 #define MSI_CAP_MMC_SHIFT 1 +#define MSI_CAP_MMC_MASK (7 << MSI_CAP_MMC_SHIFT) #define MSI_CAP_MME_SHIFT 4 #define MSI_CAP_MSI_EN_MASK 0x1 #define MSI_CAP_MME_MASK (7 << MSI_CAP_MME_SHIFT) diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c index 42c2a1156325..cd7d4788b94d 100644 --- a/drivers/pci/endpoint/pci-epc-core.c +++ b/drivers/pci/endpoint/pci-epc-core.c @@ -18,7 +18,6 @@ */ #include -#include #include #include #include @@ -371,7 +370,6 @@ EXPORT_SYMBOL_GPL(pci_epc_write_header); int pci_epc_add_epf(struct pci_epc *epc, struct pci_epf *epf) { unsigned long flags; - struct device *dev = epc->dev.parent; if (epf->epc) return -EBUSY; @@ -383,12 +381,6 @@ int pci_epc_add_epf(struct pci_epc *epc, struct pci_epf *epf) return -EINVAL; epf->epc = epc; - if (dev->of_node) { - of_dma_configure(&epf->dev, dev->of_node); - } else { - dma_set_coherent_mask(&epf->dev, epc->dev.coherent_dma_mask); - epf->dev.dma_mask = epc->dev.dma_mask; - } spin_lock_irqsave(&epc->lock, flags); list_add_tail(&epf->list, &epc->pci_epf); @@ -503,9 +495,7 @@ __pci_epc_create(struct device *dev, const struct pci_epc_ops *ops, INIT_LIST_HEAD(&epc->pci_epf); device_initialize(&epc->dev); - dma_set_coherent_mask(&epc->dev, dev->coherent_dma_mask); epc->dev.class = pci_epc_class; - epc->dev.dma_mask = dev->dma_mask; epc->dev.parent = dev; epc->ops = ops; diff --git a/drivers/pci/endpoint/pci-epf-core.c b/drivers/pci/endpoint/pci-epf-core.c index ae1611a62808..95ccc4b8a0a2 100644 --- a/drivers/pci/endpoint/pci-epf-core.c +++ b/drivers/pci/endpoint/pci-epf-core.c @@ -99,7 +99,7 @@ EXPORT_SYMBOL_GPL(pci_epf_bind); */ void pci_epf_free_space(struct pci_epf *epf, void *addr, enum pci_barno bar) { - struct device *dev = &epf->dev; + struct device *dev = epf->epc->dev.parent; if (!addr) return; @@ -122,7 +122,7 @@ EXPORT_SYMBOL_GPL(pci_epf_free_space); void *pci_epf_alloc_space(struct pci_epf *epf, size_t size, enum pci_barno bar) { void *space; - struct device *dev = &epf->dev; + struct device *dev = epf->epc->dev.parent; dma_addr_t phys_addr; if (size < 128) diff --git a/drivers/power/supply/charger-manager.c b/drivers/power/supply/charger-manager.c index 6502fa7c2106..f60dfc213257 100644 --- a/drivers/power/supply/charger-manager.c +++ b/drivers/power/supply/charger-manager.c @@ -1212,7 +1212,6 @@ static int charger_extcon_init(struct charger_manager *cm, if (ret < 0) { pr_info("Cannot register extcon_dev for %s(cable: %s)\n", cable->extcon_name, cable->name); - ret = -EINVAL; } return ret; @@ -1629,7 +1628,7 @@ static int charger_manager_probe(struct platform_device *pdev) if (IS_ERR(desc)) { dev_err(&pdev->dev, "No platform data (desc) found\n"); - return -ENODEV; + return PTR_ERR(desc); } cm = devm_kzalloc(&pdev->dev, sizeof(*cm), GFP_KERNEL); diff --git a/drivers/rtc/rtc-lib.c b/drivers/rtc/rtc-lib.c index 1ae7da5cfc60..ad5bb21908e5 100644 --- a/drivers/rtc/rtc-lib.c +++ b/drivers/rtc/rtc-lib.c @@ -52,13 +52,11 @@ EXPORT_SYMBOL(rtc_year_days); */ void rtc_time64_to_tm(time64_t time, struct rtc_time *tm) { - unsigned int month, year; - unsigned long secs; + unsigned int month, year, secs; int days; /* time must be positive */ - days = div_s64(time, 86400); - secs = time - (unsigned int) days * 86400; + days = div_s64_rem(time, 86400, &secs); /* day of the week, 1970-01-01 was a Thursday */ tm->tm_wday = (days + 4) % 7; diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 53eb27731373..07c23bbd968c 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -96,6 +96,7 @@ static int client_reserve = 1; static char partition_name[96] = "UNKNOWN"; static unsigned int partition_number = -1; static LIST_HEAD(ibmvscsi_head); +static DEFINE_SPINLOCK(ibmvscsi_driver_lock); static struct scsi_transport_template *ibmvscsi_transport_template; @@ -2274,7 +2275,9 @@ static int ibmvscsi_probe(struct vio_dev *vdev, const struct vio_device_id *id) } dev_set_drvdata(&vdev->dev, hostdata); + spin_lock(&ibmvscsi_driver_lock); list_add_tail(&hostdata->host_list, &ibmvscsi_head); + spin_unlock(&ibmvscsi_driver_lock); return 0; add_srp_port_failed: @@ -2296,15 +2299,27 @@ static int ibmvscsi_probe(struct vio_dev *vdev, const struct vio_device_id *id) static int ibmvscsi_remove(struct vio_dev *vdev) { struct ibmvscsi_host_data *hostdata = dev_get_drvdata(&vdev->dev); - list_del(&hostdata->host_list); - unmap_persist_bufs(hostdata); + unsigned long flags; + + srp_remove_host(hostdata->host); + scsi_remove_host(hostdata->host); + + purge_requests(hostdata, DID_ERROR); + + spin_lock_irqsave(hostdata->host->host_lock, flags); release_event_pool(&hostdata->pool, hostdata); + spin_unlock_irqrestore(hostdata->host->host_lock, flags); + ibmvscsi_release_crq_queue(&hostdata->queue, hostdata, max_events); kthread_stop(hostdata->work_thread); - srp_remove_host(hostdata->host); - scsi_remove_host(hostdata->host); + unmap_persist_bufs(hostdata); + + spin_lock(&ibmvscsi_driver_lock); + list_del(&hostdata->host_list); + spin_unlock(&ibmvscsi_driver_lock); + scsi_host_put(hostdata->host); return 0; diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index cc6a751ba480..9f393871041a 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -3539,10 +3539,11 @@ static int ufshcd_comp_devman_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) u32 upiu_flags; int ret = 0; - if (hba->ufs_version == UFSHCI_VERSION_20) - lrbp->command_type = UTP_CMD_TYPE_UFS_STORAGE; - else + if ((hba->ufs_version == UFSHCI_VERSION_10) || + (hba->ufs_version == UFSHCI_VERSION_11)) lrbp->command_type = UTP_CMD_TYPE_DEV_MANAGE; + else + lrbp->command_type = UTP_CMD_TYPE_UFS_STORAGE; ret = ufshcd_prepare_req_desc_hdr(hba, lrbp, &upiu_flags, DMA_NONE); @@ -3567,10 +3568,11 @@ static int ufshcd_comp_scsi_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) u32 upiu_flags; int ret = 0; - if (hba->ufs_version == UFSHCI_VERSION_20) - lrbp->command_type = UTP_CMD_TYPE_UFS_STORAGE; - else + if ((hba->ufs_version == UFSHCI_VERSION_10) || + (hba->ufs_version == UFSHCI_VERSION_11)) lrbp->command_type = UTP_CMD_TYPE_SCSI; + else + lrbp->command_type = UTP_CMD_TYPE_UFS_STORAGE; if (likely(lrbp->cmd)) { ret = ufshcd_prepare_req_desc_hdr(hba, lrbp, diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c index 0fa7d2bd0e48..155153ecb894 100644 --- a/drivers/video/backlight/pwm_bl.c +++ b/drivers/video/backlight/pwm_bl.c @@ -54,10 +54,11 @@ static void pwm_backlight_power_on(struct pwm_bl_data *pb, int brightness) if (err < 0) dev_err(pb->dev, "failed to enable power supply\n"); + pwm_enable(pb->pwm); + if (pb->enable_gpio) gpiod_set_value_cansleep(pb->enable_gpio, 1); - pwm_enable(pb->pwm); pb->enabled = true; } @@ -66,12 +67,12 @@ static void pwm_backlight_power_off(struct pwm_bl_data *pb) if (!pb->enabled) return; - pwm_config(pb->pwm, 0, pb->period); - pwm_disable(pb->pwm); - if (pb->enable_gpio) gpiod_set_value_cansleep(pb->enable_gpio, 0); + pwm_config(pb->pwm, 0, pb->period); + pwm_disable(pb->pwm); + regulator_disable(pb->power_supply); pb->enabled = false; } diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 48143e32411c..1437f62d068c 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -387,7 +387,7 @@ static inline void ext4_update_inode_fsync_trans(handle_t *handle, { struct ext4_inode_info *ei = EXT4_I(inode); - if (ext4_handle_valid(handle)) { + if (ext4_handle_valid(handle) && !is_handle_aborted(handle)) { ei->i_sync_tid = handle->h_transaction->t_tid; if (datasync) ei->i_datasync_tid = handle->h_transaction->t_tid; diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 5cb9aa3ad249..1913c69498c1 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -123,7 +123,7 @@ ext4_unaligned_aio(struct inode *inode, struct iov_iter *from, loff_t pos) struct super_block *sb = inode->i_sb; int blockmask = sb->s_blocksize - 1; - if (pos >= i_size_read(inode)) + if (pos >= ALIGN(i_size_read(inode), sb->s_blocksize)) return 0; if ((pos | iov_iter_alignment(from)) & blockmask) diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c index bf7fa1507e81..9e96a0bd08d9 100644 --- a/fs/ext4/indirect.c +++ b/fs/ext4/indirect.c @@ -1387,10 +1387,14 @@ end_range: partial->p + 1, partial2->p, (chain+n-1) - partial); - BUFFER_TRACE(partial->bh, "call brelse"); - brelse(partial->bh); - BUFFER_TRACE(partial2->bh, "call brelse"); - brelse(partial2->bh); + while (partial > chain) { + BUFFER_TRACE(partial->bh, "call brelse"); + brelse(partial->bh); + } + while (partial2 > chain2) { + BUFFER_TRACE(partial2->bh, "call brelse"); + brelse(partial2->bh); + } return 0; } diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index d27d3c4458e4..5029480e69f7 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -306,8 +306,9 @@ static int f2fs_write_meta_pages(struct address_space *mapping, goto skip_write; /* collect a number of dirty meta pages and write together */ - if (wbc->for_kupdate || - get_pages(sbi, F2FS_DIRTY_META) < nr_pages_to_skip(sbi, META)) + if (wbc->sync_mode != WB_SYNC_ALL && + get_pages(sbi, F2FS_DIRTY_META) < + nr_pages_to_skip(sbi, META)) goto skip_write; /* if locked failed, cp will flush dirty pages instead */ @@ -405,7 +406,7 @@ static int f2fs_set_meta_page_dirty(struct page *page) if (!PageDirty(page)) { __set_page_dirty_nobuffers(page); inc_page_count(F2FS_P_SB(page), F2FS_DIRTY_META); - SetPagePrivate(page); + f2fs_set_page_private(page, 0); f2fs_trace_pid(page); return 1; } @@ -956,7 +957,7 @@ void f2fs_update_dirty_page(struct inode *inode, struct page *page) inode_inc_dirty_pages(inode); spin_unlock(&sbi->inode_lock[type]); - SetPagePrivate(page); + f2fs_set_page_private(page, 0); f2fs_trace_pid(page); } @@ -1259,10 +1260,17 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, struct cp_control *cpc) else __clear_ckpt_flags(ckpt, CP_DISABLED_FLAG); + if (is_sbi_flag_set(sbi, SBI_CP_DISABLED_QUICK)) + __set_ckpt_flags(ckpt, CP_DISABLED_QUICK_FLAG); + else + __clear_ckpt_flags(ckpt, CP_DISABLED_QUICK_FLAG); + if (is_sbi_flag_set(sbi, SBI_QUOTA_SKIP_FLUSH)) __set_ckpt_flags(ckpt, CP_QUOTA_NEED_FSCK_FLAG); - else - __clear_ckpt_flags(ckpt, CP_QUOTA_NEED_FSCK_FLAG); + /* + * TODO: we count on fsck.f2fs to clear this flag until we figure out + * missing cases which clear it incorrectly. + */ if (is_sbi_flag_set(sbi, SBI_QUOTA_NEED_REPAIR)) __set_ckpt_flags(ckpt, CP_QUOTA_NEED_FSCK_FLAG); diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 2b853cad0e02..39e70b1430b0 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -308,9 +308,10 @@ static inline void __submit_bio(struct f2fs_sb_info *sbi, for (; start < F2FS_IO_SIZE(sbi); start++) { struct page *page = mempool_alloc(sbi->write_io_dummy, - GFP_NOIO | __GFP_ZERO | __GFP_NOFAIL); + GFP_NOIO | __GFP_NOFAIL); f2fs_bug_on(sbi, !page); + zero_user_segment(page, 0, PAGE_SIZE); SetPagePrivate(page); set_page_private(page, (unsigned long)DUMMY_WRITTEN_PAGE); lock_page(page); @@ -1618,6 +1619,9 @@ static int f2fs_mpage_readpages(struct address_space *mapping, if (last_block > last_block_in_file) last_block = last_block_in_file; + /* just zeroing out page which is beyond EOF */ + if (block_in_file >= last_block) + goto zero_out; /* * Map blocks using the previous result first. */ @@ -1630,16 +1634,11 @@ static int f2fs_mpage_readpages(struct address_space *mapping, * Then do more f2fs_map_blocks() calls until we are * done with this page. */ - map.m_flags = 0; - - if (block_in_file < last_block) { - map.m_lblk = block_in_file; - map.m_len = last_block - block_in_file; + map.m_lblk = block_in_file; + map.m_len = last_block - block_in_file; - if (f2fs_map_blocks(inode, &map, 0, - F2FS_GET_BLOCK_DEFAULT)) - goto set_error_page; - } + if (f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_DEFAULT)) + goto set_error_page; got_it: if ((map.m_flags & F2FS_MAP_MAPPED)) { block_nr = map.m_pblk + block_in_file - map.m_lblk; @@ -1654,6 +1653,7 @@ got_it: DATA_GENERIC)) goto set_error_page; } else { +zero_out: zero_user_segment(page, 0, PAGE_SIZE); if (!PageUptodate(page)) SetPageUptodate(page); @@ -1940,8 +1940,13 @@ got_it: if (fio->need_lock == LOCK_REQ) f2fs_unlock_op(fio->sbi); err = f2fs_inplace_write_data(fio); - if (err && PageWriteback(page)) - end_page_writeback(page); + if (err) { + if (f2fs_encrypted_file(inode)) + fscrypt_pullback_bio_page(&fio->encrypted_page, + true); + if (PageWriteback(page)) + end_page_writeback(page); + } trace_f2fs_do_write_data_page(fio->page, IPU); set_inode_flag(inode, FI_UPDATE_WRITE); return err; @@ -2392,7 +2397,8 @@ static void f2fs_write_failed(struct address_space *mapping, loff_t to) down_write(&F2FS_I(inode)->i_mmap_sem); truncate_pagecache(inode, i_size); - f2fs_truncate_blocks(inode, i_size, true, true); + if (!IS_NOQUOTA(inode)) + f2fs_truncate_blocks(inode, i_size, true); up_write(&F2FS_I(inode)->i_mmap_sem); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); @@ -2673,14 +2679,11 @@ static void f2fs_dio_submit_bio(struct bio *bio, struct inode *inode, { struct f2fs_private_dio *dio; bool write = (bio_op(bio) == REQ_OP_WRITE); - int err; dio = f2fs_kzalloc(F2FS_I_SB(inode), sizeof(struct f2fs_private_dio), GFP_NOFS); - if (!dio) { - err = -ENOMEM; + if (!dio) goto out; - } dio->inode = inode; dio->orig_end_io = bio->bi_end_io; @@ -2826,12 +2829,10 @@ void f2fs_invalidate_page(struct page *page, unsigned int offset, clear_cold_data(page); - /* This is atomic written page, keep Private */ if (IS_ATOMIC_WRITTEN_PAGE(page)) return f2fs_drop_inmem_page(inode, page); - set_page_private(page, 0); - ClearPagePrivate(page); + f2fs_clear_page_private(page); } int f2fs_release_page(struct page *page, gfp_t wait) @@ -2845,8 +2846,7 @@ int f2fs_release_page(struct page *page, gfp_t wait) return 0; clear_cold_data(page); - set_page_private(page, 0); - ClearPagePrivate(page); + f2fs_clear_page_private(page); return 1; } @@ -2914,12 +2914,8 @@ int f2fs_migrate_page(struct address_space *mapping, return -EAGAIN; } - /* - * A reference is expected if PagePrivate set when move mapping, - * however F2FS breaks this for maintaining dirty page counts when - * truncating pages. So here adjusting the 'extra_count' make it work. - */ - extra_count = (atomic_written ? 1 : 0) - page_has_private(page); + /* one extra reference was held for atomic_write page */ + extra_count = atomic_written ? 1 : 0; rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, extra_count); if (rc != MIGRATEPAGE_SUCCESS) { @@ -2940,9 +2936,10 @@ int f2fs_migrate_page(struct address_space *mapping, get_page(newpage); } - if (PagePrivate(page)) - SetPagePrivate(newpage); - set_page_private(newpage, page_private(page)); + if (PagePrivate(page)) { + f2fs_set_page_private(newpage, page_private(page)); + f2fs_clear_page_private(page); + } if (mode != MIGRATE_SYNC_NO_COPY) migrate_page_copy(newpage, page); diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c index f05b37ef7182..d00ba9b711a4 100644 --- a/fs/f2fs/debug.c +++ b/fs/f2fs/debug.c @@ -522,30 +522,16 @@ void f2fs_destroy_stats(struct f2fs_sb_info *sbi) kvfree(si); } -int __init f2fs_create_root_stats(void) +void __init f2fs_create_root_stats(void) { - struct dentry *file; - f2fs_debugfs_root = debugfs_create_dir("f2fs", NULL); - if (!f2fs_debugfs_root) - return -ENOMEM; - file = debugfs_create_file("status", S_IRUGO, f2fs_debugfs_root, - NULL, &stat_fops); - if (!file) { - debugfs_remove(f2fs_debugfs_root); - f2fs_debugfs_root = NULL; - return -ENOMEM; - } - - return 0; + debugfs_create_file("status", S_IRUGO, f2fs_debugfs_root, NULL, + &stat_fops); } void f2fs_destroy_root_stats(void) { - if (!f2fs_debugfs_root) - return; - debugfs_remove_recursive(f2fs_debugfs_root); f2fs_debugfs_root = NULL; } diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 7ff9e993008e..d3eafe9e2ed2 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -728,7 +728,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, !f2fs_truncate_hole(dir, page->index, page->index + 1)) { f2fs_clear_radix_tree_dirty_tag(page); clear_page_dirty_for_io(page); - ClearPagePrivate(page); + f2fs_clear_page_private(page); ClearPageUptodate(page); clear_cold_data(page); inode_dec_dirty_pages(dir); @@ -800,6 +800,10 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, if (de->name_len == 0) { bit_pos++; ctx->pos = start_pos + bit_pos; + printk_ratelimited( + "%s, invalid namelen(0), ino:%u, run fsck to fix.", + KERN_WARNING, le32_to_cpu(de->ino)); + set_sbi_flag(sbi, SBI_NEED_FSCK); continue; } @@ -810,7 +814,8 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, /* check memory boundary before moving forward */ bit_pos += GET_DENTRY_SLOTS(le16_to_cpu(de->name_len)); - if (unlikely(bit_pos > d->max)) { + if (unlikely(bit_pos > d->max || + le16_to_cpu(de->name_len) > F2FS_NAME_LEN)) { f2fs_msg(sbi->sb, KERN_WARNING, "%s: corrupted namelen=%d, run fsck to fix.", __func__, le16_to_cpu(de->name_len)); @@ -891,7 +896,7 @@ static int f2fs_readdir(struct file *file, struct dir_context *ctx) page_cache_sync_readahead(inode->i_mapping, ra, file, n, min(npages - n, (pgoff_t)MAX_DIR_RA_PAGES)); - dentry_page = f2fs_get_lock_data_page(inode, n, false); + dentry_page = f2fs_find_data_page(inode, n); if (IS_ERR(dentry_page)) { err = PTR_ERR(dentry_page); if (err == -ENOENT) { @@ -909,11 +914,11 @@ static int f2fs_readdir(struct file *file, struct dir_context *ctx) err = f2fs_fill_dentries(ctx, &d, n * NR_DENTRY_IN_BLOCK, &fstr); if (err) { - f2fs_put_page(dentry_page, 1); + f2fs_put_page(dentry_page, 0); break; } - f2fs_put_page(dentry_page, 1); + f2fs_put_page(dentry_page, 0); } out_free: fscrypt_fname_free_buffer(&fstr); diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c index 1cb0fcc67d2d..caf77fe8ac07 100644 --- a/fs/f2fs/extent_cache.c +++ b/fs/f2fs/extent_cache.c @@ -506,7 +506,7 @@ static void f2fs_update_extent_tree_range(struct inode *inode, unsigned int end = fofs + len; unsigned int pos = (unsigned int)fofs; bool updated = false; - bool leftmost; + bool leftmost = false; if (!et) return; diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 7ade08b5ed69..4e696b65ed9b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -192,6 +192,8 @@ enum { #define DEF_CP_INTERVAL 60 /* 60 secs */ #define DEF_IDLE_INTERVAL 5 /* 5 secs */ #define DEF_DISABLE_INTERVAL 5 /* 5 secs */ +#define DEF_DISABLE_QUICK_INTERVAL 1 /* 1 secs */ +#define DEF_UMOUNT_DISCARD_TIMEOUT 5 /* 5 secs */ struct cp_control { int reason; @@ -255,7 +257,7 @@ struct discard_entry { /* max discard pend list number */ #define MAX_PLIST_NUM 512 #define plist_idx(blk_num) ((blk_num) >= MAX_PLIST_NUM ? \ - (MAX_PLIST_NUM - 1) : (blk_num - 1)) + (MAX_PLIST_NUM - 1) : ((blk_num) - 1)) enum { D_PREP, /* initial */ @@ -311,6 +313,7 @@ struct discard_policy { bool sync; /* submit discard with REQ_SYNC flag */ bool ordered; /* issue discard by lba order */ unsigned int granularity; /* discard granularity */ + int timeout; /* discard timeout for put_super */ }; struct discard_cmd_control { @@ -457,7 +460,6 @@ struct f2fs_flush_device { /* for inline stuff */ #define DEF_INLINE_RESERVED_SIZE 1 -#define DEF_MIN_INLINE_SIZE 1 static inline int get_extra_isize(struct inode *inode); static inline int get_inline_xattr_addrs(struct inode *inode); #define MAX_INLINE_DATA(inode) (sizeof(__le32) * \ @@ -1100,6 +1102,7 @@ enum { SBI_IS_SHUTDOWN, /* shutdown by ioctl */ SBI_IS_RECOVERED, /* recovered orphan/data */ SBI_CP_DISABLED, /* CP was disabled last mount */ + SBI_CP_DISABLED_QUICK, /* CP was disabled quickly */ SBI_QUOTA_NEED_FLUSH, /* need to flush quota info in CP */ SBI_QUOTA_SKIP_FLUSH, /* skip flushing quota in current CP */ SBI_QUOTA_NEED_REPAIR, /* quota file may be corrupted */ @@ -1111,6 +1114,7 @@ enum { DISCARD_TIME, GC_TIME, DISABLE_TIME, + UMOUNT_DISCARD_TIMEOUT, MAX_TIME, }; @@ -1239,8 +1243,6 @@ struct f2fs_sb_info { unsigned int nquota_files; /* # of quota sysfile */ - u32 s_next_generation; /* for NFS support */ - /* # of pages, see count_type */ atomic_t nr_pages[NR_COUNT_TYPE]; /* # of allocated blocks */ @@ -1800,13 +1802,12 @@ static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type) { atomic_inc(&sbi->nr_pages[count_type]); - if (count_type == F2FS_DIRTY_DATA || count_type == F2FS_INMEM_PAGES || - count_type == F2FS_WB_CP_DATA || count_type == F2FS_WB_DATA || - count_type == F2FS_RD_DATA || count_type == F2FS_RD_NODE || - count_type == F2FS_RD_META) - return; - - set_sbi_flag(sbi, SBI_IS_DIRTY); + if (count_type == F2FS_DIRTY_DENTS || + count_type == F2FS_DIRTY_NODES || + count_type == F2FS_DIRTY_META || + count_type == F2FS_DIRTY_QDATA || + count_type == F2FS_DIRTY_IMETA) + set_sbi_flag(sbi, SBI_IS_DIRTY); } static inline void inode_inc_dirty_pages(struct inode *inode) @@ -2158,10 +2159,17 @@ static inline bool is_idle(struct f2fs_sb_info *sbi, int type) get_pages(sbi, F2FS_RD_META) || get_pages(sbi, F2FS_WB_DATA) || get_pages(sbi, F2FS_WB_CP_DATA) || get_pages(sbi, F2FS_DIO_READ) || - get_pages(sbi, F2FS_DIO_WRITE) || - atomic_read(&SM_I(sbi)->dcc_info->queued_discard) || - atomic_read(&SM_I(sbi)->fcc_info->queued_flush)) + get_pages(sbi, F2FS_DIO_WRITE)) return false; + + if (SM_I(sbi) && SM_I(sbi)->dcc_info && + atomic_read(&SM_I(sbi)->dcc_info->queued_discard)) + return false; + + if (SM_I(sbi) && SM_I(sbi)->fcc_info && + atomic_read(&SM_I(sbi)->fcc_info->queued_flush)) + return false; + return f2fs_time_over(sbi, type); } @@ -2302,11 +2310,12 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr) #define F2FS_EXTENTS_FL 0x00080000 /* Inode uses extents */ #define F2FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */ #define F2FS_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */ +#define F2FS_NOCOW_FL 0x00800000 /* Do not cow file */ #define F2FS_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */ #define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */ #define F2FS_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ -#define F2FS_FL_USER_VISIBLE 0x304BDFFF /* User visible flags */ +#define F2FS_FL_USER_VISIBLE 0x30CBDFFF /* User visible flags */ #define F2FS_FL_USER_MODIFIABLE 0x204BC0FF /* User modifiable flags */ /* Flags we can manipulate with through F2FS_IOC_FSSETXATTR */ @@ -2763,9 +2772,9 @@ static inline int get_inline_xattr_addrs(struct inode *inode) #define F2FS_OLD_ATTRIBUTE_SIZE (offsetof(struct f2fs_inode, i_addr)) #define F2FS_FITS_IN_INODE(f2fs_inode, extra_isize, field) \ - ((offsetof(typeof(*f2fs_inode), field) + \ + ((offsetof(typeof(*(f2fs_inode)), field) + \ sizeof((f2fs_inode)->field)) \ - <= (F2FS_OLD_ATTRIBUTE_SIZE + extra_isize)) \ + <= (F2FS_OLD_ATTRIBUTE_SIZE + (extra_isize))) \ static inline void f2fs_reset_iostat(struct f2fs_sb_info *sbi) { @@ -2794,8 +2803,8 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi, #define __is_large_section(sbi) ((sbi)->segs_per_sec > 1) -#define __is_meta_io(fio) (PAGE_TYPE_OF_BIO(fio->type) == META && \ - (!is_read_io(fio->op) || fio->is_meta)) +#define __is_meta_io(fio) (PAGE_TYPE_OF_BIO((fio)->type) == META && \ + (!is_read_io((fio)->op) || (fio)->is_meta)) bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type); @@ -2827,13 +2836,33 @@ static inline bool is_valid_data_blkaddr(struct f2fs_sb_info *sbi, return true; } +static inline void f2fs_set_page_private(struct page *page, + unsigned long data) +{ + if (PagePrivate(page)) + return; + + get_page(page); + SetPagePrivate(page); + set_page_private(page, data); +} + +static inline void f2fs_clear_page_private(struct page *page) +{ + if (!PagePrivate(page)) + return; + + set_page_private(page, 0); + ClearPagePrivate(page); + f2fs_put_page(page, 0); +} + /* * file.c */ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync); void f2fs_truncate_data_blocks(struct dnode_of_data *dn); -int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock, - bool buf_write); +int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock); int f2fs_truncate(struct inode *inode); int f2fs_getattr(const struct path *path, struct kstat *stat, u32 request_mask, unsigned int flags); @@ -3007,7 +3036,7 @@ void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr); bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr); void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi); void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi); -bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi); +bool f2fs_issue_discard_timeout(struct f2fs_sb_info *sbi); void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc); void f2fs_dirty_to_prefree(struct f2fs_sb_info *sbi); @@ -3329,7 +3358,7 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi) int f2fs_build_stats(struct f2fs_sb_info *sbi); void f2fs_destroy_stats(struct f2fs_sb_info *sbi); -int __init f2fs_create_root_stats(void); +void __init f2fs_create_root_stats(void); void f2fs_destroy_root_stats(void); #else #define stat_inc_cp_count(si) do { } while (0) @@ -3367,7 +3396,7 @@ void f2fs_destroy_root_stats(void); static inline int f2fs_build_stats(struct f2fs_sb_info *sbi) { return 0; } static inline void f2fs_destroy_stats(struct f2fs_sb_info *sbi) { } -static inline int __init f2fs_create_root_stats(void) { return 0; } +static inline void __init f2fs_create_root_stats(void) { } static inline void f2fs_destroy_root_stats(void) { } #endif @@ -3628,8 +3657,6 @@ extern void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate, #define f2fs_build_fault_attr(sbi, rate, type) do { } while (0) #endif -#endif - static inline bool is_journalled_quota(struct f2fs_sb_info *sbi) { #ifdef CONFIG_QUOTA @@ -3642,3 +3669,5 @@ static inline bool is_journalled_quota(struct f2fs_sb_info *sbi) #endif return false; } + +#endif diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index dfd26475c582..69db0d1c34a0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -589,8 +589,7 @@ truncate_out: return 0; } -int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock, - bool buf_write) +int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct dnode_of_data dn; @@ -598,7 +597,6 @@ int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock, int count = 0, err = 0; struct page *ipage; bool truncate_page = false; - int flag = buf_write ? F2FS_GET_BLOCK_PRE_AIO : F2FS_GET_BLOCK_PRE_DIO; trace_f2fs_truncate_blocks_enter(inode, from); @@ -608,7 +606,7 @@ int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock, goto free_partial; if (lock) - __do_map_lock(sbi, flag, true); + f2fs_lock_op(sbi); ipage = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(ipage)) { @@ -646,7 +644,7 @@ free_next: err = f2fs_truncate_inode_blocks(inode, free_from); out: if (lock) - __do_map_lock(sbi, flag, false); + f2fs_unlock_op(sbi); free_partial: /* lastly zero out the first data page */ if (!err) @@ -681,7 +679,7 @@ int f2fs_truncate(struct inode *inode) return err; } - err = f2fs_truncate_blocks(inode, i_size_read(inode), true, false); + err = f2fs_truncate_blocks(inode, i_size_read(inode), true); if (err) return err; @@ -768,7 +766,6 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) { struct inode *inode = d_inode(dentry); int err; - bool size_changed = false; if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) return -EIO; @@ -843,8 +840,6 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) down_write(&F2FS_I(inode)->i_sem); F2FS_I(inode)->last_disk_size = i_size_read(inode); up_write(&F2FS_I(inode)->i_sem); - - size_changed = true; } __setattr_copy(inode, attr); @@ -858,7 +853,7 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) } /* file size may changed here */ - f2fs_mark_inode_dirty_sync(inode, size_changed); + f2fs_mark_inode_dirty_sync(inode, true); /* inode change will produce dirty node pages flushed by checkpoint */ f2fs_balance_fs(F2FS_I_SB(inode), true); @@ -1262,7 +1257,7 @@ static int f2fs_collapse_range(struct inode *inode, loff_t offset, loff_t len) new_size = i_size_read(inode) - len; truncate_pagecache(inode, new_size); - ret = f2fs_truncate_blocks(inode, new_size, true, false); + ret = f2fs_truncate_blocks(inode, new_size, true); up_write(&F2FS_I(inode)->i_mmap_sem); if (!ret) f2fs_i_size_write(inode, new_size); @@ -1447,7 +1442,7 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) f2fs_balance_fs(sbi, true); down_write(&F2FS_I(inode)->i_mmap_sem); - ret = f2fs_truncate_blocks(inode, i_size_read(inode), true, false); + ret = f2fs_truncate_blocks(inode, i_size_read(inode), true); up_write(&F2FS_I(inode)->i_mmap_sem); if (ret) return ret; @@ -1651,6 +1646,8 @@ static int f2fs_ioc_getflags(struct file *filp, unsigned long arg) flags |= F2FS_ENCRYPT_FL; if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode)) flags |= F2FS_INLINE_DATA_FL; + if (is_inode_flag_set(inode, FI_PIN_FILE)) + flags |= F2FS_NOCOW_FL; flags &= F2FS_FL_USER_VISIBLE; @@ -1970,11 +1967,11 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg) break; case F2FS_GOING_DOWN_NEED_FSCK: set_sbi_flag(sbi, SBI_NEED_FSCK); + set_sbi_flag(sbi, SBI_CP_DISABLED_QUICK); + set_sbi_flag(sbi, SBI_IS_DIRTY); /* do checkpoint only */ ret = f2fs_sync_fs(sb, 1); - if (ret) - goto out; - break; + goto out; default: ret = -EINVAL; goto out; @@ -1990,6 +1987,9 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg) out: if (in != F2FS_GOING_DOWN_FULLSYNC) mnt_drop_write_file(filp); + + trace_f2fs_shutdown(sbi, in, ret); + return ret; } @@ -2873,8 +2873,8 @@ static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg) __u32 pin; int ret = 0; - if (!inode_owner_or_capable(inode)) - return -EACCES; + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; if (get_user(pin, (__u32 __user *)arg)) return -EFAULT; diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index b2e4ac3ea20a..9c9734348aba 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -315,7 +315,7 @@ process_inline: clear_inode_flag(inode, FI_INLINE_DATA); f2fs_put_page(ipage, 1); } else if (ri && (ri->i_inline & F2FS_INLINE_DATA)) { - if (f2fs_truncate_blocks(inode, 0, false, false)) + if (f2fs_truncate_blocks(inode, 0, false)) return false; goto process_inline; } @@ -487,7 +487,7 @@ static int f2fs_add_inline_entries(struct inode *dir, void *inline_dentry) return 0; punch_dentry_pages: truncate_inode_pages(&dir->i_data, 0); - f2fs_truncate_blocks(dir, 0, false, false); + f2fs_truncate_blocks(dir, 0, false); f2fs_remove_dirty_inode(dir); return err; } @@ -676,6 +676,12 @@ int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx, if (IS_ERR(ipage)) return PTR_ERR(ipage); + /* + * f2fs_readdir was protected by inode.i_rwsem, it is safe to access + * ipage without page's lock held. + */ + unlock_page(ipage); + inline_dentry = inline_data_addr(inode, ipage); make_dentry_ptr_inline(inode, &d, inline_dentry); @@ -684,7 +690,7 @@ int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx, if (!err) ctx->pos = d.max; - f2fs_put_page(ipage, 1); + f2fs_put_page(ipage, 0); return err < 0 ? err : 0; } diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index e523dc4d6335..01660a1a9c91 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -14,6 +14,7 @@ #include "f2fs.h" #include "node.h" #include "segment.h" +#include "xattr.h" #include @@ -248,6 +249,20 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page) return false; } + if (f2fs_has_extra_attr(inode) && + f2fs_sb_has_flexible_inline_xattr(sbi) && + f2fs_has_inline_xattr(inode) && + (!fi->i_inline_xattr_size || + fi->i_inline_xattr_size > MAX_INLINE_XATTR_SIZE)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: inode (ino=%lx) has corrupted " + "i_inline_xattr_size: %d, max: %zu", + __func__, inode->i_ino, fi->i_inline_xattr_size, + MAX_INLINE_XATTR_SIZE); + return false; + } + if (F2FS_I(inode)->extent_tree) { struct extent_info *ei = &F2FS_I(inode)->extent_tree->largest; diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index 31601e433709..9e15023ead7b 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -50,7 +51,7 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode) inode->i_blocks = 0; inode->i_mtime = inode->i_atime = inode->i_ctime = F2FS_I(inode)->i_crtime = current_time(inode); - inode->i_generation = sbi->s_next_generation++; + inode->i_generation = prandom_u32(); if (S_ISDIR(inode->i_mode)) F2FS_I(inode)->i_current_depth = 1; diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 5352d75bc8e0..e39c1a269de6 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1920,7 +1920,9 @@ static int f2fs_write_node_pages(struct address_space *mapping, f2fs_balance_fs_bg(sbi); /* collect a number of dirty node pages and write together */ - if (get_pages(sbi, F2FS_DIRTY_NODES) < nr_pages_to_skip(sbi, NODE)) + if (wbc->sync_mode != WB_SYNC_ALL && + get_pages(sbi, F2FS_DIRTY_NODES) < + nr_pages_to_skip(sbi, NODE)) goto skip_write; if (wbc->sync_mode == WB_SYNC_ALL) @@ -1959,7 +1961,7 @@ static int f2fs_set_node_page_dirty(struct page *page) if (!PageDirty(page)) { __set_page_dirty_nobuffers(page); inc_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES); - SetPagePrivate(page); + f2fs_set_page_private(page, 0); f2fs_trace_pid(page); return 1; } diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 0d425d460d70..e8adfc85ebba 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -191,8 +191,7 @@ void f2fs_register_inmem_page(struct inode *inode, struct page *page) f2fs_trace_pid(page); - set_page_private(page, (unsigned long)ATOMIC_WRITTEN_PAGE); - SetPagePrivate(page); + f2fs_set_page_private(page, (unsigned long)ATOMIC_WRITTEN_PAGE); new = f2fs_kmem_cache_alloc(inmem_entry_slab, GFP_NOFS); @@ -215,7 +214,8 @@ void f2fs_register_inmem_page(struct inode *inode, struct page *page) } static int __revoke_inmem_pages(struct inode *inode, - struct list_head *head, bool drop, bool recover) + struct list_head *head, bool drop, bool recover, + bool trylock) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct inmem_pages *cur, *tmp; @@ -227,7 +227,16 @@ static int __revoke_inmem_pages(struct inode *inode, if (drop) trace_f2fs_commit_inmem_page(page, INMEM_DROP); - lock_page(page); + if (trylock) { + /* + * to avoid deadlock in between page lock and + * inmem_lock. + */ + if (!trylock_page(page)) + continue; + } else { + lock_page(page); + } f2fs_wait_on_page_writeback(page, DATA, true, true); @@ -268,8 +277,7 @@ next: ClearPageUptodate(page); clear_cold_data(page); } - set_page_private(page, 0); - ClearPagePrivate(page); + f2fs_clear_page_private(page); f2fs_put_page(page, 1); list_del(&cur->list); @@ -316,13 +324,19 @@ void f2fs_drop_inmem_pages(struct inode *inode) struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_inode_info *fi = F2FS_I(inode); - mutex_lock(&fi->inmem_lock); - __revoke_inmem_pages(inode, &fi->inmem_pages, true, false); - spin_lock(&sbi->inode_lock[ATOMIC_FILE]); - if (!list_empty(&fi->inmem_ilist)) - list_del_init(&fi->inmem_ilist); - spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); - mutex_unlock(&fi->inmem_lock); + while (!list_empty(&fi->inmem_pages)) { + mutex_lock(&fi->inmem_lock); + __revoke_inmem_pages(inode, &fi->inmem_pages, + true, false, true); + + if (list_empty(&fi->inmem_pages)) { + spin_lock(&sbi->inode_lock[ATOMIC_FILE]); + if (!list_empty(&fi->inmem_ilist)) + list_del_init(&fi->inmem_ilist); + spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); + } + mutex_unlock(&fi->inmem_lock); + } clear_inode_flag(inode, FI_ATOMIC_FILE); fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0; @@ -352,8 +366,7 @@ void f2fs_drop_inmem_page(struct inode *inode, struct page *page) kmem_cache_free(inmem_entry_slab, cur); ClearPageUptodate(page); - set_page_private(page, 0); - ClearPagePrivate(page); + f2fs_clear_page_private(page); f2fs_put_page(page, 0); trace_f2fs_commit_inmem_page(page, INMEM_INVALIDATE); @@ -427,12 +440,15 @@ retry: * recovery or rewrite & commit last transaction. For other * error number, revoking was done by filesystem itself. */ - err = __revoke_inmem_pages(inode, &revoke_list, false, true); + err = __revoke_inmem_pages(inode, &revoke_list, + false, true, false); /* drop all uncommitted pages */ - __revoke_inmem_pages(inode, &fi->inmem_pages, true, false); + __revoke_inmem_pages(inode, &fi->inmem_pages, + true, false, false); } else { - __revoke_inmem_pages(inode, &revoke_list, false, false); + __revoke_inmem_pages(inode, &revoke_list, + false, false, false); } return err; @@ -540,9 +556,13 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi) static int __submit_flush_wait(struct f2fs_sb_info *sbi, struct block_device *bdev) { - struct bio *bio = f2fs_bio_alloc(sbi, 0, true); + struct bio *bio; int ret; + bio = f2fs_bio_alloc(sbi, 0, false); + if (!bio) + return -ENOMEM; + bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH; bio_set_dev(bio, bdev); ret = submit_bio_wait(bio); @@ -866,6 +886,9 @@ int f2fs_disable_cp_again(struct f2fs_sb_info *sbi) if (holes[DATA] > ovp || holes[NODE] > ovp) return -EAGAIN; + if (is_sbi_flag_set(sbi, SBI_CP_DISABLED_QUICK) && + dirty_segments(sbi) > overprovision_segments(sbi)) + return -EAGAIN; return 0; } @@ -1035,6 +1058,7 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi, dpolicy->max_requests = DEF_MAX_DISCARD_REQUEST; dpolicy->io_aware_gran = MAX_PLIST_NUM; + dpolicy->timeout = 0; if (discard_type == DPOLICY_BG) { dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME; @@ -1057,6 +1081,8 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi, } else if (discard_type == DPOLICY_UMOUNT) { dpolicy->max_requests = UINT_MAX; dpolicy->io_aware = false; + /* we need to issue all to keep CP_TRIMMED_FLAG */ + dpolicy->granularity = 1; } } @@ -1422,7 +1448,14 @@ static int __issue_discard_cmd(struct f2fs_sb_info *sbi, int i, issued = 0; bool io_interrupted = false; + if (dpolicy->timeout != 0) + f2fs_update_time(sbi, dpolicy->timeout); + for (i = MAX_PLIST_NUM - 1; i >= 0; i--) { + if (dpolicy->timeout != 0 && + f2fs_time_over(sbi, dpolicy->timeout)) + break; + if (i + 1 < dpolicy->granularity) break; @@ -1609,7 +1642,7 @@ void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi) } /* This comes from f2fs_put_super */ -bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi) +bool f2fs_issue_discard_timeout(struct f2fs_sb_info *sbi) { struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; struct discard_policy dpolicy; @@ -1617,6 +1650,7 @@ bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi) __init_discard_policy(sbi, &dpolicy, DPOLICY_UMOUNT, dcc->discard_granularity); + dpolicy.timeout = UMOUNT_DISCARD_TIMEOUT; __issue_discard_cmd(sbi, &dpolicy); dropped = __drop_discard_cmd(sbi); @@ -3162,10 +3196,10 @@ int f2fs_inplace_write_data(struct f2fs_io_info *fio) stat_inc_inplace_blocks(fio->sbi); err = f2fs_submit_page_bio(fio); - if (!err) + if (!err) { update_device_state(fio); - - f2fs_update_iostat(fio->sbi, fio->io_type, F2FS_BLKSIZE); + f2fs_update_iostat(fio->sbi, fio->io_type, F2FS_BLKSIZE); + } return err; } diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index a77f76f528b6..5c7ed0442d6e 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -865,7 +865,7 @@ static inline void wake_up_discard_thread(struct f2fs_sb_info *sbi, bool force) } } mutex_unlock(&dcc->cmd_lock); - if (!wakeup) + if (!wakeup || !is_idle(sbi, DISCARD_TIME)) return; wake_up: dcc->discard_wake = 1; diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 2fdc98457b9f..2284e116dffa 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -269,7 +269,7 @@ static int f2fs_set_qf_name(struct super_block *sb, int qtype, if (!qname) { f2fs_msg(sb, KERN_ERR, "Not enough memory for storing quotafile name"); - return -EINVAL; + return -ENOMEM; } if (F2FS_OPTION(sbi).s_qf_names[qtype]) { if (strcmp(F2FS_OPTION(sbi).s_qf_names[qtype], qname) == 0) @@ -586,7 +586,7 @@ static int parse_options(struct super_block *sb, char *options) case Opt_io_size_bits: if (args->from && match_int(args, &arg)) return -EINVAL; - if (arg > __ilog2_u32(BIO_MAX_PAGES)) { + if (arg <= 0 || arg > __ilog2_u32(BIO_MAX_PAGES)) { f2fs_msg(sb, KERN_WARNING, "Not support %d, larger than %d", 1 << arg, BIO_MAX_PAGES); @@ -821,6 +821,8 @@ static int parse_options(struct super_block *sb, char *options) } if (test_opt(sbi, INLINE_XATTR_SIZE)) { + int min_size, max_size; + if (!f2fs_sb_has_extra_attr(sbi) || !f2fs_sb_has_flexible_inline_xattr(sbi)) { f2fs_msg(sb, KERN_ERR, @@ -834,14 +836,15 @@ static int parse_options(struct super_block *sb, char *options) "set with inline_xattr option"); return -EINVAL; } - if (!F2FS_OPTION(sbi).inline_xattr_size || - F2FS_OPTION(sbi).inline_xattr_size >= - DEF_ADDRS_PER_INODE - - F2FS_TOTAL_EXTRA_ATTR_SIZE - - DEF_INLINE_RESERVED_SIZE - - DEF_MIN_INLINE_SIZE) { + + min_size = sizeof(struct f2fs_xattr_header) / sizeof(__le32); + max_size = MAX_INLINE_XATTR_SIZE; + + if (F2FS_OPTION(sbi).inline_xattr_size < min_size || + F2FS_OPTION(sbi).inline_xattr_size > max_size) { f2fs_msg(sb, KERN_ERR, - "inline xattr size is out of range"); + "inline xattr size is out of range: %d ~ %d", + min_size, max_size); return -EINVAL; } } @@ -915,6 +918,10 @@ static int f2fs_drop_inode(struct inode *inode) sb_start_intwrite(inode->i_sb); f2fs_i_size_write(inode, 0); + f2fs_submit_merged_write_cond(F2FS_I_SB(inode), + inode, NULL, 0, DATA); + truncate_inode_pages_final(inode->i_mapping); + if (F2FS_HAS_BLOCKS(inode)) f2fs_truncate(inode); @@ -1066,7 +1073,7 @@ static void f2fs_put_super(struct super_block *sb) } /* be sure to wait for any on-going discard commands */ - dropped = f2fs_wait_discard_bios(sbi); + dropped = f2fs_issue_discard_timeout(sbi); if ((f2fs_hw_support_discard(sbi) || f2fs_hw_should_discard(sbi)) && !sbi->discard_blks && !dropped) { @@ -1476,9 +1483,16 @@ static int f2fs_enable_quotas(struct super_block *sb); static int f2fs_disable_checkpoint(struct f2fs_sb_info *sbi) { + unsigned int s_flags = sbi->sb->s_flags; struct cp_control cpc; - int err; + int err = 0; + int ret; + if (s_flags & SB_RDONLY) { + f2fs_msg(sbi->sb, KERN_ERR, + "checkpoint=disable on readonly fs"); + return -EINVAL; + } sbi->sb->s_flags |= SB_ACTIVE; f2fs_update_time(sbi, DISABLE_TIME); @@ -1486,18 +1500,24 @@ static int f2fs_disable_checkpoint(struct f2fs_sb_info *sbi) while (!f2fs_time_over(sbi, DISABLE_TIME)) { mutex_lock(&sbi->gc_mutex); err = f2fs_gc(sbi, true, false, NULL_SEGNO); - if (err == -ENODATA) + if (err == -ENODATA) { + err = 0; break; + } if (err && err != -EAGAIN) - return err; + break; } - err = sync_filesystem(sbi->sb); - if (err) - return err; + ret = sync_filesystem(sbi->sb); + if (ret || err) { + err = ret ? ret: err; + goto restore_flag; + } - if (f2fs_disable_cp_again(sbi)) - return -EAGAIN; + if (f2fs_disable_cp_again(sbi)) { + err = -EAGAIN; + goto restore_flag; + } mutex_lock(&sbi->gc_mutex); cpc.reason = CP_PAUSE; @@ -1506,7 +1526,9 @@ static int f2fs_disable_checkpoint(struct f2fs_sb_info *sbi) sbi->unusable_block_count = 0; mutex_unlock(&sbi->gc_mutex); - return 0; +restore_flag: + sbi->sb->s_flags = s_flags; /* Restore MS_RDONLY status */ + return err; } static void f2fs_enable_checkpoint(struct f2fs_sb_info *sbi) @@ -2044,6 +2066,12 @@ void f2fs_quota_off_umount(struct super_block *sb) set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR); } } + /* + * In case of checkpoint=disable, we must flush quota blocks. + * This can cause NULL exception for node_inode in end_io, since + * put_super already dropped it. + */ + sync_filesystem(sb); } static void f2fs_truncate_quota_inode_pages(struct super_block *sb) @@ -2725,6 +2753,8 @@ static void init_sb_info(struct f2fs_sb_info *sbi) sbi->interval_time[DISCARD_TIME] = DEF_IDLE_INTERVAL; sbi->interval_time[GC_TIME] = DEF_IDLE_INTERVAL; sbi->interval_time[DISABLE_TIME] = DEF_DISABLE_INTERVAL; + sbi->interval_time[UMOUNT_DISCARD_TIMEOUT] = + DEF_UMOUNT_DISCARD_TIMEOUT; clear_sbi_flag(sbi, SBI_NEED_FSCK); for (i = 0; i < NR_COUNT_TYPE; i++) @@ -3044,10 +3074,11 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) struct f2fs_super_block *raw_super; struct inode *root; int err; - bool retry = true, need_fsck = false; + bool skip_recovery = false, need_fsck = false; char *options = NULL; int recovery, i, valid_super_block; struct curseg_info *seg_i; + int retry_cnt = 1; try_onemore: err = -EINVAL; @@ -3119,7 +3150,6 @@ try_onemore: sb->s_maxbytes = sbi->max_file_blocks << le32_to_cpu(raw_super->log_blocksize); sb->s_max_links = F2FS_LINK_MAX; - get_random_bytes(&sbi->s_next_generation, sizeof(u32)); #ifdef CONFIG_QUOTA sb->dq_op = &f2fs_quota_operations; @@ -3222,6 +3252,10 @@ try_onemore: if (__is_set_ckpt_flags(F2FS_CKPT(sbi), CP_QUOTA_NEED_FSCK_FLAG)) set_sbi_flag(sbi, SBI_QUOTA_NEED_REPAIR); + if (__is_set_ckpt_flags(F2FS_CKPT(sbi), CP_DISABLED_QUICK_FLAG)) { + set_sbi_flag(sbi, SBI_CP_DISABLED_QUICK); + sbi->interval_time[DISABLE_TIME] = DEF_DISABLE_QUICK_INTERVAL; + } /* Initialize device list */ err = f2fs_scan_devices(sbi); @@ -3309,7 +3343,7 @@ try_onemore: sb->s_root = d_make_root(root); /* allocate root dentry */ if (!sb->s_root) { err = -ENOMEM; - goto free_root_inode; + goto free_node_inode; } err = f2fs_register_sysfs(sbi); @@ -3331,7 +3365,7 @@ try_onemore: goto free_meta; if (unlikely(is_set_ckpt_flags(sbi, CP_DISABLED_FLAG))) - goto skip_recovery; + goto reset_checkpoint; /* recover fsynced data */ if (!test_opt(sbi, DISABLE_ROLL_FORWARD)) { @@ -3348,11 +3382,13 @@ try_onemore: if (need_fsck) set_sbi_flag(sbi, SBI_NEED_FSCK); - if (!retry) - goto skip_recovery; + if (skip_recovery) + goto reset_checkpoint; err = f2fs_recover_fsync_data(sbi, false); if (err < 0) { + if (err != -ENOMEM) + skip_recovery = true; need_fsck = true; f2fs_msg(sb, KERN_ERR, "Cannot recover all fsync data errno=%d", err); @@ -3368,14 +3404,14 @@ try_onemore: goto free_meta; } } -skip_recovery: +reset_checkpoint: /* f2fs_recover_fsync_data() cleared this already */ clear_sbi_flag(sbi, SBI_POR_DOING); if (test_opt(sbi, DISABLE_CHECKPOINT)) { err = f2fs_disable_checkpoint(sbi); if (err) - goto free_meta; + goto sync_free_meta; } else if (is_set_ckpt_flags(sbi, CP_DISABLED_FLAG)) { f2fs_enable_checkpoint(sbi); } @@ -3388,7 +3424,7 @@ skip_recovery: /* After POR, we can run background GC thread.*/ err = f2fs_start_gc_thread(sbi); if (err) - goto free_meta; + goto sync_free_meta; } kvfree(options); @@ -3408,8 +3444,14 @@ skip_recovery: cur_cp_version(F2FS_CKPT(sbi))); f2fs_update_time(sbi, CP_TIME); f2fs_update_time(sbi, REQ_TIME); + clear_sbi_flag(sbi, SBI_CP_DISABLED_QUICK); return 0; +sync_free_meta: + /* safe to flush all the data */ + sync_filesystem(sbi->sb); + retry_cnt = 0; + free_meta: #ifdef CONFIG_QUOTA f2fs_truncate_quota_inode_pages(sb); @@ -3423,6 +3465,8 @@ free_meta: * falls into an infinite loop in f2fs_sync_meta_pages(). */ truncate_inode_pages_final(META_MAPPING(sbi)); + /* evict some inodes being cached by GC */ + evict_inodes(sb); f2fs_unregister_sysfs(sbi); free_root_inode: dput(sb->s_root); @@ -3466,8 +3510,8 @@ free_sbi: kvfree(sbi); /* give only one another chance */ - if (retry) { - retry = false; + if (retry_cnt > 0 && skip_recovery) { + retry_cnt--; shrink_dcache_sb(sb); goto try_onemore; } @@ -3568,9 +3612,7 @@ static int __init init_f2fs_fs(void) err = register_filesystem(&f2fs_fs_type); if (err) goto free_shrinker; - err = f2fs_create_root_stats(); - if (err) - goto free_filesystem; + f2fs_create_root_stats(); err = f2fs_init_post_read_processing(); if (err) goto free_root_stats; @@ -3578,7 +3620,6 @@ static int __init init_f2fs_fs(void) free_root_stats: f2fs_destroy_root_stats(); -free_filesystem: unregister_filesystem(&f2fs_fs_type); free_shrinker: unregister_shrinker(&f2fs_shrinker_info); diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index 948c1a211341..d7b47662a0aa 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -222,6 +222,8 @@ out: #ifdef CONFIG_F2FS_FAULT_INJECTION if (a->struct_type == FAULT_INFO_TYPE && t >= (1 << FAULT_MAX)) return -EINVAL; + if (a->struct_type == FAULT_INFO_RATE && t >= UINT_MAX) + return -EINVAL; #endif if (a->struct_type == RESERVED_BLOCKS) { spin_lock(&sbi->stat_lock); @@ -278,10 +280,16 @@ out: return count; } - *ui = t; - if (!strcmp(a->attr.name, "iostat_enable") && *ui == 0) - f2fs_reset_iostat(sbi); + if (!strcmp(a->attr.name, "iostat_enable")) { + sbi->iostat_enable = !!t; + if (!sbi->iostat_enable) + f2fs_reset_iostat(sbi); + return count; + } + + *ui = (unsigned int)t; + return count; } @@ -418,6 +426,8 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, idle_interval, interval_time[REQ_TIME]); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, discard_idle_interval, interval_time[DISCARD_TIME]); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_idle_interval, interval_time[GC_TIME]); +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, + umount_discard_timeout, interval_time[UMOUNT_DISCARD_TIMEOUT]); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_enable, iostat_enable); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold); @@ -475,6 +485,7 @@ static struct attribute *f2fs_attrs[] = { ATTR_LIST(idle_interval), ATTR_LIST(discard_idle_interval), ATTR_LIST(gc_idle_interval), + ATTR_LIST(umount_discard_timeout), ATTR_LIST(iostat_enable), ATTR_LIST(readdir_ra), ATTR_LIST(gc_pin_file_thresh), diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c index 18d5ffbc5e8c..848a785abe25 100644 --- a/fs/f2fs/xattr.c +++ b/fs/f2fs/xattr.c @@ -224,11 +224,11 @@ static struct f2fs_xattr_entry *__find_inline_xattr(struct inode *inode, { struct f2fs_xattr_entry *entry; unsigned int inline_size = inline_xattr_size(inode); + void *max_addr = base_addr + inline_size; list_for_each_xattr(entry, base_addr) { - if ((void *)entry + sizeof(__u32) > base_addr + inline_size || - (void *)XATTR_NEXT_ENTRY(entry) + sizeof(__u32) > - base_addr + inline_size) { + if ((void *)entry + sizeof(__u32) > max_addr || + (void *)XATTR_NEXT_ENTRY(entry) > max_addr) { *last_addr = entry; return NULL; } @@ -239,6 +239,13 @@ static struct f2fs_xattr_entry *__find_inline_xattr(struct inode *inode, if (!memcmp(entry->e_name, name, len)) break; } + + /* inline xattr header or entry across max inline xattr size */ + if (IS_XATTR_LAST_ENTRY(entry) && + (void *)entry + sizeof(__u32) > max_addr) { + *last_addr = entry; + return NULL; + } return entry; } @@ -340,7 +347,7 @@ check: *base_addr = txattr_addr; return 0; out: - kzfree(txattr_addr); + kvfree(txattr_addr); return err; } @@ -383,7 +390,7 @@ static int read_all_xattrs(struct inode *inode, struct page *ipage, *base_addr = txattr_addr; return 0; fail: - kzfree(txattr_addr); + kvfree(txattr_addr); return err; } @@ -510,7 +517,7 @@ int f2fs_getxattr(struct inode *inode, int index, const char *name, } error = size; out: - kzfree(base_addr); + kvfree(base_addr); return error; } @@ -538,7 +545,7 @@ ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size) if (!handler || (handler->list && !handler->list(dentry))) continue; - prefix = handler->prefix ?: handler->name; + prefix = xattr_prefix(handler); prefix_len = strlen(prefix); size = prefix_len + entry->e_name_len + 1; if (buffer) { @@ -556,7 +563,7 @@ ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size) } error = buffer_size - rest; cleanup: - kzfree(base_addr); + kvfree(base_addr); return error; } @@ -687,7 +694,7 @@ static int __f2fs_setxattr(struct inode *inode, int index, if (!error && S_ISDIR(inode->i_mode)) set_sbi_flag(F2FS_I_SB(inode), SBI_NEED_CP); exit: - kzfree(base_addr); + kvfree(base_addr); return error; } diff --git a/fs/f2fs/xattr.h b/fs/f2fs/xattr.h index 67db134da0f5..9172ee082ca8 100644 --- a/fs/f2fs/xattr.h +++ b/fs/f2fs/xattr.h @@ -78,6 +78,12 @@ struct f2fs_xattr_entry { sizeof(struct f2fs_xattr_header) - \ sizeof(struct f2fs_xattr_entry)) +#define MAX_INLINE_XATTR_SIZE \ + (DEF_ADDRS_PER_INODE - \ + F2FS_TOTAL_EXTRA_ATTR_SIZE / sizeof(__le32) - \ + DEF_INLINE_RESERVED_SIZE - \ + MIN_INLINE_DENTRY_SIZE / sizeof(__le32)) + /* * On-disk structure of f2fs_xattr * We use inline xattrs space + 1 block for xattr. diff --git a/fs/udf/truncate.c b/fs/udf/truncate.c index 42b8c57795cb..c6ce7503a329 100644 --- a/fs/udf/truncate.c +++ b/fs/udf/truncate.c @@ -260,6 +260,9 @@ void udf_truncate_extents(struct inode *inode) epos.block = eloc; epos.bh = udf_tread(sb, udf_get_lb_pblock(sb, &eloc, 0)); + /* Error reading indirect block? */ + if (!epos.bh) + return; if (elen) indirect_ext_len = (elen + sb->s_blocksize - 1) >> diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h index d3b04f9589a9..c311cd13ea7d 100644 --- a/include/linux/ceph/libceph.h +++ b/include/linux/ceph/libceph.h @@ -291,6 +291,8 @@ extern void ceph_destroy_client(struct ceph_client *client); extern int __ceph_open_session(struct ceph_client *client, unsigned long started); extern int ceph_open_session(struct ceph_client *client); +int ceph_wait_for_latest_osdmap(struct ceph_client *client, + unsigned long timeout); /* pagevec.c */ extern void ceph_release_page_vector(struct page **pages, int num_pages); diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index d7711048ef93..f5740423b002 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -116,6 +116,7 @@ struct f2fs_super_block { /* * For checkpoint */ +#define CP_DISABLED_QUICK_FLAG 0x00002000 #define CP_DISABLED_FLAG 0x00001000 #define CP_QUOTA_NEED_FSCK_FLAG 0x00000800 #define CP_LARGE_NAT_BITMAP_FLAG 0x00000400 @@ -186,7 +187,7 @@ struct f2fs_orphan_block { struct f2fs_extent { __le32 fofs; /* start file offset of the extent */ __le32 blk; /* start block address of the extent */ - __le32 len; /* lengh of the extent */ + __le32 len; /* length of the extent */ } __packed; #define F2FS_NAME_LEN 255 @@ -284,7 +285,7 @@ enum { struct node_footer { __le32 nid; /* node id */ - __le32 ino; /* inode nunmber */ + __le32 ino; /* inode number */ __le32 flag; /* include cold/fsync/dentry marks and offset */ __le64 cp_ver; /* checkpoint version */ __le32 next_blkaddr; /* next node page block address */ @@ -489,12 +490,12 @@ typedef __le32 f2fs_hash_t; /* * space utilization of regular dentry and inline dentry (w/o extra reservation) - * regular dentry inline dentry - * bitmap 1 * 27 = 27 1 * 23 = 23 - * reserved 1 * 3 = 3 1 * 7 = 7 - * dentry 11 * 214 = 2354 11 * 182 = 2002 - * filename 8 * 214 = 1712 8 * 182 = 1456 - * total 4096 3488 + * regular dentry inline dentry (def) inline dentry (min) + * bitmap 1 * 27 = 27 1 * 23 = 23 1 * 1 = 1 + * reserved 1 * 3 = 3 1 * 7 = 7 1 * 1 = 1 + * dentry 11 * 214 = 2354 11 * 182 = 2002 11 * 2 = 22 + * filename 8 * 214 = 1712 8 * 182 = 1456 8 * 2 = 16 + * total 4096 3488 40 * * Note: there are more reserved space in inline dentry than in regular * dentry, when converting inline dentry we should handle this carefully. @@ -506,12 +507,13 @@ typedef __le32 f2fs_hash_t; #define SIZE_OF_RESERVED (PAGE_SIZE - ((SIZE_OF_DIR_ENTRY + \ F2FS_SLOT_LEN) * \ NR_DENTRY_IN_BLOCK + SIZE_OF_DENTRY_BITMAP)) +#define MIN_INLINE_DENTRY_SIZE 40 /* just include '.' and '..' entries */ /* One directory entry slot representing F2FS_SLOT_LEN-sized file name */ struct f2fs_dir_entry { __le32 hash_code; /* hash code of file name */ __le32 ino; /* inode number */ - __le16 name_len; /* lengh of file name */ + __le16 name_len; /* length of file name */ __u8 file_type; /* file type */ } __packed; diff --git a/include/linux/keychord.h b/include/linux/keychord.h deleted file mode 100644 index 08cf5402102c..000000000000 --- a/include/linux/keychord.h +++ /dev/null @@ -1,23 +0,0 @@ -/* - * Key chord input driver - * - * Copyright (C) 2008 Google, Inc. - * Author: Mike Lockwood - * - * This software is licensed under the terms of the GNU General Public - * License version 2, as published by the Free Software Foundation, and - * may be copied, distributed, and modified under those terms. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * -*/ - -#ifndef __LINUX_KEYCHORD_H_ -#define __LINUX_KEYCHORD_H_ - -#include - -#endif /* __LINUX_KEYCHORD_H_ */ diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h index 06c87f9f720c..2ce6af3235fb 100644 --- a/include/trace/events/f2fs.h +++ b/include/trace/events/f2fs.h @@ -150,6 +150,17 @@ TRACE_DEFINE_ENUM(CP_TRIMMED); { CP_SPEC_LOG_NUM, "log type is 2" }, \ { CP_RECOVER_DIR, "dir needs recovery" }) +#define show_shutdown_mode(type) \ + __print_symbolic(type, \ + { F2FS_GOING_DOWN_FULLSYNC, "full sync" }, \ + { F2FS_GOING_DOWN_METASYNC, "meta sync" }, \ + { F2FS_GOING_DOWN_NOSYNC, "no sync" }, \ + { F2FS_GOING_DOWN_METAFLUSH, "meta flush" }, \ + { F2FS_GOING_DOWN_NEED_FSCK, "need fsck" }) + +struct f2fs_sb_info; +struct f2fs_io_info; +struct extent_info; struct victim_sel_policy; struct f2fs_map_blocks; @@ -534,6 +545,9 @@ TRACE_EVENT(f2fs_map_blocks, __field(block_t, m_lblk) __field(block_t, m_pblk) __field(unsigned int, m_len) + __field(unsigned int, m_flags) + __field(int, m_seg_type) + __field(bool, m_may_create) __field(int, ret) ), @@ -543,15 +557,22 @@ TRACE_EVENT(f2fs_map_blocks, __entry->m_lblk = map->m_lblk; __entry->m_pblk = map->m_pblk; __entry->m_len = map->m_len; + __entry->m_flags = map->m_flags; + __entry->m_seg_type = map->m_seg_type; + __entry->m_may_create = map->m_may_create; __entry->ret = ret; ), TP_printk("dev = (%d,%d), ino = %lu, file offset = %llu, " - "start blkaddr = 0x%llx, len = 0x%llx, err = %d", + "start blkaddr = 0x%llx, len = 0x%llx, flags = %u," + "seg_type = %d, may_create = %d, err = %d", show_dev_ino(__entry), (unsigned long long)__entry->m_lblk, (unsigned long long)__entry->m_pblk, (unsigned long long)__entry->m_len, + __entry->m_flags, + __entry->m_seg_type, + __entry->m_may_create, __entry->ret) ); @@ -1617,6 +1638,30 @@ DEFINE_EVENT(f2fs_sync_dirty_inodes, f2fs_sync_dirty_inodes_exit, TP_ARGS(sb, type, count) ); +TRACE_EVENT(f2fs_shutdown, + + TP_PROTO(struct f2fs_sb_info *sbi, unsigned int mode, int ret), + + TP_ARGS(sbi, mode, ret), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, mode) + __field(int, ret) + ), + + TP_fast_assign( + __entry->dev = sbi->sb->s_dev; + __entry->mode = mode; + __entry->ret = ret; + ), + + TP_printk("dev = (%d,%d), mode: %s, ret:%d", + show_dev(__entry->dev), + show_shutdown_mode(__entry->mode), + __entry->ret) +); + #endif /* _TRACE_F2FS_H */ /* This part must be outside protection */ diff --git a/include/uapi/linux/keychord.h b/include/uapi/linux/keychord.h deleted file mode 100644 index ea7cf4d27bbd..000000000000 --- a/include/uapi/linux/keychord.h +++ /dev/null @@ -1,52 +0,0 @@ -/* - * Key chord input driver - * - * Copyright (C) 2008 Google, Inc. - * Author: Mike Lockwood - * - * This software is licensed under the terms of the GNU General Public - * License version 2, as published by the Free Software Foundation, and - * may be copied, distributed, and modified under those terms. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * -*/ - -#ifndef _UAPI_LINUX_KEYCHORD_H_ -#define _UAPI_LINUX_KEYCHORD_H_ - -#include - -#define KEYCHORD_VERSION 1 - -/* - * One or more input_keychord structs are written to /dev/keychord - * at once to specify the list of keychords to monitor. - * Reading /dev/keychord returns the id of a keychord when the - * keychord combination is pressed. A keychord is signalled when - * all of the keys in the keycode list are in the pressed state. - * The order in which the keys are pressed does not matter. - * The keychord will not be signalled if keys not in the keycode - * list are pressed. - * Keychords will not be signalled on key release events. - */ -struct input_keychord { - /* should be KEYCHORD_VERSION */ - __u16 version; - /* - * client specified ID, returned from read() - * when this keychord is pressed. - */ - __u16 id; - - /* number of keycodes in this keychord */ - __u16 count; - - /* variable length array of keycodes */ - __u16 keycodes[]; -}; - -#endif /* _UAPI_LINUX_KEYCHORD_H_ */ diff --git a/kernel/futex.c b/kernel/futex.c index 22f83064abb3..f2fa48c6c476 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -3450,6 +3450,10 @@ int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi) { u32 uval, uninitialized_var(nval), mval; + /* Futex address must be 32bit aligned */ + if ((((unsigned long)uaddr) % sizeof(*uaddr)) != 0) + return -1; + retry: if (get_user(uval, uaddr)) return -1; diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index bf694c709b96..e57be7031cb3 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3650,6 +3650,9 @@ __lock_set_class(struct lockdep_map *lock, const char *name, unsigned int depth; int i; + if (unlikely(!debug_locks)) + return 0; + depth = curr->lockdep_depth; /* * This function is about (re)setting the class of a held lock, diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index acb7d6455ea5..7b1136cf88ba 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -975,10 +975,9 @@ fail: stop_kthread: sugov_kthread_stop(sg_policy); - -free_sg_policy: mutex_unlock(&global_tunables_lock); +free_sg_policy: sugov_policy_free(sg_policy); disable_fast_switch: diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index be043027d554..e33470b5a032 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -611,7 +611,7 @@ static int probes_seq_show(struct seq_file *m, void *v) /* Don't print "0x (null)" when offset is 0 */ if (tu->offset) { - seq_printf(m, "0x%p", (void *)tu->offset); + seq_printf(m, "0x%px", (void *)tu->offset); } else { switch (sizeof(void *)) { case 4: diff --git a/lib/int_sqrt.c b/lib/int_sqrt.c index db0b5aa071fc..036c96781ea8 100644 --- a/lib/int_sqrt.c +++ b/lib/int_sqrt.c @@ -23,6 +23,9 @@ unsigned long int_sqrt(unsigned long x) return x; m = 1UL << (BITS_PER_LONG - 2); + while (m > x) + m >>= 2; + while (m != 0) { b = y + m; y >>= 1; diff --git a/lib/test_printf.c b/lib/test_printf.c index 563f10e6876a..71ebfa43ad05 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -24,24 +24,6 @@ #define PAD_SIZE 16 #define FILL_CHAR '$' -#define PTR1 ((void*)0x01234567) -#define PTR2 ((void*)(long)(int)0xfedcba98) - -#if BITS_PER_LONG == 64 -#define PTR1_ZEROES "000000000" -#define PTR1_SPACES " " -#define PTR1_STR "1234567" -#define PTR2_STR "fffffffffedcba98" -#define PTR_WIDTH 16 -#else -#define PTR1_ZEROES "0" -#define PTR1_SPACES " " -#define PTR1_STR "1234567" -#define PTR2_STR "fedcba98" -#define PTR_WIDTH 8 -#endif -#define PTR_WIDTH_STR stringify(PTR_WIDTH) - static unsigned total_tests __initdata; static unsigned failed_tests __initdata; static char *test_buffer __initdata; @@ -217,30 +199,79 @@ test_string(void) test("a | | ", "%-3.s|%-3.0s|%-3.*s", "a", "b", 0, "c"); } +#define PLAIN_BUF_SIZE 64 /* leave some space so we don't oops */ + +#if BITS_PER_LONG == 64 + +#define PTR_WIDTH 16 +#define PTR ((void *)0xffff0123456789ab) +#define PTR_STR "ffff0123456789ab" +#define ZEROS "00000000" /* hex 32 zero bits */ + +static int __init +plain_format(void) +{ + char buf[PLAIN_BUF_SIZE]; + int nchars; + + nchars = snprintf(buf, PLAIN_BUF_SIZE, "%p", PTR); + + if (nchars != PTR_WIDTH || strncmp(buf, ZEROS, strlen(ZEROS)) != 0) + return -1; + + return 0; +} + +#else + +#define PTR_WIDTH 8 +#define PTR ((void *)0x456789ab) +#define PTR_STR "456789ab" + +static int __init +plain_format(void) +{ + /* Format is implicitly tested for 32 bit machines by plain_hash() */ + return 0; +} + +#endif /* BITS_PER_LONG == 64 */ + +static int __init +plain_hash(void) +{ + char buf[PLAIN_BUF_SIZE]; + int nchars; + + nchars = snprintf(buf, PLAIN_BUF_SIZE, "%p", PTR); + + if (nchars != PTR_WIDTH || strncmp(buf, PTR_STR, PTR_WIDTH) == 0) + return -1; + + return 0; +} + +/* + * We can't use test() to test %p because we don't know what output to expect + * after an address is hashed. + */ static void __init plain(void) { - test(PTR1_ZEROES PTR1_STR " " PTR2_STR, "%p %p", PTR1, PTR2); - /* - * The field width is overloaded for some %p extensions to - * pass another piece of information. For plain pointers, the - * behaviour is slightly odd: One cannot pass either the 0 - * flag nor a precision to %p without gcc complaining, and if - * one explicitly gives a field width, the number is no longer - * zero-padded. - */ - test("|" PTR1_STR PTR1_SPACES " | " PTR1_SPACES PTR1_STR "|", - "|%-*p|%*p|", PTR_WIDTH+2, PTR1, PTR_WIDTH+2, PTR1); - test("|" PTR2_STR " | " PTR2_STR "|", - "|%-*p|%*p|", PTR_WIDTH+2, PTR2, PTR_WIDTH+2, PTR2); + int err; - /* - * Unrecognized %p extensions are treated as plain %p, but the - * alphanumeric suffix is ignored (that is, does not occur in - * the output.) - */ - test("|"PTR1_ZEROES PTR1_STR"|", "|%p0y|", PTR1); - test("|"PTR2_STR"|", "|%p0y|", PTR2); + err = plain_hash(); + if (err) { + pr_warn("plain 'p' does not appear to be hashed\n"); + failed_tests++; + return; + } + + err = plain_format(); + if (err) { + pr_warn("hashing plain 'p' has unexpected format\n"); + failed_tests++; + } } static void __init @@ -251,6 +282,7 @@ symbol_ptr(void) static void __init kernel_ptr(void) { + /* We can't test this without access to kptr_restrict. */ } static void __init diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 4a990f3fd345..ac1f232152f0 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #ifdef CONFIG_BLOCK #include #endif @@ -1343,6 +1345,59 @@ char *uuid_string(char *buf, char *end, const u8 *addr, return string(buf, end, uuid, spec); } +int kptr_restrict __read_mostly; + +static noinline_for_stack +char *restricted_pointer(char *buf, char *end, const void *ptr, + struct printf_spec spec) +{ + spec.base = 16; + spec.flags |= SMALL; + if (spec.field_width == -1) { + spec.field_width = 2 * sizeof(ptr); + spec.flags |= ZEROPAD; + } + + switch (kptr_restrict) { + case 0: + /* Always print %pK values */ + break; + case 1: { + const struct cred *cred; + + /* + * kptr_restrict==1 cannot be used in IRQ context + * because its test for CAP_SYSLOG would be meaningless. + */ + if (in_irq() || in_serving_softirq() || in_nmi()) + return string(buf, end, "pK-error", spec); + + /* + * Only print the real pointer value if the current + * process has CAP_SYSLOG and is running with the + * same credentials it started with. This is because + * access to files is checked at open() time, but %pK + * checks permission at read() time. We don't want to + * leak pointer values if a binary opens a file using + * %pK and then elevates privileges before reading it. + */ + cred = current_cred(); + if (!has_capability_noaudit(current, CAP_SYSLOG) || + !uid_eq(cred->euid, cred->uid) || + !gid_eq(cred->egid, cred->gid)) + ptr = NULL; + break; + } + case 2: + default: + /* Always print 0's for %pK */ + ptr = NULL; + break; + } + + return number(buf, end, (unsigned long)ptr, spec); +} + static noinline_for_stack char *netdev_bits(char *buf, char *end, const void *addr, const char *fmt) { @@ -1588,7 +1643,86 @@ char *device_node_string(char *buf, char *end, struct device_node *dn, return widen_string(buf, buf - buf_start, end, spec); } -int kptr_restrict __read_mostly; +static noinline_for_stack +char *pointer_string(char *buf, char *end, const void *ptr, + struct printf_spec spec) +{ + spec.base = 16; + spec.flags |= SMALL; + if (spec.field_width == -1) { + spec.field_width = 2 * sizeof(ptr); + spec.flags |= ZEROPAD; + } + + return number(buf, end, (unsigned long int)ptr, spec); +} + +static bool have_filled_random_ptr_key __read_mostly; +static siphash_key_t ptr_key __read_mostly; + +static void fill_random_ptr_key(struct random_ready_callback *unused) +{ + get_random_bytes(&ptr_key, sizeof(ptr_key)); + /* + * have_filled_random_ptr_key==true is dependent on get_random_bytes(). + * ptr_to_id() needs to see have_filled_random_ptr_key==true + * after get_random_bytes() returns. + */ + smp_mb(); + WRITE_ONCE(have_filled_random_ptr_key, true); +} + +static struct random_ready_callback random_ready = { + .func = fill_random_ptr_key +}; + +static int __init initialize_ptr_random(void) +{ + int ret = add_random_ready_callback(&random_ready); + + if (!ret) { + return 0; + } else if (ret == -EALREADY) { + fill_random_ptr_key(&random_ready); + return 0; + } + + return ret; +} +early_initcall(initialize_ptr_random); + +/* Maps a pointer to a 32 bit unique identifier. */ +static char *ptr_to_id(char *buf, char *end, void *ptr, struct printf_spec spec) +{ + unsigned long hashval; + const int default_width = 2 * sizeof(ptr); + + if (unlikely(!have_filled_random_ptr_key)) { + spec.field_width = default_width; + /* string length must be less than default_width */ + return string(buf, end, "(ptrval)", spec); + } + +#ifdef CONFIG_64BIT + hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key); + /* + * Mask off the first 32 bits, this makes explicit that we have + * modified the address (and 32 bits is plenty for a unique ID). + */ + hashval = hashval & 0xffffffff; +#else + hashval = (unsigned long)siphash_1u32((u32)ptr, &ptr_key); +#endif + + spec.flags |= SMALL; + if (spec.field_width == -1) { + spec.field_width = default_width; + spec.flags |= ZEROPAD; + } + spec.base = 16; + + return number(buf, end, hashval, spec); +} /* * Show a '%p' thing. A kernel extension is that the '%p' is followed @@ -1695,11 +1829,16 @@ int kptr_restrict __read_mostly; * c major compatible string * C full compatible string * + * - 'x' For printing the address. Equivalent to "%lx". + * * ** Please update also Documentation/printk-formats.txt when making changes ** * * Note: The difference between 'S' and 'F' is that on ia64 and ppc64 * function pointers are really function descriptors, which contain a * pointer to the real address. + * + * Note: The default behaviour (unadorned %p) is to hash the address, + * rendering it useful as a unique identifier. */ static noinline_for_stack char *pointer(const char *fmt, char *buf, char *end, void *ptr, @@ -1789,47 +1928,7 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, return buf; } case 'K': - switch (kptr_restrict) { - case 0: - /* Always print %pK values */ - break; - case 1: { - const struct cred *cred; - - /* - * kptr_restrict==1 cannot be used in IRQ context - * because its test for CAP_SYSLOG would be meaningless. - */ - if (in_irq() || in_serving_softirq() || in_nmi()) { - if (spec.field_width == -1) - spec.field_width = default_width; - return string(buf, end, "pK-error", spec); - } - - /* - * Only print the real pointer value if the current - * process has CAP_SYSLOG and is running with the - * same credentials it started with. This is because - * access to files is checked at open() time, but %pK - * checks permission at read() time. We don't want to - * leak pointer values if a binary opens a file using - * %pK and then elevates privileges before reading it. - */ - cred = current_cred(); - if (!has_capability_noaudit(current, CAP_SYSLOG) || - !uid_eq(cred->euid, cred->uid) || - !gid_eq(cred->egid, cred->gid)) - ptr = NULL; - break; - } - case 2: - default: - /* Always print 0's for %pK */ - ptr = NULL; - break; - } - break; - + return restricted_pointer(buf, end, ptr, spec); case 'N': return netdev_bits(buf, end, ptr, fmt); case 'a': @@ -1854,15 +1953,12 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, case 'F': return device_node_string(buf, end, ptr, spec, fmt + 1); } + case 'x': + return pointer_string(buf, end, ptr, spec); } - spec.flags |= SMALL; - if (spec.field_width == -1) { - spec.field_width = default_width; - spec.flags |= ZEROPAD; - } - spec.base = 16; - return number(buf, end, (unsigned long) ptr, spec); + /* default is to _not_ leak addresses, hash before printing */ + return ptr_to_id(buf, end, ptr, spec); } /* diff --git a/mm/debug.c b/mm/debug.c index c55abc893fdc..97609290dd51 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -50,7 +50,7 @@ void __dump_page(struct page *page, const char *reason) */ int mapcount = PageSlab(page) ? 0 : page_mapcount(page); - pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx", + pr_emerg("page:%px count:%d mapcount:%d mapping:%px index:%#lx", page, page_ref_count(page), mapcount, page->mapping, page_to_pgoff(page)); if (PageCompound(page)) @@ -69,7 +69,7 @@ void __dump_page(struct page *page, const char *reason) #ifdef CONFIG_MEMCG if (page->mem_cgroup) - pr_alert("page->mem_cgroup:%p\n", page->mem_cgroup); + pr_alert("page->mem_cgroup:%px\n", page->mem_cgroup); #endif } @@ -84,10 +84,10 @@ EXPORT_SYMBOL(dump_page); void dump_vma(const struct vm_area_struct *vma) { - pr_emerg("vma %p start %p end %p\n" - "next %p prev %p mm %p\n" - "prot %lx anon_vma %p vm_ops %p\n" - "pgoff %lx file %p private_data %p\n" + pr_emerg("vma %px start %px end %px\n" + "next %px prev %px mm %px\n" + "prot %lx anon_vma %px vm_ops %px\n" + "pgoff %lx file %px private_data %px\n" "flags: %#lx(%pGv)\n", vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_next, vma->vm_prev, vma->vm_mm, @@ -100,27 +100,27 @@ EXPORT_SYMBOL(dump_vma); void dump_mm(const struct mm_struct *mm) { - pr_emerg("mm %p mmap %p seqnum %llu task_size %lu\n" + pr_emerg("mm %px mmap %px seqnum %llu task_size %lu\n" #ifdef CONFIG_MMU - "get_unmapped_area %p\n" + "get_unmapped_area %px\n" #endif "mmap_base %lu mmap_legacy_base %lu highest_vm_end %lu\n" - "pgd %p mm_users %d mm_count %d nr_ptes %lu nr_pmds %lu map_count %d\n" + "pgd %px mm_users %d mm_count %d nr_ptes %lu nr_pmds %lu map_count %d\n" "hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n" "pinned_vm %lx data_vm %lx exec_vm %lx stack_vm %lx\n" "start_code %lx end_code %lx start_data %lx end_data %lx\n" "start_brk %lx brk %lx start_stack %lx\n" "arg_start %lx arg_end %lx env_start %lx env_end %lx\n" - "binfmt %p flags %lx core_state %p\n" + "binfmt %px flags %lx core_state %px\n" #ifdef CONFIG_AIO - "ioctx_table %p\n" + "ioctx_table %px\n" #endif #ifdef CONFIG_MEMCG - "owner %p " + "owner %px " #endif - "exe_file %p\n" + "exe_file %px\n" #ifdef CONFIG_MMU_NOTIFIER - "mmu_notifier_mm %p\n" + "mmu_notifier_mm %px\n" #endif #ifdef CONFIG_NUMA_BALANCING "numa_next_scan %lu numa_scan_offset %lu numa_scan_seq %d\n" diff --git a/mm/filemap.c b/mm/filemap.c index a75c3601fb7b..432ced5b8578 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1443,7 +1443,7 @@ EXPORT_SYMBOL(find_lock_entry); * - FGP_CREAT: If page is not present then a new page is allocated using * @gfp_mask and added to the page cache and the VM's LRU * list. The page is returned locked and with an increased - * refcount. Otherwise, NULL is returned. + * refcount. * - FGP_FOR_MMAP: Similar to FGP_CREAT, only we want to allow the caller to do * its own locking dance if the page is already in cache, or unlock the page * before returning if we had to add the page to pagecache. @@ -1515,10 +1515,10 @@ no_page: } /* - * add_to_page_cache_lru lock's the page, and for mmap we expect - * a unlocked page. + * add_to_page_cache_lru locks the page, and for mmap we expect + * an unlocked page. */ - if (fgp_flags & FGP_FOR_MMAP) + if (page && (fgp_flags & FGP_FOR_MMAP)) unlock_page(page); } @@ -2301,6 +2301,7 @@ static struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf, struct file *fpin) { int flags = vmf->flags; + if (fpin) return fpin; @@ -2334,6 +2335,11 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page, if (trylock_page(page)) return 1; + /* + * NOTE! This will make us return with VM_FAULT_RETRY, but with + * the mmap_sem still held. That's how FAULT_FLAG_RETRY_NOWAIT + * is supposed to work. We have way too many special cases.. + */ if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) return 0; diff --git a/mm/kasan/report.c b/mm/kasan/report.c index 95bbdce2f129..75206991ece0 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -138,7 +138,7 @@ static void print_error_description(struct kasan_access_info *info) pr_err("BUG: KASAN: %s in %pS\n", bug_type, (void *)info->ip); - pr_err("%s of size %zu at addr %p by task %s/%d\n", + pr_err("%s of size %zu at addr %px by task %s/%d\n", info->is_write ? "Write" : "Read", info->access_size, info->access_addr, current->comm, task_pid_nr(current)); } @@ -210,7 +210,7 @@ static void describe_object_addr(struct kmem_cache *cache, void *object, const char *rel_type; int rel_bytes; - pr_err("The buggy address belongs to the object at %p\n" + pr_err("The buggy address belongs to the object at %px\n" " which belongs to the cache %s of size %d\n", object, cache->name, cache->object_size); @@ -229,7 +229,7 @@ static void describe_object_addr(struct kmem_cache *cache, void *object, } pr_err("The buggy address is located %d bytes %s of\n" - " %d-byte region [%p, %p)\n", + " %d-byte region [%px, %px)\n", rel_bytes, rel_type, cache->object_size, (void *)object_addr, (void *)(object_addr + cache->object_size)); } @@ -306,7 +306,7 @@ static void print_shadow_for_address(const void *addr) char shadow_buf[SHADOW_BYTES_PER_ROW]; snprintf(buffer, sizeof(buffer), - (i == 0) ? ">%p: " : " %p: ", kaddr); + (i == 0) ? ">%px: " : " %px: ", kaddr); /* * We should not pass a shadow pointer to generic * function, because generic functions may try to diff --git a/mm/slab.c b/mm/slab.c index b3f919328b0f..6613f670a605 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1589,11 +1589,8 @@ static void print_objinfo(struct kmem_cache *cachep, void *objp, int lines) *dbg_redzone2(cachep, objp)); } - if (cachep->flags & SLAB_STORE_USER) { - pr_err("Last user: [<%p>](%pSR)\n", - *dbg_userword(cachep, objp), - *dbg_userword(cachep, objp)); - } + if (cachep->flags & SLAB_STORE_USER) + pr_err("Last user: (%pSR)\n", *dbg_userword(cachep, objp)); realobj = (char *)objp + obj_offset(cachep); size = cachep->object_size; for (i = 0; i < size && lines; i += 16, lines--) { @@ -1626,7 +1623,7 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp) /* Mismatch ! */ /* Print header */ if (lines == 0) { - pr_err("Slab corruption (%s): %s start=%p, len=%d\n", + pr_err("Slab corruption (%s): %s start=%px, len=%d\n", print_tainted(), cachep->name, realobj, size); print_objinfo(cachep, objp, 0); @@ -1655,13 +1652,13 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp) if (objnr) { objp = index_to_obj(cachep, page, objnr - 1); realobj = (char *)objp + obj_offset(cachep); - pr_err("Prev obj: start=%p, len=%d\n", realobj, size); + pr_err("Prev obj: start=%px, len=%d\n", realobj, size); print_objinfo(cachep, objp, 2); } if (objnr + 1 < cachep->num) { objp = index_to_obj(cachep, page, objnr + 1); realobj = (char *)objp + obj_offset(cachep); - pr_err("Next obj: start=%p, len=%d\n", realobj, size); + pr_err("Next obj: start=%px, len=%d\n", realobj, size); print_objinfo(cachep, objp, 2); } } @@ -2612,7 +2609,7 @@ static void slab_put_obj(struct kmem_cache *cachep, /* Verify double free bug */ for (i = page->active; i < cachep->num; i++) { if (get_free_obj(page, i) == objnr) { - pr_err("slab: double free detected in cache '%s', objp %p\n", + pr_err("slab: double free detected in cache '%s', objp %px\n", cachep->name, objp); BUG(); } @@ -2776,7 +2773,7 @@ static inline void verify_redzone_free(struct kmem_cache *cache, void *obj) else slab_error(cache, "memory outside object was overwritten"); - pr_err("%p: redzone 1:0x%llx, redzone 2:0x%llx\n", + pr_err("%px: redzone 1:0x%llx, redzone 2:0x%llx\n", obj, redzone1, redzone2); } @@ -3082,7 +3079,7 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep, if (*dbg_redzone1(cachep, objp) != RED_INACTIVE || *dbg_redzone2(cachep, objp) != RED_INACTIVE) { slab_error(cachep, "double free, or memory outside object was overwritten"); - pr_err("%p: redzone 1:0x%llx, redzone 2:0x%llx\n", + pr_err("%px: redzone 1:0x%llx, redzone 2:0x%llx\n", objp, *dbg_redzone1(cachep, objp), *dbg_redzone2(cachep, objp)); } @@ -3095,7 +3092,7 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep, cachep->ctor(objp); if (ARCH_SLAB_MINALIGN && ((unsigned long)objp & (ARCH_SLAB_MINALIGN-1))) { - pr_err("0x%p: not aligned to ARCH_SLAB_MINALIGN=%d\n", + pr_err("0x%px: not aligned to ARCH_SLAB_MINALIGN=%d\n", objp, (int)ARCH_SLAB_MINALIGN); } return objp; @@ -4293,7 +4290,7 @@ static void show_symbol(struct seq_file *m, unsigned long address) return; } #endif - seq_printf(m, "%p", (void *)address); + seq_printf(m, "%px", (void *)address); } static int leaks_show(struct seq_file *m, void *p) diff --git a/mm/usercopy.c b/mm/usercopy.c index 055382301eb2..f450684ed85e 100644 --- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -58,12 +58,11 @@ static noinline int check_stack_object(const void *obj, unsigned long len) return GOOD_STACK; } -static void report_usercopy(const void *ptr, unsigned long len, - bool to_user, const char *type) +static void report_usercopy(unsigned long len, bool to_user, const char *type) { - pr_emerg("kernel memory %s attempt detected %s %p (%s) (%lu bytes)\n", + pr_emerg("kernel memory %s attempt detected %s '%s' (%lu bytes)\n", to_user ? "exposure" : "overwrite", - to_user ? "from" : "to", ptr, type ? : "unknown", len); + to_user ? "from" : "to", type ? : "unknown", len); /* * For greater effect, it would be nice to do do_group_exit(), * but BUG() actually hooks all the lock-breaking and per-arch @@ -267,6 +266,6 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user) return; report: - report_usercopy(ptr, n, to_user, err); + report_usercopy(n, to_user, err); } EXPORT_SYMBOL(__check_object_size); diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 65d734c165bd..4a05235929b9 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -826,8 +826,6 @@ static int hci_sock_release(struct socket *sock) if (!sk) return 0; - hdev = hci_pi(sk)->hdev; - switch (hci_pi(sk)->channel) { case HCI_CHANNEL_MONITOR: atomic_dec(&monitor_promisc); @@ -849,6 +847,7 @@ static int hci_sock_release(struct socket *sock) bt_sock_unlink(&hci_sk_list, sk); + hdev = hci_pi(sk)->hdev; if (hdev) { if (hci_pi(sk)->channel == HCI_CHANNEL_USER) { /* When releasing a user channel exclusive access, diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index 53392ac58b38..38b3309edba8 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -31,10 +31,6 @@ /* needed for logical [in,out]-dev filtering */ #include "../br_private.h" -#define BUGPRINT(format, args...) printk("kernel msg: ebtables bug: please "\ - "report to author: "format, ## args) -/* #define BUGPRINT(format, args...) */ - /* Each cpu has its own set of counters, so there is no need for write_lock in * the softirq * For reading or updating the counters, the user context needs to @@ -453,8 +449,6 @@ static int ebt_verify_pointers(const struct ebt_replace *repl, /* we make userspace set this right, * so there is no misunderstanding */ - BUGPRINT("EBT_ENTRY_OR_ENTRIES shouldn't be set " - "in distinguisher\n"); return -EINVAL; } if (i != NF_BR_NUMHOOKS) @@ -472,18 +466,14 @@ static int ebt_verify_pointers(const struct ebt_replace *repl, offset += e->next_offset; } } - if (offset != limit) { - BUGPRINT("entries_size too small\n"); + if (offset != limit) return -EINVAL; - } /* check if all valid hooks have a chain */ for (i = 0; i < NF_BR_NUMHOOKS; i++) { if (!newinfo->hook_entry[i] && - (valid_hooks & (1 << i))) { - BUGPRINT("Valid hook without chain\n"); + (valid_hooks & (1 << i))) return -EINVAL; - } } return 0; } @@ -510,26 +500,20 @@ ebt_check_entry_size_and_hooks(const struct ebt_entry *e, /* this checks if the previous chain has as many entries * as it said it has */ - if (*n != *cnt) { - BUGPRINT("nentries does not equal the nr of entries " - "in the chain\n"); + if (*n != *cnt) return -EINVAL; - } + if (((struct ebt_entries *)e)->policy != EBT_DROP && ((struct ebt_entries *)e)->policy != EBT_ACCEPT) { /* only RETURN from udc */ if (i != NF_BR_NUMHOOKS || - ((struct ebt_entries *)e)->policy != EBT_RETURN) { - BUGPRINT("bad policy\n"); + ((struct ebt_entries *)e)->policy != EBT_RETURN) return -EINVAL; - } } if (i == NF_BR_NUMHOOKS) /* it's a user defined chain */ (*udc_cnt)++; - if (((struct ebt_entries *)e)->counter_offset != *totalcnt) { - BUGPRINT("counter_offset != totalcnt"); + if (((struct ebt_entries *)e)->counter_offset != *totalcnt) return -EINVAL; - } *n = ((struct ebt_entries *)e)->nentries; *cnt = 0; return 0; @@ -537,15 +521,13 @@ ebt_check_entry_size_and_hooks(const struct ebt_entry *e, /* a plain old entry, heh */ if (sizeof(struct ebt_entry) > e->watchers_offset || e->watchers_offset > e->target_offset || - e->target_offset >= e->next_offset) { - BUGPRINT("entry offsets not in right order\n"); + e->target_offset >= e->next_offset) return -EINVAL; - } + /* this is not checked anywhere else */ - if (e->next_offset - e->target_offset < sizeof(struct ebt_entry_target)) { - BUGPRINT("target size too small\n"); + if (e->next_offset - e->target_offset < sizeof(struct ebt_entry_target)) return -EINVAL; - } + (*cnt)++; (*totalcnt)++; return 0; @@ -665,18 +647,15 @@ ebt_check_entry(struct ebt_entry *e, struct net *net, if (e->bitmask == 0) return 0; - if (e->bitmask & ~EBT_F_MASK) { - BUGPRINT("Unknown flag for bitmask\n"); + if (e->bitmask & ~EBT_F_MASK) return -EINVAL; - } - if (e->invflags & ~EBT_INV_MASK) { - BUGPRINT("Unknown flag for inv bitmask\n"); + + if (e->invflags & ~EBT_INV_MASK) return -EINVAL; - } - if ((e->bitmask & EBT_NOPROTO) && (e->bitmask & EBT_802_3)) { - BUGPRINT("NOPROTO & 802_3 not allowed\n"); + + if ((e->bitmask & EBT_NOPROTO) && (e->bitmask & EBT_802_3)) return -EINVAL; - } + /* what hook do we belong to? */ for (i = 0; i < NF_BR_NUMHOOKS; i++) { if (!newinfo->hook_entry[i]) @@ -735,13 +714,11 @@ ebt_check_entry(struct ebt_entry *e, struct net *net, t->u.target = target; if (t->u.target == &ebt_standard_target) { if (gap < sizeof(struct ebt_standard_target)) { - BUGPRINT("Standard target size too big\n"); ret = -EFAULT; goto cleanup_watchers; } if (((struct ebt_standard_target *)t)->verdict < -NUM_STANDARD_TARGETS) { - BUGPRINT("Invalid standard target\n"); ret = -EFAULT; goto cleanup_watchers; } @@ -801,10 +778,9 @@ static int check_chainloops(const struct ebt_entries *chain, struct ebt_cl_stack if (strcmp(t->u.name, EBT_STANDARD_TARGET)) goto letscontinue; if (e->target_offset + sizeof(struct ebt_standard_target) > - e->next_offset) { - BUGPRINT("Standard target size too big\n"); + e->next_offset) return -1; - } + verdict = ((struct ebt_standard_target *)t)->verdict; if (verdict >= 0) { /* jump to another chain */ struct ebt_entries *hlp2 = @@ -813,14 +789,12 @@ static int check_chainloops(const struct ebt_entries *chain, struct ebt_cl_stack if (hlp2 == cl_s[i].cs.chaininfo) break; /* bad destination or loop */ - if (i == udc_cnt) { - BUGPRINT("bad destination\n"); + if (i == udc_cnt) return -1; - } - if (cl_s[i].cs.n) { - BUGPRINT("loop\n"); + + if (cl_s[i].cs.n) return -1; - } + if (cl_s[i].hookmask & (1 << hooknr)) goto letscontinue; /* this can't be 0, so the loop test is correct */ @@ -853,24 +827,21 @@ static int translate_table(struct net *net, const char *name, i = 0; while (i < NF_BR_NUMHOOKS && !newinfo->hook_entry[i]) i++; - if (i == NF_BR_NUMHOOKS) { - BUGPRINT("No valid hooks specified\n"); + if (i == NF_BR_NUMHOOKS) return -EINVAL; - } - if (newinfo->hook_entry[i] != (struct ebt_entries *)newinfo->entries) { - BUGPRINT("Chains don't start at beginning\n"); + + if (newinfo->hook_entry[i] != (struct ebt_entries *)newinfo->entries) return -EINVAL; - } + /* make sure chains are ordered after each other in same order * as their corresponding hooks */ for (j = i + 1; j < NF_BR_NUMHOOKS; j++) { if (!newinfo->hook_entry[j]) continue; - if (newinfo->hook_entry[j] <= newinfo->hook_entry[i]) { - BUGPRINT("Hook order must be followed\n"); + if (newinfo->hook_entry[j] <= newinfo->hook_entry[i]) return -EINVAL; - } + i = j; } @@ -888,15 +859,11 @@ static int translate_table(struct net *net, const char *name, if (ret != 0) return ret; - if (i != j) { - BUGPRINT("nentries does not equal the nr of entries in the " - "(last) chain\n"); + if (i != j) return -EINVAL; - } - if (k != newinfo->nentries) { - BUGPRINT("Total nentries is wrong\n"); + + if (k != newinfo->nentries) return -EINVAL; - } /* get the location of the udc, put them in an array * while we're at it, allocate the chainstack @@ -929,7 +896,6 @@ static int translate_table(struct net *net, const char *name, ebt_get_udc_positions, newinfo, &i, cl_s); /* sanity check */ if (i != udc_cnt) { - BUGPRINT("i != udc_cnt\n"); vfree(cl_s); return -EFAULT; } @@ -1030,7 +996,6 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl, goto free_unlock; if (repl->num_counters && repl->num_counters != t->private->nentries) { - BUGPRINT("Wrong nr. of counters requested\n"); ret = -EINVAL; goto free_unlock; } @@ -1115,15 +1080,12 @@ static int do_replace(struct net *net, const void __user *user, if (copy_from_user(&tmp, user, sizeof(tmp)) != 0) return -EFAULT; - if (len != sizeof(tmp) + tmp.entries_size) { - BUGPRINT("Wrong len argument\n"); + if (len != sizeof(tmp) + tmp.entries_size) return -EINVAL; - } - if (tmp.entries_size == 0) { - BUGPRINT("Entries_size never zero\n"); + if (tmp.entries_size == 0) return -EINVAL; - } + /* overflow check */ if (tmp.nentries >= ((INT_MAX - sizeof(struct ebt_table_info)) / NR_CPUS - SMP_CACHE_BYTES) / sizeof(struct ebt_counter)) @@ -1150,7 +1112,6 @@ static int do_replace(struct net *net, const void __user *user, } if (copy_from_user( newinfo->entries, tmp.entries, tmp.entries_size) != 0) { - BUGPRINT("Couldn't copy entries from userspace\n"); ret = -EFAULT; goto free_entries; } @@ -1197,10 +1158,8 @@ int ebt_register_table(struct net *net, const struct ebt_table *input_table, if (input_table == NULL || (repl = input_table->table) == NULL || repl->entries == NULL || repl->entries_size == 0 || - repl->counters != NULL || input_table->private != NULL) { - BUGPRINT("Bad table data for ebt_register_table!!!\n"); + repl->counters != NULL || input_table->private != NULL) return -EINVAL; - } /* Don't add one table to multiple lists. */ table = kmemdup(input_table, sizeof(struct ebt_table), GFP_KERNEL); @@ -1238,13 +1197,10 @@ int ebt_register_table(struct net *net, const struct ebt_table *input_table, ((char *)repl->hook_entry[i] - repl->entries); } ret = translate_table(net, repl->name, newinfo); - if (ret != 0) { - BUGPRINT("Translate_table failed\n"); + if (ret != 0) goto free_chainstack; - } if (table->check && table->check(newinfo, table->valid_hooks)) { - BUGPRINT("The table doesn't like its own initial data, lol\n"); ret = -EINVAL; goto free_chainstack; } @@ -1255,7 +1211,6 @@ int ebt_register_table(struct net *net, const struct ebt_table *input_table, list_for_each_entry(t, &net->xt.tables[NFPROTO_BRIDGE], list) { if (strcmp(t->name, table->name) == 0) { ret = -EEXIST; - BUGPRINT("Table name already exists\n"); goto free_unlock; } } @@ -1327,7 +1282,6 @@ static int do_update_counters(struct net *net, const char *name, goto free_tmp; if (num_counters != t->private->nentries) { - BUGPRINT("Wrong nr of counters\n"); ret = -EINVAL; goto unlock_mutex; } @@ -1452,10 +1406,8 @@ static int copy_counters_to_user(struct ebt_table *t, if (num_counters == 0) return 0; - if (num_counters != nentries) { - BUGPRINT("Num_counters wrong\n"); + if (num_counters != nentries) return -EINVAL; - } counterstmp = vmalloc(nentries * sizeof(*counterstmp)); if (!counterstmp) @@ -1501,15 +1453,11 @@ static int copy_everything_to_user(struct ebt_table *t, void __user *user, (tmp.num_counters ? nentries * sizeof(struct ebt_counter) : 0)) return -EINVAL; - if (tmp.nentries != nentries) { - BUGPRINT("Nentries wrong\n"); + if (tmp.nentries != nentries) return -EINVAL; - } - if (tmp.entries_size != entries_size) { - BUGPRINT("Wrong size\n"); + if (tmp.entries_size != entries_size) return -EINVAL; - } ret = copy_counters_to_user(t, oldcounters, tmp.counters, tmp.num_counters, nentries); @@ -1581,7 +1529,6 @@ static int do_ebt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len) } mutex_unlock(&ebt_mutex); if (copy_to_user(user, &tmp, *len) != 0) { - BUGPRINT("c2u Didn't work\n"); ret = -EFAULT; break; } diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c index cdb5b693a135..1001377ae428 100644 --- a/net/ceph/ceph_common.c +++ b/net/ceph/ceph_common.c @@ -720,7 +720,6 @@ int __ceph_open_session(struct ceph_client *client, unsigned long started) } EXPORT_SYMBOL(__ceph_open_session); - int ceph_open_session(struct ceph_client *client) { int ret; @@ -736,6 +735,23 @@ int ceph_open_session(struct ceph_client *client) } EXPORT_SYMBOL(ceph_open_session); +int ceph_wait_for_latest_osdmap(struct ceph_client *client, + unsigned long timeout) +{ + u64 newest_epoch; + int ret; + + ret = ceph_monc_get_version(&client->monc, "osdmap", &newest_epoch); + if (ret) + return ret; + + if (client->osdc.osdmap->epoch >= newest_epoch) + return 0; + + ceph_osdc_maybe_request_map(&client->osdc); + return ceph_monc_wait_osdmap(&client->monc, newest_epoch, timeout); +} +EXPORT_SYMBOL(ceph_wait_for_latest_osdmap); static int __init init_ceph_lib(void) { diff --git a/net/ceph/mon_client.c b/net/ceph/mon_client.c index f14498a7eaec..daca0af59942 100644 --- a/net/ceph/mon_client.c +++ b/net/ceph/mon_client.c @@ -922,6 +922,15 @@ int ceph_monc_blacklist_add(struct ceph_mon_client *monc, mutex_unlock(&monc->mutex); ret = wait_generic_request(req); + if (!ret) + /* + * Make sure we have the osdmap that includes the blacklist + * entry. This is needed to ensure that the OSDs pick up the + * new blacklist before processing any future requests from + * this client. + */ + ret = ceph_wait_for_latest_osdmap(monc->client, 0); + out: put_generic_request(req); return ret; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index e91f123faae1..1791d67c3a0b 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -121,20 +121,6 @@ #endif #include -#ifdef CONFIG_ANDROID_PARANOID_NETWORK -#include - -static inline int current_has_network(void) -{ - return in_egroup_p(AID_INET) || capable(CAP_NET_RAW); -} -#else -static inline int current_has_network(void) -{ - return 1; -} -#endif - int sysctl_reserved_port_bind __read_mostly = 1; /* The inetsw table contains everything that inet_create needs to @@ -270,9 +256,6 @@ static int inet_create(struct net *net, struct socket *sock, int protocol, if (protocol < 0 || protocol >= IPPROTO_MAX) return -EINVAL; - if (!current_has_network()) - return -EACCES; - sock->state = SS_UNCONNECTED; /* Look for the requested type/protocol pair. */ diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 70b392d2bb07..fda5fae57b83 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -66,20 +66,6 @@ #include #include -#ifdef CONFIG_ANDROID_PARANOID_NETWORK -#include - -static inline int current_has_network(void) -{ - return in_egroup_p(AID_INET) || capable(CAP_NET_RAW); -} -#else -static inline int current_has_network(void) -{ - return 1; -} -#endif - #include "ip6_offload.h" MODULE_AUTHOR("Cast of dozens"); @@ -136,9 +122,6 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol, if (protocol < 0 || protocol >= IPPROTO_MAX) return -EINVAL; - if (!current_has_network()) - return -EACCES; - /* Look for the requested type/protocol pair. */ lookup_protocol: err = -ESOCKTNOSUPPORT; diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 0888e6038d22..a98c04b4603d 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -6002,7 +6002,7 @@ sub process { for (my $count = $linenr; $count <= $lc; $count++) { my $fmt = get_quoted_string($lines[$count - 1], raw_line($count, 0)); $fmt =~ s/%%//g; - if ($fmt =~ /(\%[\*\d\.]*p(?![\WFfSsBKRraEhMmIiUDdgVCbGNO]).)/) { + if ($fmt =~ /(\%[\*\d\.]*p(?![\WFfSsBKRraEhMmIiUDdgVCbGNOx]).)/) { $bad_extension = $1; last; } diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c index 8a027973f2ad..e3f3351da480 100644 --- a/sound/pci/hda/hda_codec.c +++ b/sound/pci/hda/hda_codec.c @@ -2900,6 +2900,7 @@ static void hda_call_codec_resume(struct hda_codec *codec) hda_jackpoll_work(&codec->jackpoll_work.work); else snd_hda_jack_report_sync(codec); + codec->core.dev.power.power_state = PMSG_ON; atomic_dec(&codec->core.in_pm); } @@ -2932,10 +2933,62 @@ static int hda_codec_runtime_resume(struct device *dev) } #endif /* CONFIG_PM */ +#ifdef CONFIG_PM_SLEEP +static int hda_codec_force_resume(struct device *dev) +{ + int ret; + + /* The get/put pair below enforces the runtime resume even if the + * device hasn't been used at suspend time. This trick is needed to + * update the jack state change during the sleep. + */ + pm_runtime_get_noresume(dev); + ret = pm_runtime_force_resume(dev); + pm_runtime_put(dev); + return ret; +} + +static int hda_codec_pm_suspend(struct device *dev) +{ + dev->power.power_state = PMSG_SUSPEND; + return pm_runtime_force_suspend(dev); +} + +static int hda_codec_pm_resume(struct device *dev) +{ + dev->power.power_state = PMSG_RESUME; + return hda_codec_force_resume(dev); +} + +static int hda_codec_pm_freeze(struct device *dev) +{ + dev->power.power_state = PMSG_FREEZE; + return pm_runtime_force_suspend(dev); +} + +static int hda_codec_pm_thaw(struct device *dev) +{ + dev->power.power_state = PMSG_THAW; + return hda_codec_force_resume(dev); +} + +static int hda_codec_pm_restore(struct device *dev) +{ + dev->power.power_state = PMSG_RESTORE; + return hda_codec_force_resume(dev); +} +#endif /* CONFIG_PM_SLEEP */ + /* referred in hda_bind.c */ const struct dev_pm_ops hda_codec_driver_pm = { - SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, - pm_runtime_force_resume) +#ifdef CONFIG_PM_SLEEP + .suspend = hda_codec_pm_suspend, + .resume = hda_codec_pm_resume, + .freeze = hda_codec_pm_freeze, + .thaw = hda_codec_pm_thaw, + .poweroff = hda_codec_pm_suspend, + .restore = hda_codec_pm_restore, +#endif /* CONFIG_PM_SLEEP */ SET_RUNTIME_PM_OPS(hda_codec_runtime_suspend, hda_codec_runtime_resume, NULL) }; diff --git a/sound/x86/intel_hdmi_audio.c b/sound/x86/intel_hdmi_audio.c index 8b7abbd69116..88fe5eb4516f 100644 --- a/sound/x86/intel_hdmi_audio.c +++ b/sound/x86/intel_hdmi_audio.c @@ -1887,7 +1887,6 @@ static int hdmi_lpe_audio_probe(struct platform_device *pdev) pm_runtime_use_autosuspend(&pdev->dev); pm_runtime_mark_last_busy(&pdev->dev); - pm_runtime_set_active(&pdev->dev); dev_dbg(&pdev->dev, "%s: handle pending notification\n", __func__); for_each_port(card_ctx, port) { diff --git a/tools/objtool/check.c b/tools/objtool/check.c index e128d1c71c30..3ff025b64527 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2132,9 +2132,10 @@ static void cleanup(struct objtool_file *file) elf_close(file->elf); } +static struct objtool_file file; + int check(const char *_objname, bool orc) { - struct objtool_file file; int ret, warnings = 0; objname = _objname; diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 68786bb7790e..6670e12a2bb3 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -169,8 +169,10 @@ static struct map *kernel_get_module_map(const char *module) if (module && strchr(module, '/')) return dso__new_map(module); - if (!module) - module = "kernel"; + if (!module) { + pos = machine__kernel_map(host_machine); + return map__get(pos); + } for (pos = maps__first(maps); pos; pos = map__next(pos)) { /* short_name is "[module]" */