Commit e89c4a9
btrfs: allocate scrub workqueues outside of locks
I got the following lockdep splat while testing:
======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc7-00172-g021118712e59 #932 Not tainted
------------------------------------------------------
btrfs/229626 is trying to acquire lock:
ffffffff828513f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0x378/0x450
but task is already holding lock:
ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #7 (&fs_info->scrub_lock){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
btrfs_scrub_dev+0x11c/0x630
btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
btrfs_ioctl+0x2799/0x30a0
ksys_ioctl+0x83/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #6 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
btrfs_run_dev_stats+0x49/0x480
commit_cowonly_roots+0xb5/0x2a0
btrfs_commit_transaction+0x516/0xa60
sync_filesystem+0x6b/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0xe/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x29/0x60
cleanup_mnt+0xb8/0x140
task_work_run+0x6d/0xb0
__prepare_exit_to_usermode+0x1cc/0x1e0
do_syscall_64+0x5c/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #5 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
btrfs_commit_transaction+0x4bb/0xa60
sync_filesystem+0x6b/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0xe/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x29/0x60
cleanup_mnt+0xb8/0x140
task_work_run+0x6d/0xb0
__prepare_exit_to_usermode+0x1cc/0x1e0
do_syscall_64+0x5c/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #4 (&fs_info->reloc_mutex){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
btrfs_record_root_in_trans+0x43/0x70
start_transaction+0xd1/0x5d0
btrfs_dirty_inode+0x42/0xd0
touch_atime+0xa1/0xd0
btrfs_file_mmap+0x3f/0x60
mmap_region+0x3a4/0x640
do_mmap+0x376/0x580
vm_mmap_pgoff+0xd5/0x120
ksys_mmap_pgoff+0x193/0x230
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #3 (&mm->mmap_lock#2){++++}-{3:3}:
__might_fault+0x68/0x90
_copy_to_user+0x1e/0x80
perf_read+0x141/0x2c0
vfs_read+0xad/0x1b0
ksys_read+0x5f/0xe0
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #2 (&cpuctx_mutex){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
perf_event_init_cpu+0x88/0x150
perf_event_init+0x1db/0x20b
start_kernel+0x3ae/0x53c
secondary_startup_64+0xa4/0xb0
-> #1 (pmus_lock){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
perf_event_init_cpu+0x4f/0x150
cpuhp_invoke_callback+0xb1/0x900
_cpu_up.constprop.26+0x9f/0x130
cpu_up+0x7b/0xc0
bringup_nonboot_cpus+0x4f/0x60
smp_init+0x26/0x71
kernel_init_freeable+0x110/0x258
kernel_init+0xa/0x103
ret_from_fork+0x1f/0x30
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
__lock_acquire+0x1272/0x2310
lock_acquire+0x9e/0x360
cpus_read_lock+0x39/0xb0
alloc_workqueue+0x378/0x450
__btrfs_alloc_workqueue+0x15d/0x200
btrfs_alloc_workqueue+0x51/0x160
scrub_workers_get+0x5a/0x170
btrfs_scrub_dev+0x18c/0x630
btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
btrfs_ioctl+0x2799/0x30a0
ksys_ioctl+0x83/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Chain exists of:
cpu_hotplug_lock --> &fs_devs->device_list_mutex --> &fs_info->scrub_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->scrub_lock);
lock(&fs_devs->device_list_mutex);
lock(&fs_info->scrub_lock);
lock(cpu_hotplug_lock);
*** DEADLOCK ***
2 locks held by btrfs/229626:
#0: ffff88bfe8bb86e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_scrub_dev+0xbd/0x630
#1: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630
stack backtrace:
CPU: 15 PID: 229626 Comm: btrfs Kdump: loaded Not tainted 5.8.0-rc7-00172-g021118712e59 #932
Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
Call Trace:
dump_stack+0x78/0xa0
check_noncircular+0x165/0x180
__lock_acquire+0x1272/0x2310
lock_acquire+0x9e/0x360
? alloc_workqueue+0x378/0x450
cpus_read_lock+0x39/0xb0
? alloc_workqueue+0x378/0x450
alloc_workqueue+0x378/0x450
? rcu_read_lock_sched_held+0x52/0x80
__btrfs_alloc_workqueue+0x15d/0x200
btrfs_alloc_workqueue+0x51/0x160
scrub_workers_get+0x5a/0x170
btrfs_scrub_dev+0x18c/0x630
? start_transaction+0xd1/0x5d0
btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
btrfs_ioctl+0x2799/0x30a0
? do_sigaction+0x102/0x250
? lockdep_hardirqs_on_prepare+0xca/0x160
? _raw_spin_unlock_irq+0x24/0x30
? trace_hardirqs_on+0x1c/0xe0
? _raw_spin_unlock_irq+0x24/0x30
? do_sigaction+0x102/0x250
? ksys_ioctl+0x83/0xc0
ksys_ioctl+0x83/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This happens because we're allocating the scrub workqueues under the
scrub and device list mutex, which brings in a whole host of other
dependencies.
Because the work queue allocation is done with GFP_KERNEL, it can
trigger reclaim, which can lead to a transaction commit, which in turns
needs the device_list_mutex, it can lead to a deadlock. A different
problem for which this fix is a solution.
Fix this by moving the actual allocation outside of the
scrub lock, and then only take the lock once we're ready to actually
assign them to the fs_info. We'll now have to cleanup the workqueues in
a few more places, so I've added a helper to do the refcount dance to
safely free the workqueues.
CC: [email protected] # 5.4+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>1 parent a48b73e commit e89c4a9
1 file changed
+70
-52
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3716 | 3716 | | |
3717 | 3717 | | |
3718 | 3718 | | |
| 3719 | + | |
| 3720 | + | |
| 3721 | + | |
| 3722 | + | |
| 3723 | + | |
| 3724 | + | |
| 3725 | + | |
| 3726 | + | |
| 3727 | + | |
| 3728 | + | |
| 3729 | + | |
| 3730 | + | |
| 3731 | + | |
| 3732 | + | |
| 3733 | + | |
| 3734 | + | |
| 3735 | + | |
| 3736 | + | |
| 3737 | + | |
| 3738 | + | |
| 3739 | + | |
| 3740 | + | |
| 3741 | + | |
3719 | 3742 | | |
3720 | 3743 | | |
3721 | 3744 | | |
3722 | 3745 | | |
3723 | 3746 | | |
3724 | 3747 | | |
| 3748 | + | |
| 3749 | + | |
| 3750 | + | |
3725 | 3751 | | |
3726 | 3752 | | |
| 3753 | + | |
3727 | 3754 | | |
3728 | | - | |
| 3755 | + | |
| 3756 | + | |
3729 | 3757 | | |
3730 | | - | |
3731 | | - | |
3732 | | - | |
3733 | | - | |
3734 | | - | |
3735 | | - | |
3736 | | - | |
3737 | | - | |
3738 | | - | |
3739 | | - | |
3740 | | - | |
3741 | | - | |
3742 | | - | |
| 3758 | + | |
| 3759 | + | |
| 3760 | + | |
| 3761 | + | |
3743 | 3762 | | |
3744 | | - | |
3745 | | - | |
3746 | | - | |
| 3763 | + | |
3747 | 3764 | | |
3748 | | - | |
3749 | | - | |
| 3765 | + | |
| 3766 | + | |
3750 | 3767 | | |
| 3768 | + | |
| 3769 | + | |
| 3770 | + | |
| 3771 | + | |
| 3772 | + | |
| 3773 | + | |
| 3774 | + | |
| 3775 | + | |
| 3776 | + | |
| 3777 | + | |
| 3778 | + | |
| 3779 | + | |
| 3780 | + | |
3751 | 3781 | | |
3752 | | - | |
3753 | | - | |
| 3782 | + | |
| 3783 | + | |
3754 | 3784 | | |
3755 | | - | |
| 3785 | + | |
| 3786 | + | |
| 3787 | + | |
3756 | 3788 | | |
| 3789 | + | |
| 3790 | + | |
3757 | 3791 | | |
3758 | | - | |
| 3792 | + | |
3759 | 3793 | | |
3760 | | - | |
| 3794 | + | |
3761 | 3795 | | |
3762 | | - | |
| 3796 | + | |
3763 | 3797 | | |
3764 | 3798 | | |
3765 | 3799 | | |
| |||
3770 | 3804 | | |
3771 | 3805 | | |
3772 | 3806 | | |
3773 | | - | |
3774 | | - | |
3775 | | - | |
3776 | 3807 | | |
3777 | 3808 | | |
3778 | 3809 | | |
| |||
3819 | 3850 | | |
3820 | 3851 | | |
3821 | 3852 | | |
| 3853 | + | |
| 3854 | + | |
| 3855 | + | |
| 3856 | + | |
3822 | 3857 | | |
3823 | 3858 | | |
3824 | 3859 | | |
3825 | 3860 | | |
3826 | 3861 | | |
3827 | 3862 | | |
3828 | | - | |
| 3863 | + | |
3829 | 3864 | | |
3830 | 3865 | | |
3831 | 3866 | | |
| |||
3834 | 3869 | | |
3835 | 3870 | | |
3836 | 3871 | | |
3837 | | - | |
| 3872 | + | |
3838 | 3873 | | |
3839 | 3874 | | |
3840 | 3875 | | |
| |||
3843 | 3878 | | |
3844 | 3879 | | |
3845 | 3880 | | |
3846 | | - | |
| 3881 | + | |
3847 | 3882 | | |
3848 | 3883 | | |
3849 | 3884 | | |
| |||
3854 | 3889 | | |
3855 | 3890 | | |
3856 | 3891 | | |
3857 | | - | |
| 3892 | + | |
3858 | 3893 | | |
3859 | 3894 | | |
3860 | 3895 | | |
3861 | | - | |
3862 | | - | |
3863 | | - | |
3864 | | - | |
3865 | | - | |
3866 | | - | |
3867 | | - | |
3868 | 3896 | | |
3869 | 3897 | | |
3870 | 3898 | | |
| |||
3917 | 3945 | | |
3918 | 3946 | | |
3919 | 3947 | | |
3920 | | - | |
3921 | | - | |
3922 | | - | |
3923 | | - | |
3924 | | - | |
3925 | | - | |
3926 | | - | |
3927 | | - | |
3928 | | - | |
3929 | 3948 | | |
3930 | 3949 | | |
3931 | | - | |
3932 | | - | |
3933 | | - | |
| 3950 | + | |
3934 | 3951 | | |
3935 | 3952 | | |
3936 | 3953 | | |
3937 | | - | |
| 3954 | + | |
| 3955 | + | |
3938 | 3956 | | |
3939 | 3957 | | |
3940 | 3958 | | |
| |||
0 commit comments