Many thanks to Srivatsa Vaddagiri for the helpful discussion and for spotting
the bug in my previous attempt.
work->func() (and thus flush_workqueue()) must not use workqueue_mutex,
this leads to deadlock when CPU_DEAD does kthread_stop(). However without
this mutex held we can't detect CPU_DEAD in progress, which can move pending
works to another CPU while the dead one is not on cpu_online_map.
Change flush_workqueue() to use for_each_possible_cpu(). This means that
flush_cpu_workqueue() may hit CPU which is already dead. However in that
case
means that CPU_DEAD in progress, it will do kthread_stop() + take_over_work()
so we can proceed and insert a barrier. We hold cwq->lock, so we are safe.
Also, add migrate_sequence incremented by take_over_work() under cwq->lock.
If take_over_work() happened before we checked this CPU, we should see the
new value after spin_unlock().
Further possible changes:
remove CPU_DEAD handling (along with take_over_work, migrate_sequence)
from workqueue.c. CPU_DEAD just sets cwq->please_exit_after_flush flag.
CPU_UP_PREPARE->create_workqueue_thread() clears this flag, and creates
the new thread if cwq->thread == NULL.
This way the workqueue/cpu-hotplug interaction is almost zero, workqueue_mutex
just protects "workqueues" list, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE go away.