Sleeping in Atomic Warnings

TL/DR; Smatch has a warning “warn: sleeping in atomic context”. People sometimes want to know why they don’t see these warnings when they run Smatch. It requires the cross function database. Generating the cross function database is really time consuming and probably not worth it. Instead try to reproduce the warning at runtime by enabling the CONFIG_DEBUG_ATOMIC_SLEEP=y option.

Boring explanation follows:

Kernel locks can be divided into two classes, locks which can sleep such as mutexes and locks which don’t sleep such as spinlocks. Sleeping in this context means calling schedule() to run a different task. So if you try take a mutex and a different task is already holding the mutex, your task will go to sleep and let other stuff run. Periodically, your task will wake up and see if the lock is available. On the other hand, if you take spinlock, your task will just keep testing over and over to see if the lock is available yet and it won’t let anything else run.

One kind of bug is called sleeping in atomic bug where you take a spinlock and then sleep. Spinlocks are supposed to be fast and they’re not a time for going to sleep on the job. Sleeping will slow everything down. The name of this bug is “Sleeping in Atomic”.

There is a kind of deadlock bug associated with this bug. To be honest, this was much more of an issue in olden times before everyone had 8 cores in their phone. Imagine you have only 1 CPU and you’re holding a spinlock, then you go to sleep, then another process wakes up and it also wants the spinlock. The second spinlock will spin forever waiting for the first sleeping process to give up the lock. You will have to hold down the power button on your computer for 30 seconds and all your work will be lost.

These days, with multiple cores, I think maybe it’s less likely the two processes will end up on the same core. Or maybe the first sleeping process can be woken up on a different core? I’m not an expert on schedulers so I don’t know. But I feel like we used to hit these deadlocks more often back in the day.

Regardless, sleeping in atomic is a bug. Places which sleep call the might_sleep() function that checks if we are allowed to sleep and prints a stack trace if we are not. This checking is a bit slow so it’s not enabled by default but you can turn it on with CONFIG_DEBUG_ATOMIC_SLEEP=y

There is also a Smatch check for this bug. There are actually a few categories of code which are not allowed to sleep. Code that’s in hard IRQ context, places where IRQs are disabled, and when the preempt count is non-zero. (To make it simpler than it is, pretend that the preempt count is just the number of spinlocks you are holding). Smatch only tracks the preempt count. It turns out that tracking when we’re in hard IRQ is more difficult than I had assumed. Checking for when IRQs are disabled should be the same as tracking preempt count but I have not done this work yet (as I write this in May 2024 there is some pressure to add this code so I probably will).

Tracking the preempt count works like this.

The first part is check_preempt_info.c. Every return is either no change, preempt enabled, preempt disabled, or too complicated. Ignoring the “too complicated” returns is really magic. If we take a spinlock and then we drop it or we take two spinlocks, that’s all too complicated and Smatch ignores it. This cuts down on the false positives and particularly the complicated false positives. Some of these complicated functions get added back through manual annotations.

Then there is a separate module, check_preempt.c, that tracks whether a function is called with a non-zero preempt count. (To make it simple, we say that the preempt count is 1 max when it is called). Then it uses the information from the return states to increment or decrement the preempt count. The preempt count can go negative and that’s fine. It records in the caller_info table when we call a function and we have a preempt count that is greater than zero.

The next module, check_sleep_info.c, tracks when functions sleep. One sleeping function is mutex_lock(), but we don’t want the error to be printed inside mutex_lock(), we want the warning to show up in the function that calls mutex_lock(). So we track the functions which *always* sleep on every path. That way we can move the error message into a more useful location, closer to the lock.

The final module, check_scheduling_in_atomic.c, hooks into the sleeping code and whenever we call a sleeping function then it looks for if we have a positive preempt count and prints a warning.

Then to review the warnings, use smdb.py to see where the function is called with preempt disabled.

drivers/usb/dwc3/gadget.c:2666 dwc3_gadget_soft_disconnect() warn: sleeping in atomic context

$ smdb.py preempt dwc3_gadget_soft_disconnect
dwc3_suspend_common() <- disables preempt
-> dwc3_gadget_suspend()
   -> dwc3_gadget_soft_disconnect()

The code looks like this:

drivers/usb/dwc3/core.c dwc3_suspend_common()

2296 spin_lock_irqsave(&dwc->lock, flags);
2297 dwc3_gadget_suspend(dwc);
2298 spin_unlock_irqrestore(&dwc->lock, flags);
drivers/usb/dwc3/gadget.c dwc3_gadget_suspend()

4708 ret = dwc3_gadget_soft_disconnect(dwc);
drivers/usb/dwc3/gadget.c dwc3_gadget_soft_disconnect()

2663 if (dwc->ep0state != EP0_SETUP_PHASE) {
2664         reinit_completion(&dwc->ep0_in_setup);
2665
2666         ret = wait_for_completion_timeout(...

The dwc3_suspend_common() function takes a spinlock and in dwc3_gadget_soft_disconnect() we call wait_for_completion_timeout() which sleeps.

The if statement on line 2663 in dwc3_gadget_soft_disconnect() means that it will not sleep during the SETUP_PHASE. Perhaps the code in dwc3_suspend_common() is only called during setup phase? In this example, there are three functions in the call tree. Smatch loses a bit of context every time we cross a function boundary. So these sleeping in atomic warnings tend to have false positives.

You have to look at several functions so they are complicated to review manually as well. The patch which introduces the bug could have been anywhere in the in the call tree so it’s easy to blame the wrong person for introducing the bug. These warnings are a giant pain in the behind for me.

If the spinlock and the sleep are in different functions then you need to build the cross function database. It’s not hard to do that, but it takes a long time (around 5 hours depending on the system?). The command is smatch_scripts/build_kernel_data.sh.

The way the Smatch cross function database works is if we’re holding a spinlock then it adds that to the caller info table. But then when we rebuild the database a second time then we know about more functions which are called under spinlock. So we have to rebuild it over and over and each time it adds another branch to the call tree. We have to keep rebuilding it until the spinlock information and the sleep information meet. In the above example, we’d have to rebuild the database three times.

I rebuild my cross function database every night based on linux-next. Linux-next is a fast moving target so sometimes the data is slightly inconsistent, where Smatch thinks we’re holding a spinlock five functions away, but we’re not. It is annoying, but I’m used to dealing with that. In that situation there would be no “<- disables preempt” marker in the smdb.py output.

People sometimes want to verify that their fix silences the Smatch warning. Hopefully, this blog provides enough information to know the answer without actually going through the drudgery of actually reproducing the warning.

Leave a comment

Design a site like this with WordPress.com
Get started