Recently Smatch has been very slow and I don’t know why. Basically I start Smatch before I leave work. If it finishes doing a database rebuild and a test run on my allmodconfig kernel by they time I start work in the morning then I am happy. But recently it has been taking 20 hours to run so I am unhappy.
That is often how it goes. Each new feature causes a small slow down. Eventually it get too slow and I have to implement a speed up. Then add more features and then a speed up etc.
The speed ups come in different flavors. Once the solution was to cache results from the DB. Another time the slowdown was caused by recursion so Smatch was saving that list->next->next->next->next is a valid pointer. Each time I rebuilt the database it added another ->next. Sometimes I just add limits, “I want to know which data a function sets but if it sets more than 1000 struct members then give up.” Or, just ignore kernel tracing data. Or, just ignore spinlock internals.
Another things which happens, is that the number of sm_states can grow too large. (An sm_state contains the history of a variable and a smatch state). What happens normally is that there is a define like foo_debug_output() which has a giant if statement. The smatch_implied.c module parses each if statement and tears apart each sm_state to create new ones with added history. There are limits to try prevent this where Smatch will stop trying to handle implications after a while. But maybe if this causes a problem we will want to make a special case for the debug macro. There are lots of options we could consider.
Smatch parsing will always be best effort and imperfect.
The first thing to do is pass the –time option. It says that the slowest file to parse is lpfc_sli.c which takes 53 minutes.
drivers/scsi/lpfc/lpfc_sli.c:22503 (null)() time: 3139
The slowest function in that file is lpfc_sli4_hba_setup() which takes 4 minutes.
drivers/scsi/lpfc/lpfc_sli.c:8881 lpfc_sli4_hba_setup() func_time: 244
The way to debug this is to insert an #include “/path/to/smatch/check_debug.h” and put a __smatch_debug_on(); at the start of the lpfc_sli4_hba_setup() function. Hit CTRL-C part way through and whatever is on the screen is probably the problem.
I added a __smatch_cur_stree(); at the end of the function. This showed that there were thousands of PARAM_SET states.
One way to figure out where all the PARAM_SET states are being generated is to put a __smatch_debug_check(“param_set”); at the start of the function and a __smatch_debug_off(); at the end. Or even a __smatch_exit(); at the end so Smatch stops completely.
Then search each function down the call tree using “smdb return_states lpfc_fdmi_change_check” until you find which function is setting a thousand PARAM_SET states. The answer in this case was that a giant union was getting memcpy()’ed over on certain paths but not on others. I considered changing how giant unions were handled but instead I changed how memcpy()s which only happened on “certain paths” were handled.
https://github.com/error27/smatch/commit/55aae9c3de7a19e1aa1ad072889429040b70aba9
Now lpfc_sli.c takes 18 minutes to parse. It’s still too long, but at least it’s better than 53 minutes.
drivers/scsi/lpfc/lpfc_sli.c:22517 (null)() time: 1056
I was expecting that this would make the database smaller, but it doesn’t. The database has some self limiting features where if a function returns too much data then we give up. But that also kind of means that if there is space then the database will fill up until it hits the limit. Or maybe this issue was local to the lpfc driver so it didn’t affect the overall build time.
Obviously when you’re debugging, then you try to think about what you changed recently and how it could have caused the breakage. I recently re-wrote how user data is handled. But I can’t see any evidence that’s the problem.
When I look through the list of files which take a long time to parse, they’re almost all giant files which have always taken a long time. One that stands out to me as new is kernel/events/core.c which is taking 14 minutes to parse. The function which takes the longest to parse is perf_ustack_task_size() which takes 105 seconds.
kernel/events/core.c:41 perf_ustack_task_size() func_time: 105
It’s interesting and suspicious that the line number is wrong. The function is very short.
kernel/events/core.c
6840 static u64 perf_ustack_task_size(struct pt_regs *regs)
6841 {
6842 unsigned long addr = perf_user_stack_pointer(regs);
6843
6844 if (!addr || addr >= TASK_SIZE)
6845 return 0;
6846
6847 return TASK_SIZE - addr;
6848 }
At first I suspected maybe the TASK_SIZE macro expanded to something much larger, but when I did a “make kernel/events/core.i” and looked at the pre-processed .i file, the TASK_SIZE macro expands to just “(test_ti_thread_flag(((struct thread_info *)get_current()), 29) ? ((get_current()->personality & ADDR_LIMIT_3GB) ? 0xc0000000 : 0xFFFFe000) : task_size_max())” which is very simple and small.
Let’s add some annotations:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f79fd8b87f75..dcd3d0d2c7db 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6837,14 +6837,24 @@ static void perf_sample_regs_intr(struct perf_regs *regs_intr,
* precisely, but there's no way to get it safely under interrupt,
* so using TASK_SIZE as limit.
*/
+#include "/home/dcarpenter/smatch/check_debug.h"
static u64 perf_ustack_task_size(struct pt_regs *regs)
{
- unsigned long addr = perf_user_stack_pointer(regs);
+ unsigned long addr;
+ __smatch_timer_start();
+ addr = perf_user_stack_pointer(regs);
+ __smatch_timer_stop();
+
+ __smatch_timer_start();
if (!addr || addr >= TASK_SIZE)
return 0;
+ __smatch_timer_stop();
+ __smatch_timer_start();
return TASK_SIZE - addr;
+ __smatch_timer_stop();
+ __smatch_exit();
}
static u16
kernel/events/core.c:6847 perf_ustack_task_size() timer: 6 msec
kernel/events/core.c:6852 perf_ustack_task_size() timer: 25655 msec
kernel/events/core.c:6856 perf_ustack_task_size() warn: ignoring unreachable code.
kernel/events/core.c:6856 perf_ustack_task_size() timer: 81368 msec
It takes 81 seconds to parse that return statement??? Let’s change the annotations a bit:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f79fd8b87f75..3aa5f553689e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6837,6 +6837,7 @@ static void perf_sample_regs_intr(struct perf_regs *regs_intr,
* precisely, but there's no way to get it safely under interrupt,
* so using TASK_SIZE as limit.
*/
+#include "/home/dcarpenter/smatch/check_debug.h"
static u64 perf_ustack_task_size(struct pt_regs *regs)
{
unsigned long addr = perf_user_stack_pointer(regs);
@@ -6844,7 +6845,12 @@ static u64 perf_ustack_task_size(struct pt_regs *regs)
if (!addr || addr >= TASK_SIZE)
return 0;
+ __smatch_timer_start();
+ __smatch_debug_on();
return TASK_SIZE - addr;
+ __smatch_debug_off();
+ __smatch_timer_stop();
+ __smatch_exit();
}
static u16
Now my screen is just flooded with 93160 lines of this nonsense:
kernel/events/core.c:41 get_current() select value from type_value where type = '(struct fxregs_state)->mxcsr';
Ugh… Why is doing that select 93k times? There is probably a better tool to answer this question but I am a caveman and here is my method. I grep the Smatch source to find the “select value from type_value” line and then I introduce a NULL dereference.
diff --git a/smatch_type_val.c b/smatch_type_val.c
index bc84347a6fb1..858076a0287a 100644
--- a/smatch_type_val.c
+++ b/smatch_type_val.c
@@ -109,6 +109,11 @@ int get_db_type_rl(struct expression *expr, struct range_list **rl)
cached_results[res_idx].expr = expr;
cached_results[res_idx].rl = NULL;
+ if (option_debug) {
+ int *null = NULL;
+ *null = 1;
+ }
+
run_sql(get_vals, &db_vals,
"select value from type_value where type = '%s';", member);
free_string(member);
Then I rerun Smatch using valgrind. The command is “kchecker –valgrind kernel/events/core.c”. This took forever, so I created a option to just to run Smatch on one function. “kchecker –valgrind –function=perf_ustack_task_size kernel/events/core.c”. And then I needed to delete the “if (!addr || addr >= TASK_SIZE)” statement as well. Valgrind prints the following stack trace.
get_db_type_rl (smatch_type_val.c:114)
var_to_absolute_rl (smatch_math.c:1085)
handle_variable (smatch_math.c:1122)
get_rl_sval (smatch_math.c:1608)
get_rl_helper (smatch_math.c:1667)
custom_get_absolute_rl (smatch_math.c:1863)
get_user_rl (smatch_kernel_user_data.c:935)
match_assign (check_truncated_user_comparison.c:52)
pass_expr_to_client (smatch_hooks.c:135)
__pass_to_client (smatch_hooks.c:163)
parse_assignment (smatch_flow.c:588)
__split_expr (smatch_flow.c:695)
I see a get_user_rl() in there, but I still don’t think it’s related to the user_rl re-write. I also see parse_assignment() and in my code that function is slightly different than the published code but if anything it should be faster. I can’t see what changed. And when I think about this logically, the code isn’t doing anything wrong, it’s just slow.
I would add caching but that won’t work because it’s looking about thousands of different types. And also the get_db_type_rl() already does caching.
Another thing I think is that the get_user_rl() function could return early based on if we have set any states to user data. A couple weeks ago this wouldn’t have worked because of how skb->data was handled but after the user data re-write perhaps it’s possible. In the end, I will have to add some kind of short cut to say that get_current() doesn’t return user data without doing 93 thousands database queries.
But also I think the all the code in smatch_struct_assignment.c needs to be re-thought. It seemed like such a cool API when I wrote it but I have been realizing more and more that it’s just a massive waste of time and handling side effects is complicated.
So there will probably need to be several fixes to speed up the performance. But now that I know where the issue is, I can start to address it.
Leave a comment