The Cross Function DB

Smatch saves a variety of information in the cross function DB. For example, it saves that ‘(struct foo)->bar’ is in units of type byte or that it holds the values 32,48. But the most interesting data is in the caller_info and return_states tables, which hold information about how functions are called and what they return.

The cross function DB is optional. Building it is simple but takes a long time (hours and hours). For the Linux kernel use the command: smatch_scripts/build_kernel_data.sh. That creates a smatch_db.sqlite

Imagine you have a call tree foo() which calls bar() which calls baz(). Then the first time you build the cross function database, it will record the values that foo() passes to bar(). The next time you rebuild the database, it can use that information to be more accurate about what bar() passes to baz(). So each time you rebuild the database, it gets more accurate. Probably after five rebuilds the database is basically populated. I download the latest linux-next and rebuild my database every day. Eventually my database gets to around 35GB before it stops growing.

Besides the on disk database, Smatch uses an in-memory database for handling inline functions. The code is the same, so inline functions are handled transparently.

Each time a function is called that it recorded in the DB. Some functions like kmalloc() are called too often to record it in the DB so that information gets mostly deleted. [Edit: Actually kmalloc() is a static inline so it is not deleted.] There are a lot of limits like this where if the information is too much, Smatch tries to filter and delete stuff. The database is a best effort thing.

For return statements consider the following two functions:

int my_function1(int param)
{
	if (param != VALID)
		return -EINVAL;
	frob(); frob(); frob();
	
	return 0;
}

int my_function2(int param)
{
	int ret = 0;
	
	if (param == VALID) {
		frob(); frob(); frob();
	} else {
		ret = -EINVAL;
	}
	
	return ret;
}

Those functions are, functionally, exactly the same and so we want to record that in the database. So when Smatch encounters a “return ret;”, it tries to split it apart in a meaningful way and record it as two returns instead of one.

Smatch does some tricky stuff to make functions easier to parse. The simplest thing is that Smatch creates fake return statements at the end of void functions. Another thing that Smatch does is create fake assignments for certain parameters and return statements. Don’t worry about this for now.

Use the smatch_data/db/smdb.py script to explore the cross function database.

# how m88e1318_led_blink_set() is called
$ smdb.py m88e1318_led_blink_set

# what m88e1318_led_blink_set() returns
$ smdb.py return_states m88e1318_led_blink_set

There is a lot of information in the DB about where function pointers are called, which functions implement the function pointers or where struct members are set. There is also a option to print call trees where preempt is disabled.

The Smatch cross function DB, is obviously useful for generating Smatch warnings, but it’s also just really useful for reading kernel code. Explore. 🙂 Have fun!

Leave a comment

Design a site like this with WordPress.com
Get started