Writing a check for zero IRQ error codes

In the earliest git release of the Linux kernel the platform_get_irq() function used to return zero on error. It’s hard for me to know why this is. I do think that there is a sense where unsigned int is “cleaner” for this purpose. From a practical perspective negative error codes are standard. Also are we more likely to need BIT(31) or IRQ number zero? In retrospect, probably we should have made it return a signed int.

Eventually in 2006, we changed platform_get_irq() to return -ENXIO on error in commit 305b3228f9ff ([PATCH] driver core: platform_get_irq*(): return -ENXIO on error). The reason for this change was there were new architectures created where zero was valid IRQ number. I have asked people about this and they say those architectures have been removed and zero is no longer a valid IRQ number and will never be again.

Let’s just take a moment to look at that commit. This is an artifact from a very different time in kernel development when we didn’t know what we were doing. The commit JUST BROKE EVERY SINGLE DRIVER!!! In those days we used to break things and then fix them before the kernel was released to the public. These days we might still occasionally break things, but only by mistake and not deliberately.

Also what a headache for backporting! The code would compile without complaining either way but in old kernels the correct check was “if (!irq) {” and in the new kernels the check was “if (irq < 0) {“. Developers were confused so it became quite common to write “if (irq <= 0) {” which is not correct but it works for everyone except for the people who have that weird 2006 architecture where IRQ zero is valid. These days no one backports to kernels from 2006 and everyone should check “if (irq < 0) {“.

Unfortunately, although platform_get_irq() and platform_get_irq_byname() were updated a number of other IRQ functions were not. For example, msi_get_virq() still returns zero. So we are left with a confusing legacy mix of returns.

This causes a number of bugs. 1) People don’t know whether to check for zero or negative. 2) Developers forget to set the error code. 3) Signedness bugs. Let’s discuss these in reverse order.

A lot of kernel code declares irq numbers unsigned int. For example, request_irq() does this. So for a long time, until we turned on the -Wtype-limits warning, checking “if (u32_irq < 0) {” was a very common bug. Earlier I mentioned that a common format is “if (irq <= 0) {” and the -Wtype-limits warning will not catch a signedness bug in this format. The Smatch check for unsigned error codes will detect this bug.

In the kernel the normal way to write error handling code is:

ret = frob();
if (ret)
        goto free_something;

But dealing with zero returns is more complicated.

irq = returns_zero_failure();
if (!irq) {
        ret = -EINVAL;
        goto free_something;
}

irq = returns_zero_or_negative_on_failure();
if (irq <= 0) {
        ret = irq ?: -EINVAL;
        goto free_something;
}

The “ret = ” step is just another thing to forget to do. The mixed case is even more complicated and easy to get wrong. These are common bugs. Smatch will detect certain cases of this bug but the situation could be improved.

Anyway, the problem I want to address is where people check for negatives instead of zero. Let’s register a table, and a return_implies_param_key_exact() hook. The only real information that we need from the table is the function names. It feels wrong to add all the unnecessary information but I did it in case I ever create a generic way to register a full table as a one liner.


static struct ref_func_info zero_table[] = {
        { "irq_of_parse_and_map", ZERO_ERROR, -1, "$", &int_zero, &int_zero, NULL },
        { "msi_get_virq", ZERO_ERROR, -1, "$", &int_zero, &int_zero, NULL },
};

static void returns_zero_error(struct expression *expr, const char *name, struct symbol *sym, void *data)
{
        set_state(my_id, name, sym, &fail);
}

Let’s make this check work across functions as well. Add ZERO_ERROR to the list of db types in smatch.h. We can re-use the returns_zero_error() function to handle selects. The only thing we need to add is a insert function:

static void return_info_callback(int return_id, char *return_ranges, struct expression *returned_expr, int param, const char *printed_name, struct sm_state *sm)
{
        if (param != -1 ||
            strcmp(printed_name, "$") != 0)
                return;
        if (!returned_expr || 
            implied_not_equal(returned_expr, 0))
                return;
        if (!slist_has_state(sm->possible, &fail))
                return;

        sql_insert_return_states(return_id,
                return_ranges, ZERO_ERROR,
                param, printed_name, "");
}

At this point, it occurred to me to warn about places which return a mix of zero and error codes for error. And I thought, I could just add an if statement to the function above:

if (holds_kernel_error_codes(returned_expr))
        sm_warning("mixed zero and negative");

Then I thought, maybe the insert hooks are only run when we need to insert something because we’re using –info or because the function is inlined? Testing shows that it does seem to work, but let’s do this properly and make a separate hook.

static void match_return(struct expression *expr)
{
        char *name;

        if (!expr || implied_not_equal(expr, 0))
                return;
        if (!expr_has_possible_state(my_id,
                        expr, &fail))
                return;
        if (!holds_kernel_error_codes(expr))
                return;

        name = expr_to_str(expr);
        sm_warning("both zero and negative are errors '%s'", name);
        free_string(name);
}

The only thing left is to make a warning hook. Okay checks are “if (irq == 0) {“, “if (irq != 0) {” and “if (irq <= 0) {“. The comparisons that are wrong are “if (if < 0) {” and “if (irq >= 0) {” but the “>= 0” comparison can be either signed or unsigned. The “< 0” could be unsigned too, but that already generates a warning so we don’t need to check for it.

static void match_condition(struct expression *expr)
{
        if (expr->type != EXPR_COMPARE ||
            !expr_is_zero(expr->right))
                return;
        if (expr->op != '<' &&
            expr->op != SPECIAL_GTE &&
            expr->op != SPECIAL_UNSIGNED_GTE)
                return;

        if (!expr_has_possible_state(my_id,
                        expr->left, &fail))
                return;

        sm_warning("zero is also failure");
}

Here is the check_kernel_irq_of_parse_and_map() function that ties it all together:

void check_kernel_irq_of_parse_and_map(int id)
{
        int i;

        if (option_project != PROJ_KERNEL)
                return;

        my_id = id;

        for (i = 0; i < ARRAY_SIZE(zero_table); 
             i++) {
                struct ref_func_info *info = &zero_table[i];
                return_implies_param_key_exact(
                        info->name,
                        *info->implies_start,
                        *info->implies_end,
                        returns_zero_error,
                        info->param,
                        info->key, info);
        }

        add_modification_hook(my_id,
                              &set_undefined);
        add_hook(&match_condition, CONDITION_HOOK);
        add_hook(&match_return, RETURN_HOOK);

        add_return_info_callback(my_id, 
                        return_info_callback);
        select_return_param_key(ZERO_ERROR, 
                        &returns_zero_error);
}

Thanks for reading to the end. I will post again after I get some results from this test.

One response to “Writing a check for zero IRQ error codes”

  1. I had thought that I would write a follow up article about this. But it just basically worked okay. It wasn’t that interesting. It found 2 bugs and 1 false positive.

    Like

Leave a comment

Design a site like this with WordPress.com
Get started