Linux内核中的链表

发表于 2015-05-17 分类于程序设计 Waline：本文字数： 1.2k 阅读时长 ≈ 5 分钟

数据结构

Linux内核的链表（linked list）实现比较特别。通常来说，链表的结构体会包含需要数据部分，例如下面结构体中的my_item成员：

struct my_list {
    void *my_item;
    struct my_list *next;
    struct my_list *prev;
}

Linux内核的实现仅仅定义了一个简单的结构体list_head，它是个双向链表：

1
2
3

struct list_head {
    struct list_head *next, *prev;
};

使用时，直接将它作为结构体的一个成员变量：

struct my_cool_list {
    struct list_head list; /* kernel's list structure */
    int my_cool_data;
    void *my_cool_void;
};

看起来就好像是链表包含在它所连接的数据中。这样做的好处在于，你不用为每种类型的数据都定义一个结构体，只需要将该数据类型「捆绑」在该链表上即可。

使用方法

内核中链表的实现在include/linux/list.h文件中，这里有一个精简的可用版本。下面有一个使用示例（文件来源：test_list.c）：

#include <stdio.h>
#include <stdlib.h>
#include "list.h"

struct kool_list {
    int to;
    struct list_head list;
    int from;
};

int main(int argc, char **argv)
{
    struct kool_list *tmp;
    struct list_head *pos, *q;
    unsigned int i;

    struct kool_list mylist;
    INIT_LIST_HEAD(&mylist.list); // 初始化list
    
    // 为mylist添加元素
    for (i = 5; i != 0; --i) {
        // 动态分配对象，使用用户的输入初始化该对象
        tmp = (struct kool_list *)malloc(sizeof(struct kool_list));
        printf("enter to and from: ");
        scanf("%d %d", &tmp->to, &tmp->from);

        // 将新对象tmp加入到mylist中，也可以使用list_add_tail()添加到末尾
        list_add(&(tmp->list), &(mylist.list));
    }
    printf("\n");
    
    // list_for_each()是一个宏，用于循环
    // 参数1：用于循环计数，指向当前对象的list成员
    // 参数2：指向链表的指针
    printf("traversing the list using list_for_each():\n");
    list_for_each(pos, &mylist.list) {
        // 如上所述，pos->next指向下一个对象的list成员而非我们关心的to和from，
        // list_entry通过list变量的地址计算出数据结构kool_list的地址，通过该
        // 地址引用to和from成员变量。该机制本文后面有详述
        tmp = list_entry(pos, struct kool_list, list);
        printf("to = %d from = %d\n", tmp->to, tmp->from);
    }
    printf("\n");
    
    // 使用list_for_each_entry()更方便
    printf("traversing the list using list_for_each_entry()\n");
    list_for_each_entry(tmp, &mylist.list, list)
        printf("to = %d from = %d\n", tmp->to, tmp->from);
    printf("\n");
    
    // 需要删除或者移动对象时，需要使用更安全的list_for_each_safe()
    printf("deleting the list using list_for_each_safe(): \n");
    list_for_each_safe(pos, q, &mylist.list) {
        tmp = list_entry(pos, struct kool_list, list);
        printf("free item to = %d from = %d\n", tmp->to, tmp->from);
        list_del(pos);
        free(tmp);
    }

    return 0;
}

将test_list.c和list.h放到同一个目录下，编译运行即可。上面的代码只展示了基本功能，想了解更多的用法请阅读源码。

实现机制

该实现的核心在于list_entry这个宏定义：

1 2	#define list_entry(ptr, type, member) \ ((type )((char )(ptr)-(unsigned long)(&((type *)0)->member)))

上面示例中的宏调用展开如下：

1	((struct kool_list )((char )(pos) - (unsigned long)(&((struct kool_list *)0)->list)))

给出链表结构体struct list_head的地址pos，宏定义list_entry计算出了包含该链表成员的结构体struct kool_list的地址。首先必须得到成员list在结构体中的位置（内存偏移），然后根据该偏移量计算出结构体的地址。

现在的问题是，这个偏移量是如何得到的。假设有一个结构体struct foo_bar，下面的表达式计算成员boo在结构体中的偏移：

1	(unsigned long)(&((struct foo_bar *)0)->boo)

将内存地址0转换成我们想要的类型struct foo_bar，然后得到我们感兴趣的成员的地址，也就是该成员在结构体中的偏移量。我们已经知道了这个成员在结构体实例中的绝对地址（即pos），于是可以推算出结构体实例的地址。下面的测试代码（文件来源：compute_offset.c）验证了这一点：

#include <stdio.h>
#include <stdlib.h>

struct foobar {
    unsigned int foo;
    char bar;
    char boo;
};

int main(int argc, char** argv) 
{
    struct foobar tmp;

    printf("address of &tmp is= %p\n\n", &tmp);
    printf("address of tmp->foo= %p \t offset of tmp->foo= %lu\n",
            &tmp.foo, (unsigned long) &((struct foobar *)0)->foo);
    printf("address of tmp->bar= %p \t offset of tmp->bar= %lu\n", 
            &tmp.bar, (unsigned long) &((struct foobar *)0)->bar);
    printf("address of tmp->boo= %p \t offset of tmp->boo= %lu\n\n", 
            &tmp.boo, (unsigned long) &((struct foobar *)0)->boo);

    printf("computed address of &tmp using:\n");
    printf("address and offset of tmp->foo= %p\n",
           (struct foobar *) (((char *) &tmp.foo) - ((unsigned long) &((struct foobar *)0)->foo)));
    printf("address and offset of tmp->bar= %p\n",
           (struct foobar *) (((char *) &tmp.bar) - ((unsigned long) &((struct foobar *)0)->bar)));
    printf("address and offset of tmp->boo= %p\n",
           (struct foobar *) (((char *) &tmp.boo) - ((unsigned long) &((struct foobar *)0)->boo)));
           
    return 0;
}

编译运行得到如下结果：

address of &tmp is= 0x7ffcf635c9b0

address of tmp->foo= 0x7ffcf635c9b0      offset of tmp->foo= 0
address of tmp->bar= 0x7ffcf635c9b4      offset of tmp->bar= 4
address of tmp->boo= 0x7ffcf635c9b5      offset of tmp->boo= 5

computed address of &tmp using:
address and offset of tmp->foo= 0x7ffcf635c9b0
address and offset of tmp->bar= 0x7ffcf635c9b0
address and offset of tmp->boo= 0x7ffcf635c9b0

从上面的示例代码可以看出，通过每个成员变量都能够计算出结构体变量的地址，因此，链表成员变量可以放在结构体中的任意位置。

说明

本文整理自如下两篇文章的内容，原始链接都无法访问，给出的是archive.org上历史存档的链接：