libsvm代码阅读(3)：关于Cache类的分析

浏览数：30 / 时间：2015年06月20日

下面来分析Cache类的源码，该类位于svm.cpp中。这个类的主要功能是：负责运算所涉及的内存管理，包括申请、释放等。

简单来说：这个Cache类，首先通过Cache构造函数申请一块空间，这块空间的大小是：L个head_t大小的空间。然后get_data函数保证结构head_t中至少有len个float的内存，并且将可以使用的内存块的指针放在data指针中；而swap_index函数则是用于交换head[i]和head[j]。

Cache类的定义如下：

[cpp] view plain copy

class Cache
{
public:
Cache(int l,long int size);
~Cache();
// request data [0,len)
// return some position p where [p,len) need to be filled
// (p >= len if nothing needs to be filled)
int get_data(const int index, Qfloat **data, int len);
void swap_index(int i, int j);
private:
int l;
long int size;//所指定的全部内存，据说用Mb做单位
//结构head_t用来记录所申请内存的指针，并记录长度，而且通过双向的指针，形成链表，增加寻址的速度
struct head_t
{
head_t *prev, *next; // a circular list是一个双向链表，非循环链表
Qfloat *data;
int len; // data[0,len) is cached in this entry
};
head_t *head;//变量指针，该指针记录程序中所申请的内存。
head_t lru_head;//双向链表的表头
void lru_delete(head_t *h);//从双向链表中删去某个元素的链接，一般是删去当前所指向的元素
void lru_insert(head_t *h);//在链表后面插入一个新的链接
};

主要包含：

两个int变量，分别是l和size，l是样本个数，size是指定的全部内存；
一个构造函数Cache，该函数根据样本数l申请L个head_t的空间；
一个析构函数～Cache，不用多说；
一个双向链表的结构head_t；
一个get_data函数，具体下文再说；
一个swap_index函数，交换两个head_t的值；
一个双向链表的删除函数lru_delete，一个插入函数lru_insert；
一个变量指针head，该指针指向程序中所申请的内存。

关于上面的结构体head_t，它是一个双向链表方便前后内存的访问，在文献LIBSVM：A Libray for SVM中如下说法：

构造函数的代码如下：

[cpp] view plain copy

Cache::Cache(int l_,long int size_):l(l_),size(size_)
{
//calloc函数的功能与malloc函数的功能相似，都是从堆分配内存该函数与malloc函数的一个显著不同时
//是，calloc函数得到的内存空间是经过初始化的，其内容全为0。
head = (head_t *)calloc(l,sizeof(head_t)); // initialized to 0
size /= sizeof(Qfloat);//先将原来byte数目转化为float的数目。
size -= l * sizeof(head_t) / sizeof(Qfloat);//扣除掉L个head_t的内存数目
size = max(size, 2 * (long int) l); // cache must be large enough for two columns
lru_head.next = lru_head.prev = &lru_head;
}

该函数根据实参：样本数L，申请L个head_t的空间，初始化为0；size的处理是：先将输入的size值（byte单位）转化为float的数目，然后再减去L个head_t所占的空间；其中lru_head因为尚没有head_t中申请到内存，故双向链表指向自己。

析构函数的代码如下：

[cpp] view plain copy

Cache::~Cache()
{
for(head_t *h = lru_head.next; h != &lru_head; h=h->next)
free(h->data);
free(head);
}

这个很简单，不用多说。

双向链表的删去函数代码：

[cpp] view plain copy

void Cache::lru_delete(head_t *h)
{
// delete from current location
h->prev->next = h->next;
h->next->prev = h->prev;
}

该函数用于后面swap_index函数断开双向链表的实现。只是断开链接，没有删去数据。

双向链表的插入函数代码：

[cpp] view plain copy

void Cache::lru_insert(head_t *h)
{
// insert to last position
h->next = &lru_head;
h->prev = lru_head.prev;
h->prev->next = h;
h->next->prev = h;
}

该函数用于后面swap_index函数恢复前面断开连接的两个数据的实现。

get_data函数的代码：

[cpp] view plain copy

int Cache::get_data(const int index, Qfloat **data, int len)
{
head_t *h = &head[index];
if(h->len) lru_delete(h);
int more = len - h->len;
<span style="white-space:pre"> </span>//因为shrink后h->len可能变小，那么more>0，现在要做的就是重新filled，即把内存变成len那么大（主要是通过realloc实现的）
if(more > 0)
{
// free old space，这一步一般不会运行
while(size < more) //size为所指定的全部内存
{
head_t *old = lru_head.next;
lru_delete(old);
free(old->data);
size += old->len;
old->data = 0;
old->len = 0;
}
// allocate new space
h->data = (Qfloat *)realloc(h->data,sizeof(Qfloat)*len);//把内存扩大到len
size -= more;
swap(h->len,len);
}
lru_insert(h);
*data = h->data;
return len;
}

该函数主要的就是每个head[i]具有len大小的内存空间。关于realloc函数是扩大内存用的。

swap_index函数的代码：

[cpp] view plain copy

void Cache::swap_index(int i, int j)
{
if(i==j) return;
if(head[i].len) lru_delete(&head[i]);//断开双向链表
if(head[j].len) lru_delete(&head[j]);//断开双向链表
swap(head[i].data,head[j].data);
swap(head[i].len,head[j].len);
if(head[i].len) lru_insert(&head[i]);
if(head[j].len) lru_insert(&head[j]);
if(i>j) swap(i,j);
for(head_t *h = lru_head.next; h!=&lru_head; h=h->next)
{
if(h->len > i)
{
if(h->len > j)
swap(h->data[i],h->data[j]);
else
{
// give up
lru_delete(h);
free(h->data);
size += h->len;
h->data = 0;
h->len = 0;
}
}
}
}

这个函数就是交换head_t[i]和head_t[j]的内容。首先将head_t[i]和head_t[j]从双向链表中脱离出去，然后交换它们的内容，最后重新把它们恢复到链表中。

完整代码如下：

[cpp] view plain copy

//
// Kernel Cache
//
// l is the number of total data items
// size is the cache size limit in bytes
//
//类Cache负责运算所涉及的内存的管理，包括申请、释放等
class Cache
{
public:
Cache(int l,long int size);
~Cache();
// request data [0,len)
// return some position p where [p,len) need to be filled
// (p >= len if nothing needs to be filled)
int get_data(const int index, Qfloat **data, int len);
void swap_index(int i, int j);
private:
int l;
long int size;//所指定的全部内存，据说用Mb做单位
//结构head_t用来记录所申请内存的指针，并记录长度，而且通过双向的指针，形成链表，增加寻址的速度
struct head_t
{
head_t *prev, *next; // a circular list是一个双向链表，非循环链表
Qfloat *data;
int len; // data[0,len) is cached in this entry
};
head_t *head;//变量指针，该指针记录程序中所申请的内存。
head_t lru_head;//双向链表的表头
void lru_delete(head_t *h);//从双向链表中删去某个元素的链接，一般是删去当前所指向的元素
void lru_insert(head_t *h);//在链表后面插入一个新的链接
};
Cache::Cache(int l_,long int size_):l(l_),size(size_)
{
//calloc函数的功能与malloc函数的功能相似，都是从堆分配内存该函数与malloc函数的一个显著不同时
//是，calloc函数得到的内存空间是经过初始化的，其内容全为0。
head = (head_t *)calloc(l,sizeof(head_t)); // initialized to 0
size /= sizeof(Qfloat);//先将原来byte数目转化为float的数目。
size -= l * sizeof(head_t) / sizeof(Qfloat);//扣除掉L个head_t的内存数目
size = max(size, 2 * (long int) l); // cache must be large enough for two columns
lru_head.next = lru_head.prev = &lru_head;
}
Cache::~Cache()
{
for(head_t *h = lru_head.next; h != &lru_head; h=h->next)
free(h->data);
free(head);
}
void Cache::lru_delete(head_t *h)
{
// delete from current location
h->prev->next = h->next;
h->next->prev = h->prev;
}
void Cache::lru_insert(head_t *h)
{
// insert to last position
h->next = &lru_head;
h->prev = lru_head.prev;
h->prev->next = h;
h->next->prev = h;
}
//该函数保证head_t[index]中至少有len个float的内存，并且将可以使用的内存块的指针放在data指针中，返回值为申请到的内存
int Cache::get_data(const int index, Qfloat **data, int len)
{
head_t *h = &head[index];
if(h->len) lru_delete(h);
int more = len - h->len;
if(more > 0)
{
// free old space
while(size < more) //size为所指定的全部内存
{
head_t *old = lru_head.next;
lru_delete(old);
free(old->data);
size += old->len;
old->data = 0;
old->len = 0;
}
// allocate new space
h->data = (Qfloat *)realloc(h->data,sizeof(Qfloat)*len);
size -= more;
swap(h->len,len);
}
lru_insert(h);
*data = h->data;
return len;
}
void Cache::swap_index(int i, int j)
{
if(i==j) return;
if(head[i].len) lru_delete(&head[i]);//断开双向链表
if(head[j].len) lru_delete(&head[j]);//断开双向链表
swap(head[i].data,head[j].data);
swap(head[i].len,head[j].len);
if(head[i].len) lru_insert(&head[i]);
if(head[j].len) lru_insert(&head[j]);
if(i>j) swap(i,j);
for(head_t *h = lru_head.next; h!=&lru_head; h=h->next)
{
if(h->len > i)
{
if(h->len > j)
swap(h->data[i],h->data[j]);
else
{
// give up
lru_delete(h);
free(h->data);
size += h->len;
h->data = 0;
h->len = 0;
}
}
}
}