Softlockup分析

linux-dash

A beautiful web dashboard for Linux

项目地址：https://gitcode.com/gh_mirrors/li/linux-dash

免费下载资源

weijitao

6924人浏览 · 2016-11-08 20:35:32

weijitao · 2016-11-08 20:35:32 发布

在Linux环境机器上面进行测试的时候，经常会报softlockup的问题，而导致系统panic。Softlockup是内核中进行死锁检查的一种机制，在2.6.X内核中是在softlockup.c的文件中实现，在新的3.X内核中是在watchdog.c文件中实现的。

Softlockup是用来检测内核长时间不发生调度的情况，它的工作原理是在内核中启动一个优先级为MAX_RT_PRIO – 1的FIFO进程，在此进程里面会刷新时间戳。如果此时间戳超过设定的时间阈值没有更新，则会报softlockup错误。

下面结合代码来分析softlockup的实现。系统启动会调用

lockup_detector_init函数

àwatchdog_enable_all_cpus函数

watchdog_enable_all_cpus函数会在每个cpu上都创建一个watchdog_threads内核线程。

static struct smp_hotplug_threadwatchdog_threads ={

.store = &softlockup_watchdog,

.thread_should_run = watchdog_should_run,

.thread_fn = watchdog,

.thread_comm = "watchdog/%u",

.setup = watchdog_enable,

.cleanup = watchdog_cleanup,

.park = watchdog_disable,

.unpark = watchdog_enable,

};

此内核线程的名字为watchdog/%u，对应与每个CPU就是watchdog0、watchdog1等，线程函数为watchdog。在创建线程的时候会调用watchdog_enable函数：

static void watchdog_enable(unsigned intcpu)

{

struct hrtimer*hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);

hrtimer_init(hrtimer,CLOCK_MONOTONIC,HRTIMER_MODE_REL);

hrtimer->function = watchdog_timer_fn;

hrtimer_start(hrtimer,ns_to_ktime(sample_period), HRTIMER_MODE_REL_PINNED);

watchdog_set_prio(SCHED_FIFO,MAX_RT_PRIO - 1);

__touch_watchdog();

}

首先获取当前CPU的per cpu变量watchdog_hrtimer，启动一个hrtimer，此hrtimer的处理函数为watchdog_timer_fn，并在随后启动此hrtimer。

然后调用函数设置进程的调度策略和优先级，此内核线程为实时的FIFO线程，优先级为MAX_RT_PRIO- 1，优先级最高了，这样就能够在进程被唤醒的时候能够强制其它优先级低的线程\进程了。

最后在__touch_watchdog函数中刷新时间戳PER CPU变量watchdog_touch_ts。

watchdog/%u的线程函数watchdog主要工作就是调用__touch_watchdog函数来刷新时间戳。此线程函数是在watchdog_timer_fn中唤醒的。

linux-dash

A beautiful web dashboard for Linux

项目地址：https://gitcode.com/gh_mirrors/li/linux-dash

下面分析下hrtimer的处理函数watchdog_timer_fn：

static enum hrtimer_restartwatchdog_timer_fn(struct hrtimer *hrtimer)

{

unsigned long touch_ts =__this_cpu_read(watchdog_touch_ts);

int softlockup_all_cpu_backtrace =sysctl_softlockup_all_cpu_backtrace;

wake_up_process(__this_cpu_read(softlockup_watchdog));

hrtimer_forward_now(hrtimer,ns_to_ktime(sample_period));

if (touch_ts== 0) {

if (unlikely(__this_cpu_read(softlockup_touch_sync))) {

__this_cpu_write(softlockup_touch_sync, false);

sched_clock_tick();

}

__touch_watchdog();

return HRTIMER_RESTART;

}

下面分析下hrtimer的处理函数watchdog_timer_fn：

首先读取per CPU变量watchdog_touch_ts赋值给touch_ts，表示上一次刷新的时间戳。sysctl_softlockup_all_cpu_backtrace变量由用户通过sysctl命令改写或者接口/proc/sys/kernel/softlockup_all_cpu_backtrace来设置。

然后调用wake_up_process函数唤醒此cpu上的watchdog线程，如果watchdog线程被唤醒就会去刷新时间戳。如果系统关了抢占，此watchdog线程不会被唤醒，这样时间戳就不会更新。