Docker和孤儿进程、僵尸进程

linux-dash

A beautiful web dashboard for Linux

项目地址：https://gitcode.com/gh_mirrors/li/linux-dash

免费下载资源

liukuan73

5396人浏览 · 2017-09-20 18:23:31

liukuan73 · 2017-09-20 18:23:31 发布

https://yq.aliyun.com/articles/61894前言

在unix/linux系统中，正常情况下，子进程是通过父进程fork创建的。子进程的结束和父进程的运行是一个异步过程,即父进程永远无法预测子进程到底什么时候结束。当一个进程完成它的工作终止之后，它的父进程需要调用wait()或者waitpid()系统调用取得子进程的终止状态。

孤儿进程

父进程先于子进程退出，那么子进程将成为孤儿进程。孤儿进程将被init进程(进程号为1)接管，并由init进程对它完成状态收集(wait/waitpid)工作。

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
int main()
{
    pid_t pid;
    //创建一个进程
    pid = fork();
    //创建失败
    if (pid < 0)
    {
        perror("fork error:");
        exit(1);
    }
    //子进程
    if (pid == 0)
    {
        printf("I'm child process, pid:%d  ppid:%d\n", getpid(), getppid());
        //睡眠3s，保证父进程先退出
        sleep(3);
        // 输出子进程ID和父进程ID
        printf("I'm child process, pid:%d  ppid:%d\n", getpid(), getppid());
        printf("child process is exited.\n");
    }
    //父进程
    else
    {
        printf("I'm father process, pid:%d  ppid:%d\n", getpid(), getppid());
        //父进程睡眠1s，保证子进程输出进程id
        sleep(1);
        printf("father process is  exited.\n");
    }
    return 0;
}

运行结果如图: 父进程退出后，子进程的父进程(ppid)变为1，被init进程接管

僵尸进程

子进程退出，而父进程并没有调用wait或waitpid获取子进程的状态信息，那么子进程的进程描述符仍然保存在系统中，这种进程称之为僵尸进程

#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>

int main()
{
    pid_t pid;
    pid = fork();
    if (pid < 0)
    {
        perror("fork error:");
        exit(1);
    }
    else if (pid == 0)
    {
        printf("I am child process - %d.I am exiting.\n", getpid());
        exit(0);
    }
    printf("I am father process- %d.I will sleep two seconds\n", getpid());
    //等待子进程先退出
    sleep(3);
    //输出进程信息
    system("ps -o pid,ppid,state,command");
    printf("father process - %d is exiting.\n", getpid());
    return 0;
}

运行结果如图：子进程(pid=2158)成为了僵尸进程

僵尸进程的危害

在每个进程退出的时候,内核释放该进程所有的资源,包括打开的文件,占用的内存等。但是仍然为其保留一定的信息(包括进程号、退出状态、运行时间等)。直到父进程通过wait / waitpid来取时才释放。如果父进程不调用wait / waitpid的话，那么保留的那段信息就不会释放，其进程号就会一直被占用，系统所能使用的进程号是有限的，如果大量的产生僵尸进程，可能导致系统不能产生新的进程.

Docker中的孤儿进程

在docker容器中运行的进程，一般是没有init进程的。可以进入容器使用 ps 查看，会发现 pid 为 1 的进程并不是 init，而是容器的主进程。如果容器中产生了孤儿进程，谁来接管这个进程？

看下linux内核代码关于接收孤儿进程的代码

/*
 * When we die, we re-parent all our children, and try to:
 * 1. give them to another thread in our thread group, if such a member exists
 * 2. give it to the first ancestor process which prctl'd itself as a
 *    child_subreaper for its children (like a service manager)
 * 3. give it to the init process (PID 1) in our pid namespace
 */
static struct task_struct *find_new_reaper(struct task_struct *father,
                       struct task_struct *child_reaper)
{
    struct task_struct *thread, *reaper;

    thread = find_alive_thread(father);
    if (thread)
        return thread;

    if (father->signal->has_child_subreaper) {
        /*
         * Find the first ->is_child_subreaper ancestor in our pid_ns.
         * We start from father to ensure we can not look into another
         * namespace, this is safe because all its threads are dead.
         */
        for (reaper = father;
             !same_thread_group(reaper, child_reaper);
             reaper = reaper->real_parent) {
            /* call_usermodehelper() descendants need this check */
            if (reaper == &init_task)
                break;
            if (!reaper->signal->is_child_subreaper)
                continue;
            thread = find_alive_thread(reaper);
            if (thread)
                return thread;
        }
    }

    return child_reaper;
}

找到相同线程组里其它可用线程
沿着它的进程树向祖先进程找一个最近的child_subreaper并且运行着的进程
该namespace下进程号为1的进程

关于child_subreaper可以参考PRCTL的PR_SET_CHILD_SUBREAPER参数的描述。被标记为CHILD SUBREAPER的进程，它的所有子进程以及后续进程都会被标记为拥有subrepear，该进程充当init(1)的功能收养该进程树的孤儿进程

 PR_SET_CHILD_SUBREAPER (since Linux 3.4)
          If arg2 is nonzero, set the "child subreaper" attribute of the
          calling process; if arg2 is zero, unset the attribute.

          When a process is marked as a child subreaper, all of the
          children that it creates, and their descendants, will be
          marked as having a subreaper.  In effect, a subreaper fulfills
          the role of init(1) for its descendant processes.  Upon
          termination of a process that is orphaned (i.e., its immediate
          parent has already terminated) and marked as having a
          subreaper, the nearest still living ancestor subreaper will
          receive a SIGCHLD signal and will be able to wait(2) on the
          process to discover its termination status.

Docker进程树

Docker Daemon从1.11版后从架构上发生了比较大的变化，由原来的一个模块拆分为4个独立的模块：engine、containerd、runC、containerd-shim，将容器的生命周期管理交给containerd, containerd再使用runC运行容器。
架构上的变化也改变了docker容器运行时的进程树的结构，这里运行一个简单的docker镜像，并通过ps xf -o pid,ppid,stat,args查看进程树，从进程树中也可以看出docker daemon架构的变化
```
docker run -d --name ubuntu ubuntu:14.04 sleep 1000
```

docker 1.11之后
docker 1.11之前

docker产生孤儿进程

准备两个文件parent.sh、child.sh

#parent.sh
bash ./child.sh

#child.sh
while true
do
    sleep 10
done

运行docker，此时sleep进程的为容器首进程，pid为1

docker run -d -v `pwd`/parent.sh:/root/test/parent.sh -v `pwd`/child.sh:/root/test/child.sh --name test ubuntu:14.04 sleep 10000

进入容器，并运行parent.sh

# 进入容器
docker exec -it test /bin/bash
# 进入脚本目录
cd /root/test
# 运行parent.sh脚本
bash ./parent.sh

在容器中通过ps xf -o pid,ppid,stat,args查看进程树可以看到进程结构如下, sleep作为容器启动命令，它的进程号为1，根据上一节关于linux接收孤儿进程的描述，当没有其他符合条件的进程接收时，该进程就会成为孤儿进程的接收者
接下来通过kill -9杀死运行parent.sh的进程，此时运行child.sh的进程就成为了孤儿进程，这个时候docker容器是如何处理孤儿进程的接收的呢？Docker 1.11之前和之后版本的处理是有所区别的
先来看下docker 1.11版之前容器内的进程树(如下图)，可以看到运行child.sh的进程的父进程变为了1(sleep进程)
再来看下Docker 1.11版之后版本容器内的进程树(如下图)，可以看到child.sh进程的父进程变成了0，与sleep处于同一个层级，那么是谁接收了这个孤儿进程呢？

此时需要查看主机的进程树才能确定孤儿进程到底是被谁接收了，在主机上运行ps xf -o pid,ppid,stat,args，结果如下图

可以看到child.sh进程被docker-containerd-shim的进程接收，根据上面关于linux孤儿进程接收的描述，docker-containerd-shim应该是被标记为child_subreaper的，这样它就能接收以他为父节点的进程树下所有的孤儿进程。查找docker/containerd的代码，在container-shim的启动函数start中通过osutils.SetSubreaper设置了child_subreaper
```
func start(log *os.File) error {
    // start handling signals as soon as possible so that things are properly reaped
    // or if runtime exits before we hit the handler
    signals := make(chan os.Signal, 2048)
    signal.Notify(signals)
    // set the shim as the subreaper for all orphaned processes created by the container
    if err := osutils.SetSubreaper(1); err != nil {
        return err
    }
    ...
}
```

结论

Docker1.11版本之前孤儿进程是由容器内pid为1的进程接收，而1.11版本后是由docker-containerd-shim进程接收

Docker中的僵尸进程

关于僵尸进程的概念以及产生的原因上面已经阐述过了，僵尸进程是指子进程退出，而父进程并没有调用wait或waitpid获取子进程的状态信息，那么子进程的进程描述符仍然保存在系统中。我们这里只讨论docker中的孤儿进程机制是否会导致僵尸进程的产生，这个也是docker早期版本被诟病的问题。

1.11版本前

1.11版本前，孤儿进程是被容器内pid为1的进程所接收。上面关于孤儿进程的实验中，容器中pid为1的进程为sleep进程，而sleep进程是不会对子进程退出进行wait/waitpid操作的，所以我们kill掉child.sh进程就会产生僵尸进程(如下图)
上图可以看到运行child.sh的进程和sleep进程都成为了僵尸进程，这里sleep进程成为僵尸进程是由于sleep进程是child.sh的子进程，当child.sh退出时，sleep进程成为了孤儿进程并被pid为1的sleep进程所接收，当sleep运行结束时(这里运行的是sleep 10)退出，pid为1的sleep进程不进行wait/waitpid操作，就使得sleep进程成为僵尸进程

1.11版本后

1.11版本后，孤儿进程是被docker-containerd-shim进程接收，如果docker-containerd-shim在子进程退出时调用wait/waitpid就不会产生僵尸进程，反之就会产生僵尸进程。这里也进行相同的操作，kill掉运行child.sh的进程，结果如下图

从结果上看child.sh和sleep(child.sh的子进程)进程都正常退出(进程树上看不到)，并没有产生僵尸进程。所以docker-containerd-shim会在子进程退出时调用wait/waitpid。从源码中看下docker-containerd-shim的处理

func start(log *os.File) error {
    ...
    switch s {
        case syscall.SIGCHLD:
            exits, _ := osutils.Reap(false)
            ...
    }
    ...
}

在其start函数中可以看到接收子进程退出的信号量(SIGCHLD), 调用osutils.Reap(false)进行处理,并且在osutils.Reap函数中调用了wait方法

func Reap(wait bool) (exits []Exit, err error) {
    ...

    for {
        pid, err := syscall.Wait4(-1, &ws, flag, &rus)
        if err != nil {
            if err == syscall.ECHILD {
                return exits, nil
            }
            return exits, err
        }

        ...
    }
}

结论

Docker1.11之前的版本，孤儿进程是否有可能成为僵尸进程取决于容器内pid为1的进程是否在子进程退出时调用wait/waitpid, Docker1.11版本之后孤儿进程不会成为僵尸进程

GitHub 加速计划 / li / linux-dash

10.39 K

1.2 K

下载

A beautiful web dashboard for Linux

最近提交(Master分支：2 个月前 )

186a802e added ecosystem file for PM2 4 年前

5def40a3 Add host customization support for the NodeJS version 4 年前

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

[转载]在Windows环境下安装GNU Radio

转自：在Windows环境下安装GNURadio_恐弱智_新浪博客GNU Radio是用Python开发的，大部分开源的工程能够在Linux环境下运行良好，而Windows下却运行的很勉强，而且安装配置都很复杂。GNU Radio算是个例外了，不光提供了Windows的二进制安装，还有比较详细的说明。我是Python小白，所以折腾了好久才弄好，特意记录下来，免得以后再装还折腾。GNU Radio的

GitCode 开源社区

centOS 8 使用dnf安装Docker

DNF是什么？CentOS 8使用YUM软件包管理器版本v4.0.4。现在，该版本使用DNF(已删除YUM)。DNF是软件包管理器。它会在Linux发行版上安装，执行更新并删除软件包。使用DNF安装Docker跳过具有损坏依赖性的程序包一个有效的解决方案是使您的CentOS 8系统使用以下--nobest命令安装最符合条件的版本：sudo dnf install docker...

GitCode 开源社区

定时同步数据库表(mysql+linux+crontab)

sync.sh里面的参数需要改变，ip/username/password/database/tablesync.sh#!/bin/sh# Please change the IP and password of the data source db.# Then change the table name.filename=/home/nington/db/$(date +%Y-%m