目录

一、环境描述

二、pod失败状态

三、整体解决方案

四、补充一下Pod状态解释


一、环境描述

  1. 系统环境:CentOS Linux release 7.9.2009 (Core)
  2. 系统内核:Linux k8s-master01 5.4.153-1.el7.elrepo.x86_64 #1 SMP Tue Oct 12 08:16:11 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
  3. k8s版本:Kubernetes v1.20.14
  4. docker:Docker version 20.10.12
  5. 集群有外网, docker已经对接好网络镜像仓库。

二、pod失败状态

     1.拉取k8s网上的镜像失败后的状态

[root@k8s-master01 ~]# kubectl get po -n monitoring 
NAME                                   READY   STATUS              RESTARTS    AGE
kube-state-metrics-5d497ddb45-lvtxv     2/3   ImagePullBackOff      2          16h

     2.拉取本地的镜像失败后的状态

[root@k8s-master01 ~]# kubectl get po -n monitoring 
NAME                                   READY   STATUS              RESTARTS   AGE
prometheus-adapter-5c976b9ddb-chn6v    0/1     ErrImageNeverPull   0          171m
prometheus-adapter-5c976b9ddb-gbn8l    0/1     ErrImageNeverPull   0          171m
prometheus-adapter-7858d4ddfd-55lnq    0/1     ErrImageNeverPull   0          176m

    3.整体状态

[root@k8s-master01 ~]# kubectl get po -n monitoring 
NAME                                   READY   STATUS              RESTARTS   AGE
alertmanager-main-0                    2/2     Running             2          20h
alertmanager-main-1                    2/2     Running             16         20h
alertmanager-main-2                    2/2     Running             3          20h
blackbox-exporter-69894767d5-vbwz8     3/3     Running             3          20h
grafana-564b8d866c-gmgvr               1/1     Running             1          20h
kube-state-metrics-5d497ddb45-lvtxv    2/3     ImagePullBackOff    2          16h
node-exporter-4qsvn                    2/2     Running             2          20h
node-exporter-9d8bw                    2/2     Running             16         20h
node-exporter-ffs7q                    2/2     Running             2          20h
node-exporter-qjdv9                    2/2     Running             4          20h
node-exporter-zkkrr                    2/2     Running             2          20h
prometheus-adapter-5c976b9ddb-chn6v    0/1     ErrImageNeverPull   0          171m
prometheus-adapter-5c976b9ddb-gbn8l    0/1     ErrImageNeverPull   0          171m
prometheus-adapter-7858d4ddfd-55lnq    0/1     ErrImageNeverPull   0          176m
prometheus-k8s-0                       2/2     Running             2          20h
prometheus-k8s-1                       2/2     Running             4          20h
prometheus-operator-545bcb5949-r6h6z   2/2     Running             3          20h

三、整体解决方案

第一步:使用如下命令查询pod详细报错内容,以下面报错为例拉取网络镜像失败,提示拉取失败 “  Failed to pull image "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)”。

[root@k8s-master01 manifests]#  kubectl get po -n monitoring 
....
prometheus-adapter-6cf5d8bfcf-4jbhq    0/1     ImagePullBackOff   0          3m34s
prometheus-adapter-6cf5d8bfcf-bf4wt    0/1     ImagePullBackOff   0          3m34s
....

[root@k8s-master01 manifests]# kubectl  describe po prometheus-adapter-6cf5d8bfcf-4jbhq -n monitoring 

..........上面内容省略.................

Events:
  Type     Reason          Age                From               Message
  ----     ------          ----               ----               -------
  Normal   Scheduled       70s                default-scheduler  Successfully assigned monitoring/prometheus-adapter-6cf5d8bfcf-4jbhq to k8s-master02
  Normal   SandboxChanged  48s                kubelet            Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling         30s (x2 over 64s)  kubelet            Pulling image "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1"
  Warning  Failed          14s (x2 over 49s)  kubelet            Failed to pull image "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Failed          14s (x2 over 49s)  kubelet            Error: ErrImagePull
  Normal   BackOff         2s (x4 over 45s)   kubelet            Back-off pulling image "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1"
  Warning  Failed          2s (x4 over 45s)   kubelet            Error: ImagePullBackOff

第二步:由上述报错信息知道了pod需要获取 prometheus-adapter:v0.9.1 的镜像文件。那么我们就需要通过docker 获取 prometheus-adapter 有那些镜像。

[root@k8s-master01 manifests]# docker search prometheus-adapter
NAME                                                   DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
willdockerhub/prometheus-adapter                       sync from k8s.gcr.io/prometheus-adapter/prom…   0                    
selina5288/prometheus-adapter                          sync k8s.gcr.io/prometheus-adapter/prometheu…   0                    
v5cn/prometheus-adapter                                sync k8s.gcr.io/prometheus-adapter/prometheu…   0                    
lbbi/prometheus-adapter                                k8s.gcr.io                                      0                    
boy530/prometheus-adapter                                                                              0                    
forging2012/prometheus-adapter                                                                         0                    
antidebug/prometheus-adapter                                                                           0             

###下面的省略###

第三步:拉取合适的镜像到本地,在此我拉取的是 “ lbbi/prometheus-adapter ”。

[root@k8s-master01 manifests]# docker pull lbbi/prometheus-adapter:v0.9.1
v0.9.1: Pulling from lbbi/prometheus-adapter
Digest: sha256:d025d1a109234c28b4a97f5d35d759943124be8885a5bce22a91363025304e9d
Status: Image is up to date for lbbi/prometheus-adapter:v0.9.1
docker.io/lbbi/prometheus-adapter:v0.9.1

 注意!

我这里需要在后面加上版本,不然会出现找不到 lbbi/prometheus-adapter 。

根据实际情况来,有的镜像需要在后面加版本有的不需要加版本。 

[root@k8s-master01 manifests]# docker pull lbbi/prometheus-adapter
Using default tag: latest
Error response from daemon: manifest for lbbi/prometheus-adapter:latest not found: manifest unknown: manifest unknown

第四步:给下载下来的镜像打标签。(可以选择不打标签,但是第五步的yaml配置文件修改时要写对镜像标签,否则就会出现找不到本地镜像的报错,pod 状态为 “ ErrImageNeverPull ”)

[root@k8s-master01 manifests]# docker images    ## 此处为已经打好标签的镜像。
REPOSITORY                                                        TAG       IMAGE ID       CREATED         SIZE
bitnami/kube-state-metrics                                        latest    3b5c65a36550   13 hours ago    122MB
grafana/grafana                                                   8.4.3     8d0b77430ee9   11 days ago     278MB
quay.io/prometheus-operator/prometheus-config-reloader            v0.54.1   18cec637a88f   2 weeks ago     12.2MB
lbbi/kube-state-metrics                                           v2.4.1    74a618d7bae7   2 weeks ago     38.7MB
k8s.gcr.io/kube-state-metrics                                     v2.4.1    74a618d7bae7   2 weeks ago     38.7MB
quay.io/prometheus/node-exporter                                  v1.3.1    1dbe0e931976   3 months ago    20.9MB
lbbi/prometheus-adapter                                           v0.9.1    179df2737843   4 months ago    68.2MB
k8s.gcr.io/prometheus-adapter                                     v0.9.1    179df2737843   4 months ago    68.2MB
quay.io/prometheus/alertmanager                                   v0.23.0   ba2b418f427c   6 months ago    57.5MB
quay.io/brancz/kube-rbac-proxy                                    v0.11.0   29589495df8d   7 months ago    46.6MB

                                               ## 打标签命令
[root@k8s-master01 manifests]# docker tag docker.io/lbbi/kube-state-metrics:v2.4.1 k8s.gcr.io/kube-state-metrics:v2.4.1

[root@k8s-master01 manifests]# docker tag docker.io/lbbi/prometheus-adapter:v0.9.1 k8s.gcr.io/prometheus-adapter:v0.9.1

 注意!

k8s集群内的所以设备都需要拉取一样的镜像,并且打上一样的标签,因为k8s在没有做亲和力的情况下,部署pod是在集群内随机部署的。若没有全部拉取镜像,也会出现找不到本地镜像的问题,pod状态是 “ErrImageNeverPull”。

第五步:为了方便k8s找到本地的镜像文件,需要修改pod对应的yaml配置文件。

***  找到需要镜像的yaml文件           ***
[root@k8s-master01 manifests]# vim prometheusAdapter-deployment.yaml

***  找到拉取镜像失败的部分  “修改前”  ***

        image: k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1
        name: prometheus-adapter

***              修改后               ***

      #  image: k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1
        image: k8s.gcr.io/prometheus-adapter:v0.9.1   #  容器镜像标签,写错拉取本地镜像失败。
        imagePullPolicy: Never                        #  imagePullPolicy: Always 总是网络拉取镜像, 是k8s默认的拉取方式。
                                                      #  imagePullPolicy: Never 从不远程拉取镜像,只读取本地镜像。
                                                      #  imagePullPolicy: IfNotPresent 优先拉取本地镜像。
        name: prometheus-adapter 
        ports:
        - containerPort: 6443

这是拉取本地镜像失败的状态,显示找不到本地镜像。

[root@k8s-master01 ~]# kubectl describe  po prometheus-adapter-7858d4ddfd-55lnq  -n monitoring

.......
............
..................
Events:
  Type     Reason                  Age                    From     Message
  ----     ------                  ----                   ----     -------
  Warning  Failed                  106m (x325 over 176m)  kubelet  Error: ErrImageNeverPull
  Warning  ErrImageNeverPull       101m (x348 over 176m)  kubelet  Container image "lbbi/prometheus-adapter" is not present with pull policy of Never
  Warning  FailedCreatePodSandBox  95m                    kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "743108347acf9031625da151edef7c323c894657048f1f6c14655baa64a0b9ce" network for pod "prometheus-adapter-7858d4ddfd-55lnq": networkPlugin cni failed to set up pod "prometheus-adapter-7858d4ddfd-55lnq_monitoring" network: failed to find plugin "calico-ipam" in path [/opt/cni/bin], failed to clean up sandbox container "743108347acf9031625da151edef7c323c894657048f1f6c14655baa64a0b9ce" network for pod "prometheus-adapter-7858d4ddfd-55lnq": networkPlugin cni failed to teardown pod "prometheus-adapter-7858d4ddfd-55lnq_monitoring" network: failed to find plugin "calico" in path [/opt/cni/bin]]
  Normal   SandboxChanged          95m (x2 over 95m)      kubelet  Pod sandbox changed, it will be killed and re-created.
  Warning  ErrImageNeverPull       75m (x94 over 95m)     kubelet  Container image "lbbi/prometheus-adapter" is not present with pull policy of Never
  Warning  Failed                  65m (x139 over 95m)    kubelet  Error: ErrImageNeverPull
  Normal   SandboxChanged          26m (x4 over 28m)      kubelet  Pod sandbox changed, it will be killed and re-created.
  Warning  ErrImageNeverPull       25m (x10 over 27m)     kubelet  Container image "lbbi/prometheus-adapter" is not present with pull policy of Never
  Warning  Failed                  3m2s (x114 over 27m)   kubelet  Error: ErrImageNeverPull

 第五步:重新加载pod 的yaml 文件,并查询pod状态。

[root@k8s-master01 manifests]# kubectl replace -f prometheusAdapter-deployment.yaml
deployment.apps/prometheus-adapter replaced
[root@k8s-master01 manifests]# kubectl get po -n monitoring 
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2     Running   2          22h
alertmanager-main-1                    2/2     Running   16         22h
alertmanager-main-2                    2/2     Running   3          22h
blackbox-exporter-69894767d5-vbwz8     3/3     Running   3          22h
grafana-564b8d866c-gmgvr               1/1     Running   1          22h
kube-state-metrics-6db4889fd4-hcmfn    3/3     Running   0          97m
node-exporter-4qsvn                    2/2     Running   2          22h
node-exporter-9d8bw                    2/2     Running   16         22h
node-exporter-ffs7q                    2/2     Running   2          22h
node-exporter-qjdv9                    2/2     Running   4          22h
node-exporter-zkkrr                    2/2     Running   2          22h
prometheus-adapter-b8d4f7465-t7cxq     1/1     Running   0          108s
prometheus-adapter-b8d4f7465-tlf42     1/1     Running   0          101m
prometheus-k8s-0                       2/2     Running   2          22h
prometheus-k8s-1                       2/2     Running   4          22h
prometheus-operator-545bcb5949-r6h6z   2/2     Running   3          22h

四、补充一下Pod状态解释

CrashLoopBackOff                                容器退出,kubelet正在将它重启
InvalidImageName                                无法解析镜像名称
ImageInspectError                                无法校验镜像
ErrImageNeverPul                                策略禁止拉取镜像
ImagePullBackOff                                 正在重试拉取
RegistryUnavailable                              连接不到镜像中心
ErrImagePull                                         通用的拉取镜像出错
CreateContainerConfigError                 不能创建kubelet使用的容器配置
CreateContainerError                           创建容器失败
m.internalLifecycle.PreStartContainer  执行hook报错
RunContainerError                                启动容器失败
PostStartHookError                               执行hook报错
ContainersNotInitialized                        容器没有初始化完毕
ContainersNotReady                            容器没有准备完毕
ContainerCreating                                容器创建中
PodInitializing                                       pod 初始化中
DockerDaemonNotReady                    docker还没有完全启动
NetworkPluginNotReady                      网络插件还没有完全启动
Evicted                                                 即驱赶的意思,意思是当节点出现异常时,kubernetes将有相应的机制驱赶该节点上的Pod。 多见于资源不足时导致的驱赶。

 

 

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐