当前位置:网站首页>创建Deployment后,无法创建Pod问题处理
创建Deployment后,无法创建Pod问题处理
2022-06-27 12:42:00 【51CTO】
问题描述
我在二进制安装的kubernetes集群部署Ingress服务过程中,使用yaml文件apply创建对应资源Deployment过程中,一直无法看到Pod被创建出来。
[[email protected] Install]
# kubectl apply -f ingress-nginx-controller.yaml
deployment.apps/default-http-backend created
service/default-http-backend created
serviceaccount/nginx-ingress-serviceaccount created
clusterrole.rbac.authorization.k8s.io/nginx-ingress-clusterrole created
role.rbac.authorization.k8s.io/nginx-ingress-role created
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-clusterrole-nisa-binding created
deployment.apps/nginx-ingress-controller created
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.


处理过程
查看Deployment的详细信息
[[email protected] Install]
# kubectl -n kube-system describe deployments.apps nginx-ingress-controller
- 1.

没有得到有价值的信息。
查看kube-controller-manager服务状态
[[email protected] Install]
# systemctl status kube-controller-manager.service
......
6月
25
18:54:54 two-master kube-controller-manager[759]: E0625
18:54:54.563462
759 leaderelection.go:325] error retrieving resource lo...nager)
- 1.
- 2.
- 3.

可以看出kube-controller-manager服务状态确实有问题。
- 继续查看kube-controller-manager服务详细的报错信息
[[email protected] Install]
# systemctl status kube-controller-manager.service > kube-controller.log
[[email protected] Install]
# vim + kube-controller.log #导出到日志文件中,方便查看
known reason (get leases.coordination.k8s.io kube-controller-manager)
6月
25
19:10:25 two-master kube-controller-manager[759]: E0625
19:10:25.198986
759 leaderelection.go:325]
error retrieving resource lock kube-system/kube-controller-manager:
the server rejected our request
for an unknown reason (get leases.coordination.k8s.io kube-controller-manager)
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.

得出有效的报错信息:
known reason (get leases.coordination.k8s.io kube-controller-manager)
6月
25
19:10:25 two-master kube-controller-manager[759]: E0625
19:10:25.198986
759 leaderelection.go:325] error retrieving resource lock kube-system/kube-controller-manager: the server rejected our request
for an unknown reason (get leases.coordination.k8s.io kube-controller-manager)
- 1.
- 2.
百度翻译下:

意思:
检索资源锁kube-system/kube-controller-manager出错:服务器因未知原因拒绝了我们的请求(get leaves.coordination.k8s.io kube控制器管理器)
查看etcd集群状态和告警信息
复制报错信息到百度搜索,有说是etcd集群问题,于是查看我的etcd是否正常。
我的kubernetes就一个master、2个node。
- 查看etcd状态是否正常
[[email protected] Install]
# etcdctl endpoint health --endpoints=https://192.168.2.70:2379 \
>
--write
-out
=table \
>
--cacert
=/etc/kubernetes/pki/etcd/ca.pem \
>
--cert
=/etc/kubernetes/pki/etcd/etcd.pem \
>
--key
=/etc/kubernetes/pki/etcd/etcd-key.pem
+
--
--
--
--
--
--
--
--
--
--
--
--
--
-
+
--
--
--
--
+
--
--
--
--
--
--
+
--
--
--
-
+
| ENDPOINT | HEALTH | TOOK | ERROR |
+
--
--
--
--
--
--
--
--
--
--
--
--
--
-
+
--
--
--
--
+
--
--
--
--
--
--
+
--
--
--
-
+
| https://192.168.2.70:2379 |
true |
4.965087ms | |
+
--
--
--
--
--
--
--
--
--
--
--
--
--
-
+
--
--
--
--
+
--
--
--
--
--
--
+
--
--
--
-
+
#正常,没有报错
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 查看etcd告警信息
[[email protected] Install]
# etcdctl --endpoints 192.168.2.70:2379 alarm list \
>
--cacert
=/etc/kubernetes/pki/etcd/ca.pem \
>
--cert
=/etc/kubernetes/pki/etcd/etcd.pem \
>
--key
=/etc/kubernetes/pki/etcd/etcd-key.pem
- 1.
- 2.
- 3.
- 4.

无告警信息,etcd正常,问题的原因还没找到。
重启kube-controller-manager服务
试试重启大法。

[[email protected] ~]
# systemctl restart kube-controller-manager.service
[[email protected] ~]
# systemctl status kube-controller-manager.service
- 1.
- 2.

服务还是异常。
查看资源锁
leases是轻量级的资源锁,用于代替老版本的configmap和endpoints,我们使用kubectl get lease kube-controller-manager -n kube-system -o yaml命令可以看到以下的yaml。
$ kubectl
get lease kube-controller-manager
-n kube-system
-o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
creationTimestamp:
"2022-06-08T07:52:17Z"
managedFields:
- apiVersion: coordination.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:acquireTime: {}
f:holderIdentity: {}
f:leaseDurationSeconds: {}
f:leaseTransitions: {}
f:renewTime: {}
manager: kube-controller-manager
operation: Update
time:
"2022-06-08T07:52:17Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion:
"977951"
uid: 758e5b3d-422f-4254-9839-3581f532b7e5
spec:
acquireTime:
"2022-06-24T02:08:11.905250Z"
holderIdentity: two-master_f1deccfa-7a21-4b6c-97b6-611eaaff083c
leaseDurationSeconds:
15
leaseTransitions:
7
renewTime:
"2022-06-24T03:01:34.576989Z"
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
该资源记录了哪个实例持有了该资源,更新任期的时间,获得锁的时间等等信息。
- LeaseDuration:持有锁的时间,表示该任期的持续时间,在这时间内其他LeaderElector客户端无法获取leader职位,即便当前的leader无法正常工作。
- RenewDeadline:更新锁的持有时间,该字段仅对leader生效,用于刷新leaseDuration延续其任期的时间,放弃之前重试刷新领导的时间。RenewDeadline必须小于LeaseDuration。
- RetryPeriod:重试时间,每个LeaderElector客户端的重试时间,用于尝试成为leader。
- Callbacks:每次成为leader或者失去leader时回调函数。
请求资源锁
[[email protected] ~]
# curl -X GET https://192.168.2.70:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager -k
{
"kind":
"Status",
"apiVersion":
"v1",
"metadata": {
},
"status":
"Failure",
"message":
"Unauthorized",
"reason":
"Unauthorized",
"code":
401
#返回401异常,
}[[email protected] ~]
#
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
补充:
后来排查发现在/etc/kubernetes/kube-controller-manager.conf文件中多添加了到kube-apiserver的参数,删除该参数重启服务后正常了。

边栏推荐
猜你喜欢
随机推荐
JSON.stringify用法
Configuration management center of microservices
Configuration of YML
Detailed configuration of log4j
隐私计算FATE-离线预测
How to close windows defender Security Center
nmcli team bridge 基本配置
Principle of printf indefinite length parameter
Snipaste, the world's strongest screenshot software
让学指针变得更简单(二)
Quanzhi A13 tossing memo
Airbnb复盘微服务
Hibernate operation Oracle database primary key auto increment
本地可视化工具连接阿里云centOS服务器的redis
夏日里的清凉
【Acwing】第57场周赛 题解
ZABBIX supports nail alarm
Object serialization
log4j. Detailed configuration of properties
再懂已是曲中人








