背景

某客户kubernetes集群新加了一个节点,新节点部署应用后,应用会间歇性unavaliable,用户访问报503,没有事件消息,主机状态也正常。

排查

初步怀疑是新节点问题,在系统日志/var/log/messagedmesg中都未发现相关错误信息,在kubelet中发现以下日志

kubernetes集群时通过rke进行安装,可以在节点上直接执行命令docker logs -f --tail=30 kubelet查看kubelet日志
E0602 03:18:27.766726    1301 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: an error on the server ("") has prevented the request from succeeding (get leases.coordination.k8s.io k8s-node-dev-6)
E0602 03:18:34.847254    1301 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSIDriver: an error on the server ("") has prevented the request from succeeding (get csidrivers.storage.k8s.io)
I0602 03:18:39.176996    1301 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
E0602 03:18:43.771023    1301 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: an error on the server ("") has prevented the request from succeeding (get leases.coordination.k8s.io k8s-node-dev-6)

Read the rest of this entry

,

背景

某项目数据库磁盘告警,磁盘使用率接近100%

$ df -Th
Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/vda1      ext4       99G   60G   35G  64% /
devtmpfs       devtmpfs   32G     0   32G   0% /dev
tmpfs          tmpfs      32G   69M   32G   1% /dev/shm
tmpfs          tmpfs      32G  7.8M   32G   1% /run
tmpfs          tmpfs      32G     0   32G   0% /sys/fs/cgroup
/dev/vdc1      ext4       99G   94G   52M 100% /u01/oracle
/dev/vdb4      ext3       99G   30G   64G  32% /soa
tmpfs          tmpfs     6.3G     0  6.3G   0% /run/user/1002
tmpfs          tmpfs     6.3G   52K  6.3G   1% /run/user/1003
tmpfs          tmpfs     6.3G     0  6.3G   0% /run/user/1005

可以看到/dev/vdc1使用率已经100%

Read the rest of this entry