阿里云virtual-kubelet-autoscaler实现ECI作为弹性补充

一、背景


提高ECS节点的资源使用率,应用优先部署到ECS节点,当ECS节点资源不足时,调度到ECI上做临时的弹性伸缩。

二、vk-autoscale组件部署


直接通过阿里云控制台上部署即可。参见:https://help.aliyun.com/document_detail/131561.html?spm=5176.12818093.nav-right.11.19b616d0Ffk0Ce

三、vk-autoscale组件功能测试


阿里云上已经有文档初步说明演示了该组件的功能:
https://help.aliyun.com/document_detail/131590.html?spm=a2c4g.11186623.2.10.dd52e08elvMl5t

图片

1)我们将结合自己的场景做如下测试,如下将ECS和vk节点标签统一:

2)测试ECS节点资源不足时调度到ECI节点上。

// 定义一个deploy来测试,逐步扩容副本数,观察调度情况。

cat deployment_nginx.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
nodeSelector:
vk-stress-test: “1”
containers:
– name: nginx
image: nginx:1.15.4
ports:
– containerPort: 80
resources:
limits:
cpu: 1
memory: 1024Mi
requests:
cpu: 1
memory: 1024Mi
此时的调度情况如下:

kubectl get pods -o wide | grep nginx-deployment

nginx-deployment-78f66667b4-c29kp 1/1 Running 0 33m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-pbrcp 1/1 Running 0 35m 172.26.20.122 cn-hangzhou.10.200.0.119

kubectl describe pods nginx-deployment-78f66667b4-c29kp


Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal Scheduled default-scheduler Successfully assigned kube-system/nginx-deployment-78f66667b4-c29kp to cn-hangzhou.10.200.0.119
Normal Pulled 33m kubelet, cn-hangzhou.10.200.0.119 Container image “nginx:1.15.4” already present on machine
Normal Created 33m kubelet, cn-hangzhou.10.200.0.119 Created container nginx
Normal Started 33m kubelet, cn-hangzhou.10.200.0.119 Started container nginx

// 此时是直接调度到ECS节点上的
// 接下来,我们逐步扩容

kubectl scale deploy nginx-deployment –replicas=3

deployment.extensions/nginx-deployment scaled

kubectl get pods -o wide | grep nginx-deployment

nginx-deployment-78f66667b4-c29kp 1/1 Running 0 34m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dsj55 1/1 Running 0 7s 172.26.20.2 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-pbrcp 1/1 Running 0 37m 172.26.20.122 cn-hangzhou.10.200.0.119

// 继续扩容,当副本数达到4个的时候,ECS节点资源不足开始调度到ECI节点了
kubectl scale deploy nginx-deployment –replicas=4
deployment.extensions/nginx-deployment scaled
[root@benchmark-10-200-0-119 ~]# kubectl get pods -o wide | grep nginx-deployment
nginx-deployment-78f66667b4-c29kp 1/1 Running 0 35m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dsj55 1/1 Running 0 29s 172.26.20.2 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-jddkm 0/1 Pending 0 4s virtual-kubelet
nginx-deployment-78f66667b4-pbrcp 1/1 Running 0 37m 172.26.20.122 cn-hangzhou.10.200.0.119

kubectl get pods -o wide | grep nginx-deployment

nginx-deployment-78f66667b4-c29kp 1/1 Running 0 35m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dsj55 1/1 Running 0 79s 172.26.20.2 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-jddkm 1/1 Running 0 54s 10.224.1.74 virtual-kubelet
nginx-deployment-78f66667b4-pbrcp 1/1 Running 0 38m 172.26.20.122 cn-hangzhou.10.200.0.119

// 此时调度过程如下,可以看到优先调度到ECS节点上,当资源不足时调度到了ECI上

kubectl describe pods nginx-deployment-78f66667b4-jddkm


Events:
Type Reason Age From Message
—- —— —- —- ——-
Warning FailedScheduling default-scheduler 0/5 nodes are available: 1 Insufficient cpu, 1 node(s) had taints that the pod didn’t tolerate, 3 node(s) didn’t match node selector.
Normal Scheduled default-scheduler Successfully assigned kube-system/nginx-deployment-78f66667b4-jddkm to virtual-kubelet
Normal AutoInstanceTypeMatch 68s eci-provider, eci The most-matched instanceType for current eci instance is 1.0-2.0Gi
Normal MultiZoneRecommendations 68s eci-provider, eci It is recommended to use the multi-zone creation function to avoid the risk of stockout. More info: https://help.aliyun.com/document_detail/157290.html
Normal SuccessfulMountVolume 50s kubelet, eci MountVolume.SetUp succeeded for volume “default-token-w85gj”
Normal Pulling 49s kubelet, eci pulling image “nginx:1.15.4”
Normal Pulled 42s kubelet, eci Successfully pulled image “nginx:1.15.4”
Normal Created 42s kubelet, eci Created container
Normal Started 42s kubelet, eci Started container
3)测试缩容时是否会优先缩容掉ECI节点的pod。

kubectl get pods -o wide | grep nginx-deployment

nginx-deployment-78f66667b4-c29kp 1/1 Running 0 38m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dsj55 1/1 Running 0 3m57s 172.26.20.2 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-jddkm 1/1 Running 0 3m32s 10.224.1.74 virtual-kubelet
nginx-deployment-78f66667b4-pbrcp 1/1 Running 0 41m 172.26.20.122 cn-hangzhou.10.200.0.119

// 因为ECI的pod是最后调度的,所以我们先delete掉一个ECS的pod,让ECS的pod更新时间为最新

kubectl delete pods nginx-deployment-78f66667b4-pbrcp

pod “nginx-deployment-78f66667b4-pbrcp” deleted
[root@benchmark-10-200-0-119 ~]# kubectl get pods -o wide | grep nginx-deployment
nginx-deployment-78f66667b4-c29kp 1/1 Running 0 40m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-d549d 0/1 ContainerCreating 0 4s cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dsj55 1/1 Running 0 6m7s 172.26.20.2 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-jddkm 1/1 Running 0 5m42s 10.224.1.74 virtual-kubelet
[root@benchmark-10-200-0-119 ~]# kubectl get pods -o wide | grep nginx-deployment
nginx-deployment-78f66667b4-c29kp 1/1 Running 0 40m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-d549d 1/1 Running 0 6s 172.26.20.3 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dsj55 1/1 Running 0 6m9s 172.26.20.2 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-jddkm 1/1 Running 0 5m44s 10.224.1.74 virtual-kubelet

// 然后缩容,可以看到缩容过程不会优先缩容ECI节点的pod。

kubectl scale deploy nginx-deployment –replicas=3

deployment.extensions/nginx-deployment scaled
[root@benchmark-10-200-0-119 ~]# kubectl get pods -o wide | grep nginx-deployment
nginx-deployment-78f66667b4-c29kp 1/1 Running 0 41m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-d549d 0/1 Terminating 0 22s cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dsj55 1/1 Running 0 6m25s 172.26.20.2 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-jddkm 1/1 Running 0 6m 10.224.1.74 virtual-kubelet

kubectl scale deploy nginx-deployment –replicas=1

deployment.extensions/nginx-deployment scaled
[root@benchmark-10-200-0-119 ~]# kubectl get pods -o wide | grep nginx-deployment
nginx-deployment-78f66667b4-c29kp 1/1 Running 0 49m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dsj55 0/1 Terminating 0 14m cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-jddkm 0/1 Terminating 0 14m 10.224.1.74 virtual-kubelet
4)测试vk-autoscale和节点自动伸缩是否冲突

集群对标签vk-stress-test开启自动伸缩功能,先把vk节点的vk-stress-test标签摘到,然后扩容pod,观察自动伸缩组生效情况。如下:

此时,pod的部署情况如下:

kubectl get pods -o wide | grep nginx-deployment

nginx-deployment-78f66667b4-7t2gs 1/1 Running 0 6m51s 172.26.20.14 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-c29kp 1/1 Running 0 5h43m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-pjwvp 1/1 Running 0 6m51s 172.26.20.13 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-rcrp6 1/1 Running 0 6m44s 172.26.28.130 cn-hangzhou.10.200.0.13
// 自动弹出了10.200.0.13节点,并将pending状态的pod调度完成

// 然后把vk节点打上vk-stress-test的标签,继续扩容

kubectl scale deploy nginx-deployment –replicas=8

deployment.extensions/nginx-deployment scaled
[root@benchmark-10-200-0-119 K8S]# kubectl get pods -o wide | grep nginx-deployment
nginx-deployment-78f66667b4-7t2gs 1/1 Running 0 6m57s 172.26.20.14 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-9qgvw 0/1 Pending 0 1s virtual-kubelet
nginx-deployment-78f66667b4-bfwgq 0/1 Pending 0 1s virtual-kubelet
nginx-deployment-78f66667b4-c29kp 1/1 Running 0 5h43m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dvkxc 1/1 Running 0 56s 10.224.1.86 virtual-kubelet
nginx-deployment-78f66667b4-kfkb7 0/1 Pending 0 1s virtual-kubelet
nginx-deployment-78f66667b4-pjwvp 1/1 Running 0 6m57s 172.26.20.13 cn-hangzhou.10.200.0.119

kubectl get pods -o wide | grep nginx-deployment

nginx-deployment-78f66667b4-7t2gs 1/1 Running 0 7m20s 172.26.20.14 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-9qgvw 0/1 ContainerCreating 0 24s 10.224.1.88 virtual-kubelet
nginx-deployment-78f66667b4-bfwgq 0/1 ContainerCreating 0 24s 10.224.1.87 virtual-kubelet
nginx-deployment-78f66667b4-c29kp 1/1 Running 0 5h44m 172.26.20.124 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-dvkxc 1/1 Running 0 79s 10.224.1.86 virtual-kubelet
nginx-deployment-78f66667b4-kfkb7 0/1 ContainerCreating 0 24s 10.224.1.89 virtual-kubelet
nginx-deployment-78f66667b4-pjwvp 1/1 Running 0 7m20s 172.26.20.13 cn-hangzhou.10.200.0.119
nginx-deployment-78f66667b4-rcrp6 1/1 Running 0 7m13s 172.26.28.130 cn-hangzhou.10.200.0.13

// 可以看到调度到vk节点了,而此时的自动伸缩组没有发生伸缩操作。这个和cluster-autoscaler的轮训周期有关系,只是碰巧出现优先被vk-autoscaler patch了而已。实际上pending的pod会被两者发现并触发,两者逻辑上是独立了。阿里云1.20版本后兼容了。
测试结论:
vk-autoscale组件实现优先调度到ECS节点,当应用对应标签下ECS资源不足时会调度到对应的vk节点的ECI上。实现ECI的弹性

声明:文中观点不代表本站立场。本文传送门:https://eyangzhen.com/235136.html

(0)
联系我们
联系我们
分享本页
返回顶部