概念 #
滚动更新,通常出现在软件或者是系统中。滚动更新与传统更新的不同之处在于: 滚动更新不但提供了更新服务,而且通常还提供了滚动进度查询,滚动历史记录, 以及最重要的回滚等能力。通俗地说,就是具有系统或是软件的主动降级的能力。
Deployment 滚动更新 #
Deployment 更新方式有 2 种:
- RollingUpdate
- Recreate
其中,滚动更新是最常见的,阅读代码 pkg/controller/deployment/deployment_controller.go:648
,
可以看到 2 种方式分别对应的业务逻辑:
func (dc *DeploymentController) syncDeployment(key string) error {
...
switch d.Spec.Strategy.Type {
case apps.RecreateDeploymentStrategyType:
return dc.rolloutRecreate(d, rsList, podMap)
case apps.RollingUpdateDeploymentStrategyType:
return dc.rolloutRolling(d, rsList)
}
...
}
根据 d.Spec.Strategy.Type
,若更新策略为 RollingUpdate
,
则执行 dc.rolloutRecreate()
方法,具体逻辑如下:
func (dc *DeploymentController) rolloutRolling(d *apps.Deployment, rsList []*apps.ReplicaSet) error {
// 1、获取所有的 rs,若没有 newRS 则创建
newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(d, rsList, true)
if err != nil {
return err
}
allRSs := append(oldRSs, newRS)
// 2、newRS 执行 scale up 操作
scaledUp, err := dc.reconcileNewReplicaSet(allRSs, newRS, d)
if err != nil {
return err
}
if scaledUp {
// Update DeploymentStatus
return dc.syncRolloutStatus(allRSs, newRS, d)
}
// 3、oldRS 执行 scale down 操作
scaledDown, err := dc.reconcileOldReplicaSets(allRSs, controller.FilterActiveReplicaSets(oldRSs), newRS, d)
if err != nil {
return err
}
if scaledDown {
// Update DeploymentStatus
return dc.syncRolloutStatus(allRSs, newRS, d)
}
// 4、清理过期的 rs
if deploymentutil.DeploymentComplete(d, &d.Status) {
if err := dc.cleanupDeployment(oldRSs, d); err != nil {
return err
}
}
// 5、同步 deployment 状态
return dc.syncRolloutStatus(allRSs, newRS, d)
}
滚动更新概述 #
上面代码中 5 个重要的步骤总结如下:
- 调用
getAllReplicaSetsAndSyncRevision()
获取所有的 rs,若没有 newRS 则创建; - 调用
reconcileNewReplicaSet()
判断是否需要对 newRS 进行 scaleUp 操作; 如果需要 scaleUp,更新 Deployment 的 status, 添加相关的 condition,该 condition 的 type 是 Progressing, 表明该 deployment 正在更新中,然后直接返回; - 调用
reconcileOldReplicaSets()
判断是否需要为 oldRS 进行 scaleDown 操作;如果需要 scaleDown, 把 oldRS 关联的 pod 删掉 maxScaledDown 个,然后更新 Deployment 的 status,添加相关的 condition,直接返回。 这样一来就保证了在滚动更新过程中,新老版本的 Pod 都存在; - 如果两者都不是则滚动升级很可能已经完成,此时需要检查
deployment.Status
是否已经达到期望状态, 并且根据deployment.Spec.RevisionHistoryLimit
的值清理 oldRSs; - 最后,同步 deployment 的状态,使其与期望一致;
从上面的步骤可以看出,滚动更新的过程主要分成一下三个阶段:
graph LR start(start) --> condtion1{newRS need scale up ?} condtion1 -- No --> condtion2{oldRS need scale down ?} condtion2 -- NO --> x3(3. sync deploment status) condtion1 -- YES --> x1(1. newRS scale up) x1 --> stop(end) condtion2 -- YES --> x2(2. oldRS scale down) x2 --> stop x3 --> stop
newRS scale up #
阅读代码 pkg/controller/deployment/rolling.go:68
,详细如下:
func (dc *DeploymentController) reconcileNewReplicaSet(allRSs []*apps.ReplicaSet, newRS *apps.ReplicaSet, deployment *apps.Deployment) (bool, error) {
// 1、判断副本数是否已达到了期望值
if *(newRS.Spec.Replicas) == *(deployment.Spec.Replicas) {
// Scaling not required.
return false, nil
}
// 2、判断是否需要 scale down 操作
if *(newRS.Spec.Replicas) > *(deployment.Spec.Replicas) {
// Scale down.
scaled, _, err := dc.scaleReplicaSetAndRecordEvent(newRS, *(deployment.Spec.Replicas), deployment)
return scaled, err
}
// 3、计算 newRS 所需要的副本数
newReplicasCount, err := deploymentutil.NewRSNewReplicas(deployment, allRSs, newRS)
if err != nil {
return false, err
}
// 4、如果需要 scale ,则更新 rs 的 annotation 以及 rs.Spec.Replicas
scaled, _, err := dc.scaleReplicaSetAndRecordEvent(newRS, newReplicasCount, deployment)
return scaled, err
}
从上面的源码可以得出,reconcileNewReplicaSet()
的主要逻辑如下:
- 判断
newRS.Spec.Replicas
和deployment.Spec.Replicas
是否相等, 如果相等则直接返回,说明已经达到期望状态; - 若
newRS.Spec.Replicas > deployment.Spec.Replicas
, 则说明 newRS 副本数已经超过期望值,调用dc.scaleReplicaSetAndRecordEvent()
进行 scale down; - 此时
newRS.Spec.Replicas < deployment.Spec.Replicas
, 调用deploymentutil.NewRSNewReplicas()
为 newRS 计算所需要的副本数, 计算原则遵守 maxSurge 和 maxUnavailable 的约束; - 调用
dc.scaleReplicaSetAndRecordEvent()
更新 newRS 对象,设置rs.Spec.Replicas
、rs.Annotations[DesiredReplicasAnnotation]
以及rs.Annotations[MaxReplicasAnnotation]
;
其中,计算 newRS 的副本数,是滚动更新核心过程的第一步,
阅读源码 pkg/controller/deployment/util/deployment_util.go:816
:
func NewRSNewReplicas(deployment *apps.Deployment, allRSs []*apps.ReplicaSet, newRS *apps.ReplicaSet) (int32, error) {
switch deployment.Spec.Strategy.Type {
case apps.RollingUpdateDeploymentStrategyType:
// 1、计算 maxSurge 值,向上取整
maxSurge, err := intstrutil.GetValueFromIntOrPercent(deployment.Spec.Strategy.RollingUpdate.MaxSurge, int(*(deployment.Spec.Replicas)), true)
if err != nil {
return 0, err
}
// 2、累加 rs.Spec.Replicas 获取 currentPodCount
currentPodCount := GetReplicaCountForReplicaSets(allRSs)
maxTotalPods := *(deployment.Spec.Replicas) + int32(maxSurge)
if currentPodCount >= maxTotalPods {
// Cannot scale up.
return *(newRS.Spec.Replicas), nil
}
// 3、计算 scaleUpCount,结果不超过期望值
scaleUpCount := maxTotalPods - currentPodCount
scaleUpCount = int32(integer.IntMin(int(scaleUpCount), int(*(deployment.Spec.Replicas)-*(newRS.Spec.Replicas))))
return *(newRS.Spec.Replicas) + scaleUpCount, nil
case apps.RecreateDeploymentStrategyType:
return *(deployment.Spec.Replicas), nil
default:
return 0, fmt.Errorf("deployment type %v isn't supported", deployment.Spec.Strategy.Type)
}
}
可知 NewRSNewReplicas()
的主要逻辑如下:
- 判断更新策略;
- 计算 maxSurge 值;
- 通过 allRSs 计算 currentPodCount 的值;
- 最后计算 scaleUpCount 值;
oldRS scale down #
同理,oldRS 规模缩小,阅读源码 pkg/controller/deployment/rolling.go:68
:
func (dc *DeploymentController) reconcileOldReplicaSets(allRSs []*apps.ReplicaSet, oldRSs []*apps.ReplicaSet, newRS *apps.ReplicaSet, deployment *apps.Deployment) (bool, error) {
// 1、计算 oldPodsCount
oldPodsCount := deploymentutil.GetReplicaCountForReplicaSets(oldRSs)
if oldPodsCount == 0 {
// Can't scale down further
return false, nil
}
// 2、计算 maxUnavailable
allPodsCount := deploymentutil.GetReplicaCountForReplicaSets(allRSs)
klog.V(4).Infof("New replica set %s/%s has %d available pods.", newRS.Namespace, newRS.Name, newRS.Status.AvailableReplicas)
maxUnavailable := deploymentutil.MaxUnavailable(*deployment)
// 3、计算 maxScaledDown
minAvailable := *(deployment.Spec.Replicas) - maxUnavailable
newRSUnavailablePodCount := *(newRS.Spec.Replicas) - newRS.Status.AvailableReplicas
maxScaledDown := allPodsCount - minAvailable - newRSUnavailablePodCount
if maxScaledDown <= 0 {
return false, nil
}
// 4、清理异常的 rs
oldRSs, cleanupCount, err := dc.cleanupUnhealthyReplicas(oldRSs, deployment, maxScaledDown)
if err != nil {
return false, nil
}
klog.V(4).Infof("Cleaned up unhealthy replicas from old RSes by %d", cleanupCount)
// 5、缩容 old rs
allRSs = append(oldRSs, newRS)
scaledDownCount, err := dc.scaleDownOldReplicaSetsForRollingUpdate(allRSs, oldRSs, deployment)
if err != nil {
return false, nil
}
klog.V(4).Infof("Scaled down old RSes of deployment %s by %d", deployment.Name, scaledDownCount)
totalScaledDown := cleanupCount + scaledDownCount
return totalScaledDown > 0, nil
}
通过上面的代码可知,reconcileOldReplicaSets()
的主要逻辑如下:
- 通过 oldRSs 和 allRSs 获取 oldPodsCount 和 allPodsCount;
- 计算 deployment 的 maxUnavailable、minAvailable、newRSUnavailablePodCount、maxScaledDown 值, 当 deployment 的 maxSurge 和 maxUnavailable 值为百分数时, 计算 maxSurge 向上取整而 maxUnavailable 则向下取整;
- 清理异常的 rs;
- 计算 oldRS 的 scaleDownCount;
- 最后 oldRS 缩容;
滚动更新总结 #
通过上面的代码可以看出,滚动更新过程中主要是通过调用 reconcileNewReplicaSet()
对 newRS 不断扩容,
调用 reconcileOldReplicaSets()
对 oldRS 不断缩容,最终达到期望状态,并且在整个升级过程中,
都严格遵守 maxSurge 和 maxUnavailable 的约束。
不论是在 scale up 或者 scale down 中都是调用 scaleReplicaSetAndRecordEvent()
执行,
而 scaleReplicaSetAndRecordEvent()
又会调用 scaleReplicaSet()
,
扩缩容都是更新 rs.Annotations
以及 rs.Spec.Replicas
。
整体流程如下图所示:
graph LR op1(newRS scale up) --> op3[dc.scaleReplicaSetAndRecordEvent] op2(oldRS scale down) --> op3(dc.scaleReplicaSetAndRecordEvent) op3(dc.scaleReplicaSetAndRecordEvent) --> op4(dc.scaleReplicaSet)
滚动更新示例 #
- 创建一个 deployment,replica = 10
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 10
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.18.0
ports:
- containerPort: 80
10 个 Pod 创建成功后如下所示:
$ kubectl get rs
NAME DESIRED CURRENT READY AGE
nginx-deployment-67dfd6c8f9 10 10 10 70s
- 更新 nginx-deployment 的镜像,默认使用滚动更新的方式
$ kubectl set image deploy/nginx-deployment nginx-deployment=nginx:1.19.1
此时通过源码可知会计算该 deployment 的 maxSurge=3,maxUnavailable=2,maxAvailable=13,计算方法如下所示:
// 向上取整 maxSurge = 10 * 0.25 = 3
maxSurge = replicas * deployment.spec.strategy.rollingUpdate.maxSurge
// 向下取整 maxUnavailable = 10 * 0.25 = 2
maxUnavailable = replicas * deployment.spec.strategy.rollingUpdate.maxUnavailable
// maxAvailable = 10 + 3 = 13
maxAvailable = replicas + MaxSurge
如上面代码所说,更新时首先创建 newRS,然后为其设定 replicas,计算 newRS 的 replicas 值的方法在 NewRSNewReplicas()
中,
此时计算出 replicas 结果为 3,然后更新 deployment 的 annotation,创建 events,本次 syncLoop 完成。
等到下一个 syncLoop 时,所有 rs 的 replicas 已经达到最大值 10 + 3 = 13,此时需要 oldRS 缩容。
scale down 的数量是通过以下公式得到的:
// 13 = 10 + 3
allPodsCount := deploymentutil.GetReplicaCountForReplicaSets(allRSs)
// 8 = 10 - 2
minAvailable := *(deployment.Spec.Replicas) - maxUnavailable
// ???
newRSUnavailablePodCount := *(newRS.Spec.Replicas) - newRS.Status.AvailableReplicas
// 13 - 8 - ???
maxScaledDown := allPodsCount - minAvailable - newRSUnavailablePodCount
allPodsCount = 13,minAvailable = 8 ,newRSUnavailablePodCount 此时不确定,但是值在 [0,3] 范围内。 此时假设 newRS 的 3 个 pod 还处于 containerCreating 状态,则 newRSUnavailablePodCount = 3, 根据以上公式计算所知 maxScaledDown = 2,则 oldRS 需要缩容 2 个 pod,其 replicas 需要改为 8,此时该 syncLoop 完成。 下一个 syncLoop 时在 scaleUp 处计算得知 scaleUpCount = 13 - 8 - 3 = 2, 此时 newRS 需要更新 replicase 增加 2。以此轮询直到 newRS 扩容到 10,oldRS 缩容至 0。
对于上面的示例,可以使用 kubectl get rs -w 进行观察,以下为输出:
$ kubectl get rs -w
NAME DESIRED CURRENT READY AGE
nginx-deployment-5bbdfb5879 10 10 5 3s
nginx-deployment-67dfd6c8f9 3 3 3 4m47s
nginx-deployment-5bbdfb5879 10 10 6 3s
nginx-deployment-67dfd6c8f9 2 3 3 4m47s
nginx-deployment-67dfd6c8f9 2 2 2 4m47s
nginx-deployment-5bbdfb5879 10 10 7 4s
nginx-deployment-67dfd6c8f9 1 2 2 4m48s
nginx-deployment-67dfd6c8f9 1 1 1 4m48s
nginx-deployment-5bbdfb5879 10 10 8 4s
nginx-deployment-67dfd6c8f9 0 1 1 4m48s
nginx-deployment-67dfd6c8f9 0 0 0 4m48s
nginx-deployment-5bbdfb5879 10 10 9 5s
nginx-deployment-5bbdfb5879 10 10 10 6s