如何防止在K8s中进行误删除操作

大家好,我是乔克,一个爱折腾的运维工程,一个睡觉都被自己丑醒的云原生爱好者。


作者:乔克
公众号:运维开发故事
博客:https://jokerbai.com

防止 K8s 中的误删除

刚入行做运维时,我总觉得rm命令自带一种“致命魔力”——明明知道它危险,却总在恍惚间敲下回车。直到接触K8s,kubectl delete才让我明白:原来比rm -rf更狠的,是集群里的“静默删除”。

有个同行曾凌晨3点哭着找我求助:手滑用kubectl delete ns prod删了生产命名空间,里面的服务、数据卷瞬间清空,监控大屏直接红成一片。这种级别的事故,不是“道歉”能解决的。

今天就把我们团队实战过的K8s防误删方案全盘托出,从根源上杜绝“手滑惨案”。

一、为什么kubectl delete比rm更致命?

Linux的rm命令好歹还能通过回收站、数据恢复工具补救,但K8s的删除操作,简直是为“灾难”量身定做的:

  • 无确认静默执行:删Pod和删日志文件一样,输入命令就直接执行,没有“您确定要删除吗?”的灵魂拷问
  • 级联删除毁所有:删除一个命名空间,里面的Deployment、StatefulSet、PVC等资源会被“一锅端”,连带数据一起蒸发
  • 无可撤销无后路:K8s没有内置“回收站”,删除操作一旦执行,除非有备份,否则神仙难救

更可怕的是,触发删除的途径太多了:用kubectl手动操作会错,Helm部署时配置失误会删,CI/CD脚本写漏判断会删,甚至有人用Postman调API时手滑点错——任何能访问API Server的方式,都可能成为“删库导火索”

安全不能只靠“人小心”,必须靠“系统拦”。这是所有运维人的血泪教训。

二、想法:“授权-验证”双环防护机制

我们的目标很明确:既要挡住误操作,又不能给正常运维添负担。最终设计了一套“客户端授权+服务端验证”的双环机制:

  1. kubectl-safe-delete插件:作为客户端工具,替代原生kubectl delete。用户确认或加-y参数后,自动给资源加“删除授权注解”
  2. Validating Admission Webhook:部署在集群内的“防护闸”,只允许带有授权注解的资源被删除

核心思想:“-y不是绕过安全,而是启动授权流程”——既保留原生操作习惯,又补上安全漏洞。

三、落地:从部署到验证全流程

第一步:部署Webhook(搭建服务端防护)

先把“防护闸”立起来,确保所有高风险删除操作都经过验证。这一步分为4个小环节:

1. 编写Webhook服务代码(main.go)

核心逻辑是监听删除请求,校验目标资源是否有授权注解,没有则直接拦截。代码如下:

// main.go
package main

import (
 "context"
 "encoding/json"
 "fmt"
 "log"
 "net/http"
 "os"
 "time"

 admissionv1 "k8s.io/api/admission/v1"
 metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 "k8s.io/client-go/kubernetes"
 "k8s.io/client-go/rest"
)

const forceDeleteAnnotation = "safe-delete/force"

var clientset *kubernetes.Clientset

func main() {
 config, err := rest.InClusterConfig()
 if err != nil {
  log.Fatal("Failed to create in-cluster config:", err)
 }
 clientset, err = kubernetes.NewForConfig(config)
 if err != nil {
  log.Fatal("Failed to create clientset:", err)
 }

 http.HandleFunc("/validate", handleDelete)
 http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
  w.WriteHeader(http.StatusOK)
 })
 port := os.Getenv("PORT")
 if port == "" {
  port = "8443"
 }
 log.Printf("Starting delete-protection webhook on :%s", port)
 log.Fatal(http.ListenAndServeTLS(":"+port, "/certs/cert.pem", "/certs/key.pem", nil))
}

func handleDelete(w http.ResponseWriter, r *http.Request) {
 var review admissionv1.AdmissionReview
 if err := json.NewDecoder(r.Body).Decode(&review); err != nil {
  log.Printf("AdmissionReview decode error: %v", err)
  http.Error(w, err.Error(), http.StatusBadRequest)
  return
 }

 resp := &admissionv1.AdmissionResponse{
  Allowed: true,
  UID:     review.Request.UID,
 }

 req := review.Request
 log.Printf(
  "Admission request: op=%s kind=%s ns=%s name=%s uid=%s",
  req.Operation, req.Kind.Kind, req.Namespace, req.Name, req.UID,
 )
 if req.Operation == admissionv1.Delete {
  highRiskKinds := map[string]bool{
   "Namespace":             true,
   "Deployment":            true,
   "StatefulSet":           true,
   "DaemonSet":             true,
   "PersistentVolume":      true,
   "PersistentVolumeClaim": true,
  }

  if highRiskKinds[req.Kind.Kind] {
   log.Printf("High-risk delete detected: kind=%s ns=%s name=%s uid=%s", req.Kind.Kind, req.Namespace, req.Name, req.UID)
   annFromReq, annOK := getAnnotationsFromAdmission(req)
   if annOK {
    log.Printf("Annotations from request: kind=%s ns=%s name=%s count=%d", req.Kind.Kind, req.Namespace, req.Name, len(annFromReq))
   }
   annotations, err := func() (map[string]string, error) {
    if annOK {
     return annFromReq, nil
    }
    return getResourceAnnotations(req)
   }()
   if err != nil || annotations[forceDeleteAnnotation] != "true" {
    if err == nil {
     retries := 10
     for i := 0; i < retries; i++ {
      time.Sleep(200 * time.Millisecond)
      a2, err2 := getResourceAnnotations(req)
      if err2 != nil {
       err = err2
       break
      }
      if a2[forceDeleteAnnotation] == "true" {
       annotations = a2
       break
      }
     }
    }
    if err != nil {
     log.Printf(
      "Delete blocked: annotations fetch error for kind=%s ns=%s name=%s uid=%s err=%v",
      req.Kind.Kind, req.Namespace, req.Name, req.UID, err,
     )
    } else {
     log.Printf(
      "Delete blocked: missing '%s' annotation for kind=%s ns=%s name=%s uid=%s",
      forceDeleteAnnotation, req.Kind.Kind, req.Namespace, req.Name, req.UID,
     )
    }
    resp.Allowed = false
    resp.Result = &metav1.Status{
     Message: fmt.Sprintf(
      "❌ 删除被策略阻止。\n"+
       "请使用 'kubectl safe-delete %s/%s' 完成授权删除。",
      req.Kind.Kind, req.Name,
     ),
     Code: 403,
    }
   } else {
    log.Printf(
     "Delete allowed: '%s' annotation present for kind=%s ns=%s name=%s uid=%s",
     forceDeleteAnnotation, req.Kind.Kind, req.Namespace, req.Name, req.UID,
    )
   }
  } else {
   log.Printf("Non-high-risk delete: kind=%s ns=%s name=%s uid=%s allowed", req.Kind.Kind, req.Namespace, req.Name, req.UID)
  }
 } else {
  log.Printf("Non-delete operation: op=%s kind=%s ns=%s name=%s allowed", req.Operation, req.Kind.Kind, req.Namespace, req.Name)
 }

 w.Header().Set("Content-Type", "application/json")
 json.NewEncoder(w).Encode(admissionv1.AdmissionReview{Response: resp})
 log.Printf("Admission response: allowed=%t uid=%s kind=%s ns=%s name=%s", resp.Allowed, resp.UID, req.Kind.Kind, req.Namespace, req.Name)
}

func getResourceAnnotations(req *admissionv1.AdmissionRequest) (map[string]string, error) {
 ctx := context.Background()
 log.Printf("Fetching annotations for kind=%s ns=%s name=%s", req.Kind.Kind, req.Namespace, req.Name)
 switch req.Kind.Kind {
 case "Namespace":
  ns, err := clientset.CoreV1().Namespaces().Get(ctx, req.Name, metav1.GetOptions{})
  if err != nil {
   log.Printf("Namespace annotations fetch error: name=%s err=%v", req.Name, err)
   return nil, err
  }
  log.Printf("Namespace annotations fetched: name=%s count=%d", req.Name, len(ns.Annotations))
  return ns.Annotations, nil
 case "Deployment":
  obj, err := clientset.AppsV1().Deployments(req.Namespace).Get(ctx, req.Name, metav1.GetOptions{})
  if err != nil {
   log.Printf("Deployment annotations fetch error: ns=%s name=%s err=%v", req.Namespace, req.Name, err)
   return nil, err
  }
  log.Printf("Deployment annotations fetched: ns=%s name=%s count=%d", req.Namespace, req.Name, len(obj.Annotations))
  return obj.Annotations, nil
 case "StatefulSet":
  obj, err := clientset.AppsV1().StatefulSets(req.Namespace).Get(ctx, req.Name, metav1.GetOptions{})
  if err != nil {
   log.Printf("StatefulSet annotations fetch error: ns=%s name=%s err=%v", req.Namespace, req.Name, err)
   return nil, err
  }
  log.Printf("StatefulSet annotations fetched: ns=%s name=%s count=%d", req.Namespace, req.Name, len(obj.Annotations))
  return obj.Annotations, nil
 case "DaemonSet":
  obj, err := clientset.AppsV1().DaemonSets(req.Namespace).Get(ctx, req.Name, metav1.GetOptions{})
  if err != nil {
   log.Printf("DaemonSet annotations fetch error: ns=%s name=%s err=%v", req.Namespace, req.Name, err)
   return nil, err
  }
  log.Printf("DaemonSet annotations fetched: ns=%s name=%s count=%d", req.Namespace, req.Name, len(obj.Annotations))
  return obj.Annotations, nil
 case "PersistentVolume":
  obj, err := clientset.CoreV1().PersistentVolumes().Get(ctx, req.Name, metav1.GetOptions{})
  if err != nil {
   log.Printf("PersistentVolume annotations fetch error: name=%s err=%v", req.Name, err)
   return nil, err
  }
  log.Printf("PersistentVolume annotations fetched: name=%s count=%d", req.Name, len(obj.Annotations))
  return obj.Annotations, nil
 case "PersistentVolumeClaim":
  obj, err := clientset.CoreV1().PersistentVolumeClaims(req.Namespace).Get(ctx, req.Name, metav1.GetOptions{})
  if err != nil {
   log.Printf("PersistentVolumeClaim annotations fetch error: ns=%s name=%s err=%v", req.Namespace, req.Name, err)
   return nil, err
  }
  log.Printf("PersistentVolumeClaim annotations fetched: ns=%s name=%s count=%d", req.Namespace, req.Name, len(obj.Annotations))
  return obj.Annotations, nil
 default:
  type withMeta struct {
   Metadata metav1.ObjectMeta `json:"metadata"`
  }
  gvr := kindToGVR(req.Kind.Kind)
  url := fmt.Sprintf("/apis/%s/%s/namespaces/%s/%s/%s",
   gvr.Group, gvr.Version, req.Namespace, gvr.Resource, req.Name)
  raw, err := clientset.RESTClient().Get().AbsPath(url).Do(ctx).Raw()
  if err != nil {
   log.Printf(
    "Annotations REST fetch error: kind=%s ns=%s name=%s url=%s err=%v",
    req.Kind.Kind, req.Namespace, req.Name, url, err,
   )
   return nil, err
  }
  var obj withMeta
  if err := json.Unmarshal(raw, &obj); err != nil {
   log.Printf("Annotations unmarshal error: kind=%s ns=%s name=%s err=%v", req.Kind.Kind, req.Namespace, req.Name, err)
   return nil, err
  }
  log.Printf("Annotations fetched: kind=%s ns=%s name=%s count=%d", req.Kind.Kind, req.Namespace, req.Name, len(obj.Metadata.Annotations))
  return obj.Metadata.Annotations, nil
 }
}

func kindToGVR(kind string) metav1.GroupVersionResource {
 switch kind {
 case "Deployment", "StatefulSet", "DaemonSet":
  return metav1.GroupVersionResource{Group: "apps", Version: "v1", Resource: kindToLower(kind) + "s"}
 case "PersistentVolumeClaim":
  return metav1.GroupVersionResource{Group: "", Version: "v1", Resource: "persistentvolumeclaims"}
 default:
  return metav1.GroupVersionResource{Group: "", Version: "v1", Resource: kindToLower(kind) + "s"}
 }
}

func kindToLower(kind string) string {
 m := map[string]string{
  "Namespace":             "namespace",
  "Deployment":            "deployment",
  "StatefulSet":           "statefulset",
  "DaemonSet":             "daemonset",
  "PersistentVolume":      "persistentvolume",
  "PersistentVolumeClaim": "persistentvolumeclaim",
 }
 if s, ok := m[kind]; ok {
  return s
 }
 return kind
}
func getAnnotationsFromAdmission(req *admissionv1.AdmissionRequest) (map[string]string, bool) {
 if req == nil || req.OldObject.Raw == nil || len(req.OldObject.Raw) == 0 {
  return nil, false
 }
 var obj struct {
  Metadata metav1.ObjectMeta `json:"metadata"`
 }
 if err := json.Unmarshal(req.OldObject.Raw, &obj); err != nil {
  log.Printf("Annotations unmarshal from request error: kind=%s ns=%s name=%s err=%v", req.Kind.Kind, req.Namespace, req.Name, err)
  return nil, false
 }
 return obj.Metadata.Annotations, true
}
https://wxa.wxs.qq.com/tmpl/oo/base_tmpl.html

2. 构建Docker镜像(Dockerfile)

用多阶段构建减小镜像体积,同时加入CA证书确保HTTPS服务正常运行:

FROM golang:1.24.0 AS builder
ENV GOPROXY https://goproxy.cn
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go mod tidy && go build -o webhook main.go

FROM ubuntu:22.04
ENV TZ=Asia/Shanghai
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates && rm -rf /var/lib/apt/lists/*
WORKDIR /root/
COPY --from=builder /app/webhook .
EXPOSE 8443
CMD ["./webhook"]

小贴士:证书可以用openssl手动生成,生产环境建议用cert-manager自动签发和更新,避免证书过期风险。

3. 部署到K8s集群(webhook-deployment.yaml)

创建Deployment、Service、RBAC权限等资源,确保Webhook服务稳定运行并有权限查询资源注解:

# webhook-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: safe-delete-webhook
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: safe-delete-webhook
  template:
    metadata:
      labels:
        app: safe-delete-webhook
    spec:
      serviceAccountName: safe-delete-webhook
      containers:
      - name: webhook
        image: your-image-hub/kubernetes-safe-delete-webhook:v0.0.4
        ports:
        - containerPort: 8443
        volumeMounts:
        - name: certs
          mountPath: /certs
          readOnly: true
        livenessProbe:
          httpGet:
            scheme: HTTPS
            path: /healthz
            port: 8443
          initialDelaySeconds: 5
        readinessProbe:
          httpGet:
            scheme: HTTPS
            path: /healthz
            port: 8443
          initialDelaySeconds: 5
      volumes:
      - name: certs
        secret:
          secretName: safe-delete-webhook-tls
---
apiVersion: v1
kind: Service
metadata:
  name: safe-delete-webhook
  namespace: kube-system
spec:
  selector:
    app: safe-delete-webhook
  ports:
  - port: 443
    targetPort: 8443
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: safe-delete-webhook
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: safe-delete-webhook-reader
rules:
- apiGroups: ["*"]
  resources: ["namespaces", "deployments", "statefulsets", "daemonsets", "persistentvolumes", "persistentvolumeclaims"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: safe-delete-webhook-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: safe-delete-webhook-reader
subjects:
- kind: ServiceAccount
  name: safe-delete-webhook
  namespace: kube-system

4. 生成证书并配置Webhook

Webhook必须通过HTTPS通信,所以需要生成证书并配置ValidatingWebhookConfiguration:

###### 1. 生成自签名证书(生产环境用cert-manager)https://wxa.wxs.qq.com/tmpl/oo/base_tmpl.html

#!/bin/bash

set -e

usage() {
    cat <<EOF
Generate certificate suitable for use with an sidecar-injector webhook service.

This script uses k8s' CertificateSigningRequest API to a generate a
certificate signed by k8s CA suitable for use with sidecar-injector webhook
services. This requires permissions to create and approve CSR. See
https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster for
detailed explantion and additional instructions.

The server key/cert k8s CA cert are stored in a k8s secret.

usage: ${0} [OPTIONS]

The following flags are required.

       --service          Service name of webhook.
       --namespace        Namespace where webhook service and secret reside.
       --secret           Secret name for CA certificate and server certificate/key pair.
EOF
    exit 1
}

while [[ $# -gt 0 ]]; do
    case ${1} in
        --service)
            service="$2"
            shift
            ;;
        --secret)
            secret="$2"
            shift
            ;;
        --namespace)
            namespace="$2"
            shift
            ;;
        *)
            usage
            ;;
    esac
    shift
done

[ -z ${service} ] && service=safe-delete-webhook
[ -z ${secret} ] && secret=safe-delete-webhook-tls
[ -z ${namespace} ] && namespace=kube-system

if [ ! -x "$(command -v openssl)" ]; then
    echo "openssl not found"
    exit 1
fi

csrName=${service}.${namespace}
tmpdir=$(mktemp -d)
echo "creating certs in tmpdir ${tmpdir} "

cat <<EOF >> ${tmpdir}/csr.conf
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = ${service}
DNS.2 = ${service}.${namespace}
DNS.3 = ${service}.${namespace}.svc
EOF

openssl genrsa -out ${tmpdir}/key.pem 2048
openssl req -new -key ${tmpdir}/key.pem -subj "/CN=${service}.${namespace}.svc" -out ${tmpdir}/server.csr -config ${tmpdir}/csr.conf

# clean-up any previously created CSR for our service. Ignore errors if not present.
kubectl delete csr ${csrName} 2>/dev/null || true

# create  server cert/key CSR and  send to k8s API
cat <<EOF | kubectl create -f -
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  name: ${csrName}
spec:
  groups:
  - system:authenticated
  request: $(cat ${tmpdir}/server.csr | base64 | tr -d '\n')
  usages:
  - digital signature
  - key encipherment
  - server auth
EOF

# verify CSR has been created
while true; do
    kubectl get csr ${csrName}
    if [ "$?" -eq 0 ]; then
        break
    fi
done

# approve and fetch the signed certificate
kubectl certificate approve ${csrName}
# verify certificate has been signed
for x in $(seq 10); do
    serverCert=$(kubectl get csr ${csrName} -o jsonpath='{.status.certificate}')
    if [[ ${serverCert} != '' ]]; then
        break
    fi
    sleep 1
done
if [[ ${serverCert} == '' ]]; then
    echo "ERROR: After approving csr ${csrName}, the signed certificate did not appear on the resource. Giving up after 10 attempts." >&2
    exit 1
fi
echo ${serverCert} | openssl base64 -d -A -out ${tmpdir}/cert.pem


# create the secret with CA cert and server cert/key
kubectl create secret generic ${secret} \
        --from-file=key.pem=${tmpdir}/key.pem \
        --from-file=cert.pem=${tmpdir}/cert.pem \
        --dry-run -o yaml |
    kubectl -n ${namespace} apply -f -

###### 2. 创建Webhook配置

apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
  name: safe-delete-webhook
webhooks:
  - name: safe-delete-webhook.kube-system.svc.cluster.local
    clientConfig:
      service:
        namespace: kube-system
        name: safe-delete-webhook
        path: "/validate"
      caBundle: "${CA_BUNDLE}"
    rules:
      - operations: ["DELETE"]
        apiGroups: ["*"]
        apiVersions: ["*"]
        resources: ["namespaces", "deployments", "statefulsets", "daemonsets", "persistentvolumes", "persistentvolumeclaims"]
    failurePolicy: Ignore
3.更新webhook中的CA_BUNDLE变量
#!/bin/bash

ROOT=$(cd $(dirname $0)/../../; pwd)

set -o errexit
set -o nounset
set -o pipefail

export CA_BUNDLE=$(kubectl config view --raw -o json | jq -r '.clusters[0].cluster."certificate-authority-data"' | tr -d '"')

if command -v envsubst >/dev/null 2>&1; then
    envsubst
else
    sed -e "s|\${CA_BUNDLE}|${CA_BUNDLE}|g"
fi
4.执行部署
#!/bin/bash

sh create-cert.sh
cat ./valide-webhook.yaml | sh ./patch-webhook-ca.sh > ./webhook.yaml

echo "Creating k8s admission deployment"
kubectl apply -f deployment.yaml
kubectl apply -f webhook.yaml

第二步:安装kubectl插件(优化客户端体验)

Webhook部署完成后,正常删除会被拦截,这时候就需要客户端插件来完成“授权”操作。插件会自动给资源加注解,无需手动操作。

1. 创建插件脚本(kubectl-safe-delete)

 #!/bin/bash  
 # kubectl-safe-delete 插件  
 # 支持用法: kubectl safe-delete <resource>/<name> [-n ns] [-y]  
   
 set -euo pipefail  
   
 # 定义高风险资源(与Webhook配置一致)  
 declare -A HIGH_RISK_MAP  
 HIGH_RISK_MAP=(  
   ["namespace"]=1 ["ns"]=1  
   ["deployment"]=1 ["deploy"]=1  
   ["statefulset"]=1 ["sts"]=1  
   ["daemonset"]=1 ["ds"]=1  
   ["persistentvolume"]=1 ["pv"]=1  
   ["persistentvolumeclaim"]=1 ["pvc"]=1  
 )  
   
 SKIP_CONFIRM=false  
 RESOURCE=""  
 NAMESPACE=""  
   
 # 解析命令行参数  
 ARGS=()  
 i=0  
 while [ $i -lt $# ]; do  
   arg="${!i}"  
   if [[ "$arg" == "-y" ]] || [[ "$arg" == "--yes" ]]; then  
     SKIP_CONFIRM=true  
   elif [[ "$arg" == "-n" ]] || [[ "$arg" == "--namespace" ]]; then  
     NAMESPACE="${!((i+1))}"  
     ARGS+=("$arg" "$NAMESPACE")  
     ((i+=1))  
   elif [[ "$arg" == */* ]]; then  
     RESOURCE="$arg"  
     ARGS+=("$arg")  
   else  
     ARGS+=("$arg")  
   fi  
   ((i++))  
 done  
   
 # 处理高风险资源的授权逻辑  
 if [[ -n "$RESOURCE" ]]; then  
   TYPE="${RESOURCE%%/*}"  # 提取资源类型(如deploy)  
   NAME="${RESOURCE#*/}"   # 提取资源名称(如web)  
   
   if [[ -n "${HIGH_RISK_MAP[$TYPE]:-}" ]]; then  
     # 处理命名空间(命名空间本身没有命名空间)  
     if [[ "$TYPE" == "namespace" ]] || [[ "$TYPE" == "ns" ]]; then  
       NS_FLAG=""  
     else  
       # 优先用命令行指定的ns,其次用当前上下文的ns,默认default  
       NS="${NAMESPACE:-$(kubectl config view --minify -o jsonpath='{..namespace}' 2>/dev/null || echo 'default')}"  
       NS_FLAG="-n $NS"  
     fi  
   
     # 确认环节(-y参数跳过)  
     if [[ "$SKIP_CONFIRM" == false ]]; then  
       echo "⚠️  检测到高风险删除操作:$RESOURCE"  
       read -p "是否确认删除?此操作将自动授权并执行。(y/N): " -n 1 -r  
       echo  
       if [[ ! $REPLY =~ ^[Yy]$ ]]; then  
         echo "❌ 操作已取消。"  
         exit 1  
       fi  
     fi  
   
     # 自动添加授权注解  
     echo "🔒 自动添加删除授权注解: safe-delete/force=true"  
     if [[ "$TYPE" == "namespace" ]] || [[ "$TYPE" == "ns" ]]; then  
       kubectl annotate namespace "$NAME" safe-delete/force=true --overwrite  
     else  
       kubectl annotate "$TYPE" "$NAME" $NS_FLAG safe-delete/force=true --overwrite  
     fi  
   fi  
 fi  
   
 # 执行真正的删除操作  
 exec kubectl delete "${ARGS[@]}"

2. 安装插件

将脚本放到系统PATH目录下,赋予执行权限即可:

 sudo install -m 755 kubectl-safe-delete /usr/local/bin/  

# 验证插件是否安装成功

 kubectl plugin list

如果输出“safe-delete”相关内容,说明插件已识别。

第三步:验证效果(4个实战场景)

部署完成后,我们用4个场景验证防护效果,确保“该拦的拦住,该删的能删”。

场景1:交互式删除(最常用)

未加-y参数时,插件会提示确认,确认后自动授权并删除:

 $ kubectl safe-delete deploy/web  
 ⚠️ 检测到高风险删除操作:deploy/web  
 是否确认删除?此操作将自动授权并执行。(y/N): y  
 🔒 自动添加删除授权注解: safe-delete/force=true  
 deployment "web" deleted

场景2:跳过确认(自动化脚本用)

-y参数后直接授权删除,适合CI/CD脚本集成:

 $ kubectl safe-delete ns test -y  
 🔒 自动添加删除授权注解: safe-delete/force=true  
 namespace "test" deleted

场景3:绕过插件(被拦截)

直接用原生kubectl delete,Webhook会拦截并提示正确用法:

$ kubectl delete deploy/web  
Error from server: admission webhook "safe-delete-webhook.kube-system.svc" denied the request:  
❌ 删除被策略阻止。  
请使用 'kubectl safe-delete deployment/web' 完成授权删除.

场景4:删除低风险资源(无干扰)

对于Pod这类低风险资源,插件不做额外干预,和原生操作一致:

$ kubectl safe-delete pod/nginx-xxx  
pod "nginx-xxx" deleted

四、总结:简单却管用的防护逻辑

整个方案只用了不到200行插件脚本+150行Webhook代码,就实现了:

  • ✅ 符合原生操作习惯:用户加-y就能删,无需学习新命令
  • ✅ 双重防护无死角:客户端授权+服务端验证,绕过任何一环都删不了
  • ✅ 操作可追溯:授权注解会留下操作痕迹,便于审计
  • ✅ 低侵入性:不影响低风险资源操作,只管控核心资源

最后想说:运维安全的核心不是“禁止操作”,而是“规范操作”。好的防护方案,应该让安全成为习惯,而不是负担。最后,求关注。如果你还想看更多优质原创文章,欢迎关注我们的公众号「运维开发故事」。

如果我的文章对你有所帮助,还请帮忙点赞、在看、转发一下,你的支持会激励我输出更高质量的文章,非常感谢!

你还可以把我的公众号设为「星标」,这样当公众号文章更新时,你会在第一时间收到推送消息,避免错过我的文章更新。


我是 乔克,《运维开发故事》公众号团队中的一员,一线运维农民工,云原生实践者,这里不仅有硬核的技术干货,还有我们对技术的思考和感悟,欢迎关注我们的公众号,期待和你一起成长!

声明:来自运维开发故事,仅代表创作者观点。链接:https://eyangzhen.com/4348.html

运维开发故事的头像运维开发故事

相关推荐

关注我们
关注我们
购买服务
购买服务
返回顶部