注意: 一些读者可能想直接去 Quick Start.

如果正在使用 Kubebuilder v1, 请查看 Kubebuilder v1

英文原版: book.kubebuilder.io

哪些人适合看这个文档

Kubernetes 的使用者

Kubernetes 的使用者将通过学习 API 是如何设计和实现的,获得对 Kubernetes 更深入的了解。 本书将教读者如何开发自己的 Kubernetes API 以及实现 Kubernetes API 的核心原理。

包括:

  • 如何构造 Kubernetes API 和 Resources
  • 如何进行 API 版本控制
  • 如何实现故障自愈
  • 如何实现垃圾回收和 Finalizers
  • 如何创建声明式和命令式 API
  • 如何创建 Level-Based API 和 Edge-Base API
  • 如何创建 Resources 和 Subresources

Kubernetes API extension developers

API extension developers 将学习实现标准的 Kubernetes API 的原理和概念,以及用于快速构建 API 的工具和库。本书涵盖了开发人员通常会遇到的陷阱和误解。

包括:

  • 如何用一个 reconciliation 方法处理多个 events
  • 如何定期执行 reconciliation 方法
  • 将来会有
    • 何时使用 lister cache 与 live lookups(实时查找)
    • 如何垃圾回收和 Finalizers
    • 如何使用 Declarative Validation 和 Webhook Validation
    • 如何实现 API 版本控制

相关资源

快速开始

这个 Quick Start 指南包括:

环境准备

  • go version v1.13+.
  • docker version 17.03+.
  • kubectl version v1.11.3+.
  • kustomize v3.1.0+
  • Access to a Kubernetes v1.11.3+ cluster.

安装

安装 kubebuilder:

os=$(go env GOOS)
arch=$(go env GOARCH)

# download kubebuilder and extract it to tmp
curl -L https://go.kubebuilder.io/dl/2.2.0/${os}/${arch} | tar -xz -C /tmp/

# move to a long-term location and put it on your path
# (you'll need to set the KUBEBUILDER_ASSETS env var if you put it somewhere else)
sudo mv /tmp/kubebuilder_2.2.0_${os}_${arch} /usr/local/kubebuilder
export PATH=$PATH:/usr/local/kubebuilder/bin

创建一个 Project

创建一个目录,然后运行 init 命令以初始化新项目

mkdir $GOPATH/src/example
cd $GOPATH/src/example
kubebuilder init --domain my.domain

创建一个 API

运行以下命令创建一个新的 API(group/version) -->webapp/v1, 并在此 API 上创建一个新的 Kind(CRD) --> Guestbook

kubebuilder create api --group webapp --version v1 --kind Guestbook

备注: 对于如何设计 API 和如何实现业务逻辑可以参考 Designing an APIWhat’s in a Controller.

查看示例 `(api/v1/guestbook_types.go)`

// GuestbookSpec defines the desired state of Guestbook
type GuestbookSpec struct {
	// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	// Quantity of instances
	// +kubebuilder:validation:Minimum=1
	// +kubebuilder:validation:Maximum=10
	Size int32 `json:"size"`

	// Name of the ConfigMap for GuestbookSpec's configuration
	// +kubebuilder:validation:MaxLength=15
	// +kubebuilder:validation:MinLength=1
	ConfigMapName string `json:"configMapName"`

	// +kubebuilder:validation:Enum=Phone;Address;Name
	Type string `json:"alias,omitempty"`
}

// GuestbookStatus defines the observed state of Guestbook
type GuestbookStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	// PodName of the active Guestbook node.
	Active string `json:"active"`

	// PodNames of the standby Guestbook nodes.
	Standby []string `json:"standby"`
}

type Guestbook struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   GuestbookSpec   `json:"spec,omitempty"`
	Status GuestbookStatus `json:"status,omitempty"`
}

在本地运行 operator

你需要一个 Kubernetes 测试集群来进行测试,你可以使用 KIND 来启动一个本地或者远程的测试集群

在集群中安装 CRDs

make install

运行 controller (这个命令不会再后台运行 controller,所以建议新建一个 terminal 运行以下命令)

make run

安装 Custom Resources 实例

创建 Project时,对于 Create Resource [y/n],如果你选择 y, 将会在 config/samples/ 目录中为你的 CRD 创建一个 CR 文件 xxx.yaml (如果你修改了 API 定义文件(api/v1/guestbook_types.go),请务必修改此文件)

kubectl apply -f config/samples/

在集群上运行 operator

Build 并 Push 镜像 IMG:

make docker-build docker-push IMG=<some-registry>/<project-name>:tag

使用指定镜像IMG将 operator 部署到集群中:

make deploy IMG=<some-registry>/<project-name>:tag

卸载 CRDs

从集群中删除 CRDs:

make uninstall

下一步

现在,请继续学习 CronJob tutorial 教程,通过开发演示示例项目更好地了解其工作原理。

教程:构建 CronJob

很多教程都是从一些人为设置好的程序开始,或者是给你一些了解基础知识的小应用程序,这些不能让你深入了解更复杂的东西。 相反,本教程几乎带会您了解 Kubebuilder 的全部复杂性,从简单开始,然后逐步发展为功能齐全的产品。

现在让我们假设 Kubernetes 中实现的 CronJob Controller 并不能满足我们的需求,我们希望使用 Kubebuilder 重写它。

CronJob controller 会控制 kubernetes 集群上的 job 每隔一段时间运行一次,它是基于 Job controller 实现的,Job controller 的 job 只会执行任务一次。

通过重写 Job controller,我们可以更加了解如何与不属于集群的资源类型进行交互。

创建我们的项目

快速入门中所述,我们需要创建一个新项目,请确保已经安装Kubebuilder ,然后执行如下命令创建一个新项目:

# we'll use a domain of tutorial.kubebuilder.io,
# so all API groups will be <group>.tutorial.kubebuilder.io.
kubebuilder init --domain tutorial.kubebuilder.io

现在我们已经有了一个项目,下面让我们看一下到目前为止 Kubebuilder 为我们搭建的脚手架...

基本项目中有什么?

在创建新项目时,Kubebuilder 为我们提供了一些基本的模板。

基础设施模板

首先,会创建一些用于构建项目的 basic infrastructure:

go.mod: 一个与项目匹配,包含最基本依赖关系的 go module 文件
module tutorial.kubebuilder.io/project

go 1.13

require (
	github.com/go-logr/logr v0.1.0
	github.com/robfig/cron v1.2.0
	k8s.io/api v0.17.2
	k8s.io/apimachinery v0.17.2
	k8s.io/client-go v0.17.2
	sigs.k8s.io/controller-runtime v0.5.0
)
Makefile: 用于构建和部署 controller

# Image URL to use all building/pushing image targets
IMG ?= controller:latest
# Produce CRDs that work back to Kubernetes 1.11 (no version conversion)
CRD_OPTIONS ?= "crd:trivialVersions=true"

all: manager

# Run tests
test: generate fmt vet manifests
	go test ./api/... ./controllers/... -coverprofile cover.out

# Build manager binary
manager: generate fmt vet
	go build -o bin/manager main.go

# Run against the configured Kubernetes cluster in ~/.kube/config
run: generate fmt vet
	go run ./main.go

# Install CRDs into a cluster
install: manifests
	kubectl apply -f config/crd/bases

# Deploy controller in the configured Kubernetes cluster in ~/.kube/config
deploy: manifests
	kubectl apply -f config/crd/bases
	kustomize build config/default | kubectl apply -f -

# Generate manifests e.g. CRD, RBAC etc.
manifests: controller-gen
	$(CONTROLLER_GEN) $(CRD_OPTIONS) rbac:roleName=manager-role webhook paths="./api/...;./controllers/..." output:crd:artifacts:config=config/crd/bases

# Run go fmt against code
fmt:
	go fmt ./...

# Run go vet against code
vet:
	go vet ./...

# Generate code
generate: controller-gen
	$(CONTROLLER_GEN) object:headerFile=./hack/boilerplate.go.txt paths=./api/...

# Build the docker image
docker-build: test
	docker build . -t ${IMG}
	@echo "updating kustomize image patch file for manager resource"
	sed -i'' -e 's@image: .*@image: '"${IMG}"'@' ./config/default/manager_image_patch.yaml

# Push the docker image
docker-push:
	docker push ${IMG}

# find or download controller-gen
# download controller-gen if necessary
controller-gen:
ifeq (, $(shell which controller-gen))
	go get sigs.k8s.io/controller-tools/cmd/controller-gen@v0.2.0-rc.0
CONTROLLER_GEN=$(shell go env GOPATH)/bin/controller-gen
else
CONTROLLER_GEN=$(shell which controller-gen)
endif
PROJECT: 用于创建新组件的 Kubebuilder 元数据
version: "2"
domain: tutorial.kubebuilder.io
repo: tutorial.kubebuilder.io/project

启动配置项模板

我们能在 config/ 目录下找到运行 operator 所需的所有配置文件,现在它只包含运行 controller 所需要的 Kustomize YAML 配置文件,后续我们编写 operator 时,这个目录还会包含 CustomResourceDefinitions(CRD)、RBAC 和 Webhook 等相关的配置文件

config/default 包含 Kustomize base 文件,用于以标准配置启动 controller。

config/目录下每个目录都包含不同的配置:

  • config/manager: 包含在 k8s 集群中以 pod 形式运行 controller 的 YAML 配置文件

  • config/rbac: 包含运行 controller 所需最小权限的配置文件

程序入口

最后,但同样重要的是,Kubebuilder 创建了我们项目的程序入口:main.go。下面让我们看看这个文件...

每个程序都需要一个 main 入口

emptymain-cn.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

我们的 main.go 文件最开始会 import 一些基础的 package。 例如:

  • 比较核心的库: controller-runtime
  • controller-runtime 的默认日志库:Zap(稍后会详细介绍)
package main

import (
	"flag"
	"fmt"
	"os"

	"k8s.io/apimachinery/pkg/runtime"
	_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/cache"
	"sigs.k8s.io/controller-runtime/pkg/log/zap"
	// +kubebuilder:scaffold:imports
)

每组 controller 都需要一个 Scheme, Scheme 会提供 Kinds 与 Go types 之间的映射关系(现在你只需要记住这一点)。 在编写 API 定义时,我们将会进一步讨论 Kinds。

var (
	scheme   = runtime.NewScheme()
	setupLog = ctrl.Log.WithName("setup")
)

func init() {

	// +kubebuilder:scaffold:scheme
}

此时,我们 main.go 的功能相对来说比较简单:

  1. 为 metrics 绑定一些基本的 flags。

  2. 实例化一个 manager,用于跟踪我们运行的所有 controllers, 并设置 shared caches 和可以连接到 API server 的 k8s clients 实例,并将 Scheme 配置传入 manager。

  3. 运行我们的 manager, 而 manager 又运行所有的 controllers 和 webhook。 manager 会一直处于运行状态,直到收到正常关闭信号为止。 这样,当我们的 operator 运行在 Kubernetes 上时,我们可以通过优雅的方式终止这个 Pod。

虽然目前我们还没有什么可以运行,但是请记住 +kubebuilder:scaffold:builder 注释的位置 -- 很快那里就会变得有趣。

func main() {
	var metricsAddr string
	flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
	flag.Parse()

	ctrl.SetLogger(zap.New(zap.UseDevMode(true)))

	mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{Scheme: scheme, MetricsBindAddress: metricsAddr})
	if err != nil {
		setupLog.Error(err, "unable to start manager")
		os.Exit(1)
	}

请注意:下面代码如果指定了 Namespace 字段, controllers 仍然可以 watch cluster 级别的资源(例如Node), 但是对于 namespace 级别的资源,cache 将仅可以缓存指定 namespace 中的资源。

	mgr, err = ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
		Scheme:             scheme,
		Namespace:          namespace,
		MetricsBindAddress: metricsAddr,
	})

上面的示例将 operator 应用的获取资源范围限制在了单个 namespace。在这种情况下,建议将默认的 ClusterRole 和 ClusterRoleBinding 分别替换​​为 Role 和 RoleBinding 来将授权限制于此名称空间。 有关更多信息,请参见如何使用 RBAC Authorization

除此之外,你也可以使用 MultiNamespacedCacheBuilder 来监视一组特定的 namespaces:

	var namespaces []string // List of Namespaces

	mgr, err = ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
		Scheme:             scheme,
		NewCache:           cache.MultiNamespacedCacheBuilder(namespaces),
		MetricsBindAddress: fmt.Sprintf("%s:%d", metricsHost, metricsPort),
	})

有关更多信息,请参见 MultiNamespacedCacheBuilder

	// +kubebuilder:scaffold:builder

	setupLog.Info("starting manager")
	if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
		setupLog.Error(err, "problem running manager")
		os.Exit(1)
	}
}

有了这些,我们就可以继续创建我们的API!

Groups、Versions 和 Kinds 之间的关系

在开始编写 API 之前,让我们讨论一些关键术语

当描述 Kubernetes 中的 API 时,我们经常用到的四个术语是:groupsversionskindsresources

Groups 和 Versions

Kubernetes 中的 API Group 只是相关功能的集合, 每个 group 包含一个或者多个 versions, 这样的关系可以让我们随着时间的推移,通过创建不同的 versions 来更改 API 的工作方式。

Kinds 和 Resources

Kinds

每个 API group-version 包含一个或者多个 API 类型, 我们称之为 Kinds. 虽然同类型 Kind 在不同的 version 之间的表现形式可能不同,但是同类型 Kind 必须能够存储其他 Kind 的全部数据,也就是说同类型 Kind 之间必须是互相兼容的(我们可以把数据存到 fields 或者 annotations),这样当你使用老版本的 API group-version 时不会造成丢失或损坏, 有关更多信息,请参见Kubernetes API指南

Resources

你应该听别人提到过 ResourcesResourceKind 在 API 中的标识,通常情况下 KindResource 是一一对应的, 但是有时候相同的 Kind 可能对应多个 Resources, 比如 Scale Kind 可能对应很多 Resources:deployments/scale 或者 replicasets/scale, 但是在 CRD 中,每个 Kind 只会对应一种 Resource

请注意,Resource 始终是小写形式,并且通常情况下是 Kind 的小写形式。 具体对应关系可以查看 resource type

那么以上类型在框架中是如何定义的呢

当我们在一个特定的 group-version中使用 Kind 时,我们称它为 GroupVersionKind, 简称 GVK, 同样的 resources 我们称它为 GVR。 稍后我们将看到每个 GVK 都对应一个 root Go type (比如:Deployment 就关联着 K8s 源码里面 k8s.io/api/apps/v1 package 中的 Deployment struct)。

现在我们已经熟悉了一些关键术语,那么我们可以开始创建我们的 API 了!

额, 但是 Scheme 是什么东东?

Scheme 提供了 GVK 与对应 Go types(struct) 之间的映射(请不要和 godocs 中的 Scheme 混淆)

也就是说给定 Go type 就可以获取到它的 GVK,给定 GVK 可以获取到它的 Go type

例如,让我们假设 tutorial.kubebuilder.io/api/v1/cronjob_types.go 中的 CronJob 结构体在 batch.tutorial.kubebuilder.io/v1 API group 中(也就是说,假设这个 API group 有一种 Kind:CronJob,并且已经注册到 Api Server 中)

那么我们可以从 Api Server 获取到数据并反序列化至 &CronJob{}中,那么结果会是如下格式:

{
    "kind": "CronJob",
    "apiVersion": "batch.tutorial.kubebuilder.io/v1",
    ...
}

ps:这里翻译的不好,请继续向后看,慢慢就理解了 :)

创建一个 API

创建一个新的 Kind (你还记得上一章的内容, 对吗?)和相应的 controller, 我们可以使用 kubebuilder create api 命令:

kubebuilder create api --group batch --version v1 --kind CronJob

当我们第一次执行这个命令时,它会为创建一个新的 group-version 目录

在这里,api/v1/ 这个目录会被创建, 对应的 group-versionbatch.tutorial.kubebuilder.io/v1 (还记得我们一开始设置的--domain setting) 参数么)

它也会创建 api/v1/cronjob_types.go 文件并添加 CronJob Kind, 每次我们运行这个命令但是指定不同的 Kind 时, 他都会为我们创建一个 xxx_types.go 文件。

先让我们看看我们得到了什么开箱即用的东西,然后我们就开始填写它。

emptyapi-cn.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

最开始我们导入了 meta/v1 库,该库通常不会直接使用,但是它包含了 Kubernetes 所有 kind 的 metadata 结构体类型。

package v1

import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

下一步,我们为我们的 Kind 定义 Spec(CronJobSpec) 和 Status(CronJobStatus) 的类型。 Kubernetes 的功能是将期望状态(Spec)与集群实际状态(其他对象的Status)和外部状态进行协调。 然后记录下观察到的状态(Status)。 因此,每个具有功能的对象都包含 SpecStatus 。 但是 ConfigMap 之类的一些类型不遵循此模式,因为它们不会维护一种状态,但是大多数类型都需要。

// EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.

// CronJobSpec defines the desired state of CronJob
type CronJobSpec struct {
	// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
	// Important: Run "make" to regenerate code after modifying this file
}

// CronJobStatus defines the observed state of CronJob
type CronJobStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make" to regenerate code after modifying this file
}

然后,我们定义了 Kinds 对应的结构体类型,CronJobCronJobListCronJob 是我们的 root type,用来描述 CronJob Kind。和所有 Kubernetes 对象一样, 它包含 TypeMeta (用来定义 API version 和 Kind) 和 ObjectMeta (用来定义 name、namespace 和 labels等一些字段)

CronJobList 包含了一个 CronJob 的切片,它是用来批量操作 Kind 的,比如 LIST 操作

通常,我们不会修改它们 -- 所有的修改都是在 Spec 和 Status 上进行的。

+kubebuilder:object:root 注释称为标记(marker)。 稍后我们还会看到更多它们,它们提供了一些元数据, 来告诉 controller-tools(我们的代码和 YAML 生成器) 一些额外的信息。 这个注释告诉 object 这是一种 root type Kind。 然后,object 生成器会为我们生成 runtime.Object 接口的实现, 这是所有 Kinds 必须实现的接口。

// +kubebuilder:object:root=true

// CronJob is the Schema for the cronjobs API
type CronJob struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   CronJobSpec   `json:"spec,omitempty"`
	Status CronJobStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// CronJobList contains a list of CronJob
type CronJobList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []CronJob `json:"items"`
}

最后,我们将 Kinds 注册到 API group 中。这使我们可以将此 API group 中的 Kind 添加到任何 Scheme 中。

func init() {
	SchemeBuilder.Register(&CronJob{}, &CronJobList{})
}

现在,我们已经认识了基础的结构体,下一步我们开始填写它们!

设计一个 API

在 Kubernetes 中,我们有一些设计 API 的规范。 比如:所有可序列化字段必须是camelCase(驼峰) 格式,所以我们使用 JSON 的 tags 去指定它们,当一个字段可以为空时,我们会用 omitempty tag 去标记它。

字段类型大多是原始数据类型,Numbers 类型比较特殊: 出于兼容性的考虑,numbers 类型只接受三种数据类型:整形只能使用 int32int64 声明, 小数只能使用 resource.Quantity 声明

我的天, Quantity 又是什么鬼

Quantity 是十进制数字的一种特殊表示法,具有明确固定的表示形式, 使它们在计算机之间更易于移植。在Kubernetes中指定资源请求和Pod的限制时,您可能已经注意到它们。

从概念上讲,它们的工作方式类似于浮点数:它们具有有效位数,基数和指数。 它们的可序列化和人类可读格式使用整数和后缀来指定值,这与我们描述计算机存储的方式非常相似。

例如,该值2m表示0.002十进制表示法。 2Ki 表示2048十进制,而2K表示2000十进制。 如果要指定分数,请切换到一个后缀,该后缀使我们可以使用整数:2.5is 2500m。

支持两个基数:10和2(分别称为十进制和二进制)。 十进制基数用“常规” SI后缀(例如 M和K)表示,而二进制基数用“ mebi”表示法(例如Mi 和Ki)指定。想想megabytes vs mebibytes

我们还使用了另一种特殊类型:metav1.Time。它除了格式在 Kubernetes 中比较通用外,功能与 time.Time 完全相同。

先让我们看一下 CronJob 应该是什么样子的!

project/api/v1/cronjob_types.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Imports
package v1

import (
	batchv1beta1 "k8s.io/api/batch/v1beta1"
	corev1 "k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.

首先,让我们看一下 spec,如我们之前说的,spec 表明期望的状态,所以有关 controller 的任何 “inputs” 都在这里声明。

一个基本的 CronJob 需要以下功能:

  • 一个 schedule(调度器) -- (CronJob 中的 “Cron”)
  • 用来运行 Job 的模板 -- (CronJob 中的 “Job”)

为了让用户体现更好,我们也需要一些额外的功能:

  • 一个截止时间(StartingDeadlineSeconds), 如果错过了这个截止时间,Job 将会等到下一个调度时间点再被调度。
  • 如果多个 Job 同时启动要怎么做(ConcurrencyPolicy)(等待?停掉最老的一个?还是同时运行?)
  • 一个暂停(Suspend)功能,以防止 Job 在运行过程中出现什么错误。
  • 限制历史 Job 的数量(SuccessfulJobsHistoryLimit

请记住,因为 job 不会读取自己的状态,所以我们需要用一些方式去跟踪一个 CronJob 是否已经运行过 Job, 我们可以用至少一个 old job 来完成这个功能。

我们会用几个标记 (//+comment) 去定义一些额外的数据,这些标记在 controller-tools 生成 CRD manifest 时会被使用。 稍后我们还将看到,controller-tools 还会使用 GoDoc 来为字段生成描述信息。

// CronJobSpec defines the desired state of CronJob
type CronJobSpec struct {
	// +kubebuilder:validation:MinLength=0

	// The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
	Schedule string `json:"schedule"`

	// +kubebuilder:validation:Minimum=0

	// Optional deadline in seconds for starting the job if it misses scheduled
	// time for any reason.  Missed jobs executions will be counted as failed ones.
	// +optional
	StartingDeadlineSeconds *int64 `json:"startingDeadlineSeconds,omitempty"`

	// Specifies how to treat concurrent executions of a Job.
	// Valid values are:
	// - "Allow" (default): allows CronJobs to run concurrently;
	// - "Forbid": forbids concurrent runs, skipping next run if previous run hasn't finished yet;
	// - "Replace": cancels currently running job and replaces it with a new one
	// +optional
	ConcurrencyPolicy ConcurrencyPolicy `json:"concurrencyPolicy,omitempty"`

	// This flag tells the controller to suspend subsequent executions, it does
	// not apply to already started executions.  Defaults to false.
	// +optional
	Suspend *bool `json:"suspend,omitempty"`

	// Specifies the job that will be created when executing a CronJob.
	JobTemplate batchv1beta1.JobTemplateSpec `json:"jobTemplate"`

	// +kubebuilder:validation:Minimum=0

	// The number of successful finished jobs to retain.
	// This is a pointer to distinguish between explicit zero and not specified.
	// +optional
	SuccessfulJobsHistoryLimit *int32 `json:"successfulJobsHistoryLimit,omitempty"`

	// +kubebuilder:validation:Minimum=0

	// The number of failed finished jobs to retain.
	// This is a pointer to distinguish between explicit zero and not specified.
	// +optional
	FailedJobsHistoryLimit *int32 `json:"failedJobsHistoryLimit,omitempty"`
}

我们自定义了一个类型(ConcurrencyPolicy)来保存我们的并发策略。 它实际上只是一个字符串,但是这个类型名称提供了额外的说明, 我们还将验证附加到类型上,而不是字段上,从而使验证更易于重用。 // +kubebuilder:validation:Enum=Allow;Forbid;Replace 表示这个 ConcurrencyPolicy 只接受 AllowForbidReplace 这三个值。

// ConcurrencyPolicy describes how the job will be handled.
// Only one of the following concurrent policies may be specified.
// If none of the following policies is specified, the default one
// is AllowConcurrent.
// +kubebuilder:validation:Enum=Allow;Forbid;Replace
type ConcurrencyPolicy string

const (
	// AllowConcurrent allows CronJobs to run concurrently.
	AllowConcurrent ConcurrencyPolicy = "Allow"

	// ForbidConcurrent forbids concurrent runs, skipping next run if previous
	// hasn't finished yet.
	ForbidConcurrent ConcurrencyPolicy = "Forbid"

	// ReplaceConcurrent cancels currently running job and replaces it with a new one.
	ReplaceConcurrent ConcurrencyPolicy = "Replace"
)

下面,让我们设计 status,它用来存储观察到的状态。它包含我们希望用户或其他 controllers 能够获取的所有信息

我们将保留一份正在运行的 Job 列表,以及最后一次成功运行 Job 的时间。 注意,如上所述,我们使用 metav1.Time 而不是 time.Time 来获得稳定的序列化。

// CronJobStatus defines the observed state of CronJob
type CronJobStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	// A list of pointers to currently running jobs.
	// +optional
	Active []corev1.ObjectReference `json:"active,omitempty"`

	// Information when was the last time the job was successfully scheduled.
	// +optional
	LastScheduleTime *metav1.Time `json:"lastScheduleTime,omitempty"`
}

最后我们只剩下了 CronJob 结构体,如之前说的,我们不需要修改该结构体的任何内容, 但是我们希望操作改 Kind 像操作 Kubernetes 内置资源一样,所以我们需要增加一个 mark +kubebuilder:subresource:status, 来声明一个status subresource, 关于 subresource 的更多信息可以参考 k8s 文档

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status

// CronJob is the Schema for the cronjobs API
type CronJob struct {
Root Object Definitions
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   CronJobSpec   `json:"spec,omitempty"`
	Status CronJobStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// CronJobList contains a list of CronJob
type CronJobList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []CronJob `json:"items"`
}

func init() {
	SchemeBuilder.Register(&CronJob{}, &CronJobList{})
}

现在,我们已经有了一个 API,然后我们需要编写一个 controller 来实现具体功能。

简要说明: 其他文件是干什么的?

api/v1/目录下除 cronjob_types.go 外还有另外两个文件:groupversion_info.gozz_generated.deepcopy.go

这两个文件都不需要编辑(前者保持不变,后者是自动生成的), 但是知道其中有什么是非常有用的。

groupversion_info.go

groupversion_info.go 包含和 group-version 有关的元数据:

project/api/v1/groupversion_info.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

首先,我们有一些 package-level 的标记,+kubebuilder:object:generate=true 表示该程序包中有 Kubernetes 对象, +groupName=batch.tutorial.kubebuilder.io 表示该程序包的 Api group 是 batch.tutorial.kubebuilder.ioobject 生成器使用前者,而 CRD 生成器使用后者, 来为由此包创建的 CRD 生成正确的元数据。

// Package v1 contains API Schema definitions for the batch v1 API group
// +kubebuilder:object:generate=true
// +groupName=batch.tutorial.kubebuilder.io
package v1

import (
	"k8s.io/apimachinery/pkg/runtime/schema"
	"sigs.k8s.io/controller-runtime/pkg/scheme"
)

然后,我们定义一些全局变量帮助我们建立 Scheme。 由于我们需要在 controller 中使用此程序包中的所有 types, 因此需要一种方便的方法(或约定)可以将所有 types 添加到其他 Scheme 中。 SchemeBuilder 让我们做这件事情变的容易。

var (
	// GroupVersion is group version used to register these objects
	GroupVersion = schema.GroupVersion{Group: "batch.tutorial.kubebuilder.io", Version: "v1"}

	// SchemeBuilder is used to add go types to the GroupVersionKind scheme
	SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}

	// AddToScheme adds the types in this group-version to the given scheme.
	AddToScheme = SchemeBuilder.AddToScheme
)

zz_generated.deepcopy.go

zz_generated.deepcopy.go 包含之前所说的+kubebuilder:object:root自动生成的 runtime.Object 接口的实现。

runtime.Object interface 的核心是一个 deep-copy 方法:DeepCopyObject

controller-tools 中的 object 生成器也为每个 root type(CronJob) 和他的 sub-types(CronJobList,CronJobSpec,CronJob1Status) 都生成了两个方法:DeepCopyDeepCopyInto

controller 中有什么?

Controllers 是 operator 和 Kubernetes 的核心组件。

controller 的职责是确保实际的状态(包括群集状态,以及潜在的外部状态,例如正在运行的 Kubelet 容器和云提供商的 loadbalancers)与给定 object 期望的状态相匹配。 每个 controller 专注于一个 root Kind,但也可以与其他 Kinds 进行交互。

这种努力达到期望状态的过程,我们称之为 reconciling(调和,使...一直)。

在 controller-runtime 库中,实现 Kind reconciling 的逻辑我们称为 Reconcilerreconciler 获取对象的名称并返回是否需要重试(例如: 发生错误或是一些周期性的 controllers,像 HorizontalPodAutoscale)。

emptycontroller.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

首先,我们会 import 一些标准库。和以前一样,我们需要 controller-runtime 库,client 包以及我们定义的有关 API 类型的软件包。

package controllers

import (
	"context"

	"github.com/go-logr/logr"
	"k8s.io/apimachinery/pkg/runtime"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"

	batchv1 "tutorial.kubebuilder.io/project/api/v1"
)

接下来,kubebuilder 为我们搭建了一个基本的 reconciler 结构体。 几乎每个 reconciler 都需要记录日志,并且需要能够获取对象,因此这个结构体是开箱即用的。

// CronJobReconciler reconciles a CronJob object
type CronJobReconciler struct {
	client.Client
	Log    logr.Logger
	Scheme *runtime.Scheme
}

大多数 controllers 最终都会运行在 k8s 集群上,因此它们需要 RBAC 权限, 我们使用 controller-tools RBAC markers 指定了这些权限。 这是运行所需的最小权限。 随着我们添加更多功能,我们将会重新定义这些权限。

// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs/status,verbs=get;update;patch

Reconcile 方法对个某个单一的 object 执行 reconciling 动作, 我们的 Request只是一个 name, 但是 client 可以通过 name 信息从 cache 中获取到对应的 object。

我们返回的 result 为空,且 error 为 nil, 这表明 controller-runtime 已经成功 reconciled 了这个 object,无需进行任何重试,直到这个 object 被更改。

大多数 controller 都需要一个 logging handle 和一个 context,因此我们在这里进行了设置。

context 是用于允许取消请求或者用于跟踪之类的事情。 这是所有 client 方法的第一个参数。 Background context 只是一个 basic context,没有任何额外的数据或时间限制。

logging handle 用于记录日志, controller-runtime 通过 logr 库结构化日志记录。 稍后我们将看到,日志记录通过将 key-value 添加到静态消息中而起作用。 我们可以在 reconcile 方法的顶部提前分配一些 key-value ,以便查找在这个 reconciler 中所有的日志

func (r *CronJobReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
	_ = context.Background()
	_ = r.Log.WithValues("cronjob", req.NamespacedName)

	// your logic here

	return ctrl.Result{}, nil
}

最后,我们将此 reconciler 加到 manager,以便在启动 manager 时启动 reconciler。

现在,我们仅记录了 reconciler 在 CronJob 上的动作。稍后,我们将使用它来标记我们关心的 objects。

func (r *CronJobReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&batchv1.CronJob{}).
		Complete(r)
}

现在,我们已经看到了 controller 的基本结构,下一步让我们来填写 CronJob 的逻辑

实现一个 controller

我们的 CronJob controller 的基本逻辑是:

  1. 加载 CronJob

  2. 列出所有 active jobs,并更新状态

  3. 根据历史记录清理 old jobs

  4. 检查 Job 是否已被 suspended(如果被 suspended,请不要执行任何操作)

  5. 获取到下一次要 schedule 的 Job

  6. 运行新的 Job, 确定新 Job 没有超过 deadline 时间,且不会被我们 concurrency 规则 block

  7. 如果 Job 正在运行或者它应该下次运行,请重新排队

project/controllers/cronjob_controller.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

我们从一些 import 开始,正如你看到的,我们使用的 package 比脚手架帮我们生成的多,在使用它们时,我们会逐一讨论。

package controllers

import (
	"context"
	"fmt"
	"sort"
	"time"

	"github.com/go-logr/logr"
	"github.com/robfig/cron"
	kbatch "k8s.io/api/batch/v1"
	corev1 "k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
	ref "k8s.io/client-go/tools/reference"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"

	batch "tutorial.kubebuilder.io/project/api/v1"
)

接下来,我们需要一个 Clock 字段,它帮我们在测试中伪装计时。

// CronJobReconciler reconciles a CronJob object
type CronJobReconciler struct {
	client.Client
	Log    logr.Logger
	Scheme *runtime.Scheme
	Clock
}
Clock

我们模拟时钟,以便在测试时更容易跳转, realClock 只是调用了 time.Now 函数。

type realClock struct{}

func (_ realClock) Now() time.Time { return time.Now() }

// clock 知道如何获取当前时间
//它可以用来在测试时伪造时间。
type Clock interface {
	Now() time.Time
}

请注意,我们需要更多的 RBAC 权限 -- 由于我们现在正在创建和管理 Job,因此我们需要这些权限, 这意味着要添加更多 markers,所以我们增加了最下面两行。

// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=batch,resources=jobs,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=batch,resources=jobs/status,verbs=get

现在, 我们到了 controller 的核心部分 -- 实现 reconciler 的逻辑

var (
	scheduledTimeAnnotation = "batch.tutorial.kubebuilder.io/scheduled-at"
)

func (r *CronJobReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
	ctx := context.Background()
	log := r.Log.WithValues("cronjob", req.NamespacedName)

1: 按 namespace 加载 CronJob

我们使用 client 获取 CronJob。 所有的 client 方法都将 context(用来取消请求)作为其第一个参数, 并将所讨论的 object 作为其最后一个参数。 Get 方法有点特殊, 因为它使用 NamespacedName 作为中间参数(大多数没有中间参数,如下所示)。

最后,许多 client 方法也采用可变参数选项(也就是 “...”)。

	var cronJob batch.CronJob
	if err := r.Get(ctx, req.NamespacedName, &cronJob); err != nil {
		log.Error(err, "unable to fetch CronJob")
		// we'll ignore not-found errors, since they can't be fixed by an immediate
		// requeue (we'll need to wait for a new notification), and we can get them
		// on deleted requests.
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

2: 列出所有 active jobs,并更新状态

要完全更新我们的状态,我们需要列出此 namespace 中属于此 CronJob 的所有 Job。 与 Get 方法类似,我们可以使用 List 方法列出 Job。 注意,我们使用可变参数选项来设置 client.InNamespaceclient.MatchingFields

	var childJobs kbatch.JobList
	if err := r.List(ctx, &childJobs, client.InNamespace(req.Namespace), client.MatchingFields{jobOwnerKey: req.Name}); err != nil {
		log.Error(err, "unable to list child Jobs")
		return ctrl.Result{}, err
	}

当得到所有的 Job 后,我们把 Job 的状态分为 active、successful和 failed, 并跟踪他们最近的运行情况,以便将其记录在 status 中。 请记住,status 应该可以从整体的状态重新构造, 因此从 root object 的状态读取信息通常不是一个好主意。 相反,您应该在每次运行时重新构建它。 这就是我们在这里要做的。

我们可以使用 status conditions 来检查作业是“完成”、成功或失败。 我们将把这种逻辑放在匿名函数中,以使我们的代码更整洁。

	// 查找状态为 active 的 Jobs
	var activeJobs []*kbatch.Job
	var successfulJobs []*kbatch.Job
	var failedJobs []*kbatch.Job
	var mostRecentTime *time.Time // 找到最后一次运行 Job,以便我们更新状态
isJobFinished

如果一项工作的 “succeeded” 或 “failed” 的 Conditions 标记为 “true”,我们认为该工作 “finished”。 Status.conditions 使我们可以向 objects 添加可扩展的状态信息, 其他人和 controller 可以通过检查这些状态信息以确定 Job 完成和健康状况。

	isJobFinished := func(job *kbatch.Job) (bool, kbatch.JobConditionType) {
		for _, c := range job.Status.Conditions {
			if (c.Type == kbatch.JobComplete || c.Type == kbatch.JobFailed) && c.Status == corev1.ConditionTrue {
				return true, c.Type
			}
		}

		return false, ""
	}
getScheduledTimeForJob

我们将使用匿名函数从创建 Job 时添加的 annotation 中获取到 Job 计划执行的时间。

	getScheduledTimeForJob := func(job *kbatch.Job) (*time.Time, error) {
		timeRaw := job.Annotations[scheduledTimeAnnotation]
		if len(timeRaw) == 0 {
			return nil, nil
		}

		timeParsed, err := time.Parse(time.RFC3339, timeRaw)
		if err != nil {
			return nil, err
		}
		return &timeParsed, nil
	}
	// 根据 Job 的状态将 Job 放到不同的切片中, 并获得最近一个 Job
	for i, job := range childJobs.Items {
		_, finishedType := isJobFinished(&job)
		switch finishedType {
		case "": // ongoing
			activeJobs = append(activeJobs, &childJobs.Items[i])
		case kbatch.JobFailed:
			failedJobs = append(failedJobs, &childJobs.Items[i])
		case kbatch.JobComplete:
			successfulJobs = append(successfulJobs, &childJobs.Items[i])
		}

		// 把运行时间存在 annotation,以便我们重新获取他们
		scheduledTimeForJob, err := getScheduledTimeForJob(&job)
		if err != nil {
			log.Error(err, "unable to parse schedule time for child job", "job", &job)
			continue
		}
		// 获取最后一个 Job
		if scheduledTimeForJob != nil {
			if mostRecentTime == nil {
				mostRecentTime = scheduledTimeForJob
			} else if mostRecentTime.Before(*scheduledTimeForJob) {
				mostRecentTime = scheduledTimeForJob
			}
		}
	}

	if mostRecentTime != nil {
		cronJob.Status.LastScheduleTime = &metav1.Time{Time: *mostRecentTime}
	} else {
		cronJob.Status.LastScheduleTime = nil
	}
	cronJob.Status.Active = nil
	for _, activeJob := range activeJobs {
		jobRef, err := ref.GetReference(r.Scheme, activeJob)
		if err != nil {
			log.Error(err, "unable to make reference to active job", "job", activeJob)
			continue
		}
		cronJob.Status.Active = append(cronJob.Status.Active, *jobRef)
	}

在这里,我们将以略高的日志记录级别记录观察到的 Job 数量,以进行调试。 请注意,我们是使用固定消息在键值对中附加额外的信息,而不是使用字符串格式。 这样可以更轻松地过滤和查询日志行。

	log.V(1).Info("job count", "active jobs", len(activeJobs), "successful jobs", len(successfulJobs), "failed jobs", len(failedJobs))

我们根据我们得到的时间更新 CRD 的状态。 和以前一样,我们使用 client。 为了更新 subresource 的状态,我们将使用 client 的 Status().Update 方法

subresource status 会忽略对 spec 的更改, 因此它与其他任何 update 发生冲突的可能性较小, 并且它可以具有单独的权限。

	if err := r.Status().Update(ctx, &cronJob); err != nil {
		log.Error(err, "unable to update CronJob status")
		return ctrl.Result{}, err
	}

一旦我们正确 update 了我们的 status,我们可以确保整体的状态是符合我们在 spec 中指定的。

3: 根据历史记录清理过期 jobs

首先,我们会尝试清理过期的 jobs,以免留下太多闲杂事。

	// 注意:这里是尽量删除,如果删除失败,我们不会为了删除让它们重新排队
	if cronJob.Spec.FailedJobsHistoryLimit != nil {
		// 把失败的 Job 按时间排序
		sort.Slice(failedJobs, func(i, j int) bool {
			if failedJobs[i].Status.StartTime == nil {
				return failedJobs[j].Status.StartTime != nil
			}
			return failedJobs[i].Status.StartTime.Before(failedJobs[j].Status.StartTime)
		})
		// 如果 failedJob 超出 FailedJobsHistoryLimit 就删掉
		for i, job := range failedJobs {
			if int32(i) >= int32(len(failedJobs))-*cronJob.Spec.FailedJobsHistoryLimit {
				break
			}
			if err := r.Delete(ctx, job, client.PropagationPolicy(metav1.DeletePropagationBackground)); client.IgnoreNotFound(err) != nil {
				log.Error(err, "unable to delete old failed job", "job", job)
			} else {
				log.V(0).Info("deleted old failed job", "job", job)
			}
		}
	}
	// 如果 successfulJob 超出 SuccessfulJobsHistoryLimit 就删掉
	if cronJob.Spec.SuccessfulJobsHistoryLimit != nil {
		sort.Slice(successfulJobs, func(i, j int) bool {
			if successfulJobs[i].Status.StartTime == nil {
				return successfulJobs[j].Status.StartTime != nil
			}
			return successfulJobs[i].Status.StartTime.Before(successfulJobs[j].Status.StartTime)
		})
		for i, job := range successfulJobs {
			if int32(i) >= int32(len(successfulJobs))-*cronJob.Spec.SuccessfulJobsHistoryLimit {
				break
			}
			if err := r.Delete(ctx, job, client.PropagationPolicy(metav1.DeletePropagationBackground)); (err) != nil {
				log.Error(err, "unable to delete old successful job", "job", job)
			} else {
				log.V(0).Info("deleted old successful job", "job", job)
			}
		}
	}

4: 检查 Job 是否已被 suspended

如果这个 object 已被 suspended,且我们不想运行任何其他 Jobs,我们会立即 return。 这对调试 Job 的问题非常有用。

	if cronJob.Spec.Suspend != nil && *cronJob.Spec.Suspend {
		log.V(1).Info("cronjob suspended, skipping")
		return ctrl.Result{}, nil
	}

5: 获取到下一次要 schedule 的 Job

如果 Job 没有被暂停,则需要计算下一次 schedule 的 Job,以及是否有尚未处理的 Job。

getNextSchedule

我们将使用有用的 cron 库来计算下一个 scheduled 时间。 我们将从上次 Job 开始的时间计算下一次运行的时间,如果找不到上次运行的时间,就创建一个 CronJob。

如果错过的 Job 数量太多,并且我们没有设置任何 deadlines,我们将释放这个 Job, 以免造成 controller 重启。

否则,我们将只返回错过的 Job(我们将使用最后一个运行的 Job)和下一次要运行的 Job, 以便让我们知道何时该重新进行 reconcile。

	getNextSchedule := func(cronJob *batch.CronJob, now time.Time) (lastMissed time.Time, next time.Time, err error) {
		sched, err := cron.ParseStandard(cronJob.Spec.Schedule)
		if err != nil {
			return time.Time{}, time.Time{}, fmt.Errorf("Unparseable schedule %q: %v", cronJob.Spec.Schedule, err)
		}

		// for optimization purposes, cheat a bit and start from our last observed run time
		// we could reconstitute this here, but there's not much point, since we've
		// just updated it.
		var earliestTime time.Time
		if cronJob.Status.LastScheduleTime != nil {
			earliestTime = cronJob.Status.LastScheduleTime.Time
		} else {
			earliestTime = cronJob.ObjectMeta.CreationTimestamp.Time
		}
		if cronJob.Spec.StartingDeadlineSeconds != nil {
			// controller is not going to schedule anything below this point
			schedulingDeadline := now.Add(-time.Second * time.Duration(*cronJob.Spec.StartingDeadlineSeconds))

			if schedulingDeadline.After(earliestTime) {
				earliestTime = schedulingDeadline
			}
		}
		if earliestTime.After(now) {
			return time.Time{}, sched.Next(now), nil
		}

		starts := 0
		for t := sched.Next(earliestTime); !t.After(now); t = sched.Next(t) {
			lastMissed = t
			// An object might miss several starts. For example, if
			// controller gets wedged on Friday at 5:01pm when everyone has
			// gone home, and someone comes in on Tuesday AM and discovers
			// the problem and restarts the controller, then all the hourly
			// jobs, more than 80 of them for one hourly scheduledJob, should
			// all start running with no further intervention (if the scheduledJob
			// allows concurrency and late starts).
			//
			// However, if there is a bug somewhere, or incorrect clock
			// on controller's server or apiservers (for setting creationTimestamp)
			// then there could be so many missed start times (it could be off
			// by decades or more), that it would eat up all the CPU and memory
			// of this controller. In that case, we want to not try to list
			// all the missed start times.
			starts++
			if starts > 100 {
				// We can't get the most recent times so just return an empty slice
				return time.Time{}, time.Time{}, fmt.Errorf("Too many missed start times (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.")
			}
		}
		return lastMissed, sched.Next(now), nil
	}
	// figure out the next times that we need to create
	// jobs at (or anything we missed).
	missedRun, nextRun, err := getNextSchedule(&cronJob, r.Now())
	if err != nil {
		log.Error(err, "unable to figure out CronJob schedule")
		// we don't really care about requeuing until we get an update that
		// fixes the schedule, so don't return an error
		return ctrl.Result{}, nil
	}

我们把下次执行 Job 的 object 存到 scheduledResult 变量中,知道下次需要需要执行的时间点,然后确定 Job 是否真的需要运行。

	scheduledResult := ctrl.Result{RequeueAfter: nextRun.Sub(r.Now())} // save this so we can re-use it elsewhere
	log = log.WithValues("now", r.Now(), "next run", nextRun)

6: 运行新的 Job, 确定新 Job 没有超过 deadline 时间,且不会被我们 concurrency 规则 block

如果我们错过了一个 Job 的运行时间点,但是 Job 还在 deadline 时间内,我们需要再次运行这个 Job

	if missedRun.IsZero() {
		log.V(1).Info("no upcoming scheduled times, sleeping until next")
		return scheduledResult, nil
	}

	// 确地没有超过 StartingDeadlineSeconds 时间
	log = log.WithValues("current run", missedRun)
	tooLate := false
	if cronJob.Spec.StartingDeadlineSeconds != nil {
		tooLate = missedRun.Add(time.Duration(*cronJob.Spec.StartingDeadlineSeconds) * time.Second).Before(r.Now())
	}
	if tooLate {
		log.V(1).Info("missed starting deadline for last run, sleeping till next")
		// TODO(directxman12): events
		return scheduledResult, nil
	}

如果我们不得不运行一个 Job,那么需要等现有 Job 完成之后,替换现有 Job 或添加一个新 Job。 如果我们的信息由于 cache 的延迟而过时,那么我们将在获得 up-to-date 信息时重新排队。

	// 判断如何运行此 Job -- 并发策略可能会禁止我们同时运行多个 Job
	if cronJob.Spec.ConcurrencyPolicy == batch.ForbidConcurrent && len(activeJobs) > 0 {
		log.V(1).Info("concurrency policy blocks concurrent runs, skipping", "num active", len(activeJobs))
		return scheduledResult, nil
	}

	// ...或者希望我们替换一个 Job...
	if cronJob.Spec.ConcurrencyPolicy == batch.ReplaceConcurrent {
		for _, activeJob := range activeJobs {
			// 我们不关心 Job 是否已经被删除
			if err := r.Delete(ctx, activeJob, client.PropagationPolicy(metav1.DeletePropagationBackground)); client.IgnoreNotFound(err) != nil {
				log.Error(err, "unable to delete active job", "job", activeJob)
				return ctrl.Result{}, err
			}
		}
	}

一旦弄清楚如何处理现有 Job,我们便会真正创建所需的 Job。

constructJobForCronJob

我们需要基于 CronJob 的 template 构建 Job, 我们将从 template 复制 spec,然后复制一些基本的字段。

然后,我们将设置 “scheduled time” annotation, 以便我们可以在每次 reconcile 时重新构造我们的 LastScheduleTime 字段。

最后,我们需要设置 owner reference。 这使 Kubernetes 垃圾收集器在删除 CronJob 时清理 Job, 并允许 controller-runtime 找出给定 Job 被更改(added,deleted,completes)时需要协调哪个 CronJob。

	constructJobForCronJob := func(cronJob *batch.CronJob, scheduledTime time.Time) (*kbatch.Job, error) {
		// We want job names for a given nominal start time to have a deterministic name to avoid the same job being created twice
		// 这里防止 Job 名称冲突
		name := fmt.Sprintf("%s-%d", cronJob.Name, scheduledTime.Unix())

		job := &kbatch.Job{
			ObjectMeta: metav1.ObjectMeta{
				Labels:      make(map[string]string),
				Annotations: make(map[string]string),
				Name:        name,
				Namespace:   cronJob.Namespace,
			},
			Spec: *cronJob.Spec.JobTemplate.Spec.DeepCopy(),
		}
		for k, v := range cronJob.Spec.JobTemplate.Annotations {
			job.Annotations[k] = v
		}
		job.Annotations[scheduledTimeAnnotation] = scheduledTime.Format(time.RFC3339)
		for k, v := range cronJob.Spec.JobTemplate.Labels {
			job.Labels[k] = v
		}
		if err := ctrl.SetControllerReference(cronJob, job, r.Scheme); err != nil {
			return nil, err
		}

		return job, nil
	}
	// actually make the job...
	job, err := constructJobForCronJob(&cronJob, missedRun)
	if err != nil {
		log.Error(err, "unable to construct job from template")
		// don't bother requeuing until we get a change to the spec
		return scheduledResult, nil
	}

	// 在 k8s 集群上启动一个 Job Resource
	if err := r.Create(ctx, job); err != nil {
		log.Error(err, "unable to create Job for CronJob", "job", job)
		return ctrl.Result{}, err
	}

	log.V(1).Info("created Job for CronJob run", "job", job)

7: 如果 Job 正在运行或者它应该下次运行,请重新排队

最后,我们将返回上面准备的结果,这表示我们想在下次运行的 Job 需要重新排队时使用。 这被视为 maximum deadline -- 如果两者之间有其他变化,例如我们的 Job 开始或完成, 或者我们得到一个修改信息,我们需要尽快 reconcile。

	// 当 Job 运行的时候把下次需要运行的 Job object 放到队列中,并更新状态
	return scheduledResult, nil
}

Setup

最后,我们将更新我们的设置。 为了使我们的 reconciler 可以按其 owner 快速查找 Jobs,我们需要一个 index。 我们声明一个 index key,以后可以与 client 一起使用它作为伪字段名称,然后描述如何从 Job 对象中提取 index key。 索引器将自动为我们处理 namespace,因此,如果 Job 有一个 CronJob owner,我们只需提取所有者名称。

此外,我们会通知 manager 该 controller 拥有一些 Jobs, 以便在作业发生更改,删除等情况时,它将自动在基础 CronJob 上调用 Reconcile。

var (
	jobOwnerKey = ".metadata.controller"
	apiGVStr    = batch.GroupVersion.String()
)

func (r *CronJobReconciler) SetupWithManager(mgr ctrl.Manager) error {
	// set up a real clock, since we're not in a test
	if r.Clock == nil {
		r.Clock = realClock{}
	}

	if err := mgr.GetFieldIndexer().IndexField(&kbatch.Job{}, jobOwnerKey, func(rawObj runtime.Object) []string {
		// grab the job object, extract the owner...
		job := rawObj.(*kbatch.Job)
		owner := metav1.GetControllerOf(job)
		if owner == nil {
			return nil
		}
		// ...make sure it's a CronJob...
		if owner.APIVersion != apiGVStr || owner.Kind != "CronJob" {
			return nil
		}

		// ...and if so, return it
		return []string{owner.Name}
	}); err != nil {
		return err
	}

	return ctrl.NewControllerManagedBy(mgr).
		For(&batch.CronJob{}).
		Owns(&kbatch.Job{}).
		Complete(r)
}

很明显,我们现在有了一个可以运行 controller ,让我们针对集群进行测试,然后,如果我们没有任何问题,把它部署到集群中! 让我们针对集群进行测试,然后,如果我们没有任何问题,请对其进行部署!

main 文件修改什么?

还记得最开始我们说要回到 main.go 文件么? 让我们看看 main.go 新增了什么。

project/main.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Imports
package main

import (
	"flag"
	"os"

	"k8s.io/apimachinery/pkg/runtime"
	clientgoscheme "k8s.io/client-go/kubernetes/scheme"
	_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/log/zap"

	batchv1 "tutorial.kubebuilder.io/project/api/v1"
	"tutorial.kubebuilder.io/project/controllers"
	// +kubebuilder:scaffold:imports
)

首先要注意的是 kubebuilder 已将新 API group 的 package(batchv1)添加到我们的 scheme 中了。 这意味着我们可以在 controller 中使用这些对象了。

如果要使用任何其他 CRD,则必须以相同的方式添加其 scheme。 诸如 Job 之类的内置类型的 scheme 是由 clientgoscheme 添加的。

var (
	scheme   = runtime.NewScheme()
	setupLog = ctrl.Log.WithName("setup")
)

func init() {
	_ = clientgoscheme.AddToScheme(scheme)

	_ = batchv1.AddToScheme(scheme)
	// +kubebuilder:scaffold:scheme
}

另一处更改是 kubebuilder 添加了一个块代码, 该代码调用了我们的 CronJob controller 的 SetupWithManager 方法。

func main() {
old stuff
	var metricsAddr string
	var enableLeaderElection bool
	flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
	flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,
		"Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.")
	flag.Parse()

	ctrl.SetLogger(zap.New(zap.UseDevMode(true)))

	mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
		Scheme:             scheme,
		MetricsBindAddress: metricsAddr,
		LeaderElection:     enableLeaderElection,
		Port:               9443,
	})
	if err != nil {
		setupLog.Error(err, "unable to start manager")
		os.Exit(1)
	}
	if err = (&controllers.CronJobReconciler{
		Client: mgr.GetClient(),
		Log:    ctrl.Log.WithName("controllers").WithName("Captain"),
		Scheme: mgr.GetScheme(),
	}).SetupWithManager(mgr); err != nil {
		setupLog.Error(err, "unable to create controller", "controller", "Captain")
		os.Exit(1)
	}

我们还会为我们的 type 设置 webhook。 我们只需要将它们添加到 manager 中就行。 由于我们可能想单独运行 webhook,或者在本地测试 controller 不运行它们,因此我们将其是否启动放在环境变量里面。

如不需要启动 webhook,只需要设置 ENABLE_WEBHOOKS = false 即可。

	if os.Getenv("ENABLE_WEBHOOKS") != "false" {
		if err = (&batchv1.CronJob{}).SetupWebhookWithManager(mgr); err != nil {
			setupLog.Error(err, "unable to create webhook", "webhook", "Captain")
			os.Exit(1)
		}
	}
	// +kubebuilder:scaffold:builder
old stuff
	setupLog.Info("starting manager")
	if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
		setupLog.Error(err, "problem running manager")
		os.Exit(1)
	}
}

现在,我们可以实现我们的 controller 了

实现 defaulting webhooks 和 validating webhooks

如果你想为你的 CRD 实现 admission webhooks,你只需要实现 Defaulter 和 (或) Validator 接口即可。

其余的东西 Kubebuilder 会为你实现,比如:

  1. 创建一个 webhook server
  2. 确保这个 server 被添加到 manager 中
  3. 为你的 webhooks 创建一个 handlers
  4. 将每个 handler 以 path 形式注册到你的 server 中

首先,让我们为 CRD(CronJob)创建 webhooks,我们需要用到 --defaulting--programmatic-validation 参数(因为我们的测试项目将使用 defaulting webhooks 和 validating webhooks):

kubebuilder create webhook --group batch --version v1 --kind CronJob --defaulting --programmatic-validation

这将创建 Webhook 功能相关的方法,并在 main.go 中注册 Webhook 到你的 manager 中。

project/api/v1/cronjob_webhook.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Go imports
package v1

import (
	"github.com/robfig/cron"
	apierrors "k8s.io/apimachinery/pkg/api/errors"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/apimachinery/pkg/runtime/schema"
	validationutils "k8s.io/apimachinery/pkg/util/validation"
	"k8s.io/apimachinery/pkg/util/validation/field"
	ctrl "sigs.k8s.io/controller-runtime"
	logf "sigs.k8s.io/controller-runtime/pkg/runtime/log"
	"sigs.k8s.io/controller-runtime/pkg/webhook"
)

下一步,为我们的 webhooks 创建一个 logger。

var cronjoblog = logf.Log.WithName("cronjob-resource")

然后, 我们通过 manager 构建 webhook。

func (r *CronJob) SetupWebhookWithManager(mgr ctrl.Manager) error {
	return ctrl.NewWebhookManagedBy(mgr).
		For(r).
		Complete()
}

注意,下面我们使用 kubebuilder 的 marker 语句生成 webhook manifests(配置 webhook 的 yaml 文件), 这个 marker 会生成一个 mutating Webhook 的 manifests。

关于每个参数的解释都可以在这里找到。

// +kubebuilder:webhook:path=/mutate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=true,failurePolicy=fail,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=create;update,versions=v1,name=mcronjob.kb.io

我们使用 webhook.Defaulter 接口将 webhook 的默认值设置为我们的 CRD, 这将自动生成一个 Webhook(defaulting webhooks),并调用它的 Default 方法。

Default 方法用来改变接受到的内容,并设置默认值。

var _ webhook.Defaulter = &CronJob{}

// Default implements webhook.Defaulter so a webhook will be registered for the type
func (r *CronJob) Default() {
	cronjoblog.Info("default", "name", r.Name)

	if r.Spec.ConcurrencyPolicy == "" {
		r.Spec.ConcurrencyPolicy = AllowConcurrent
	}
	if r.Spec.Suspend == nil {
		r.Spec.Suspend = new(bool)
	}
	if r.Spec.SuccessfulJobsHistoryLimit == nil {
		r.Spec.SuccessfulJobsHistoryLimit = new(int32)
		*r.Spec.SuccessfulJobsHistoryLimit = 3
	}
	if r.Spec.FailedJobsHistoryLimit == nil {
		r.Spec.FailedJobsHistoryLimit = new(int32)
		*r.Spec.FailedJobsHistoryLimit = 1
	}
}

这个 marker 负责生成一个 validating webhook manifest。

// TODO(user): change verbs to "verbs=create;update;delete" if you want to enable deletion validation.
// +kubebuilder:webhook:verbs=create;update,path=/validate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=false,failurePolicy=fail,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,versions=v1,name=vcronjob.kb.io

我们可以通过声明式验证(declarative validation)来验证我们的 CRD, 通常情况下,声明式验证就足够了,但是有时更高级的用例需要进行复杂的验证。

例如,我们将在后面看到我们使验证(declarative validation)来验证 cron schedule(是否是 * * * * * 这种格式) 的格式是否正确, 而不是编写复杂的正则表达式来验证。

如果实现了 webhook.Validator 接口,将会自动生成一个 Webhook(validating webhooks) 来调用我们验证方法。

ValidateCreateValidateUpdateValidateDelete 方法分别在创建,更新和删除 resrouces 时验证它们收到的信息。 我们将 ValidateCreateValidateUpdate 方法分开,因为某些字段可能是固定不变的, 他们只能在 ValidateCreate 方法中被调用,这样会提高一些安全性, ValidateDeleteValidateUpdate 方法也被分开,以便在进行删除操作时进行单独的验证。 但是在这里,我们在 ValidateDelete 中什么也没有做, 只是对 ValidateCreateValidateUpdate 使用同一个方法进行了验证, 因为我们不需要在删除时验证任何内容。

var _ webhook.Validator = &CronJob{}

// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *CronJob) ValidateCreate() error {
	cronjoblog.Info("validate create", "name", r.Name)

	return r.validateCronJob()
}

// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type
func (r *CronJob) ValidateUpdate(old runtime.Object) error {
	cronjoblog.Info("validate update", "name", r.Name)

	return r.validateCronJob()
}

// ValidateDelete implements webhook.Validator so a webhook will be registered for the type
func (r *CronJob) ValidateDelete() error {
	cronjoblog.Info("validate delete", "name", r.Name)

	// TODO(user): fill in your validation logic upon object deletion.
	return nil
}

验证 CronJob 的 name 和 spec 字段

func (r *CronJob) validateCronJob() error {
	var allErrs field.ErrorList
	if err := r.validateCronJobName(); err != nil {
		allErrs = append(allErrs, err)
	}
	if err := r.validateCronJobSpec(); err != nil {
		allErrs = append(allErrs, err)
	}
	if len(allErrs) == 0 {
		return nil
	}

	return apierrors.NewInvalid(
		schema.GroupKind{Group: "batch.tutorial.kubebuilder.io", Kind: "CronJob"},
		r.Name, allErrs)
}

一些字段是用 OpenAPI schema 方式进行验证, 你可以在 设计API 中找到有关 kubebuilder 的验证 markers(注释前缀为//+kubebuilder:validation)。 你也可以通过运行 controller-gen crd -w 或 在这里 找到所有关于使用 markers 验证的格式信息。

func (r *CronJob) validateCronJobSpec() *field.Error {
	// The field helpers from the kubernetes API machinery help us return nicely
	// structured validation errors.
	return validateScheduleFormat(
		r.Spec.Schedule,
		field.NewPath("spec").Child("schedule"))
}

我们在这里验证 cron schedule 的格式

func validateScheduleFormat(schedule string, fldPath *field.Path) *field.Error {
	if _, err := cron.ParseStandard(schedule); err != nil {
		return field.Invalid(fldPath, schedule, err.Error())
	}
	return nil
}
Validate object name

验证字符串字段的长度可以以声明方式完成。

但是 ObjectMeta.Name 字段是在 apimachinery 库下的 package 中定义的, 因此我们无法以声明性的方式对其进行验证。

func (r *CronJob) validateCronJobName() *field.Error {
	if len(r.ObjectMeta.Name) > validationutils.DNS1035LabelMaxLength-11 {
		// The job name length is 63 character like all Kubernetes objects
		// (which must fit in a DNS subdomain). The cronjob controller appends
		// a 11-character suffix to the cronjob (`-$TIMESTAMP`) when creating
		// a job. The job name length limit is 63 characters. Therefore cronjob
		// names must have length <= 63-11=52. If we don't validate this here,
		// then job creation will fail later.
		return field.Invalid(field.NewPath("metadata").Child("name"), r.Name, "must be no more than 52 characters")
	}
	return nil
}

运行和部署 controller

为了测试,我们可以在本地运行 controller。像快速入门中说的,在运行 controller 之前,我们需要先安装 CRD。

一下命令会通过 controller-tools 更新我们的 YAML manifests 文件。

make install

现在,我们已经安装了 CRD,我们可以针对我们的集群运行 controller了,运行 controller 的证书和我们连接 k8s 集群的证书是同一个 ,因此我们现在不必担心 RBAC 权限问题。

在另一个终端中,运行:

make run ENABLE_WEBHOOKS=false

您应该会从 controller 中看到有关启动的日志,但是它目前还没有做任何事。

此时,我们需要创建一个 CronJob 进行测试。让我们写个简单的例子放到 config/samples/batch_v1_cronjob.yaml 中,并运行该例子:

apiVersion: batch.tutorial.kubebuilder.io/v1
kind: CronJob
metadata:
  name: cronjob-sample
spec:
  schedule: "*/1 * * * *"
  startingDeadlineSeconds: 60
  concurrencyPolicy: Allow # explicitly specify, but Allow is also default.
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure
kubectl create -f config/samples/batch_v1_cronjob.yaml

执行下面的命令,你应该会看到一连串的信息。如果你使用 -w 参数,应该会看到你的 cronjob 正在运行,并且正在更新状态:

kubectl get cronjob.batch.tutorial.kubebuilder.io -o yaml
kubectl get job

现在, 我们已经可以在集群中运行 CronJob 了。 停掉 make run 命令,然后运行

make docker-build docker-push IMG=<some-registry>/<project-name>:tag
make deploy IMG=<some-registry>/<project-name>:tag

此时 Webhook 并不能正确运行,如果想 Webhook 正确运行,请参运行 Webhook

注意

  1. 为了解决镜像拉不到的问题,可以将 Dockerfile 中的 FROM gcr.io/distroless/static:nonroot 修改为 FROM golang:1.13 或任意可正常拉取到的镜像。将 /config/default/manager_auth_proxy_patch.yaml 文件中的 gcr.io/kubebuilder/kube-rbac-proxy:v0.4.1 改为 xuejipeng/learn:kube-rbac-proxy-v0.4.1

  2. 为了加速构建,我们可以把 Makefile 中 docker-build 后面的 test 去掉

如果像以前一样再次get cronjobs ,我们应该可以看到 controller 再次运行!

部署 cert manager

我们建议使用 cert manager 为 Webhook 服务器配置证书, 如果使用其他方法,将证书放在正确的位置,它们也会起作用。

您可以根据 the cert manager documentation 安装 cert manager。

Cert manager 还具有一个名为 CA injector 的组件, 该组件负责将 CA bundle 注入到 Mutating | ValidatingWebhookConfiguration 中。

为此,您需要在 Mutating | ValidatingWebhookConfiguration 对象中使用 key 为 certmanager.k8s.io/inject-ca-from 的 annotation, annotation 的 value 应该以 <certificate-namespace>/<certificate-name> 的格式指向现有证书 CR 实例。

这是带有 Mutating | ValidatingWebhookConfiguration 对象的 kustomize 文件补丁。

# This patch add annotation to admission webhook config and
# the variables $(CERTIFICATE_NAMESPACE) and $(CERTIFICATE_NAME) will be substituted by kustomize.
apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
  name: mutating-webhook-configuration
  annotations:
    certmanager.k8s.io/inject-ca-from: $(CERTIFICATE_NAMESPACE)/$(CERTIFICATE_NAME)
---
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
  name: validating-webhook-configuration
  annotations:
    certmanager.k8s.io/inject-ca-from: $(CERTIFICATE_NAMESPACE)/$(CERTIFICATE_NAME)

部署 Admission Webhooks

Kind Cluster

建议使用 kind 集群开发 Webhook,以加快迭代速度。 为什么?

  • 你可以在1分钟内在本地启动多节点群集。
  • 你可以在几秒钟内将其拆除。
  • 你无需将 images 推送到远程镜像仓库。

Cert Manager

你需要 按此 安装 cert manager bundle。你只需要安装就好,对于证书的申请 kubebuilder 会帮你做。

Build your image

运行以下命令以在本地生成 image。

make docker-build

如果你使用 kind 创建的群集,则无需将 image 推送到远程镜像仓库。你可以将本地的 image 直接加载到 kind 创建的群集:

kind load docker-image your-image-name:your-tag

部署 Webhooks

您可以通过 kustomize 启动 webhook 和 cert manager 配置,将 config/default/kustomization.yaml 改成如下所示:

# Adds namespace to all resources.
namespace: project-system

# Value of this field is prepended to the
# names of all resources, e.g. a deployment named
# "wordpress" becomes "alices-wordpress".
# Note that it should also match with the prefix (text before '-') of the namespace
# field above.
namePrefix: project-

# Labels to add to all resources and selectors.
#commonLabels:
#  someName: someValue

bases:
- ../crd
- ../rbac
- ../manager
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
- ../webhook
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus

patchesStrategicMerge:
  # Protect the /metrics endpoint by putting it behind auth.
  # If you want your controller-manager to expose the /metrics
  # endpoint w/o any authn/z, please comment the following line.
- manager_auth_proxy_patch.yaml

# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
- manager_webhook_patch.yaml

# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'.
# Uncomment 'CERTMANAGER' sections in crd/kustomization.yaml to enable the CA injection in the admission webhooks.
# 'CERTMANAGER' needs to be enabled to use ca injection
- webhookcainjection_patch.yaml

# the following config is for teaching kustomize how to do var substitution
vars:
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.
- name: CERTIFICATE_NAMESPACE # namespace of the certificate CR
  objref:
    kind: Certificate
    group: cert-manager.io
    version: v1alpha2
    name: serving-cert # this name should match the one in certificate.yaml
  fieldref:
    fieldpath: metadata.namespace
- name: CERTIFICATE_NAME
  objref:
    kind: Certificate
    group: cert-manager.io
    version: v1alpha2
    name: serving-cert # this name should match the one in certificate.yaml
- name: SERVICE_NAMESPACE # namespace of the service
  objref:
    kind: Service
    version: v1
    name: webhook-service
  fieldref:
    fieldpath: metadata.namespace
- name: SERVICE_NAME
  objref:
    kind: Service
    version: v1
    name: webhook-service

现在你可以通过执行下面的命令将它们部署到你的集群中

make docker-build docker-push IMG=<some-registry>/<project-name>:tag
make deploy IMG=<some-registry>/<project-name>:tag

稍等片刻,直到出现 webhook pod 启动并且证书被提供。它通常在1分钟内完成。

现在,您可以创建一个有效的 CronJob 来测试你的 webhooks,创建应该成功完成。

kubectl create -f config/samples/batch_v1_cronjob.yaml

您也可以尝试创建一个无效的 CronJob (例如,使用格式错误的 cron schedule 字段),您应该看到创建失败并显示 validation error。

结语

至此,我们已经有了 CronJob controller 的相当完整的实现,并利用了KubeBuilder 的大多数功能。

如果需要更多内容,请转到 Multi-Version 指南l 以了解如何向项目添加新的 API version。

此外,您可以自行尝试执行以下步骤 -- 很快我们将为它们提供一个教程:

  • 编写单元/整合测试(使用 [envtest])
  • kubectl get 命令添加额外字段 [envtest]: https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest

Tutorial: Multi-Version API

Most projects start out with an alpha API that changes release to release. However, eventually, most projects will need to move to a more stable API. Once your API is stable though, you can’t make breaking changes to it. That’s where API versions come into play.

Let’s make some changes to the CronJob API spec and make sure all the different versions are supported by our CronJob project.

If you haven’t already, make sure you’ve gone through the base CronJob Tutorial.

Next, let’s figure out what changes we want to make...

Changing things up

A fairly common change in a Kubernetes API is to take some data that used to be unstructured or stored in some special string format, and change it to structured data. Our schedule field fits the bill quite nicely for this -- right now, in v1, our schedules look like

schedule: "*/1 * * * *"

That’s a pretty textbook example of a special string format (it’s also pretty unreadable unless you’re a Unix sysadmin).

Let’s make it a bit more structured. According to the our CronJob code, we support “standard” Cron format.

In Kubernetes, all versions must be safely round-tripable through each other. This means that if we convert from version 1 to version 2, and then back to version 1, we must not lose information. Thus, any change we make to our API must be compatible with whatever we supported in v1, and also need to make sure anything we add in v2 is supported in v2. In some cases, this means we need to add new fields to v1, but in our case, we won’t have to, since we’re not adding new functionality.

Keeping all that in mind, let’s convert our example above to be slightly more structured:

schedule:
  minute: */1

Now, at least, we’ve got labels for each of our fields, but we can still easily support all the different syntax for each field.

We’ll need a new API version for this change. Let’s call it v2:

kubebuilder create api --group batch --version v2 --kind CronJob

Now, let’s copy over our existing types, and make the change:

project/api/v2/cronjob_types.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Since we’re in a v2 package, controller-gen will assume this is for the v2 version automatically. We could override that with the +versionName marker.

package v2
Imports
import (
	batchv1beta1 "k8s.io/api/batch/v1beta1"
	corev1 "k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.

We’ll leave our spec largely unchanged, except to change the schedule field to a new type.

// CronJobSpec defines the desired state of CronJob
type CronJobSpec struct {
	// The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
	Schedule CronSchedule `json:"schedule"`
The rest of Spec
	// +kubebuilder:validation:Minimum=0

	// Optional deadline in seconds for starting the job if it misses scheduled
	// time for any reason.  Missed jobs executions will be counted as failed ones.
	// +optional
	StartingDeadlineSeconds *int64 `json:"startingDeadlineSeconds,omitempty"`

	// Specifies how to treat concurrent executions of a Job.
	// Valid values are:
	// - "Allow" (default): allows CronJobs to run concurrently;
	// - "Forbid": forbids concurrent runs, skipping next run if previous run hasn't finished yet;
	// - "Replace": cancels currently running job and replaces it with a new one
	// +optional
	ConcurrencyPolicy ConcurrencyPolicy `json:"concurrencyPolicy,omitempty"`

	// This flag tells the controller to suspend subsequent executions, it does
	// not apply to already started executions.  Defaults to false.
	// +optional
	Suspend *bool `json:"suspend,omitempty"`

	// Specifies the job that will be created when executing a CronJob.
	JobTemplate batchv1beta1.JobTemplateSpec `json:"jobTemplate"`

	// +kubebuilder:validation:Minimum=0

	// The number of successful finished jobs to retain.
	// This is a pointer to distinguish between explicit zero and not specified.
	// +optional
	SuccessfulJobsHistoryLimit *int32 `json:"successfulJobsHistoryLimit,omitempty"`

	// +kubebuilder:validation:Minimum=0

	// The number of failed finished jobs to retain.
	// This is a pointer to distinguish between explicit zero and not specified.
	// +optional
	FailedJobsHistoryLimit *int32 `json:"failedJobsHistoryLimit,omitempty"`
}

Next, we’ll need to define a type to hold our schedule. Based on our proposed YAML above, it’ll have a field for each corresponding Cron “field”.

// describes a Cron schedule.
type CronSchedule struct {
	// specifies the minute during which the job executes.
	// +optional
	Minute *CronField `json:"minute,omitempty"`
	// specifies the hour during which the job executes.
	// +optional
	Hour *CronField `json:"hour,omitempty"`
	// specifies the day of the month during which the job executes.
	// +optional
	DayOfMonth *CronField `json:"dayOfMonth,omitempty"`
	// specifies the month during which the job executes.
	// +optional
	Month *CronField `json:"month,omitempty"`
	// specifies the day of the week during which the job executes.
	// +optional
	DayOfWeek *CronField `json:"dayOfWeek,omitempty"`
}

Finally, we’ll define a wrapper type to represent a field. We could attach additional validation to this field, but for now we’ll just use it for documentation purposes.

// represents a Cron field specifier.
type CronField string
Other Types

All the other types will stay the same as before.

// ConcurrencyPolicy describes how the job will be handled.
// Only one of the following concurrent policies may be specified.
// If none of the following policies is specified, the default one
// is AllowConcurrent.
// +kubebuilder:validation:Enum=Allow;Forbid;Replace
type ConcurrencyPolicy string

const (
	// AllowConcurrent allows CronJobs to run concurrently.
	AllowConcurrent ConcurrencyPolicy = "Allow"

	// ForbidConcurrent forbids concurrent runs, skipping next run if previous
	// hasn't finished yet.
	ForbidConcurrent ConcurrencyPolicy = "Forbid"

	// ReplaceConcurrent cancels currently running job and replaces it with a new one.
	ReplaceConcurrent ConcurrencyPolicy = "Replace"
)

// CronJobStatus defines the observed state of CronJob
type CronJobStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	// A list of pointers to currently running jobs.
	// +optional
	Active []corev1.ObjectReference `json:"active,omitempty"`

	// Information when was the last time the job was successfully scheduled.
	// +optional
	LastScheduleTime *metav1.Time `json:"lastScheduleTime,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status

// CronJob is the Schema for the cronjobs API
type CronJob struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   CronJobSpec   `json:"spec,omitempty"`
	Status CronJobStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// CronJobList contains a list of CronJob
type CronJobList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []CronJob `json:"items"`
}

func init() {
	SchemeBuilder.Register(&CronJob{}, &CronJobList{})
}

Storage Versions

project/api/v1/cronjob_types.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

package v1
Imports
import (
	batchv1beta1 "k8s.io/api/batch/v1beta1"
	corev1 "k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.
old stuff
// CronJobSpec defines the desired state of CronJob
type CronJobSpec struct {
	// +kubebuilder:validation:MinLength=0

	// The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
	Schedule string `json:"schedule"`

	// +kubebuilder:validation:Minimum=0

	// Optional deadline in seconds for starting the job if it misses scheduled
	// time for any reason.  Missed jobs executions will be counted as failed ones.
	// +optional
	StartingDeadlineSeconds *int64 `json:"startingDeadlineSeconds,omitempty"`

	// Specifies how to treat concurrent executions of a Job.
	// Valid values are:
	// - "Allow" (default): allows CronJobs to run concurrently;
	// - "Forbid": forbids concurrent runs, skipping next run if previous run hasn't finished yet;
	// - "Replace": cancels currently running job and replaces it with a new one
	// +optional
	ConcurrencyPolicy ConcurrencyPolicy `json:"concurrencyPolicy,omitempty"`

	// This flag tells the controller to suspend subsequent executions, it does
	// not apply to already started executions.  Defaults to false.
	// +optional
	Suspend *bool `json:"suspend,omitempty"`

	// Specifies the job that will be created when executing a CronJob.
	JobTemplate batchv1beta1.JobTemplateSpec `json:"jobTemplate"`

	// +kubebuilder:validation:Minimum=0

	// The number of successful finished jobs to retain.
	// This is a pointer to distinguish between explicit zero and not specified.
	// +optional
	SuccessfulJobsHistoryLimit *int32 `json:"successfulJobsHistoryLimit,omitempty"`

	// +kubebuilder:validation:Minimum=0

	// The number of failed finished jobs to retain.
	// This is a pointer to distinguish between explicit zero and not specified.
	// +optional
	FailedJobsHistoryLimit *int32 `json:"failedJobsHistoryLimit,omitempty"`
}

// ConcurrencyPolicy describes how the job will be handled.
// Only one of the following concurrent policies may be specified.
// If none of the following policies is specified, the default one
// is AllowConcurrent.
// +kubebuilder:validation:Enum=Allow;Forbid;Replace
type ConcurrencyPolicy string

const (
	// AllowConcurrent allows CronJobs to run concurrently.
	AllowConcurrent ConcurrencyPolicy = "Allow"

	// ForbidConcurrent forbids concurrent runs, skipping next run if previous
	// hasn't finished yet.
	ForbidConcurrent ConcurrencyPolicy = "Forbid"

	// ReplaceConcurrent cancels currently running job and replaces it with a new one.
	ReplaceConcurrent ConcurrencyPolicy = "Replace"
)

// CronJobStatus defines the observed state of CronJob
type CronJobStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	// A list of pointers to currently running jobs.
	// +optional
	Active []corev1.ObjectReference `json:"active,omitempty"`

	// Information when was the last time the job was successfully scheduled.
	// +optional
	LastScheduleTime *metav1.Time `json:"lastScheduleTime,omitempty"`
}

Since we’ll have more than one version, we’ll need to mark a storage version. This is the version that the Kubernetes API server uses to store our data. We’ll chose the v1 version for our project.

We’ll use the +kubebuilder:storageversion to do this.

Note that multiple versions may exist in storage if they were written before the storage version changes -- changing the storage version only affects how objects are created/updated after the change.

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:storageversion

// CronJob is the Schema for the cronjobs API
type CronJob struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   CronJobSpec   `json:"spec,omitempty"`
	Status CronJobStatus `json:"status,omitempty"`
}
old stuff
// +kubebuilder:object:root=true

// CronJobList contains a list of CronJob
type CronJobList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []CronJob `json:"items"`
}

func init() {
	SchemeBuilder.Register(&CronJob{}, &CronJobList{})
}

Now that we’ve got our types in place, we’ll need to set up conversion...

Hubs, spokes, and other wheel metaphors

Since we now have two different versions, and users can request either version, we’ll have to define a way to convert between our version. For CRDs, this is done using a webhook, similar to the defaulting and validating webhooks we defined in the base tutorial. Like before, controller-runtime will help us wire together the nitty-gritty bits, we just have to implement the actual conversion.

Before we do that, though, we’ll need to understand how controller-runtime thinks about versions. Namely:

Complete graphs are insufficiently nautical

A simple approach to defining conversion might be to define conversion functions to convert between each of our versions. Then, whenever we need to convert, we’d look up the appropriate function, and call it to run the conversion.

This works fine when we just have two versions, but what if we had 4 types? 8 types? That’d be a lot of conversion functions.

Instead, controller-runtime models conversion in terms of a “hub and spoke” model -- we mark one version as the “hub”, and all other versions just define conversion to and from the hub:

becomes

Then, if we have to convert between two non-hub versions, we first convert to the hub version, and then to our desired version:

This cuts down on the number of conversion functions that we have to define, and is modeled off of what Kubernetes does internally.

What does that have to do with Webhooks?

When API clients, like kubectl or your controller, request a particular version of your resource, the Kubernetes API server needs to return a result that’s of that version. However, that version might not match the version stored by the API server.

In that case, the API server needs to know how to convert between the desired version and the stored version. Since the conversions aren’t built in for CRDs, the Kubernetes API server calls out to a webhook to do the conversion instead. For KubeBuilder, this webhook is implemented by controller-runtime, and performs the hub-and-spoke conversions that we discussed above.

Now that we have the model for conversion down pat, we can actually implement our conversions.

Implementing conversion

With our model for conversion in place, it’s time to actually implement the conversion functions. We’ll put them in a file called cronjob_conversion.go next to our cronjob_types.go file, to avoid cluttering up our main types file with extra functions.

Hub...

First, we’ll implement the hub. We’ll choose the v1 version as the hub:

project/api/v1/cronjob_conversion.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

package v1

Implementing the hub method is pretty easy -- we just have to add an empty method called Hub() to serve as a marker. We could also just put this inline in our cronjob_types.go file.

// Hub marks this type as a conversion hub.
func (*CronJob) Hub() {}

... and Spokes

Then, we’ll implement our spoke, the v2 version:

project/api/v2/cronjob_conversion.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

package v2
Imports

For imports, we’ll need the controller-runtime conversion package, plus the API version for our hub type (v1), and finally some of the standard packages.

import (
	"fmt"
	"strings"

	"sigs.k8s.io/controller-runtime/pkg/conversion"

	"tutorial.kubebuilder.io/project/api/v1"
)

Our “spoke” versions need to implement the Convertible interface. Namely, they’ll need ConvertTo and ConvertFrom methods to convert to/from the hub version.

ConvertTo is expected to modify its argument to contain the converted object. Most of the conversion is straightforward copying, except for converting our changed field.

// ConvertTo converts this CronJob to the Hub version (v1).
func (src *CronJob) ConvertTo(dstRaw conversion.Hub) error {
	dst := dstRaw.(*v1.CronJob)

	sched := src.Spec.Schedule
	scheduleParts := []string{"*", "*", "*", "*", "*"}
	if sched.Minute != nil {
		scheduleParts[0] = string(*sched.Minute)
	}
	if sched.Hour != nil {
		scheduleParts[1] = string(*sched.Hour)
	}
	if sched.DayOfMonth != nil {
		scheduleParts[2] = string(*sched.DayOfMonth)
	}
	if sched.Month != nil {
		scheduleParts[3] = string(*sched.Month)
	}
	if sched.DayOfWeek != nil {
		scheduleParts[4] = string(*sched.DayOfWeek)
	}
	dst.Spec.Schedule = strings.Join(scheduleParts, " ")
rote conversion

The rest of the conversion is pretty rote.

	// ObjectMeta
	dst.ObjectMeta = src.ObjectMeta

	// Spec
	dst.Spec.StartingDeadlineSeconds = src.Spec.StartingDeadlineSeconds
	dst.Spec.ConcurrencyPolicy = v1.ConcurrencyPolicy(src.Spec.ConcurrencyPolicy)
	dst.Spec.Suspend = src.Spec.Suspend
	dst.Spec.JobTemplate = src.Spec.JobTemplate
	dst.Spec.SuccessfulJobsHistoryLimit = src.Spec.SuccessfulJobsHistoryLimit
	dst.Spec.FailedJobsHistoryLimit = src.Spec.FailedJobsHistoryLimit

	// Status
	dst.Status.Active = src.Status.Active
	dst.Status.LastScheduleTime = src.Status.LastScheduleTime
	return nil
}

ConvertFrom is expected to modify its receiver to contain the converted object. Most of the conversion is straightforward copying, except for converting our changed field.

// ConvertFrom converts from the Hub version (v1) to this version.
func (dst *CronJob) ConvertFrom(srcRaw conversion.Hub) error {
	src := srcRaw.(*v1.CronJob)

	schedParts := strings.Split(src.Spec.Schedule, " ")
	if len(schedParts) != 5 {
		return fmt.Errorf("invalid schedule: not a standard 5-field schedule")
	}
	partIfNeeded := func(raw string) *CronField {
		if raw == "*" {
			return nil
		}
		part := CronField(raw)
		return &part
	}
	dst.Spec.Schedule.Minute = partIfNeeded(schedParts[0])
	dst.Spec.Schedule.Hour = partIfNeeded(schedParts[1])
	dst.Spec.Schedule.DayOfMonth = partIfNeeded(schedParts[2])
	dst.Spec.Schedule.Month = partIfNeeded(schedParts[3])
	dst.Spec.Schedule.DayOfWeek = partIfNeeded(schedParts[4])
rote conversion

The rest of the conversion is pretty rote.

	// ObjectMeta
	dst.ObjectMeta = src.ObjectMeta

	// Spec
	dst.Spec.StartingDeadlineSeconds = src.Spec.StartingDeadlineSeconds
	dst.Spec.ConcurrencyPolicy = ConcurrencyPolicy(src.Spec.ConcurrencyPolicy)
	dst.Spec.Suspend = src.Spec.Suspend
	dst.Spec.JobTemplate = src.Spec.JobTemplate
	dst.Spec.SuccessfulJobsHistoryLimit = src.Spec.SuccessfulJobsHistoryLimit
	dst.Spec.FailedJobsHistoryLimit = src.Spec.FailedJobsHistoryLimit

	// Status
	dst.Status.Active = src.Status.Active
	dst.Status.LastScheduleTime = src.Status.LastScheduleTime
	return nil
}

Now that we’ve got our conversions in place, all that we need to do is wire up our main to serve the webhook!

Setting up the webhooks

Our conversion is in place, so all that’s left is to tell controller-runtime about our conversion.

Normally, we’d run

kubebuilder create webhook --group batch --version v1 --kind CronJob --conversion

to scaffold out the webhook setup. However, we’ve already got webhook setup, from when we built our defaulting and validating webhooks!

Webhook setup...

project/api/v1/cronjob_webhook.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Go imports
package v1

import (
	"github.com/robfig/cron"
	apierrors "k8s.io/apimachinery/pkg/api/errors"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/apimachinery/pkg/runtime/schema"
	validationutils "k8s.io/apimachinery/pkg/util/validation"
	"k8s.io/apimachinery/pkg/util/validation/field"
	ctrl "sigs.k8s.io/controller-runtime"
	logf "sigs.k8s.io/controller-runtime/pkg/runtime/log"
	"sigs.k8s.io/controller-runtime/pkg/webhook"
)
var cronjoblog = logf.Log.WithName("cronjob-resource")

This setup is doubles as setup for our conversion webhooks: as long as our types implement the Hub and Convertible interfaces, a conversion webhook will be registered.

func (r *CronJob) SetupWebhookWithManager(mgr ctrl.Manager) error {
	return ctrl.NewWebhookManagedBy(mgr).
		For(r).
		Complete()
}
Existing Defaulting and Validation
var _ webhook.Defaulter = &CronJob{}

// Default implements webhook.Defaulter so a webhook will be registered for the type
func (r *CronJob) Default() {
	cronjoblog.Info("default", "name", r.Name)

	if r.Spec.ConcurrencyPolicy == "" {
		r.Spec.ConcurrencyPolicy = AllowConcurrent
	}
	if r.Spec.Suspend == nil {
		r.Spec.Suspend = new(bool)
	}
	if r.Spec.SuccessfulJobsHistoryLimit == nil {
		r.Spec.SuccessfulJobsHistoryLimit = new(int32)
		*r.Spec.SuccessfulJobsHistoryLimit = 3
	}
	if r.Spec.FailedJobsHistoryLimit == nil {
		r.Spec.FailedJobsHistoryLimit = new(int32)
		*r.Spec.FailedJobsHistoryLimit = 1
	}
}

// TODO(user): change verbs to "verbs=create;update;delete" if you want to enable deletion validation.
// +kubebuilder:webhook:verbs=create;update,path=/validate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=false,failurePolicy=fail,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,versions=v1,name=vcronjob.kb.io

var _ webhook.Validator = &CronJob{}

// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *CronJob) ValidateCreate() error {
	cronjoblog.Info("validate create", "name", r.Name)

	return r.validateCronJob()
}

// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type
func (r *CronJob) ValidateUpdate(old runtime.Object) error {
	cronjoblog.Info("validate update", "name", r.Name)

	return r.validateCronJob()
}

// ValidateDelete implements webhook.Validator so a webhook will be registered for the type
func (r *CronJob) ValidateDelete() error {
	cronjoblog.Info("validate delete", "name", r.Name)

	// TODO(user): fill in your validation logic upon object deletion.
	return nil
}

func (r *CronJob) validateCronJob() error {
	var allErrs field.ErrorList
	if err := r.validateCronJobName(); err != nil {
		allErrs = append(allErrs, err)
	}
	if err := r.validateCronJobSpec(); err != nil {
		allErrs = append(allErrs, err)
	}
	if len(allErrs) == 0 {
		return nil
	}

	return apierrors.NewInvalid(
		schema.GroupKind{Group: "batch.tutorial.kubebuilder.io", Kind: "CronJob"},
		r.Name, allErrs)
}

func (r *CronJob) validateCronJobSpec() *field.Error {
	// The field helpers from the kubernetes API machinery help us return nicely
	// structured validation errors.
	return validateScheduleFormat(
		r.Spec.Schedule,
		field.NewPath("spec").Child("schedule"))
}

func validateScheduleFormat(schedule string, fldPath *field.Path) *field.Error {
	if _, err := cron.ParseStandard(schedule); err != nil {
		return field.Invalid(fldPath, schedule, err.Error())
	}
	return nil
}

func (r *CronJob) validateCronJobName() *field.Error {
	if len(r.ObjectMeta.Name) > validationutils.DNS1035LabelMaxLength-11 {
		// The job name length is 63 character like all Kubernetes objects
		// (which must fit in a DNS subdomain). The cronjob controller appends
		// a 11-character suffix to the cronjob (`-$TIMESTAMP`) when creating
		// a job. The job name length limit is 63 characters. Therefore cronjob
		// names must have length <= 63-11=52. If we don't validate this here,
		// then job creation will fail later.
		return field.Invalid(field.NewPath("metadata").Child("name"), r.Name, "must be no more than 52 characters")
	}
	return nil
}

...and main.go

Similarly, our existing main file is sufficient:

project/main.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Imports
package main

import (
	"flag"
	"os"

	kbatchv1 "k8s.io/api/batch/v1"
	"k8s.io/apimachinery/pkg/runtime"
	clientgoscheme "k8s.io/client-go/kubernetes/scheme"
	_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/log/zap"

	batchv1 "tutorial.kubebuilder.io/project/api/v1"
	batchv2 "tutorial.kubebuilder.io/project/api/v2"
	"tutorial.kubebuilder.io/project/controllers"
	// +kubebuilder:scaffold:imports
)
existing setup
var (
	scheme   = runtime.NewScheme()
	setupLog = ctrl.Log.WithName("setup")
)

func init() {
	_ = clientgoscheme.AddToScheme(scheme)

	_ = kbatchv1.AddToScheme(scheme) // we've added this ourselves
	_ = batchv1.AddToScheme(scheme)
	_ = batchv2.AddToScheme(scheme)
	// +kubebuilder:scaffold:scheme
}
func main() {
existing setup
	var metricsAddr string
	var enableLeaderElection bool
	flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
	flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,
		"Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.")
	flag.Parse()

	ctrl.SetLogger(zap.New(zap.UseDevMode(true)))

	mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
		Scheme:             scheme,
		MetricsBindAddress: metricsAddr,
		LeaderElection:     enableLeaderElection,
	})
	if err != nil {
		setupLog.Error(err, "unable to start manager")
		os.Exit(1)
	}

	if err = (&controllers.CronJobReconciler{
		Client: mgr.GetClient(),
		Log:    ctrl.Log.WithName("controllers").WithName("Captain"),
		Scheme: mgr.GetScheme(), // we've added this ourselves
	}).SetupWithManager(mgr); err != nil {
		setupLog.Error(err, "unable to create controller", "controller", "Captain")
		os.Exit(1)
	}

Our existing call to SetupWebhookWithManager registers our conversion webhooks with the manager, too.

	if err = (&batchv1.CronJob{}).SetupWebhookWithManager(mgr); err != nil {
		setupLog.Error(err, "unable to create webhook", "webhook", "Captain")
		os.Exit(1)
	}
	// +kubebuilder:scaffold:builder
existing setup
	setupLog.Info("starting manager")
	if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
		setupLog.Error(err, "problem running manager")
		os.Exit(1)
	}
}

Everything’s set up and ready to go! All that’s left now is to test out our webhooks.

Deployment and Testing

Before we can test out our conversion, we’ll need to enable them conversion in our CRD:

Kubebuilder generates Kubernetes manifests under the config directory with webhook bits disabled. To enable them, we need to:

  • Enable patches/webhook_in_<kind>.yaml and patches/cainjection_in_<kind>.yaml in config/crd/kustomization.yaml file.

  • Enable ../certmanager and ../webhook directories under the bases section in config/default/kustomization.yaml file.

  • Enable manager_webhook_patch.yaml under the patches section in config/default/kustomization.yaml file.

  • Enable all the vars under the CERTMANAGER section in config/default/kustomization.yaml file.

Additionally, we’ll need to set the CRD_OPTIONS variable to just "crd", removing the trivialVersions option (this ensures that we actually generate validation for each version, instead of telling Kubernetes that they’re the same):

CRD_OPTIONS ?= "crd"

Now we have all our code changes and manifests in place, so let’s deploy it to the cluster and test it out.

You’ll need cert-manager installed (version 0.9.0+) unless you’ve got some other certificate management solution. The Kubebuilder team has tested the instructions in this tutorial with 0.9.0-alpha.0 release.

Once all our ducks are in a row with certificates, we can run make install deploy (as normal) to deploy all the bits (CRD, controller-manager deployment) onto the cluster.

Testing

Once all of the bits are up an running on the cluster with conversion enabled, we can test out our conversion by requesting different versions.

We’ll make a v2 version based on our v1 version (put it under config/samples)

apiVersion: batch.tutorial.kubebuilder.io/v2
kind: CronJob
metadata:
  name: cronjob-sample
spec:
  schedule:
    minute: "*/1"
  startingDeadlineSeconds: 60
  concurrencyPolicy: Allow # explicitly specify, but Allow is also default.
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

Then, we can create it on the cluster:

kubectl apply -f config/samples/batch_v2_cronjob.yaml

If we’ve done everything correctly, it should create successfully, and we should be able to fetch it using both the v2 resource

kubectl get cronjobs.v2.batch.tutorial.kubebuilder.io -o yaml
apiVersion: batch.tutorial.kubebuilder.io/v2
kind: CronJob
metadata:
  name: cronjob-sample
spec:
  schedule:
    minute: "*/1"
  startingDeadlineSeconds: 60
  concurrencyPolicy: Allow # explicitly specify, but Allow is also default.
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

and the v1 resource

kubectl get cronjobs.v1.batch.tutorial.kubebuilder.io -o yaml
apiVersion: batch.tutorial.kubebuilder.io/v1
kind: CronJob
metadata:
  name: cronjob-sample
spec:
  schedule: "*/1 * * * *"
  startingDeadlineSeconds: 60
  concurrencyPolicy: Allow # explicitly specify, but Allow is also default.
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

Both should be filled out, and look equivalent to our v2 and v1 samples, respectively. Notice that each has a different API version.

Finally, if we wait a bit, we should notice that our CronJob continues to reconcile, even though our controller is written against our v1 API version.

Troubleshooting

steps for troubleshooting

Migrations

Migrating between project structures in KubeBuilder generally involves a bit of manual work.

This section details what’s required to migrate, between different versions of KubeBuilder scaffolding, as well as to more complex project layout structures.

Kubebuilder v1 vs v2

This document cover all breaking changes when migrating from v1 to v2.

The details of all changes (breaking or otherwise) can be found in controller-runtime, controller-tools and kubebuilder release notes.

Common changes

V2 project uses go modules. But kubebuilder will continue to support dep until go 1.13 is out.

controller-runtime

  • Client.List now uses functional options (List(ctx, list, ...option)) instead of List(ctx, ListOptions, list).

  • Client.DeleteAllOf was added to the Client interface.

  • Metrics are on by default now.

  • A number of packages under pkg/runtime have been moved, with their old locations deprecated. The old locations will be removed before controller-runtime v1.0.0. See the godocs for more information.

Webhook-related

  • Automatic certificate generation for webhooks has been removed, and webhooks will no longer self-register. Use controller-tools to generate a webhook configuration. If you need certificate generation, we recommend using cert-manager. Kubebuilder v2 will scaffold out cert manager configs for you to use -- see the Webhook Tutorial for more details.

  • The builder package now has separate builders for controllers and webhooks, which facilitates choosing which to run.

controller-tools

The generator framework has been rewritten in v2. It still works the same as before in many cases, but be aware that there are some breaking changes. Please check marker documentation for more details.

Kubebuilder

  • Kubebuilder v2 introduces a simplified project layout. You can find the design doc here.

  • In v1, the manager is deployed as a StatefulSet, while it’s deployed as a Deployment in v2.

  • The kubebuilder create webhook command was added to scaffold mutating/validating/conversion webhooks. It replaces the kubebuilder alpha webhook command.

  • v2 uses distroless/static instead of Ubuntu as base image. This reduces image size and attack surface.

  • v2 requires kustomize v3.1.0+.

Migration from v1 to v2

Make sure you understand the differences between Kubebuilder v1 and v2 before continuing

Please ensure you have followed the installation guide to install the required components.

The recommended way to migrate a v1 project is to create a new v2 project and copy over the API and the reconciliation code. The conversion will end up with a project that looks like a native v2 project. However, in some cases, it’s possible to do an in-place upgrade (i.e. reuse the v1 project layout, upgrading controller-runtime and controller-tools.

Let’s take the example v1 project and migrate it to Kubebuilder v2. At the end, we should have something that looks like the example v2 project.

Preparation

We’ll need to figure out what the group, version, kind and domain are.

Let’s take a look at our current v1 project structure:

pkg/
├── apis
│   ├── addtoscheme_batch_v1.go
│   ├── apis.go
│   └── batch
│       ├── group.go
│       └── v1
│           ├── cronjob_types.go
│           ├── cronjob_types_test.go
│           ├── doc.go
│           ├── register.go
│           ├── v1_suite_test.go
│           └── zz_generated.deepcopy.go
├── controller
└── webhook

All of our API information is stored in pkg/apis/batch, so we can look there to find what we need to know.

In cronjob_types.go, we can find

type CronJob struct {...}

In register.go, we can find

SchemeGroupVersion = schema.GroupVersion{Group: "batch.tutorial.kubebuilder.io", Version: "v1"}

Putting that together, we get CronJob as the kind, and batch.tutorial.kubebuilder.io/v1 as the group-version

Initialize a v2 Project

Now, we need to initialize a v2 project. Before we do that, though, we’ll need to initialize a new go module if we’re not on the gopath:

go mod init tutorial.kubebuilder.io/project

Then, we can finish initializing the project with kubebuilder:

kubebuilder init --domain tutorial.kubebuilder.io

Migrate APIs and Controllers

Next, we’ll re-scaffold out the API types and controllers. Since we want both, we’ll say yes to both the API and controller prompts when asked what parts we want to scaffold:

kubebuilder create api --group batch --version v1 --kind CronJob

If you’re using multiple groups, some manual work is required to migrate. Please follow this for more details.

Migrate the APIs

Now, let’s copy the API definition from pkg/apis/batch/v1/cronjob_types.go to api/v1/cronjob_types.go. We only need to copy the implementation of the Spec and Status fields.

We can replace the +k8s:deepcopy-gen:interfaces=... marker (which is deprecated in kubebuilder) with +kubebuilder:object:root=true.

We don’t need the following markers any more (they’re not used anymore, and are relics from much older versions of KubeBuilder):

// +genclient
// +k8s:openapi-gen=true

Our API types should look like the following:

// +kubebuilder:object:root=true

// CronJob is the Schema for the cronjobs API
type CronJob struct {...}

// +kubebuilder:object:root=true

// CronJobList contains a list of CronJob
type CronJobList struct {...}

Migrate the Controllers

Now, let’s migrate the controller reconciler code from pkg/controller/cronjob/cronjob_controller.go to controllers/cronjob_controller.go.

We’ll need to copy

  • the fields from the ReconcileCronJob struct to CronJobReconciler
  • the contents of the Reconcile function
  • the rbac related markers to the new file.
  • the code under func add(mgr manager.Manager, r reconcile.Reconciler) error to func SetupWithManager

Migrate the Webhooks

If you don’t have a webhook, you can skip this section.

Webhooks for Core Types and External CRDs

If you are using webhooks for Kubernetes core types (e.g. Pods), or for an external CRD that is not owned by you, you can refer the controller-runtime example for builtin types and do something similar. Kubebuilder doesn’t scaffold much for these cases, but you can use the library in controller-runtime.

Scaffold Webhooks for our CRDs

Now let’s scaffold the webhooks for our CRD (CronJob). We’ll need to run the following command with the --defaulting and --programmatic-validation flags (since our test project uses defaulting and validating webhooks):

kubebuilder create webhook --group batch --version v1 --kind CronJob --defaulting --programmatic-validation

Depending on how many CRDs need webhooks, we may need to run the above command multiple times with different Group-Version-Kinds.

Now, we’ll need to copy the logic for each webhook. For validating webhooks, we can copy the contents from func validatingCronJobFn in pkg/default_server/cronjob/validating/cronjob_create_handler.go to func ValidateCreate in api/v1/cronjob_webhook.go and then the same for update.

Similarly, we’ll copy from func mutatingCronJobFn to func Default.

Webhook Markers

When scaffolding webhooks, Kubebuilder v2 adds the following markers:

// These are v2 markers

// This is for the mutating webhook
// +kubebuilder:webhook:path=/mutate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=true,failurePolicy=fail,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=create;update,versions=v1,name=mcronjob.kb.io

...

// This is for the validating webhook
// +kubebuilder:webhook:path=/validate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=false,failurePolicy=fail,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=create;update,versions=v1,name=vcronjob.kb.io

The default verbs are verbs=create;update. We need to ensure verbs matches what we need. For example, if we only want to validate creation, then we would change it to verbs=create.

We also need to ensure failure-policy is still the same.

Markers like the following are no longer needed (since they deal with self-deploying certificate configuration, which was removed in v2):

// v1 markers
// +kubebuilder:webhook:port=9876,cert-dir=/tmp/cert
// +kubebuilder:webhook:service=test-system:webhook-service,selector=app:webhook-server
// +kubebuilder:webhook:secret=test-system:webhook-server-secret
// +kubebuilder:webhook:mutating-webhook-config-name=test-mutating-webhook-cfg
// +kubebuilder:webhook:validating-webhook-config-name=test-validating-webhook-cfg

In v1, a single webhook marker may be split into multiple ones in the same paragraph. In v2, each webhook must be represented by a single marker.

Others

If there are any manual updates in main.go in v1, we need to port the changes to the new main.go. We’ll also need to ensure all of the needed schemes have been registered.

If there are additional manifests added under config directory, port them as well.

Change the image name in the Makefile if needed.

Verification

Finally, we can run make and make docker-build to ensure things are working fine.

Single Group to Multi-Group

While KubeBuilder v2 will not scaffold out a project structure compatible with multiple API groups in the same repository by default, it’s possible to modify the default project structure to support it.

Let’s migrate the CronJob example.

Generally, we use the prefix for the API group as the directory name. We can check api/v1/groupversion_info.go to find that out:

// +groupName=batch.tutorial.kubebuilder.io
package v1

Then, we’ll rename api to apis to be more clear, and we’ll move our existing APIs into a new subdirectory, “batch”:

mkdir apis/batch
mv api/* apis/batch
# After ensuring that all was moved successfully remove the old directory `api/`
rm -rf api/ 

After moving the APIs to a new directory, the same needs to be applied to the controllers:

mkdir controllers/batch
mv controllers/* controllers/batch/

Next, we’ll need to update all the references to the old package name. For CronJob, that’ll be main.go and controllers/batch/cronjob_controller.go.

If you’ve added additional files to your project, you’ll need to track down imports there as well.

Finally, we’ll run the command which enable the multi-group layout in the project:

kubebuilder edit --multigroup=true

When the command kubebuilder edit --multigroup=true is executed it will add a new line to PROJECT that marks this a multi-group project:

version: "2"
domain: tutorial.kubebuilder.io
repo: tutorial.kubebuilder.io/project
multigroup: true

Note that this option indicates to KubeBuilder that this is a multi-group project.

In this way, if the project is not new and has previous APIs already implemented will be in the previous structure. Notice that with the multi-group project the Kind API’s files are created under apis/<group>/<version> instead of api/<version>. Also, note that the controllers will be created under controllers/<group> instead of controllers. That is the reason why we moved the previously generated APIs with the provided scripts in the previous steps. Remember to update the references afterwards.

The CronJob tutorial explains each of these changes in more detail (in the context of how they’re generated by KubeBuilder for single-group projects).

Reference

Generating CRDs

KubeBuilder uses a tool called controller-gen to generate utility code and Kubernetes object YAML, like CustomResourceDefinitions.

To do this, it makes use of special “marker comments” (comments that start with // +) to indicate additional information about fields, types, and packages. In the case of CRDs, these are generally pulled from your _types.go files. For more information on markers, see the marker reference docs.

KubeBuilder provides a make target to run controller-gen and generate CRDs: make manifests.

When you run make manifests, you should see CRDs generated under the config/crd/bases directory. make manifests can generate a number of other artifacts as well -- see the marker reference docs for more details.

Validation

CRDs support declarative validation using an OpenAPI v3 schema in the validation section.

In general, validation markers may be attached to fields or to types. If you’re defining complex validation, if you need to re-use validation, or if you need to validate slice elements, it’s often best to define a new type to describe your validation.

For example:

type ToySpec struct {
	// +kubebuilder:validation:MaxLength=15
	// +kubebuilder:validation:MinLength=1
	Name string `json:"name,omitempty"`

	// +kubebuilder:validation:MaxItems=500
	// +kubebuilder:validation:MinItems=1
	// +kubebuilder:validation:UniqueItems=true
	Knights []string `json:"knights,omitempty"`

	Alias   Alias   `json:"alias,omitempty"`
	Rank    Rank    `json:"rank"`
}

// +kubebuilder:validation:Enum=Lion;Wolf;Dragon
type Alias string

// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=3
// +kubebuilder:validation:ExclusiveMaximum=false
type Rank int32

Additional Printer Columns

Starting with Kubernetes 1.11, kubectl get can ask the server what columns to display. For CRDs, this can be used to provide useful, type-specific information with kubectl get, similar to the information provided for built-in types.

The information that gets displayed can be controlled with the [additionalPrinterColumns field][kube-additional-printer-columns] on your CRD, which is controlled by the +kubebuilder:printcolumn marker on the Go type for your CRD.

For instance, in the following example, we add fields to display information about the knights, rank, and alias fields from the validation example:

// +kubebuilder:printcolumn:name="Alias",type=string,JSONPath=`.spec.alias`
// +kubebuilder:printcolumn:name="Rank",type=integer,JSONPath=`.spec.rank`
// +kubebuilder:printcolumn:name="Bravely Run Away",type=boolean,JSONPath=`.spec.knights[?(@ == "Sir Robin")]`,description="when danger rears its ugly head, he bravely turned his tail and fled",priority=10
type Toy struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   ToySpec   `json:"spec,omitempty"`
	Status ToyStatus `json:"status,omitempty"`
}

Subresources

CRDs can choose to implement the /status and /scale subresources as of Kubernetes 1.13.

It’s generally reccomended that you make use of the /status subresource on all resources that have a status field.

Both subresources have a corresponding marker.

Status

The status subresource is enabled via +kubebuilder:subresource:status. When enabled, updates at the main resource will not change status. Similarly, updates to the status subresource cannot change anything but the status field.

For example:

// +kubebuilder:subresource:status
type Toy struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   ToySpec   `json:"spec,omitempty"`
	Status ToyStatus `json:"status,omitempty"`
}

Scale

The scale subresource is enabled via +kubebuilder:subresource:scale. When enabled, users will be able to use kubectl scale with your resource. If the selectorpath argument pointed to the string form of a label selector, the HorizontalPodAutoscaler will be able to autoscale your resource.

For example:

type CustomSetSpec struct {
	Replicas *int32 `json:"replicas"`
}

type CustomSetStatus struct {
	Replicas int32 `json:"replicas"`
    Selector string `json:"selector"` // this must be the string form of the selector
}


// +kubebuilder:subresource:status
// +kubebuilder:subresource:scale:specpath=.spec.replicas,statuspath=.status.replicas,selectorpath=.status.selector
type CustomSet struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   ToySpec   `json:"spec,omitempty"`
	Status ToyStatus `json:"status,omitempty"`
}

Multiple Versions

As of Kubernetes 1.13, you can have multiple versions of your Kind defined in your CRD, and use a webhook to convert between them.

For more details on this process, see the multiversion tutorial.

By default, KubeBuilder disables generating different validation for different versions of the Kind in your CRD, to be compatible with older Kubernetes versions.

You’ll need to enable this by switching the line in your makefile that says CRD_OPTIONS ?= "crd:trivialVersions=true to CRD_OPTIONS ?= crd

Then, you can use the +kubebuilder:storageversion marker to indicate the GVK that should be used to store data by the API server.

Under the hood

KubeBuilder scaffolds out make rules to run controller-gen. The rules will automatically install controller-gen if it’s not on your path using go get with Go modules.

You can also run controller-gen directly, if you want to see what it’s doing.

Each controller-gen “generator” is controlled by an option to controller-gen, using the same syntax as markers. For instance, to generate CRDs with “trivial versions” (no version conversion webhooks), we call controller-gen crd:trivialVersions=true paths=./api/....

controller-gen also supports different output “rules” to control how and where output goes. Notice the manifests make rule (condensed slightly to only generate CRDs):

# Generate manifests for CRDs
manifests: controller-gen
	$(CONTROLLER_GEN) crd:trivialVersions=true paths="./..." output:crd:artifacts:config=config/crd/bases

It uses the output:crd:artifacts output rule to indicate that CRD-related config (non-code) artifacts should end up in config/crd/bases instead of config/crd.

To see all the options for controller-gen, run

$ controller-gen -h

or, for more details:

$ controller-gen -hhh

Using Finalizers

Finalizers allow controllers to implement asynchronous pre-delete hooks. Let’s say you create an external resource (such as a storage bucket) for each object of your API type, and you want to delete the associated external resource on object’s deletion from Kubernetes, you can use a finalizer to do that.

You can read more about the finalizers in the Kubernetes reference docs. The section below demonstrates how to register and trigger pre-delete hooks in the Reconcile method of a controller.

The key point to note is that a finalizer causes “delete” on the object to become an “update” to set deletion timestamp. Presence of deletion timestamp on the object indicates that it is being deleted. Otherwise, without finalizers, a delete shows up as a reconcile where the object is missing from the cache.

Highlights:

  • If the object is not being deleted and does not have the finalizer registered, then add the finalizer and update the object in Kubernetes.
  • If object is being deleted and the finalizer is still present in finalizers list, then execute the pre-delete logic and remove the finalizer and update the object.
  • Ensure that the pre-delete logic is idempotent.
../../cronjob-tutorial/testdata/finalizer_example.go
Apache License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Imports

First, we start out with some standard imports. As before, we need the core controller-runtime library, as well as the client package, and the package for our API types.

package controllers

import (
	"context"

	"github.com/go-logr/logr"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"

	batchv1 "tutorial.kubebuilder.io/project/api/v1"
)

The code snippet below shows skeleton code for implementing a finalizer.

func (r *CronJobReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
	ctx := context.Background()
	log := r.Log.WithValues("cronjob", req.NamespacedName)

	var cronJob *batchv1.CronJob
	if err := r.Get(ctx, req.NamespacedName, cronJob); err != nil {
		log.Error(err, "unable to fetch CronJob")
		// we'll ignore not-found errors, since they can't be fixed by an immediate
		// requeue (we'll need to wait for a new notification), and we can get them
		// on deleted requests.
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

	// name of our custom finalizer
	myFinalizerName := "storage.finalizers.tutorial.kubebuilder.io"

	// examine DeletionTimestamp to determine if object is under deletion
	if cronJob.ObjectMeta.DeletionTimestamp.IsZero() {
		// The object is not being deleted, so if it does not have our finalizer,
		// then lets add the finalizer and update the object. This is equivalent
		// registering our finalizer.
		if !containsString(cronJob.ObjectMeta.Finalizers, myFinalizerName) {
			cronJob.ObjectMeta.Finalizers = append(cronJob.ObjectMeta.Finalizers, myFinalizerName)
			if err := r.Update(context.Background(), cronJob); err != nil {
				return ctrl.Result{}, err
			}
		}
	} else {
		// The object is being deleted
		if containsString(cronJob.ObjectMeta.Finalizers, myFinalizerName) {
			// our finalizer is present, so lets handle any external dependency
			if err := r.deleteExternalResources(cronJob); err != nil {
				// if fail to delete the external dependency here, return with error
				// so that it can be retried
				return ctrl.Result{}, err
			}

			// remove our finalizer from the list and update it.
			cronJob.ObjectMeta.Finalizers = removeString(cronJob.ObjectMeta.Finalizers, myFinalizerName)
			if err := r.Update(context.Background(), cronJob); err != nil {
				return ctrl.Result{}, err
			}
		}

		// Stop reconciliation as the item is being deleted
		return ctrl.Result{}, nil
	}

	// Your reconcile logic

	return ctrl.Result{}, nil
}

func (r *Reconciler) deleteExternalResources(cronJob *batch.CronJob) error {
	//
	// delete any external resources associated with the cronJob
	//
	// Ensure that delete implementation is idempotent and safe to invoke
	// multiple types for same object.
}

// Helper functions to check and remove string from a slice of strings.
func containsString(slice []string, s string) bool {
	for _, item := range slice {
		if item == s {
			return true
		}
	}
	return false
}

func removeString(slice []string, s string) (result []string) {
	for _, item := range slice {
		if item == s {
			continue
		}
		result = append(result, item)
	}
	return
}

Kind Cluster

This only cover the basics to use a kind cluster. You can find more details at kind documentation.

Installation

You can follow this to install kind.

Create a Cluster

You can simply create a kind cluster by

kind create cluster

To customize your cluster, you can provide additional configuration. For example, the following is a sample kind configuration.

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
  - role: control-plane
  - role: worker
  - role: worker
  - role: worker

Using the configuration above, run the following command will give you a k8s 1.14.2 cluster with 1 master and 3 workers.

kind create cluster --config hack/kind-config.yaml --image=kindest/node:v1.14.2

You can use --image flag to specify the cluster version you want, e.g. --image=kindest/node:v1.13.6, the supported version are listed here

Cheetsheet

kind load docker-image your-image-name:your-tag
  • Point kubectl to the kind cluster
kind export kubeconfig
  • Delete a kind cluster
kind delete cluster

Webhook

Webhooks are requests for information sent in a blocking fashion. A web application implementing webhooks will send an HTTP request to other application when certain event happens.

In the kubernetes world, there are 3 kinds of webhooks: admission webhook, authorization webhook and CRD conversion webhook.

In controller-runtime libraries, we support admission webhooks and CRD conversion webhooks.

Kubernetes supports these dynamic admission webhooks as of version 1.9 (when the feature entered beta).

Kubernetes supports the conversion webhooks as of version 1.15 (when the feature entered beta).

Admission Webhooks

Admission webhooks are HTTP callbacks that receive admission requests, process them and return admission responses.

Kubernetes provides the following types of admission webhooks:

  • Mutating Admission Webhook: These can mutate the object while it’s being created or updated, before it gets stored. It can be used to default fields in a resource requests, e.g. fields in Deployment that are not specified by the user. It can be used to inject sidecar containers.

  • Validating Admission Webhook: These can validate the object while it’s being created or updated, before it gets stored. It allows more complex validation than pure schema-based validation. e.g. cross-field validation and pod image whitelisting.

The apiserver by default doesn’t authenticate itself to the webhooks. However, if you want to authenticate the clients, you can configure the apiserver to use basic auth, bearer token, or a cert to authenticate itself to the webhooks. You can find detailed steps here.

Admission Webhook for Core Types

It is very easy to build admission webhooks for CRDs, which has been covered in the CronJob tutorial. Given that kubebuilder doesn’t support webhook scaffolding for core types, you have to use the library from controler-runtime to handle it. There is an example in controller-runtime.

It is suggested to use kubebuilder to initialize a project, and then you can follow the steps below to add admission webhooks for core types.

Implement Your Handler

You need to have your handler implements the admission.Handler interface.

type podAnnotator struct {
	Client  client.Client
	decoder *admission.Decoder
}

func (a *podAnnotator) Handle(ctx context.Context, req admission.Request) admission.Response {
	pod := &corev1.Pod{}
	err := a.decoder.Decode(req, pod)
	if err != nil {
		return admission.Errored(http.StatusBadRequest, err)
	}

	// mutate the fields in pod

	marshaledPod, err := json.Marshal(pod)
	if err != nil {
		return admission.Errored(http.StatusInternalServerError, err)
	}
	return admission.PatchResponseFromRaw(req.Object.Raw, marshaledPod)
}

If you need a client, just pass in the client at struct construction time.

If you add the InjectDecoder method for your handler, a decoder will be injected for you.

func (a *podAnnotator) InjectDecoder(d *admission.Decoder) error {
	a.decoder = d
	return nil
}

Note: in order to have controller-gen generate the webhook configuration for you, you need to add markers. For example, // +kubebuilder:webhook:path=/mutate-v1-pod,mutating=true,failurePolicy=fail,groups="",resources=pods,verbs=create;update,versions=v1,name=mpod.kb.io

Update main.go

Now you need to register your handler in the webhook server.

mgr.GetWebhookServer().Register("/mutate-v1-pod", &webhook.Admission{Handler: &podAnnotator{Client: mgr.GetClient()}})

You need to ensure the path here match the path in the marker.

Deploy

Deploying it is just like deploying a webhook server for CRD. You need to

  1. provision the serving certificate 2) deploy the server

You can follow the tutorial.

Markers for Config/Code Generation

KubeBuilder makes use of a tool called controller-gen for generating utility code and Kubernetes YAML. This code and config generation is controlled by the presence of special “marker comments” in Go code.

Markers are single-line comments that start with a plus, followed by a marker name, optionally followed by some marker specific configuration:

// +kubebuilder:validation:Optional
// +kubebuilder:validation:MaxItems=2
// +kubebuilder:printcolumn:JSONPath=".status.replicas",name=Replicas,type=string

See each subsection for information about different types of code and YAML generation.

Generating Code & Artifacts in KubeBuilder

KubeBuilder projects have two make targets that make use of controller-gen:

See Generating CRDs for a comprehensive overview.

Marker Syntax

Exact syntax is described in the godocs for controller-tools.

In general, markers may either be:

  • Empty (+kubebuilder:validation:Optional): empty markers are like boolean flags on the command line -- just specifying them enables some behavior.

  • Anonymous (+kubebuilder:validation:MaxItems=2): anonymous markers take a single value as their argument.

  • Multi-option (+kubebuilder:printcolumn:JSONPath=".status.replicas",name=Replicas,type=string): multi-option markers take one or more named arguments. The first argument is separated from the name by a colon, and latter arguments are comma-separated. Order of arguments doesn’t matter. Some arguments may be optional.

Marker arguments may be strings, ints, bools, slices, or maps thereof. Strings, ints, and bools follow their Go syntax:

// +kubebuilder:validation:ExclusiveMaximum=false
// +kubebuilder:validation:Format="date-time"
// +kubebuilder:validation:Maximum=42

For convenience, in simple cases the quotes may be omitted from strings, although this is not encouraged for anything other than single-word strings:

// +kubebuilder:validation:Type=string

Slices may be specified either by surrounding them with curly braces and separating with commas:

// +kubebuilder:webhooks:Enum={"crackers, Gromit, we forgot the crackers!","not even wensleydale?"}

or, in simple cases, by separating with semicolons:

// +kubebuilder:validation:Enum=Wallace;Gromit;Chicken

Maps are specified with string keys and values of any type (effectively map[string]interface{}). A map is surrounded by curly braces ({}), each key and value is separated by a colon (:), and each key-value pair is separated by a comma:

// +kubebuilder:validation:Default={magic: {numero: 42, stringified: forty-two}}

CRD Generation

These markers describe how to construct a custom resource definition from a series of Go types and packages. Generation of the actual validation schema is described by the validation markers.

See Generating CRDs for examples.

groupName
string
specifies the API group name for this package.
string
kubebuilder:printcolumn
JSONPath
string
description
string
format
string
name
string
priority
int
type
string
adds a column to "kubectl get" output for this CRD.
JSONPath
string
specifies the jsonpath expression used to extract the value of the column.
description
string
specifies the help/description for this column.
format
string
specifies the format of the column.

It may be any OpenAPI data format corresponding to the type, listed at https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md#data-types.

name
string
specifies the name of the column.
priority
int
indicates how important it is that this column be displayed.

Lower priority (higher numbered) columns will be hidden if the terminal width is too small.

type
string
indicates the type of the column.

It may be any OpenAPI data type listed at https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md#data-types.

kubebuilder:resource
categories
string
path
string
scope
string
shortName
string
singular
string
configures naming and scope for a CRD.
categories
string
specifies which group aliases this resource is part of.

Group aliases are used to work with groups of resources at once. The most common one is “all“ which covers about a third of the base resources in Kubernetes, and is generally used for “user-facing“ resources.

path
string
specifies the plural "resource" for this CRD.

It generally corresponds to a plural, lower-cased version of the Kind. See https://book.kubebuilder.io/cronjob-tutorial/gvks.html.

scope
string
overrides the scope of the CRD (cluster vs namespaced).

Scope defaults to “namespaced“. Cluster-scoped (“cluster“) resources don‘t exist in namespaces.

shortName
string
specifies aliases for this CRD.

Short names are often used when people have work with your resource over and over again. For instance, “rs“ for “replicaset“ or “crd“ for customresourcedefinition.

singular
string
overrides the singular form of your resource.

The singular form is otherwise defaulted off the plural (path).

kubebuilder:skip
don't consider this package as an API version.
kubebuilder:skipversion
removes the particular version of the CRD from the CRDs spec.

This is useful if you need to skip generating and listing version entries for ‘internal‘ resource versions, which typically exist if using the Kubernetes upstream conversion-gen tool.

kubebuilder:storageversion
marks this version as the "storage version" for the CRD for conversion.

When conversion is enabled for a CRD (i.e. it‘s not a trivial-versions/single-version CRD), one version is set as the “storage version“ to be stored in etcd. Attempting to store any other version will result in conversion to the storage version via a conversion webhook.

kubebuilder:subresource:scale
selectorpath
string
specpath
string
statuspath
string
enables the "/scale" subresource on a CRD.
selectorpath
string
specifies the jsonpath to the pod label selector field for the scale's status.

The selector field must be the string form (serialized form) of a selector. Setting a pod label selector is necessary for your type to work with the HorizontalPodAutoscaler.

specpath
string
specifies the jsonpath to the replicas field for the scale's spec.
statuspath
string
specifies the jsonpath to the replicas field for the scale's status.
kubebuilder:subresource:status
enables the "/status" subresource on a CRD.
versionName
string
overrides the API group version for this package (defaults to the package name).
string

CRD Validation

These markers modify how the CRD validation schema is produced for the types and fields they modify. Each corresponds roughly to an OpenAPI/JSON schema option.

See Generating CRDs for examples.

kubebuilder:default
any
sets the default value for this field.

A default value will be accepted as any value valid for the field. Formatting for common types include: boolean: true, string: Cluster, numerical: 1.24, array: {1,2}, object: {policy: &#34;delete&#34;}). Defaults should be defined in pruned form, and only best-effort validation will be performed. Full validation of a default requires submission of the containing CRD to an apiserver.

any
kubebuilder:validation:EmbeddedResource
EmbeddedResource marks a fields as an embedded resource with apiVersion, kind and metadata fields.

An embedded resource is a value that has apiVersion, kind and metadata fields. They are validated implicitly according to the semantics of the currently running apiserver. It is not necessary to add any additional schema for these field, yet it is possible. This can be combined with PreserveUnknownFields.

kubebuilder:validation:Enum
any
specifies that this (scalar) field is restricted to the *exact* values specified here.
any
kubebuilder:validation:Enum
any
specifies that this (scalar) field is restricted to the *exact* values specified here.
any
kubebuilder:validation:ExclusiveMaximum
bool
indicates that the maximum is "up to" but not including that value.
bool
kubebuilder:validation:ExclusiveMaximum
bool
indicates that the maximum is "up to" but not including that value.
bool
kubebuilder:validation:ExclusiveMinimum
bool
indicates that the minimum is "up to" but not including that value.
bool
kubebuilder:validation:ExclusiveMinimum
bool
indicates that the minimum is "up to" but not including that value.
bool
kubebuilder:validation:Format
string
specifies additional "complex" formatting for this field.

For example, a date-time field would be marked as “type: string“ and “format: date-time“.

string
kubebuilder:validation:Format
string
specifies additional "complex" formatting for this field.

For example, a date-time field would be marked as “type: string“ and “format: date-time“.

string
kubebuilder:validation:MaxItems
int
specifies the maximum length for this list.
int
kubebuilder:validation:MaxItems
int
specifies the maximum length for this list.
int
kubebuilder:validation:MaxLength
int
specifies the maximum length for this string.
int
kubebuilder:validation:MaxLength
int
specifies the maximum length for this string.
int
kubebuilder:validation:Maximum
int
specifies the maximum numeric value that this field can have.
int
kubebuilder:validation:Maximum
int
specifies the maximum numeric value that this field can have.
int
kubebuilder:validation:MinItems
int
specifies the minimun length for this list.
int
kubebuilder:validation:MinItems
int
specifies the minimun length for this list.
int
kubebuilder:validation:MinLength
int
specifies the minimum length for this string.
int
kubebuilder:validation:MinLength
int
specifies the minimum length for this string.
int
kubebuilder:validation:Minimum
int
specifies the minimum numeric value that this field can have.
int
kubebuilder:validation:Minimum
int
specifies the minimum numeric value that this field can have.
int
kubebuilder:validation:MultipleOf
int
specifies that this field must have a numeric value that's a multiple of this one.
int
kubebuilder:validation:MultipleOf
int
specifies that this field must have a numeric value that's a multiple of this one.
int
kubebuilder:validation:Optional
specifies that this field is optional, if fields are required by default.
kubebuilder:validation:Optional
specifies that all fields in this package are optional by default.
kubebuilder:validation:Pattern
string
specifies that this string must match the given regular expression.
string
kubebuilder:validation:Pattern
string
specifies that this string must match the given regular expression.
string
kubebuilder:validation:Required
specifies that this field is required, if fields are optional by default.
kubebuilder:validation:Required
specifies that all fields in this package are required by default.
kubebuilder:validation:Type
string
overrides the type for this field (which defaults to the equivalent of the Go type).

This generally must be paired with custom serialization. For example, the metav1.Time field would be marked as “type: string“ and “format: date-time“.

string
kubebuilder:validation:Type
string
overrides the type for this field (which defaults to the equivalent of the Go type).

This generally must be paired with custom serialization. For example, the metav1.Time field would be marked as “type: string“ and “format: date-time“.

string
kubebuilder:validation:UniqueItems
bool
specifies that all items in this list must be unique.
bool
kubebuilder:validation:UniqueItems
bool
specifies that all items in this list must be unique.
bool
kubebuilder:validation:XEmbeddedResource
EmbeddedResource marks a fields as an embedded resource with apiVersion, kind and metadata fields.

An embedded resource is a value that has apiVersion, kind and metadata fields. They are validated implicitly according to the semantics of the currently running apiserver. It is not necessary to add any additional schema for these field, yet it is possible. This can be combined with PreserveUnknownFields.

kubebuilder:validation:XEmbeddedResource
EmbeddedResource marks a fields as an embedded resource with apiVersion, kind and metadata fields.

An embedded resource is a value that has apiVersion, kind and metadata fields. They are validated implicitly according to the semantics of the currently running apiserver. It is not necessary to add any additional schema for these field, yet it is possible. This can be combined with PreserveUnknownFields.

nullable
marks this field as allowing the "null" value.

This is often not necessary, but may be helpful with custom serialization.

optional
specifies that this field is optional, if fields are required by default.

CRD Processing

These markers help control how the Kubernetes API server processes API requests involving your custom resources.

See Generating CRDs for examples.

kubebuilder:pruning:PreserveUnknownFields
PreserveUnknownFields stops the apiserver from pruning fields which are not specified.

By default the apiserver drops unknown fields from the request payload during the decoding step. This marker stops the API server from doing so. It affects fields recursively, but switches back to normal pruning behaviour if nested properties or additionalProperties are specified in the schema. This can either be true or undefined. False is forbidden.

kubebuilder:validation:XPreserveUnknownFields
PreserveUnknownFields stops the apiserver from pruning fields which are not specified.

By default the apiserver drops unknown fields from the request payload during the decoding step. This marker stops the API server from doing so. It affects fields recursively, but switches back to normal pruning behaviour if nested properties or additionalProperties are specified in the schema. This can either be true or undefined. False is forbidden.

kubebuilder:validation:XPreserveUnknownFields
PreserveUnknownFields stops the apiserver from pruning fields which are not specified.

By default the apiserver drops unknown fields from the request payload during the decoding step. This marker stops the API server from doing so. It affects fields recursively, but switches back to normal pruning behaviour if nested properties or additionalProperties are specified in the schema. This can either be true or undefined. False is forbidden.

Webhook

These markers describe how webhook configuration is generated. Use these to keep the description of your webhooks close to the code that implements them.

kubebuilder:webhook
failurePolicy
string
groups
string
mutating
bool
name
string
path
string
resources
string
verbs
string
versions
string
specifies how a webhook should be served.

It specifies only the details that are intrinsic to the application serving it (e.g. the resources it can handle, or the path it serves on).

failurePolicy
string
specifies what should happen if the API server cannot reach the webhook.

It may be either “ignore“ (to skip the webhook and continue on) or “fail“ (to reject the object in question).

groups
string
specifies the API groups that this webhook receives requests for.
mutating
bool
marks this as a mutating webhook (it's validating only if false)

Mutating webhooks are allowed to change the object in their response, and are called before all validating webhooks. Mutating webhooks may choose to reject an object, similarly to a validating webhook.

name
string
indicates the name of this webhook configuration.
path
string
specifies that path that the API server should connect to this webhook on.
resources
string
specifies the API resources that this webhook receives requests for.
verbs
string
specifies the Kubernetes API verbs that this webhook receives requests for.

Only modification-like verbs may be specified. May be “create“, “update“, “delete“, “connect“, or “*“ (for all).

versions
string
specifies the API versions that this webhook receives requests for.

Object/DeepCopy

These markers control when DeepCopy and runtime.Object implementation methods are generated.

k8s:deepcopy-gen
raw
enables or disables object interface & deepcopy implementation generation for this package
raw
k8s:deepcopy-gen
raw
overrides enabling or disabling deepcopy generation for this type
raw
k8s:deepcopy-gen:interfaces
string
enables object interface implementation generation for this type
string
kubebuilder:object:generate
bool
enables or disables object interface & deepcopy implementation generation for this package
bool
kubebuilder:object:generate
bool
overrides enabling or disabling deepcopy generation for this type
bool
kubebuilder:object:root
bool
enables object interface implementation generation for this type
bool

RBAC

These markers cause an RBAC ClusterRole to be generated. This allows you to describe the permissions that your controller requires alongside the code that makes use of those permissions.

kubebuilder:rbac
groups
string
namespace
string
resources
string
urls
string
verbs
string
specifies an RBAC rule to all access to some resources or non-resource URLs.
groups
string
specifies the API groups that this rule encompasses.
namespace
string
specifies the scope of the Rule. If not set, the Rule belongs to the generated ClusterRole. If set, the Rule belongs to a Role, whose namespace is specified by this field.
resources
string
specifies the API resources that this rule encompasses.
urls
string
URL specifies the non-resource URLs that this rule encompasses.
verbs
string
specifies the (lowercase) kubernetes API verbs that this rule encompasses.

controller-gen CLI

KubeBuilder makes use of a tool called controller-gen for generating utility code and Kubernetes YAML. This code and config generation is controlled by the presence of special “marker comments” in Go code.

controller-gen is built out of different “generators” (which specify what to generate) and “output rules” (which specify how and where to write the results).

Both are configured through command line options specified in marker format.

For instance,

controller-gen paths=./... crd:trivialVersions=true rbac:roleName=controller-perms output:crd:artifacts:config=config/crd/bases

generates CRDs and RBAC, and specifically stores the generated CRD YAML in config/crd/bases. For the RBAC, it uses the default output rules (config/rbac). It considers every package in the current directory tree (as per the normal rules of the go ... wildcard).

Generators

Each different generator is configured through a CLI option. Multiple generators may be used in a single invocation of controller-gen.

webhook
generates (partial) {Mutating,Validating}WebhookConfiguration objects.
schemapatch
manifests
string
maxDescLen
int
patches existing CRDs with new schemata.

For legacy (v1beta1) single-version CRDs, it will simply replace the global schema. For legacy (v1beta1) multi-version CRDs, and any v1 CRDs, it will replace schemata of existing versions and clear the schema from any versions not specified in the Go code. It will not add new versions, or remove old ones. For legacy multi-version CRDs with identical schemata, it will take care of lifting the per-version schema up to the global schema. It will generate output for each “CRD Version“ (API version of the CRD type itself) , e.g. apiextensions/v1beta1 and apiextensions/v1) available.

manifests
string
contains the CustomResourceDefinition YAML files.
maxDescLen
int
specifies the maximum description length for fields in CRD's OpenAPI schema.

0 indicates drop the description for all fields completely. n indicates limit the description to at most n characters and truncate the description to closest sentence boundary if it exceeds n characters.

rbac
roleName
string
generates ClusterRole objects.
roleName
string
sets the name of the generated ClusterRole.
object
headerFile
string
year
string
generates code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.
headerFile
string
specifies the header text (e.g. license) to prepend to generated files.
year
string
specifies the year to substitute for " YEAR" in the header file.
crd
crdVersions
string
maxDescLen
int
preserveUnknownFields
bool
trivialVersions
bool
generates CustomResourceDefinition objects.
crdVersions
string
specifies the target API versions of the CRD type itself to generate. Defaults to v1beta1.

The first version listed will be assumed to be the “default“ version and will not get a version suffix in the output filename. You‘ll need to use “v1“ to get support for features like defaulting, along with an API server that supports it (Kubernetes 1.16+).

maxDescLen
int
specifies the maximum description length for fields in CRD's OpenAPI schema.

0 indicates drop the description for all fields completely. n indicates limit the description to at most n characters and truncate the description to closest sentence boundary if it exceeds n characters.

preserveUnknownFields
bool
trivialVersions
bool
indicates that we should produce a single-version CRD.

Single “trivial-version“ CRDs are compatible with older (pre 1.13) Kubernetes API servers. The storage version‘s schema will be used as the CRD‘s schema. Only works with the v1beta1 CRD version.

Output Rules

Output rules configure how a given generator outputs its results. There is always one global “fallback” output rule (specified as output:<rule>), plus per-generator overrides (specified as output:<generator>:<rule>).

For brevity, the per-generator output rules (output:<generator>:<rule>) are omitted below. They are equivalent to the global fallback options listed here.

output:artifacts
code
string
config
string
outputs artifacts to different locations, depending on whether they're package-associated or not.

Non-package associated artifacts are output to the Config directory, while package-associated ones are output to their package‘s source files‘ directory, unless an alternate path is specified in Code.

code
string
overrides the directory in which to write new code (defaults to where the existing code lives).
config
string
points to the directory to which to write configuration.
output:dir
string
outputs each artifact to the given directory, regardless of if it's package-associated or not.
string
output:none
skips outputting anything.
output:stdout
outputs everything to standard-out, with no separation.

Generally useful for single-artifact outputs.

Other Options

paths
string
represents paths and go-style path patterns to use as package roots.
string

Artifacts

Kubebuilder publishes test binaries and container images in addition to the main binary releases.

Test Binaries

You can find all of the test binaries at https://go.kubebuilder.io/test-tools. You can find individual test binaries at https://go.kubebuilder.io/test-tools/${version}/${os}/${arch}.

Container Images

You can find all container images for your os at https://go.kubebuilder.io/images/${os} or at gcr.io/kubebuilder/thirdparty-${os}. You can find individual container images at https://go.kubebuilder.io/images/${os}/${version} or at gcr.io/kubebuilder/thirdparty-${os}:${version}.

Writing controller tests

Testing Kubernetes controller is a big subject, and the boilerplate testing files generated for you by kubebuilder are fairly minimal.

Writing and Running Integration Tests documents steps to consider when writing integration steps for your controllers, and available options for configuring your test control plane using envtest.

Until more documentation has been written, your best bet to get started is to look at some existing examples, such as:

  • Azure Databricks Operator: see their fully fleshed-out suite_test.go as well as any *_test.go file in that directory like this one.

The basic approach is that, in your generated suite_test.go file, you will create a local Kubernetes API server, instantiate and run your controllers, and then write additional *_test.go files to test it using Ginko.

Using envtest in integration tests

controller-runtime offers envtest (godoc), a package that helps write integration tests for your controllers by setting up and starting an instance of etcd and the Kubernetes API server, without kubelet, controller-manager or other components.

Using envtest in integration tests follows the general flow of:

import sigs.k8s.io/controller-runtime/pkg/envtest

//specify testEnv configuration
testEnv = &envtest.Environment{
	CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
}

//start testEnv
cfg, err = testEnv.Start()

//write test logic

//stop testEnv
err = testEnv.Stop()

kubebuilder does the boilerplate setup and teardown of testEnv for you, in the ginkgo test suite that it generates under the /controllers directory.

Logs from the test runs are prefixed with test-env.

Configuring your test control plane

You can use environment variables and/or flags to specify the api-server and etcd setup within your integration tests.

Environment Variables

Variable nameTypeWhen to use
USE_EXISTING_CLUSTERbooleanInstead of setting up a local control plane, point to the control plane of an existing cluster.
KUBEBUILDER_ASSETSpath to directoryPoint integration tests to a directory containing all binaries (api-server, etcd and kubectl).
TEST_ASSET_KUBE_APISERVER, TEST_ASSET_ETCD, TEST_ASSET_KUBECTLpaths to, respectively, api-server, etcd and kubectl binariesSimilar to KUBEBUILDER_ASSETS, but more granular. Point integration tests to use binaries other than the default ones. These environment variables can also be used to ensure specific tests run with expected versions of these binaries.
KUBEBUILDER_CONTROLPLANE_START_TIMEOUT and KUBEBUILDER_CONTROLPLANE_STOP_TIMEOUTdurations in format supported by time.ParseDurationSpecify timeouts different from the default for the test control plane to (respectively) start and stop; any test run that exceeds them will fail.
KUBEBUILDER_ATTACH_CONTROL_PLANE_OUTPUTbooleanSet to true to attach the control plane’s stdout and stderr to os.Stdout and os.Stderr. This can be useful when debugging test failures, as output will include output from the control plane.

Flags

Here’s an example of modifying the flags with which to start the API server in your integration tests, compared to the default values in envtest.DefaultKubeAPIServerFlags:

customApiServerFlags := []string{
	"--secure-port=6884",
	"--admission-control=MutatingAdmissionWebhook",
}

apiServerFlags := append([]string(nil), envtest.DefaultKubeAPIServerFlags...)
apiServerFlags = append(apiServerFlags, customApiServerFlags...)

testEnv = &envtest.Environment{
	CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
	KubeAPIServerFlags: apiServerFlags,
}

Testing considerations

Unless you’re using an existing cluster, keep in mind that no built-in controllers are running in the test context. In some ways, the test control plane will behave differently from “real” clusters, and that might have an impact on how you write tests. One common example is garbage collection; because there are no controllers monitoring built-in resources, objects do not get deleted, even if an OwnerReference is set up.

To test that the deletion lifecycle works, test the ownership instead of asserting on existence. For example:

expectedOwnerReference := v1.OwnerReference{
	Kind:       "MyCoolCustomResource",
	APIVersion: "my.api.example.com/v1beta1",
	UID:        "d9607e19-f88f-11e6-a518-42010a800195",
	Name:       "userSpecifiedResourceName",
}
Expect(deployment.ObjectMeta.OwnerReferences).To(ContainElement(expectedOwnerReference))

Metrics

By default, controller-runtime builds a global prometheus registry and publishes a collection of performance metrics for each controller.

Protecting the Metrics

These metrics are protected by kube-auth-proxy by default if using kubebuilder. Kubebuilder v2.2.0+ scaffold a clusterrole which can be found at config/rbac/auth_proxy_client_clusterrole.yaml.

You will need to grant permissions to your Prometheus server so that it can scrape the protected metrics. To achieve that, you can create a clusterRoleBinding to bind the clusterRole to the service account that your Prometheus server uses.

You can run the following kubectl command to create it. If using kubebuilder <project-prefix> is the namePrefix field in config/default/kustomization.yaml.

kubectl create clusterrolebinding metrics --clusterrole=<project-prefix>-metrics-reader --serviceaccount=<namespace>:<service-account-name>

Exporting Metrics for Prometheus

Follow the steps below to export the metrics using the Prometheus Operator:

  1. Install Prometheus and Prometheus Operator. We recommend using kube-prometheus in production if you don’t have your own monitoring system. If you are just experimenting, you can only install Prometheus and Prometheus Operator.
  2. Uncomment the line - ../prometheus in the config/default/kustomization.yaml. It creates the ServiceMonitor resource which enables exporting the metrics.
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
- ../prometheus

Note that, when you install your project in the cluster, it will create the ServiceMonitor to export the metrics. To check the ServiceMonitor, run kubectl get ServiceMonitor -n <project>-system. See an example:

$ kubectl get ServiceMonitor -n monitor-system 
NAME                                         AGE
monitor-controller-manager-metrics-monitor   2m8s

Also, notice that the metrics are exported by default through port 8443. In this way, you are able to check the Prometheus metrics in its dashboard. To verify it, search for the metrics exported from the namespace where the project is running {namespace="<project>-system"}. See an example:

Screenshot 2019-10-02 at 13 07 13

Publishing Additional Metrics

If you wish to publish additional metrics from your controllers, this can be easily achieved by using the global registry from controller-runtime/pkg/metrics.

One way to achieve this is to declare your collectors as global variables and then register them using init().

For example:

import (
    "github.com/prometheus/client_golang/prometheus"
    "sigs.k8s.io/controller-runtime/pkg/metrics"
)

var (
    goobers = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "goobers_total",
            Help: "Number of goobers proccessed",
        },
    )
    gooberFailures = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "goober_failures_total",
            Help: "Number of failed goobers",
        },
    )
)

func init() {
    // Register custom metrics with the global prometheus registry
    metrics.Registry.MustRegister(goobers, gooberFailures)
}

You may then record metrics to those collectors from any part of your reconcile loop, and those metrics will be available for prometheus or other openmetrics systems to scrape.

TODO

If you’re seeing this page, it’s probably because something’s not done in the book yet. Go see if anyone else has found this or bug the maintainers.