Scaling Kyverno

Scaling considerations for a Kyverno installation.

Scaling Kyverno

Kyverno supports scaling in multiple dimensions, both vertical as well as horizontal. It is important to understand when to scale, how to scale, and what the effects of that scaling will have on its operation. See the sections below to understand these topics better.

Because Kyverno is an admission controller with many capabilities and due to the variability with respect to environment type, size, and composition of Kubernetes clusters, the amount of processing performed by Kyverno can vary greatly. Sizing a Kyverno installation based solely upon Node or Pod count is often not appropriate to accurately predict the amount of resources it will require.

For example, a large production cluster hosting 60,000 Pods yet with no Kyverno policies installed which match on Pod has no bearing on the resources required by Kyverno. Because webhooks are dynamically managed by Kyverno according to the policies installed in the cluster, no policies which match on Pod results in no information about Pods being sent by the API server to Kyverno and, therefore, reduced processing load.

However, any policies which match on a wildcard ("*") will result in Kyverno being forced to process every operation (CREATE, UPDATE, DELETE, and CONNECT) on every resource in the cluster. Even if the policy logic itself is simple, only a single, simple policy written in such a manner and installed in a large cluster can and will have significant impact on the resources required by Kyverno.

Vertical Scale

Vertical scaling refers to increasing the resources allocated to existing Pods, which amounts to resource requests and limits.

We recommend conducting tests in your own environment to determine real-world utilization in order to best set resource requests and limits, but as a best practice we also recommend not setting CPU limits.

Horizontal Scale

Horizontal scaling refers to increasing the number of replicas of a given controller. Kyverno supports multiple replicas for each of its controllers, but the effect of multiple replicas is handled differently according to the controller. See the high availability section for more details.

Scale Testing

The following tables show Kyverno performance test results for the admission and reports controllers. The admission controller table shows the resource consumption (memory and CPU) and latency as a result of increased AdmissionReviews per Second (ARPS) and how this is influenced by the queries per second (QPS) and burst settings.

The reports controller table shows the policy report count and size impacts including the various intermediary resources. Also shown are the resource consumption figures at a scale of up to 100,000 Pods.

In both tables, the testing was performed using K3d on an Ubuntu 20.04 system with an AMD EPYC 7502P 32-core processor @ 2.5GHz (max 3.35GHz) and 256GB of RAM.

For additional specifics on these tests along with a set of instructions which can be used to reproduce the environment, see the developer documentation here.

Admission Controller

replicas	# policies	Rule Type	Mode	Subject	memory request / limit	cpu request	ARPS	Latency (avg, unit: ms)	Memory (max)	CPU (max)	admission reports	bgscan reports	policy reports	reports controller memory (max)	reports controller CPU (max)	# nodes	# pods	QPS/Bust
3	17	Validate	Enforce	Pods	128 Mi / 384Mi	100m	14.92	44	150.60Mi	2.16	1000	1368	88	604.49Mi	8.51	300	1k	15/15
3	17	Validate	Enforce	Pods	128 Mi / 384Mi	100m	43.47	32	169Mi	5.55	5000	5369	164	781.25Mi	8.22	300	5k	50/50
3	17	Validate	Enforce	Pods	128 Mi / 384Mi	100m	81.97	78	215.64Mi	10.37	5000	5369	164	702.15Mi	4	300	5k	100/100
3	17	Validate	Enforce	Pods	128 Mi / 512Mi	100m	83.88	129	267.29Mi	8.75	4552	4907	146	598.70Mi	7.88	300	4552/5000	150/150
3	17	Validate	Enforce	Pods	128 Mi / 512Mi	100m	108.7	151	243.10Mi	15.34	2139	2630	124	375.98Mi	7.51	300	2262/5000	200/200

Reports Controller

# validate policies	# pods	memory request / limit	memory (max)	cpu request	CPU (max)	periodic scan interval / workers	total etcd size	policyreports count	admission reports count	background reports count	QPS/burst	# nodes	admission controller (memory request/limit)
17 PSS policies	1000	64Mi / 4Gi	240504832=229.36Mi	100m	6.28	30 mins / 2	43.54Mi	88	1000	1369	5/10	300	128Mi/384Mi
17 PSS policies	5000	64Mi / 4Gi	823582720=785.43Mi	100m	8	30 mins / 2	145.33Mi	164	5000	5369	50/50	300	128Mi/384Mi
17 PSS policies	10000	64Mi / 4Gi	1381728256=1.32Gi	100m	8.51	30 mins / 2	251.48Mi	258	10000	10369	50/50	300	128Mi/384Mi
17 PSS policies	10000	64Mi / 4Gi	1700921344=1.62Gi	100m	8.44	1h / 2	251.48Mi	258	10000	10369	50/50	300	128Mi/384Mi
17 PSS policies	19924 / 20000	64Mi / 4Gi	2693844992=2.51Gi	100m	9.62	1h / 2	470.42Mi	448	19885	20289	50/50	300	128Mi/384Mi
17 PSS policies	100940	64Mi / 20Gi	6866862080=6.40Gi	100m	5.55	1h / 2		1356	100587	11441	50/50	1000	128Mi/384Mi (OOM)

17 PSS policies	53456	64Mi / 10Gi	1.89Gi	100m	8.12	1h / 2		1077	52893	22742	50/50	500	128Mi/1Gi
17 PSS policies	53457	64Mi / 10Gi	2.84Gi	100m	7.39	2h / 2		1077	52893	33303	50/50	500	128Mi/1Gi
17 PSS policies	53457	64Mi / 10Gi	2.55Gi	100m	7.66	3h / 2	1.10Gi	1077	52893	35520	50/50	500	128Mi/1Gi

17 PSS policies	83716	64Mi / 10Gi		100m		3h / 2		1510/1305	82868	33768	50/50	800	128Mi/1Gi
17 PSS policies	80856	64Mi / 10Gi	2.20Gi	100m	19.13	2h / 10	2.24Gi	1573	n/a	80891	50/50	818	128Mi/384Mi
17 PSS policies	100392	64Mi / 10Gi	4.83Gi	100m	23.14	2h / 10	2.38Gi	1873	100033	73728	50/50	960	128Mi/512Mi

AdmissionReview Reference

API requests, operations, and activities which match corresponding Kyverno rules result in an AdmissionReview request getting sent to admission controllers like Kyverno. The number and frequency of these requests may vary greatly depending on the amount and type of activity in the cluster. The following table below is provided to give a sense of how many minimum AdmissionReview requests may result from common operations. These figures only refer to the minimum number and, in actuality, the final count will almost certainly be greater but varies depending on things like finalizers and other controllers in the cluster.

Operation	Resource	Config	ARs
CREATE	Pod		1
DELETE	Pod		3
CREATE	Deployment	replicas=1	3
UPDATE	Deployment	Change image	8
DELETE	Deployment	replicas=1	7
CREATE	Deployment	replicas=2	4
UPDATE	Deployment	Change image	13
DELETE	Deployment	replicas=2	10
CREATE	Job	restartPolicy=Never, backoffLimit=4	3
DELETE	Job		4
CREATE	CronJob	schedule="/1 * * *"	4 (3 per invocation)
DELETE	CronJob	schedule="/1 * * *", 2 completed	9
CREATE	ConfigMap		1
EDIT	ConfigMap		1
DELETE	ConfigMap		1

These figures were captured using K3d v5.4.9 on Kubernetes v1.26.2 and Kyverno 1.10.0-alpha.2 with a 3-replica admission controller. When testing against KinD, there may be one less DELETE AdmissionReview for Pod-related operations.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified May 30, 2023 at 11:31 AM PST: Add scale testing results (#877) (8ff14e5)