Infrastructure for KGR runners

Infrastructure-runner overview

Runner name/tag prefix	Runner executor type - Platform
`kgr1`	Kubernetes - Tanzu Kubernetes Grid by Broadcom (VMware)
`kgr2`	Docker - Docker in bwCloud VM
`kgr3`	Kubernetes - microK8s running on bwCloud
`kgr4`	Kubernetes - k3s running on bwCloud

Tanzu cluster - KGR1

Our Kubernetes cluster is build with following specification:

Nodes	Node count	CPU/RAM	Disk - `/`	Disk - `/builds/`
large+	1	8/64	160	80
large	4	8/32	160	80
control plane	3

WARNING ⚠️: The nodes are set as best-effort, it means the Tanzu implementation will try its best to keep up with the specs, but it is not unlikely to be lower.

NOTE 🗒️: There runs no Workload on control plane nodes.

WARNING ⚠️: Experimental runner runs on a standard or a test cluster.

bwCloud - KGR2-4

NOTE 🗒️: Some resources are allocated to docker/kubernetes implementation, host OS of VM, but these wont take as much.

Runner and its VM's	Node count	CPU/RAM	Disk - `/`
kgr2-instance-hugedisk
medium	1	4/8	128
kgr2-instance-standard
xlarge	1	8/16	128
kgr3-instance-standard
xlarge	1	8/16	128
kgr4-instance-standard
xlarge	1	8/16	128

Parameter explanation ℹ️

Runner and its VM's: grouping of runners and its nodes
Nodes: group of nodes
Node count: How many node of this type is available
CPU/RAM: amount of resources, memory is in GiB
Disk - /: mounted disk in GiB size on root for jobs. The storage is shared across jobs and processes on whole node, so the value is purely informational
Disk /builds/: mounted disk size in GiB for repository directories (build directory). The storage is shared across jobs on whole node, so the value is purely informational (in case of no /builds/ the space is on the same disk as /)

The relation between node sizes and runner sizes - Kubernetes runners

The nodes and runner sizes are tightly connected. Here is connection between medium node and standard runner represented on rules for request (guaranteed values for jobs, lower value) and for limit (limiting values for jobs, upper value)

Request and limit:
- Design node - 4 CPU cores node with 16 GB Memory
- The request is made in a way that 3 runners (1 main and 1 helper container) can run design node
- The limit is made in a way that 2 runners (1 main, 1 helper and 1 service container) can run on design node
Overwrite values (advanced user can set in pipeline specification)
- The limit overwrite is set so 2 runners (1 main and helper container) can run on large node
- The request overwrite is just set under the limit overwrite
Service container overwrite
- Service and helper container overwrite values are just made to be higher than the standard
Storage
- ephemeral storage is following these rules as well, however less tightly

cluster node runner size relations Diagram illustrates Size relations between runner values for request and limit and node size It shows the design size of the node (virtual unit used to plan the size of the runners). The standard runners are then so designed that, two fit with inside while using maximum resources possible for main and helper container. While the request is made in a way that even when using service, main and helper container there can three runner fit inside of one design node.

WARNING ⚠️: Resource limitation other then runner workload

There is a bit of reserve on each node even with three pods at request values.

There will be other workload at the time, like monitoring, so the real deployment situation will differ.

With usage of best-effort nodes on "kgr1 cluster" the node size might differ.