Infrastructure for KGR runners
Infrastructure-runner overview
Runner name/tag prefix | Runner executor type - Platform |
---|---|
kgr1 |
Kubernetes - Tanzu Kubernetes Grid by Broadcom (VMware) |
kgr2 |
Docker - Docker in bwCloud VM |
kgr3 |
Kubernetes - microK8s running on bwCloud |
KGR1
Our Kubernetes cluster is build with following specification:
Nodes | Node count | CPU/RAM | Disk - / |
Disk - /builds/ |
---|---|---|---|---|
large+ | 1 | 8/64 | 160 | 80 |
large | 4 | 8/32 | 160 | 80 |
control plane | 3 |
WARNING ⚠️: The nodes are set as best-effort, it means the Tanzu implementation will try its best to keep up with the specs, but it is not unlikely to be lower.
NOTE 🗒️: There runs no Workload on control plane nodes.
WARNING ⚠️: Experimental runner runs on a standard or a test cluster.
Parameter explanation ℹ️
- Nodes: group of nodes
- Node count: How many node of this type is available
- CPU/RAM: amount of resources, memory is in GiB
- Disk -
/
: mounted disk in GiB size on root for jobs. The storage is shared across jobs and processes on whole node, so the value is purely informational - Disk
/builds/
: mounted disk size in GiB for repository directories (build directory). The storage is shared across jobs on whole node, so the value is purely informational
The relation between node sizes and runner sizes
The nodes and runner sizes are tightly connected. Here is connection between medium node and standard runner represented on rules for request (guaranteed values for jobs, lower value) and for limit (limiting values for jobs, upper value)
-
Request and limit:
- Design node - 4 CPU cores node with 16 GB Memory
- The request is made in a way that 3 runners (1 main and 1 helper container) can run design node
- The limit is made in a way that 2 runners (1 main, 1 helper and 1 service container) can run on design node
-
Overwrite values (advanced user can set in pipeline specification)
- The limit overwrite is set so 2 runners (1 main and helper container) can run on large node
- The request overwrite is just set under the limit overwrite
-
Service container overwrite
- Service and helper container overwrite values are just made to be higher than the standard
-
Storage
- ephemeral storage is following these rules as well, however less tightly
Diagram illustrates Size relations between runner values for request and limit and node size It shows the design size of the node (virtual unit used to plan the size of the runners). The standard runners are then so designed that, two fit with inside while using maximum resources possible for main and helper container. While the request is made in a way that even when using service, main and helper container there can three runner fit inside of one design node.
WARNING ⚠️: Resource limitation other then runner workload
- There is a bit of reserve on each node even with three pods at request values.
- There will be other workload at the time, like monitoring, so the real deployment situation will differ.
- With usage of best-effort nodes the node size might differ.
KGR2
NOTE 🗒️: Some resources are allocated to docker, host OS of VM, but these wont take as much.
Runner and its VM's | Node count | CPU/RAM | Disk - / |
---|---|---|---|
kgr2-instance- hugedisk | |||
medium | 1 | 2/8 | 128 |
kgr2-instance- standard | |||
large | 1 | 16/32 | 128 |
Parameter explanation ℹ️
- Runner and its VM's: grouping of runners and its nodes
- Node count: How many node of this type is available
- CPU/RAM: amount of resources, memory is in GiB
- Disk -
/
: mounted disk in GiB size on root for jobs. The storage is shared across jobs and processes on whole node, so the value is purely informational
KGR3
NOTE 🗒️: Some resources are allocated to docker, host OS of VM, but these wont take as much (not more than)
Nodes | Node count | CPU/RAM | Disk - / |
---|---|---|---|
large | 1 | 16/32 | 128 |
Parameter explanation ℹ️
- Nodes: group of nodes
- Node count: How many node of this type is available
- CPU/RAM: amount of resources, memory is in GiB
- Disk -
/
: mounted disk in GiB size on root for jobs. The storage is shared across jobs and processes on whole node, so the value is purely informational