Hello everyone, I’m roc, from the Tencent Kubernetes Engine (TKE) team. Today I will introduce a new feature of kubernetes in v1.17 that I am involved in: topology-aware service routing.
This article is translated from my Chinese blog post, which received a great response in China and was reposted by many well-known Chinese container technology media accounts.
- Topological domain: Indicates a certain type of “place” in the cluster, such as node, rack, zone or region etc.
- endpoint: An ip:port of a kubernetes service, usually the ip:port of a pod.
- service: kubernetes service resource, associated with a set of endpoints, traffic of access to the service will be forwarded to its’ associated endpoints.
Topology-aware service routing, simply called
Service Topology. This feature was originally proposed and designed by Jun Du (@m1093782566). Why design this feature? Imagine that the kubernetes cluster nodes are distributed in different places and the endpoints corresponding to the service are distributed in different nodes. The traditional forwarding strategy will load balance all endpoints and usually forward them with equal probability. When accessing the service, the traffic may be scattered and hit these different places. Although service forwarding is load-balanced, if the endpoints are far away, the network forwarding will have a high latency, which will affect network performance, and in some cases may even pay additional traffic costs. If the service can forward the endpoints nearby, will it be possible to reduce network latency and improve network performance? Yes! And this is exactly the purpose and significance of this feature.
The service’s nearest forwarding is actually a kind of network affinity, and it tends to forward to the endpoints closer to itself. Prior to this feature, there have been some other affinity designs and implementations in terms of scheduling and storage:
- Node Affinity: Allows Pods to be scheduled to Nodes that meet certain expectations, such as limiting scheduling to a certain Availability Zone, or requiring nodes to support GPUs. This is considered scheduling affinity, and the scheduling results depend on node attributes.
- Pod affinity and anti-affinity: Allows pod to be scheduled depends on ther pods. E.g. Let a group of pods to be scheduled to nodes in the same topology domain, or dispersed to nodes in different topology domains. This is also can be considered as scheduling affinity, and the scheduling result depends on other pods.
- Volume Topology-aware Scheduling: Allows Pods to be scheduled only to nodes that match the topology domain of the storage to which they are bound. This is considered as the affinity of scheduling and storage. The scheduling result depends on the topology domain of the storage.
- Local Persistent Volume: Let Pod use local data volume, such as high-performance SSD, which is useful in some scenarios that require high IOPS and low latency. It also guarantees that the Pod is always scheduled to the same node, and the data will not lost. This is also can considered as the affinity of scheduling and storage. The scheduling result depends on the node where the storage is located.
- Topology-Aware Volume Dynamic Provisioning: The Pod is scheduled first, and then the storage is created according to the topology domain of the node that pod been scheduled onto. This can be considered as the affinity between storage and scheduling, and the creation of storage depends on the scheduling result.
However, kubernetes currently does not have an affinity capability on the network side. The new feature of topology-aware service routing can just fill this gap. This feature enables services can be forwarded nearby instead of all endpoints with equal probability forwarding.[Read More]