An abstract way to expose an application running on a set of PodsThe smallest and simplest Kubernetes object. A Pod represents a set of running containers on your cluster. as a network service.
With Kubernetes you don’t need to modify your application to use an unfamiliar service discovery mechanism. Kubernetes gives Pods their own IP addresses and a single DNS name for a set of Pods, and can load-balance across them.
Kubernetes PodsThe smallest and simplest Kubernetes object. A Pod represents a set of running containers on your cluster. are mortal. They are born and when they die, they are not resurrected. If you use a DeploymentAn API object that manages a replicated application. to run your app, it can create and destroy Pods dynamically.
Each Pod gets its own IP address, however in a Deployment, the set of Pods running in one moment in time could be different from the set of Pods running that application a moment later.
This leads to a problem: if some set of Pods (call them “backends”) provides functionality to other Pods (call them “frontends”) inside your cluster, how do the frontends find out and keep track of which IP address to connect to, so that the frontend can use the backend part of the workload?
In Kubernetes, a Service is an abstraction which defines a logical set of Pods and a policy by which to access them (sometimes this pattern is called a micro-service). The set of Pods targeted by a Service is usually determined by a selectorAllows users to filter a list of resources based on labels. (see below for why you might want a Service without a selector).
For example, consider a stateless image-processing backend which is running with 3 replicas. Those replicas are fungible—frontends do not care which backend they use. While the actual Pods that compose the backend set may change, the frontend clients should not need to be aware of that, nor should they need to keep track of the set of backends themselves.
The Service abstraction enables this decoupling.
If you’re able to use Kubernetes APIs for service discovery in your application, you can query the API serverComponent on the master that exposes the Kubernetes API. It is the front-end for the Kubernetes control plane. for Endpoints, that get updated whenever the set of Pods in a Service changes.
For non-native applications, Kubernetes offers ways to place a network port or load balancer in between your application and the backend Pods.
A Service in Kubernetes is a REST object, similar to a Pod. Like all of the
REST objects, you can
POST a Service definition to the API server to create
a new instance.
For example, suppose you have a set of Pods that each listen on TCP port 9376
and carry a label
apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app: MyApp ports: - protocol: TCP port: 80 targetPort: 9376
This specification creates a new Service object named “my-service”, which
targets TCP port 9376 on any Pod with the
Kubernetes assigns this Service an IP address (sometimes called the “cluster IP”), which is used by the Service proxies (see Virtual IPs and service proxies below).
The controller for the Service selector continuously scans for Pods that match its selector, and then POSTs any updates to an Endpoint object also named “my-service”.
Note: A Service can map any incoming
targetPort. By default and for convenience, the
targetPortis set to the same value as the
Port definitions in Pods have names, and you can reference these names in the
targetPort attribute of a Service. This works even if there is a mixture
of Pods in the Service using a single configured name, with the same network
protocol available via different port numbers.
This offers a lot of flexibility for deploying and evolving your Services.
For example, you can change the port numbers that Pods expose in the next
version of your backend software, without breaking clients.
The default protocol for Services is TCP; you can also use any other supported protocol.
As many Services need to expose more than one port, Kubernetes supports multiple
port definitions on a Service object.
Each port definition can have the same
protocol, or a different one.
Services most commonly abstract access to Kubernetes Pods, but they can also abstract other kinds of backends. For example:
In any of these scenarios you can define a Service without a Pod selector. For example:
apiVersion: v1 kind: Service metadata: name: my-service spec: ports: - protocol: TCP port: 80 targetPort: 9376
Because this Service has no selector, the corresponding Endpoint object is not created automatically. You can manually map the Service to the network address and port where it’s running, by adding an Endpoint object manually:
apiVersion: v1 kind: Endpoints metadata: name: my-service subsets: - addresses: - ip: 192.0.2.42 ports: - port: 9376
The endpoint IPs must not be: loopback (127.0.0.0/8 for IPv4, ::1/128 for IPv6), or link-local (169.254.0.0/16 and 184.108.40.206/24 for IPv4, fe80::/64 for IPv6).
Endpoint IP addresses cannot be the cluster IPs of other Kubernetes Services, because kube-proxykube-proxy is a network proxy that runs on each node in the cluster. doesn’t support virtual IPs as a destination.
Accessing a Service without a selector works the same as if it had a selector.
In the example above, traffic is routed to the single endpoint defined in
An ExternalName Service is a special case of Service that does not have selectors and uses DNS names instead. For more information, see the ExternalName section later in this document.
Every node in a Kubernetes cluster runs a
responsible for implementing a form of virtual IP for
Services of type other
A question that pops up every now and then is why Kubernetes relies on proxying to forward inbound traffic to backends. What about other approaches? For example, would it be possible to configure DNS records that have multiple A values (or AAAA for IPv6), and rely on round-robin name resolution?
There are a few reasons for using proxying for Services:
Since Kubernetes v1.0 you have been able to use the userspace proxy mode. Kubernetes v1.1 added iptables mode proxying, and in Kubernetes v1.2 the iptables mode for kube-proxy became the default. Kubernetes v1.8 added ipvs proxy mode.
In this mode, kube-proxy watches the Kubernetes master for the addition and
removal of Service and Endpoint objects. For each Service it opens a
port (randomly chosen) on the local node. Any connections to this “proxy port”
is proxied to one of the Service’s backend Pods (as reported via
Endpoints). kube-proxy takes the
SessionAffinity setting of the Service into
account when deciding which backend Pod to use.
Lastly, the user-space proxy installs iptables rules which capture traffic to
clusterIP (which is virtual) and
port. The rules
redirect that traffic to the proxy port which proxies the backend Pod.
By default, kube-proxy in userspace mode chooses a backend via a round-robin algorithm.
In this mode, kube-proxy watches the Kubernetes control plane for the addition and
removal of Service and Endpoint objects. For each Service, it installs
iptables rules, which capture traffic to the Service’s
and redirect that traffic to one of the Service’s
backend sets. For each Endpoint object, it installs iptables rules which
select a backend Pod.
By default, kube-proxy in iptables mode chooses a backend at random.
Using iptables to handle traffic has a lower system overhead, because traffic is handled by Linux netfilter without the need to switch between userspace and the kernel space. This approach is also likely to be more reliable.
If kube-proxy is running in iptables mode and the first Pod that’s selected does not respond, the connection fails. This is different from userspace mode: in that scenario, kube-proxy would detect that the connection to the first Pod had failed and would automatically retry with a different backend Pod.
You can use Pod readiness probes to verify that backend Pods are working OK, so that kube-proxy in iptables mode only sees backends that test out as healthy. Doing this means you avoid having traffic sent via kube-proxy to a Pod that’s known to have failed.
ipvs mode, kube-proxy watches Kubernetes Services and Endpoints,
netlink interface to create IPVS rules accordingly and synchronizes
IPVS rules with Kubernetes Services and Endpoints periodically.
This control loop ensures that IPVS status matches the desired
When accessing a Service, IPVS directs traffic to one of the backend Pods.
The IPVS proxy mode is based on netfilter hook function that is similar to iptables mode, but uses hash table as the underlying data structure and works in the kernel space. That means kube-proxy in IPVS mode redirects traffic with a lower latency than kube-proxy in iptables mode, with much better performance when synchronising proxy rules. Compared to the other proxy modes, IPVS mode also supports a higher throughput of network traffic.
IPVS provides more options for balancing traffic to backend Pods; these are:
lc: least connection (smallest number of open connections)
dh: destination hashing
sh: source hashing
sed: shortest expected delay
nq: never queue
To run kube-proxy in IPVS mode, you must make the IPVS Linux available on the node before you starting kube-proxy.
When kube-proxy starts in IPVS proxy mode, it verifies whether IPVS kernel modules are available. If the IPVS kernel modules are not detected, then kube-proxy falls back to running in iptables proxy mode.
In these proxy models, the traffic bound for the Service’s IP:Port is proxied to an appropriate backend without the clients knowing anything about Kubernetes or Services or Pods.
If you want to make sure that connections from a particular client
are passed to the same Pod each time, you can select the session affinity based
the on client’s IP addresses by setting
service.spec.sessionAffinity to “ClientIP”
(the default is “None”).
You can also set the maximum session sticky time by setting
(the default value is 10800, which works out to be 3 hours).
For some Services, you need to expose more than one port. Kubernetes lets you configure multiple port definitions on a Service object. When using multiple ports for a Service, you must give all of your ports names so that these are unambiguous. For example:
apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app: MyApp ports: - name: http protocol: TCP port: 80 targetPort: 9376 - name: https protocol: TCP port: 443 targetPort: 9377
As with Kubernetes namesA client-provided string that refers to an object in a resource URL, such as /api/v1/pods/some-name. in general, names for ports must only contain lowercase alphanumeric characters and
-. Port names must also start and end with an alphanumeric character.
For example, the names
webare valid, but
You can specify your own cluster IP address as part of a
request. To do this, set the
.spec.clusterIP field. For example, if you
already have an existing DNS entry that you wish to reuse, or legacy systems
that are configured for a specific IP address and difficult to re-configure.
The IP address that you choose must be a valid IPv4 or IPv6 address from within the
service-cluster-ip-range CIDR range that is configured for the API server.
If you try to create a Service with an invalid clusterIP address value, the API
server will return a 422 HTTP status code to indicate that there’s a problem.
Kubernetes supports 2 primary modes of finding a Service - environment variables and DNS.
When a Pod is run on a Node, the kubelet adds a set of environment variables
for each active Service. It supports both Docker links
compatible variables (see
where the Service name is upper-cased and dashes are converted to underscores.
For example, the Service
"redis-master" which exposes TCP port 6379 and has been
allocated cluster IP address 10.0.0.11, produces the following environment
REDIS_MASTER_SERVICE_HOST=10.0.0.11 REDIS_MASTER_SERVICE_PORT=6379 REDIS_MASTER_PORT=tcp://10.0.0.11:6379 REDIS_MASTER_PORT_6379_TCP=tcp://10.0.0.11:6379 REDIS_MASTER_PORT_6379_TCP_PROTO=tcp REDIS_MASTER_PORT_6379_TCP_PORT=6379 REDIS_MASTER_PORT_6379_TCP_ADDR=10.0.0.11
When you have a Pod that needs to access a Service, and you are using the environment variable method to publish the port and cluster IP to the client Pods, you must create the Service before the client Pods come into existence. Otherwise, those client Pods won’t have their environment variables populated.
If you only use DNS to discover the cluster IP for a Service, you don’t need to worry about this ordering issue.
You can (and almost always should) set up a DNS service for your Kubernetes cluster using an add-on.
A cluster-aware DNS server, such as CoreDNS, watches the Kubernetes API for new Services and creates a set of DNS records for each one. If DNS has been enabled throughout your cluster then all Pods should automatically be able to resolve Services by their DNS name.
For example, if you have a Service called
"my-service" in a Kubernetes
"my-ns", the control plane and the DNS Service acting together
create a DNS record for
"my-service.my-ns". Pods in the
should be able to find it by simply doing a name lookup for
"my-service.my-ns" would also work).
Pods in other Namespaces must qualify the name as
my-service.my-ns. These names
will resolve to the cluster IP assigned for the Service.
Kubernetes also supports DNS SRV (Service) records for named ports. If the
"my-service.my-ns" Service has a port named
"http" with protocol set to
TCP, you can do a DNS SRV query for
_http._tcp.my-service.my-ns to discover
the port number for
"http", as well as the IP address.
The Kubernetes DNS server is the only way to access
You can find more information about
ExternalName resolution in
DNS Pods and Services.
Sometimes you don’t need load-balancing and a single Service IP. In
this case, you can create what are termed “headless” Services, by explicitly
"None" for the cluster IP (
You can use a headless Service to interface with other service discovery mechanisms, without being tied to Kubernetes’ implementation.
Services, a cluster IP is not allocated, kube-proxy does not handle
these Services, and there is no load balancing or proxying done by the platform
for them. How DNS is automatically configured depends on whether the Service has
For headless Services that define selectors, the endpoints controller creates
Endpoints records in the API, and modifies the DNS configuration to return
records (addresses) that point directly to the
Pods backing the
For headless Services that do not define selectors, the endpoints controller does
Endpoints records. However, the DNS system looks for and configures
Endpointsthat share a name with the Service, for all other types.
For some parts of your application (for example, frontends) you may want to expose a Service onto an external IP address, that’s outside of your cluster.
ServiceTypes allow you to specify what kind of Service you want.
The default is
Type values and their behaviors are:
ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default
NodePort: Exposes the Service on each Node’s IP at a static port (the
ClusterIPService, to which the
NodePortService routes, is automatically created. You’ll be able to contact the
NodePortService, from outside the cluster, by requesting
LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer.
ClusterIPServices, to which the external load balancer routes, are automatically created.
ExternalName: Maps the Service to the contents of the
externalName field (e.g.
foo.bar.example.com), by returning a
with its value. No proxying of any kind is set up.
Note: You need CoreDNS version 1.7 or higher to use the
You can also use Ingress to expose your Service. Ingress is not a Service type, but it acts as the entry point for your cluster. It lets you consolidate your routing rules into a single resource as it can expose multiple services under the same IP address.
If you set the
type field to
NodePort, the Kubernetes control plane
allocates a port from a range specified by
--service-node-port-range flag (default: 30000-32767).
Each node proxies that port (the same port number on every Node) into your Service.
Your Service reports the allocated port in its
If you want to specify particular IP(s) to proxy the port, you can set the
--nodeport-addresses flag in kube-proxy to particular IP block(s); this is supported since Kubernetes v1.10.
This flag takes a comma-delimited list of IP blocks (e.g. 10.0.0.0/8, 192.0.2.0/25) to specify IP address ranges that kube-proxy should consider as local to this node.
For example, if you start kube-proxy with the
--nodeport-addresses=127.0.0.0/8 flag, kube-proxy only selects the loopback interface for NodePort Services. The default for
--nodeport-addresses is an empty list. This means that kube-proxy should consider all available network interfaces for NodePort. (That’s also compatible with earlier Kubernetes releases).
If you want a specific port number, you can specify a value in the
field. The control plane will either allocate you that port or report that
the API transaction failed.
This means that you need to take care about possible port collisions yourself.
You also have to use a valid port number, one that’s inside the range configured
for NodePort use.
Using a NodePort gives you the freedom to set up your own load balancing solution, to configure environments that are not fully supported by Kubernetes, or even to just expose one or more nodes’ IPs directly.
Note that this Service is visible as
.spec.clusterIP:spec.ports[*].port. (If the
--nodeport-addresses flag in kube-proxy is set,
On cloud providers which support external load balancers, setting the
LoadBalancer provisions a load balancer for your Service.
The actual creation of the load balancer happens asynchronously, and
information about the provisioned balancer is published in the Service’s
apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app: MyApp ports: - protocol: TCP port: 80 targetPort: 9376 clusterIP: 10.0.171.239 loadBalancerIP: 220.127.116.11 type: LoadBalancer status: loadBalancer: ingress: - ip: 18.104.22.168
Traffic from the external load balancer is directed at the backend Pods. The cloud provider decides how it is load balanced.
Some cloud providers allow you to specify the
loadBalancerIP. In those cases, the load-balancer is created
with the user-specified
loadBalancerIP. If the
loadBalancerIP field is not specified,
the loadBalancer is set up with an ephemeral IP address. If you specify a
but your cloud provider does not support the feature, the
loadbalancerIP field that you
set is ignored.
Note: If you’re using SCTP, see the caveat below about the
On Azure, if you want to use a user-specified public type
loadBalancerIP, you first need to create a static type public IP address resource. This public IP address resource should be in the same resource group of the other automatically created resources of the cluster. For example,
Specify the assigned IP address as loadBalancerIP. Ensure that you have updated the securityGroupName in the cloud provider configuration file. For information about troubleshooting
CreatingLoadBalancerFailedpermission issues see, Use a static IP address with the Azure Kubernetes Service (AKS) load balancer or CreatingLoadBalancerFailed on AKS cluster with advanced networking.
In a mixed environment it is sometimes necessary to route traffic from Services inside the same (virtual) network address block.
In a split-horizon DNS environment you would need two Services to be able to route both external and internal traffic to your endpoints.
You can achieve this by adding one the following annotations to a Service. The annotation to add depends on the cloud Service provider you’re using.
Select one of the tabs.
[...] metadata: name: my-service annotations: cloud.google.com/load-balancer-type: "Internal" [...]
cloud.google.com/load-balancer-type: "internal" for masters with version 1.7.0 to 1.7.3.
For more information, see the docs.
[...] metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-internal: "true" [...]
[...] metadata: name: my-service annotations: service.beta.kubernetes.io/azure-load-balancer-internal: "true" [...]
[...] metadata: name: my-service annotations: service.beta.kubernetes.io/openstack-internal-load-balancer: "true" [...]
[...] metadata: name: my-service annotations: service.beta.kubernetes.io/cce-load-balancer-internal-vpc: "true" [...]
For partial TLS / SSL support on clusters running on AWS, you can add three
annotations to a
metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012
The first specifies the ARN of the certificate to use. It can be either a certificate from a third party issuer that was uploaded to IAM or one created within AWS Certificate Manager.
metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-backend-protocol: (https|http|ssl|tcp)
The second annotation specifies which protocol a Pod speaks. For HTTPS and SSL, the ELB expects the Pod to authenticate itself over the encrypted connection, using a certificate.
HTTP and HTTPS selects layer 7 proxying: the ELB terminates
the connection with the user, parse headers and inject the
header with the user’s IP address (Pods only see the IP address of the
ELB at the other end of its connection) when forwarding requests.
TCP and SSL selects layer 4 proxying: the ELB forwards traffic without modifying the headers.
In a mixed-use environment where some ports are secured and others are left unencrypted, you can use the following annotations:
metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443,8443"
In the above example, if the Service contained three ports,
8443 would use the SSL certificate, but
80 would just
be proxied HTTP.
From Kubernetes v1.9 onwards you can use predefined AWS SSL policies with HTTPS or SSL listeners for your Services.
To see which policies are available for use, you can use the
aws command line tool:
aws elb describe-load-balancer-policies --query 'PolicyDescriptions.PolicyName'
You can then specify any one of those policies using the
annotation; for example:
metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: "ELBSecurityPolicy-TLS-1-2-2017-01"
To enable PROXY protocol support for clusters running on AWS, you can use the following service annotation:
metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
Since version 1.3.0, the use of this annotation applies to all ports proxied by the ELB and cannot be configured otherwise.
There are several annotations to manage access logs for ELB Services on AWS.
controls whether access logs are enabled.
controls the interval in minutes for publishing the access logs. You can specify
an interval of either 5 or 60 minutes.
controls the name of the Amazon S3 bucket where load balancer access logs are
specifies the logical hierarchy you created for your Amazon S3 bucket.
metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: "true" # Specifies whether access logs are enabled for the load balancer service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval: "60" # The interval for publishing the access logs. You can specify an interval of either 5 or 60 (minutes). service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: "my-bucket" # The name of the Amazon S3 bucket where the access logs are stored service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix: "my-bucket-prefix/prod" # The logical hierarchy you created for your Amazon S3 bucket, for example `my-bucket-prefix/prod`
Connection draining for Classic ELBs can be managed with the annotation
to the value of
"true". The annotation
also be used to set maximum time, in seconds, to keep the existing connections open before deregistering the instances.
metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled: "true" service.beta.kubernetes.io/aws-load-balancer-connection-draining-timeout: "60"
There are other annotations to manage Classic Elastic Load Balancers that are described below.
metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60" # The time, in seconds, that the connection is allowed to be idle (no data has been sent over the connection) before it is closed by the load balancer service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true" # Specifies whether cross-zone load balancing is enabled for the load balancer service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "environment=prod,owner=devops" # A comma-separated list of key-value pairs which will be recorded as # additional tags in the ELB. service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "" # The number of successive successful health checks required for a backend to # be considered healthy for traffic. Defaults to 2, must be between 2 and 10 service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "3" # The number of unsuccessful health checks required for a backend to be # considered unhealthy for traffic. Defaults to 6, must be between 2 and 10 service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "20" # The approximate interval, in seconds, between health checks of an # individual instance. Defaults to 10, must be between 5 and 300 service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "5" # The amount of time, in seconds, during which no response means a failed # health check. This value must be less than the service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval # value. Defaults to 5, must be between 2 and 60 service.beta.kubernetes.io/aws-load-balancer-extra-security-groups: "sg-53fae93f,sg-42efd82e" # A list of additional security groups to be added to the ELB
To use a Network Load Balancer on AWS, use the annotation
service.beta.kubernetes.io/aws-load-balancer-type with the value set to
metadata: name: my-service annotations: service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
Note: NLB only works with certain instance classes; see the AWS documentation on Elastic Load Balancing for a list of supported instance types.
Unlike Classic Elastic Load Balancers, Network Load Balancers (NLBs) forward the
client’s IP address through to the node. If a Service’s
is set to
Cluster, the client’s IP address is not propagated to the end
Local, the client IP addresses is
propagated to the end Pods, but this could result in uneven distribution of
traffic. Nodes without any Pods for a particular LoadBalancer Service will fail
the NLB Target Group’s health check on the auto-assigned
.spec.healthCheckNodePort and not receive any traffic.
In order to achieve even traffic, either use a DaemonSet, or specify a pod anti-affinity to not locate on the same node.
You can also use NLB Services with the internal load balancer annotation.
In order for client traffic to reach instances behind an NLB, the Node security groups are modified with the following IP rules:
|Health Check||TCP||NodePort(s) (
In order to limit which client IP’s can access the Network Load Balancer,
spec: loadBalancerSourceRanges: - "22.214.171.124/16"
.spec.loadBalancerSourceRangesis not set, Kubernetes allows traffic from
0.0.0.0/0to the Node Security Group(s). If nodes have public IP addresses, be aware that non-NLB traffic can also reach all instances in those modified security groups.
Services of type ExternalName map a Service to a DNS name, not to a typical selector such as
cassandra. You specify these Services with the
This Service definition, for example, maps
my-service Service in the
prod namespace to
apiVersion: v1 kind: Service metadata: name: my-service namespace: prod spec: type: ExternalName externalName: my.database.example.com
Note: ExternalName accepts an IPv4 address string, but as a DNS names comprised of digits, not as an IP address. ExternalNames that resemble IPv4 addresses are not resolved by CoreDNS or ingress-nginx because ExternalName is intended to specify a canonical DNS name. To hardcode an IP address, consider using headless Services.
When looking up the host
my-service.prod.svc.cluster.local, the cluster DNS Service
CNAME record with the value
my-service works in the same way as other Services but with the crucial
difference that redirection happens at the DNS level rather than via proxying or
forwarding. Should you later decide to move your database into your cluster, you
can start its Pods, add appropriate selectors or endpoints, and change the
If there are external IPs that route to one or more cluster nodes, Kubernetes Services can be exposed on those
externalIPs. Traffic that ingresses into the cluster with the external IP (as destination IP), on the Service port,
will be routed to one of the Service endpoints.
externalIPs are not managed by Kubernetes and are the responsibility
of the cluster administrator.
In the Service spec,
externalIPs can be specified along with any of the
In the example below, “
my-service” can be accessed by clients on “
apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app: MyApp ports: - name: http protocol: TCP port: 80 targetPort: 9376 externalIPs: - 126.96.36.199
Using the userspace proxy for VIPs, work at small to medium scale, but will not scale to very large clusters with thousands of Services. The original design proposal for portals has more details on this.
Using the userspace proxy obscures the source IP address of a packet accessing a Service. This makes some kinds of network filtering (firewalling) impossible. The iptables proxy mode does not obscure in-cluster source IPs, but it does still impact clients coming through a load balancer or node-port.
Type field is designed as nested functionality - each level adds to the
previous. This is not strictly required on all cloud providers (e.g. Google Compute Engine does
not need to allocate a
NodePort to make
LoadBalancer work, but AWS does)
but the current API requires it.
The previous information should be sufficient for many people who just want to use Services. However, there is a lot going on behind the scenes that may be worth understanding.
One of the primary philosophies of Kubernetes is that you should not be exposed to situations that could cause your actions to fail through no fault of your own. For the design of the Service resource, this means not making you choose your own port number for a if that choice might collide with someone else’s choice. That is an isolation failure.
In order to allow you to choose a port number for your Services, we must ensure that no two Services can collide. Kubernetes does that by allocating each Service its own IP address.
To ensure each Service receives a unique IP, an internal allocator atomically updates a global allocation map in etcdConsistent and highly-available key value store used as Kubernetes’ backing store for all cluster data. prior to creating each Service. The map object must exist in the registry for Services to get IP address assignments, otherwise creations will fail with a message indicating an IP address could not be allocated.
In the control plane, a background controller is responsible for creating that map (needed to support migrating from older versions of Kubernetes that used in-memory locking). Kubernetes also uses controllers to checking for invalid assignments (eg due to administrator intervention) and for cleaning up allocated IP addresses that are no longer used by any Services.
Unlike Pod IP addresses, which actually route to a fixed destination, Service IPs are not actually answered by a single host. Instead, kube-proxy uses iptables (packet processing logic in Linux) to define virtual IP addresses which are transparently redirected as needed. When clients connect to the VIP, their traffic is automatically transported to an appropriate endpoint. The environment variables and DNS for Services are actually populated in terms of the Service’s virtual IP address (and port).
kube-proxy supports three proxy modes—userspace, iptables and IPVS—which each operate slightly differently.
As an example, consider the image processing application described above. When the backend Service is created, the Kubernetes master assigns a virtual IP address, for example 10.0.0.1. Assuming the Service port is 1234, the Service is observed by all of the kube-proxy instances in the cluster. When a proxy sees a new Service, it opens a new random port, establishes an iptables redirect from the virtual IP address to this new port, and starts accepting connections on it.
When a client connects to the Service’s virtual IP address, the iptables rule kicks in, and redirects the packets to the proxy’s own port. The “Service proxy” chooses a backend, and starts proxying traffic from the client to the backend.
This means that Service owners can choose any port they want without risk of collision. Clients can simply connect to an IP and port, without being aware of which Pods they are actually accessing.
Again, consider the image processing application described above. When the backend Service is created, the Kubernetes control plane assigns a virtual IP address, for example 10.0.0.1. Assuming the Service port is 1234, the Service is observed by all of the kube-proxy instances in the cluster. When a proxy sees a new Service, it installs a series of iptables rules which redirect from the virtual IP address to per-Service rules. The per-Service rules link to per-Endpoint rules which redirect traffic (using destination NAT) to the backends.
When a client connects to the Service’s virtual IP address the iptables rule kicks in. A backend is chosen (either based on session affinity or randomly) and packets are redirected to the backend. Unlike the userspace proxy, packets are never copied to userspace, the kube-proxy does not have to be running for the virtual IP address to work, and Nodes see traffic arriving from the unaltered client IP address.
This same basic flow executes when traffic comes in through a node-port or through a load-balancer, though in those cases the client IP does get altered.
iptables operations slow down dramatically in large scale cluster e.g 10,000 Services. IPVS is designed for load balancing and based on in-kernel hash tables. So you can achieve performance consistency in large number of Services from IPVS-based kube-proxy. Meanwhile, IPVS-based kube-proxy has more sophisticated load balancing algorithms (least conns, locality, weighted, persistence).
Service is a top-level resource in the Kubernetes REST API. You can find more details about the API object at: Service API object.
You can use TCP for any kind of Service, and it’s the default network protocol.
You can use UDP for most Services. For type=LoadBalancer Services, UDP support depends on the cloud provider offering this facility.
If your cloud provider supports it, you can use a Service in LoadBalancer mode to set up external HTTP / HTTPS reverse proxying, forwarded to the Endpoints of the Service.
Note: You can also use IngressAn API object that manages external access to the services in a cluster, typically HTTP. in place of Service to expose HTTP / HTTPS Services.
If your cloud provider supports it (eg, AWS), you can use a Service in LoadBalancer mode to configure a load balancer outside of Kubernetes itself, that will forward connections prefixed with PROXY protocol.
The load balancer will send an initial series of octets describing the incoming connection, similar to this example
PROXY TCP4 192.0.2.202 10.0.42.7 12345 7\r\n
followed by the data from the client.
Kubernetes supports SCTP as a
protocol value in Service, Endpoint, NetworkPolicy and Pod definitions as an alpha feature. To enable this feature, the cluster administrator needs to enable the
SCTPSupport feature gate on the apiserver, for example,
When the feature gate is enabled, you can set the
protocol field of a Service, Endpoint, NetworkPolicy or Pod to
SCTP. Kubernetes sets up the network accordingly for the SCTP associations, just like it does for TCP connections.
The support of multihomed SCTP associations requires that the CNI plugin can support the assignment of multiple interfaces and IP addresses to a Pod.
NAT for multihomed SCTP associations requires special logic in the corresponding kernel modules.
Warning: You can only create a Service with
protocolSCTP if the cloud provider’s load balancer implementation supports SCTP as a protocol. Otherwise, the Service creation request is rejected. The current set of cloud load balancer providers (Azure, AWS, CloudStack, GCE, OpenStack) all lack support for SCTP.
Warning: SCTP is not supported on Windows based nodes.
Warning: The kube-proxy does not support the management of SCTP associations when it is in userspace mode.
In the future, the proxy policy for Services can become more nuanced than simple round-robin balancing, for example master-elected or sharded. We also envision that some Services will have “real” load balancers, in which case the virtual IP address will simply transport the packets there.
The Kubernetes project intends to improve support for L7 (HTTP) Services.
The Kubernetes project intends to have more flexible ingress modes for Services which encompass the current ClusterIP, NodePort, and LoadBalancer modes and more.
Was this page helpful?
Thanks for the feedback. If you have a specific, answerable question about how to use Kubernetes, ask it on Stack Overflow. Open an issue in the GitHub repo if you want to report a problem or suggest an improvement.