This was my talk at QConNY 2018, which was part of the container orchestration track, about how Chick-fil-A uses Kubernetes at the Edge in our restaurants and how we have engineered some solutions to solve problems that are unique to our scale.
2. What to expect from the session
• Intro
• How is CFA using K8s?
• What does our
architecture look like?
• How are we
engineering around
K8s for our business?
• Q&A
4. AT PEAK HOUR
1 sandwich every 16 seconds
1 box of nuggets every 25 seconds
1 order of waffle fries every 14 seconds
1 car through the drive thru every 22 seconds
267 total transactions
12. Engineering Around K8s
• How we build and repair bare
metal clusters
• SRE Lessons Learned
• How we deploy applications to
thousands of clusters
13. Challenges of Bare Metal K8s clustering at scale
• Goal: #code2prod
• Simple enough for a non-
technologist to install
• Manageable remotely
• Automated device discovery
and self-clustering
• Self healing & HA
14. How we Bare Metal Cluster K8s at scale
Highlander Hooves Up
TOOLS
Sherlock FleetRKEImage
PROCESS
15. Bootstrapping Clusters
• Highlander
– Node coordination and
clustering leader election
using UDP
– Execute clustering (RKE)
– Swap KubeDNS for CoreDNS
– Base OAuth identity
negotiation
– Controller Pods (control
plane activity/Istio)
16. Initializing Clusters
What we considered
• Kops = love it, no bare metal
• Kubespray = slow + brittle
• kubeadmin = maybe in the future
• RKE = fairly simple, works for us
Future State?
• Stick w/ RKE, Kubeadmin, or roll our own to meet our needs
17. Resetting Cluster State
• Requirement: Need to be
able to re-image remotely
• Solution: Overlay FS + HAMS
– Manages wiping clusters
and restoring to base
18. Hooves Up
• Self-healing AWS SSM
Registration
• Free even for non-AWS
deployments
• Able to do remote
commands and patch
reporting/management
19. Lessons learned
• Use K8s feature set and don’t reinvent the wheel
• MVP. MVP. MVP.
• Ensure aggregated and searchable logging
• Deep health checks are a must --> Use /healthz
• Every service needs “/metrics”
endpoint
20. How do we deploy to our restaurants?
• Large number of
deployment targets
• Complex success/fail
criteria
• Array of application types
• What approaches did we
consider?
kubectl
/
21. Introducing Fleet
• Design Goals
– Simple to use / reason about
– Use declarative approach
– Support for variety of deployment
models (canary, blue/green)
– Rollout over flexible time period
– Sane rollback behaviors
– Leverage standard k8s API
– Full visibility
22. Fleet Ecosystem Components
• Fleet Client
– Git webhook, REST call, CLI
• Fleet Server API
– Code generation for
deployment, service,
ingress files
– Git management for cluster
repositories
– Deployment status tracking
• Atlas
– Repository of deploy-ready,
k8s compliant application
files
• Vessel
– Deployed on cluster, git
pull, kubectl apply, report
status
• Dashboards
29. Where you can find us
www.linkedin.com/in/brian-chambers
www.linkedin.com/in/calebrhurd
@brianchambers21
@calebrhurd
https://medium.com/@cfatechblog
https://github.com/chick-fil-a