💾 Archived View for dioskouroi.xyz › thread › 29388176 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Karpenter by AWS – Kubernetes node autoscaling

Author: babelfish

Score: 29

Comments: 2

Date: 2021-11-30 02:31:07

Web Link

________________________________________________________________________________

epelesis wrote at 2021-11-30 05:42:51:

I've had my eye on this autoscaler for a while now, has anyone had experience running it in production with a large rate of node churn?

I think combining the scheduler and the autoscaler into a single entity makes a lot of sense, both from a latency/performance perspective as well as a data sharing perspective - we have some complicated rules on when nodes are eligible to be terminated that are tightly coupled to the lifecycle of certain pods in the system, and as it stands we have to implement all of that logic with the assumption that the scheduler may (re)schedule pods concurrently while we are performing terminations. For this kind of application where pods and nodes are tighly coupled, having all of that logic in a single place is pretty attractive to me.

Another really signifigant (and unexpected to me) scaling limit for us has been ratelimiting on behalf of the cloud provider... I believe that since AWS leads development of Karpenter they have also optimized a lot of the API calls Karpenter makes to try to more efficiently use the API Quotas in AWS that each tenant has, and this is one of the main reasons Karpenter caught my eye in the first place.

dilyevsky wrote at 2021-11-30 09:36:33:

Been running it in production on aws and gcp for years now (self hosted). Imo whatever seconds you’ll shave off by skipping scheduler loop will be dwarfed by nodes not being available / slow to come which happens often and for which there’s no sla