1 upvotes, 0 direct replies (showing 0)
View submission: Evolving Reddit’s ML Model Deployment and Serving Architecture
Hi, nice post!
At Booking.com we have a model deployment setup that is kind of "in-between" your legacy and new system, supporting both models deployed together and models isolated, but we did not move into full isolation in part because that'd multiply our resource requirements; for example there's an resource overhead per pod, specially for small models that are not that frequently used, and having hundreds of models this becomes huge.
Was that the case for you? How does the total amount of cpu cores/memory required in the new system -vs- the old one compare?
Cheers!
There's nothing here!