Speculative Decoding with vLLM using Gemma
Improving LLM inferences with speculative decoding using Gemma
Product-focused machine learning engineer. Talks, and writes about Machine Learning, MLOps, and Natural Language Processing. I share thoughts, wins and failures.
Improving LLM inferences with speculative decoding using Gemma
A comprehensive guide to deploying Google's Gemma language model on Vertex AI using vLLM, covering model registration, endpoint creation, and production deployment best practices.
Improving LLV inferences with speculative decoding
Determining bottlenecks in your deep learning model can be crucial in reducing your model latency
Receiving Google Open Source Peer Bonus Award 2022
A collection of useful links with information about the inner working of TFServing
Reinforcement Learning for Human Feedback (RLHF) is the concept with powers recent models like ChatGPT
A collection of useful links with information about model performance profiling
A collection of useful links with information about the inner working of TFServing