Speculative Decoding with vLLM
Improving LLV inferences with speculative decoding
Product-focused machine learning engineer. Talks, and writes about Machine Learning, MLOps, and Natural Language Processing. I share thoughts, wins and failures.
Improving LLV inferences with speculative decoding
Determining bottlenecks in your deep learning model can be crucial in reducing your model latency
Receiving Google Open Source Peer Bonus Award 2022
A collection of useful links with information about the inner working of TFServing
Reinforcement Learning for Human Feedback (RLHF) is the concept with powers recent models like ChatGPT
A collection of useful links with information about model performance profiling
A collection of useful links with information about the inner working of TFServing
I find myself revisiting highly interestign Twitter threads. Here is a list of the most interesting threads sorted by topics ...