Hey, I'm Hannes.

Product-focused machine learning engineer. Talks, and writes about Machine Learning, MLOps, and Natural Language Processing. I share thoughts, wins and failures.

's Picture

28 Feb 2025

Speculative Decoding with vLLM using Gemma

Speculative Decoding with vLLM using Gemma

Improving LLM inferences with speculative decoding using Gemma

17 Feb 2025

Deploying Google's Gemma on Vertex AI

Deploying Google's Gemma on Vertex AI

A comprehensive guide to deploying Google's Gemma language model on Vertex AI using vLLM, covering model registration, endpoint creation, and production deployment best practices.

11 Jan 2025

Speculative Decoding with vLLM

Speculative Decoding with vLLM

Improving LLV inferences with speculative decoding

12 Feb 2023

How to Profile TensorFlow Serving Inference Requests with TFProfiler

Determining bottlenecks in your deep learning model can be crucial in reducing your model latency

15 Jan 2023

Receiving Google Open Source Peer Bonus Award 2022

Receiving Google Open Source Peer Bonus Award 2022

Receiving Google Open Source Peer Bonus Award 2022

12 Jan 2023

Notes on deploying models with TFServing

Notes on deploying models with TFServing

A collection of useful links with information about the inner working of TFServing

12 Jan 2023

Notes on Reinforcement Learning for Human Feedback

Notes on Reinforcement Learning for Human Feedback

Reinforcement Learning for Human Feedback (RLHF) is the concept with powers recent models like ChatGPT

12 Jan 2023

Notes on Model Performance Profiling

Notes on Model Performance Profiling

A collection of useful links with information about model performance profiling

12 Jan 2023

Notes on GPT4

Notes on GPT4

A collection of useful links with information about the inner working of TFServing

Tags

See all

Reinforcement Learning for Human Feedback

post.title

post.title

post.title

post.title