About me

I am a Research Engineer at Google DeepMind. My research focuses on real time interactive agents: AI agents that are situated with the user, react to streaming audio-visual inputs, and proactively help the user achieve their goals. I am fortuate to have worked closely with Matt Botvinick, Greg Wayne, Dilan Gorur and many others on this.

Here are some related research questions that I’m interested in: How to build a foundation model that understands and operates in real-time? How to make RLHF work for real-time interactions? How to work around the memory constraints for long-term video understanding and for a more personalized agent? What enables an agent to learn on the fly?

Previously, I worked on Offline RL and RL for commercial cooling systems with Cosmin Paduraru and Daniel Mankowitz. I was also very interested in style transfer for both images and audio, back when GANs, VAEs, and cycle consistency methods were popular. For audio, I collaborated with Jesse Engel and RJ Skerry-Ryan. I briefly worked on AlphaFold commercial applications in drug discovery. Before DeepMind, I was part of the Natural Language Understanding team of Google Assistant. For my undergrad and Master’s, I worked with Doug Downey on word-vector-based NLP research.

Outside of work, I enjoy reading (my book recommendations), Kitesurfing, playing guitar, binge-watching shows, and learning new languages. My names in non-Latin languages are: 李嘉銘 · リ　ジェリー. I used to be addicted to video games.

News

Nov 2023 I am applying to PhDs!

Jerry Li

News