Raghavv Goel

I am currently a Senior Deep Learning Researcher at Qualcomm AI Research, where I am part of the Efficient LLM team led by Mingu Lee and Chris Lott. Our research is centered on lossless inference acceleration methods, efficient caching, and the design of efficient architectures for language modeling. Previously, I was involved with Compiler Optimization team briefly under the guidance of Will Zeng and Chris Lott, focusing on designing optimizations for running deep networks on non-GPU devices.

I hold an MS in Robotics Research from Carnegie Mellon University (CMU), where my research was directed towards control theory, computer vision, and reinforcement learning with applications in surgical robotics. I had the privilege of conducting my research under the mentorship of Professor Howie Choset and Professor John Galeotti, and doctors of UPMC.

During my undergraduate studies at IIIT Delhi, I collaborated with Dr. Sayan Basu Roy and Dr. P. B. Sujit on projects involving adaptive control, parametric uncertainty, and multi-agent systems. I was honored with the department's (ECE) gold medals for best academic performance and all-round excellence. I still collaborate with Dr. Sayan for fun!

Additionally, I participated in CMU's RISS 2019 summer program, where I worked under the guidance of Professor Katia Sycara on multi-agent task allocation problems.

Email  /  CV  /  Google Scholar  /  Github

profile photo
Research

I am keen on exploring the application of control theory (focusing on stability and optimality guarantees) and reinforcement learning (data-driven methods) to enhance our understanding of large language models. While my current research is dedicated to LLMs, I maintain a strong interest in robotics and stay updated on the latest advancements in the field.

Please feel free to reach out if you are interested in collaborating.

Efficient LLMs
On Speculative Decoding for Multimodal Large Language Models
M Gagrani*, R Goel*, W Jeon, J Park, M Lee^, C Lott^
Spotlight (top-4 papers)
CVPR Workshop on Efficient Large Vision Models (eVLM), 2024
Paper

This paper explores the application of speculative decoding to enhance the inference efficiency of multimodal large language models (MLLMs), specifically the LLaVA 7B model. The key contribution is demonstrating that a language-only model can serve as an effective draft model for speculative decoding, achieving significant speedups without the need for image tokens

Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs
R Goel, M Gagrani, W Jeon, J Park, M Lee, C Lott
ICLR Workshop on Understanding of Foundational Models, 2024
Paper

This paper proposes a framework for training draft models directly aligned with chat-fine-tuned large language models (LLMs). The key contribution is the introduction of the Llama 2 Chat Drafter 115M, which achieves up to 2.4× speed-up in inference relative to autoregressive decoding, using a novel Total Variation Distance++ (TVD++) loss for improved alignment

Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
W Jeon, M Gagrani, R Goel, J Park, M Lee, C Lott
ICLR Workshop on LLM Agents, 2024
Paper

This paper presents Recursive Speculative Decoding (RSD), a novel tree-based method that samples draft tokens without replacement to maximize diversity and efficiency. The key contribution is the empirical demonstration that RSD outperforms baseline methods in both fixed draft sequence length and fixed computational budget scenarios, significantly accelerating LLM inference

Robotics, Control Theory and Multi-Agent Systems
Composite Adaptive Control for Time-varying Systems with Dual Adaptation
R Goel, SB Roy
IEEE Transaction on Automatic Control (TAC), 2025  
Paper

Introduces a novel control architecture that employs a dual adaptation scheme to handle dynamical systems with time-varying uncertain parameters. Key contributions include the integration of projection and $\sigma$-modification algorithms to achieve global tracking error stability, and the use of a less restrictive initial excitation (IE) condition instead of the traditional persistence of excitation (PE) requirement for parameter estimation

Motion-aware Needle Segmentation in Ultrasound Images
R Goel, C Morales*, M Singh*, A Dubrawski, J Galeotti, H Choset
International Symposium on Bio Medical Imaging (ISBI), 2024 CVPR Workshop (medical vision), 2024
Paper

A novel approach that combines classical Kalman Filter techniques with data-driven learning to improve needle segmentation in 2D ultrasound images. The key contributions include a framework compatible with encoder-decoder architectures, superior performance with a 15% reduction in pixel-wise needle tip error and an 8% reduction in length error, and the implementation of a learnable filter for non-linear needle motion

Autonomous Ultrasound Scanning using Bayesian Optimization and Hybrid Force Control
R Goel*, Abhimanyu*, K Patel, J Galeotti, H Choset
International Conference on Robotics and Automation (ICRA), 2022
Paper

Proposes an innovative robotic ultrasound system that leverages Bayesian Optimization (BO) and hybrid force control to autonomously scan regions for high-quality diagnostic images. Key contributions include the use of Gaussian processes to estimate a quality map based on expert demonstrations, and the integration of deep convolutional neural networks for real-time image quality feedback, achieving high accuracy in probe positioning and force application

Closed-Loop Reference Model Based Distributed MRAC Using Cooperative Initial Excitation and Distributed Reference Input Estimation
R Goel*, T Garg, SB Roy
IEEE Transaction on Control of Network Systems (TCNS), 2022  
Paper

introduces a novel distributed model reference adaptive control (DMRAC) framework for multi-agent systems. Key contributions include the use of a closed-loop reference model (CRM) to enhance transient performance and the implementation of cooperative initial excitation (IE) for improved parameter estimation without the need for persistent excitation (PE) conditions

Closed-loop reference model based distributed model reference adaptive control for multi-agent systems
R Goel*, SB Roy
Letters of Controls and Systems (L-CSS), 2021  
American Control Conference (ACC), 2021  
Paper

presents a distributed control framework that integrates closed-loop reference models (CRM) to enhance the transient performance of multi-agent systems. Key contributions include the use of cooperative initial excitation (IE) for improved parameter estimation without relying on persistent excitation (PE) conditions, and the implementation of distributed reference input estimation to ensure robust and adaptive control across the network

Leader and predator based swarm steering for multiple tasks
R Goel, J Lewis, MA Goodrich, PB Sujit
Inernation Conference on System, Man and Cybernetics (SMC), 2019  
Paper / video

explores the use of leaders and predators to influence robotic swarms in performing various tasks. Key contributions include the analysis of different swarm models (shepherding, Couzin's, and physicomimetic) using Monte-Carlo simulations, and the demonstration that predator-based swarm splitting and steering significantly outperforms other methods, even with large numbers of agents

Dynamic Task Allocation Using Multi-Agent Mobile Robots
Raghavv Goel, Sha Yi, Jaskaran Singh Grover, Katia Sycara
RISS Journal, 2019  
poster / video

We propose to solve task allocation problem in heterogeneous agents using mix integer linear program with collision avoidance and communication breakage constraints

The website style is from here