Pavlo Molchanov

Pavlo Molchanov is a Distinguished Research Scientist and Team Manager at NVIDIA Research. Since 2023, he has been leading the Deep Learning Efficiency Team at NVIDIA Research. He obtained a PhD from Tampere University of Technology, Finland, in 2014 with Karen Eguiazarian. During his studies, he received the Nokia Foundation Scholarship, GETA Graduate School grant, Best Paper Award, and Young Researcher Award at EuRAD. Recently, he has focused on efficiency in LLMs and multi-modal models: compression, NAS-like acceleration, novel architectures, and adaptive/conditional inference.

His past research has led to several NVIDIA product integrations: hand, body, and facial keypoint estimation and recognition in DriveIX, Broadcast, Omniverse, Maxine; efficient vision backbones in TAO, developed compression techniques in TAO, NVIDIA AV, TRT Model Optimization; and small in-game LLMs called Minitron.

We are always on the lookout for promising interns and full-time positions in the area of LLM and VLM efficiency. Feel free to reach out to me for more details. I am also interested in connecting with individuals who share similar research interests.

Featured Publications

X-VILA: Cross-Modality Alignment for Large Language Model

Cross Video/Image/Language/Audio language foundation model.

Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

DoRA: Weight-Decomposed Low-Rank Adaptation

PEFT technique for efficient LLM/VLM adaptation. Significantly better than LoRA, supported in HF.

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen

AM-RADIO: Reduce All Domains Into One

Best vision foundation model obtained via multiple model distillation like CLIP, DINOv2, SAM.

Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov

VILA: On Pre-training for Visual Language Models

Vision language foundation model. Multiple findings on how to train a better model.

Ji Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han

FasterViT: Fast Vision Transformers with Hierarchical Attention

Vision transformer architecture with new hierarchical attention optimized for throughput and high-resolution images.

A. Hatamizadeh, G. Heinrich, H. Yin, A. Tao, J. Alvarez, J. Kautz, P. Molchanov

Heterogeneous Continual Learning

Introducing a new problem of continual learning with architecture progression.

D. Madaan, H. Yin, W. Byeon, J. Kautz, P. Molchanov

Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models

Application of Deep Equlibrium Models to landmark estimation. Train on single isolated images, apply on videos to reduce jitter. New video landmarks dataset.

P. Micaelli, A. Vahdat, H. Yin, J. Kautz, P. Molchanov

Global context vision transformers

New vision transformer architecture optimized for number of parameters and FLOPs.

A. Hatamizadeh, H. Yin, J. Kautz, P. Molchanov

LANA: Latency Aware Network Acceleration

Fast NAS-like techinue for trained model compression. Pretraines possible layer candidates via local distillation, does NAS via integer linear programming.

P. Molchanov, J. Hall, H. Yin, J. Kautz, N. Fusi, A. Vahdat

Structural pruning via latency-saliency knapsack

Fast NAS-like techinue for trained model compression. Pretraines possible layer candidates via local distillation, does NAS via integer linear programming.

M. Shen, H. Yin, P. Molchanov, L. Mao, J. Liu, J. Alvarez

AViT: Adaptive Tokens for Efficient Vision Transformer

Transformer with adaptive inference where simpler images are classified faster. Tokens are automatically stopped at various depth once become irrelevant. Learned via differentiable loss inspired by ACT.

H. Yin, A. Vahdat, J. Alvarez, A. Mallya, J. Kautz, P. Molchanov

Global Vision Transformer Pruning with Hessian-Aware Saliency

Global pruning of vision transformer networks. New parameter redistribution rule for ViT. 2x latency reduction with minor acucracy loss. More than 1.4% accuracy gain when pruned towards smaller model.

H. Yang, H. Yin, M. Shen, P. Molchanov, H. Li, J. Kautz

Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion

Statistics stored in batch normalization layers contain information on training data. Via iterative optimization we recover images from train distribution and use for various applications.

H. Yin, P. Molchanov, J. M. Alvarez, Z. Li, A. Mallya, D. Hoiem, N..Jha, J. Kautz

Publications

Full list with filter

X-VILA: Cross-Modality Alignment for Large Language Model. Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin (2024). In Arxiv.

PDF Cite

DoRA: Weight-Decomposed Low-Rank Adaptation. Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen (2024). ICML 2024.

PDF Cite Code Project Video ICML2024(Oral)

AM-RADIO: Reduce All Domains Into One. Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov (2023). In Arxiv.

PDF Cite Code CVPR2024

VILA: On Pre-training for Visual Language Models. Ji Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han (2023). In Arxiv.

PDF Cite CVPR2024

FasterViT: Fast Vision Transformers with Hierarchical Attention. A. Hatamizadeh, G. Heinrich, H. Yin, A. Tao, J. Alvarez, J. Kautz, P. Molchanov (2023). In Arxiv.

PDF Cite Code ICLR2024

Heterogeneous Continual Learning. D. Madaan, H. Yin, W. Byeon, J. Kautz, P. Molchanov (2023). In CVPR 2023.

PDF Cite Code Video CVPR2023 Highlight

Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models. P. Micaelli, A. Vahdat, H. Yin, J. Kautz, P. Molchanov (2023). In CVPR 2023.

PDF Cite Code Video CVPR2023

Global context vision transformers. A. Hatamizadeh, H. Yin, J. Kautz, P. Molchanov (2023). In ICML 2023.

PDF Cite Code ICML2023

LANA: Latency Aware Network Acceleration. P. Molchanov, J. Hall, H. Yin, J. Kautz, N. Fusi, A. Vahdat (2022). In CVPR 2022.

PDF Cite Video ECCV2022

Structural pruning via latency-saliency knapsack. M. Shen, H. Yin, P. Molchanov, L. Mao, J. Liu, J. Alvarez (2022). In NeurIPS2022.

PDF Cite Code Video NeurIPS2022

See all publications