An-Chieh Cheng 鄭安傑
a8cheng at ucsd dot edu

I am a PhD student at the University of California, San Diego, advised by Prof. Xiaolong Wang. During my PhD studies, I interned at NVIDIA and Adobe, and my research has been supported by Qualcomm Innovation Fellowship. Prior to my PhD, I earned my Master’s and Bachelor’s degrees in computer science from National Tsing Hua University.

I'm interested in building multimodal foundation models capable of general spatial understanding and actionable intelligence.

Google Scholar  /  Curriculum Vitæ  /  Github  /  LinkedIn  /  Twitter

An-Chieh Cheng
News
Publications and preprints Full List
LoHo-Manip teaser

Long-Horizon Manipulation via Trace-Conditioned VLA Planning
Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu Preprint, 2026
Long-horizon manipulation via a task-management VLM with visual trace conditioning.

Grounded 3D-Aware Spatial Vision-Language Modeling
An-Chieh Cheng, Yang Fu, Yatai Ji, Ligeng Zhu, Guanqi Zhan, Zhuoyang Zhang, Zhaojing Yang, Song Han, Yao Lu, Pavlo Molchanov, Vidya Nariyambut Murali, Jan Kautz, Xiaolong Wang, Hongxu Yin, Sifei Liu CVPR, 2026
Unified Spatial Reasoning & 3D Grounding VLMs with visual CoT (Thinking with Regions).

OmniVinci teaser

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Hanrong Ye et al. ICLR, 2026
NVIDIA's state-of-the-art 9B Omni-Modal LLMs.

3D Aware Region Prompted Vision Language Model
An-Chieh Cheng, Yang Fu, Yukang Chen, Zhijian Liu, Xiaolong Li, Subhashree Radhakrishnan, Song Han, Yao Lu, Jan Kautz, Pavlo Molchanov, Hongxu Yin✝︎, Xiaolong Wang✝︎, Sifei Liu✝︎ ICLR, 2026
Region-level spatial reasoning for both single-view and multi-view inputs.

EgoVLA teaser

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Ruihan Yang*, Qinxi Yu*, Yecheng Wu, Rui Yan, Borui Li, An-Chieh Cheng, Xueyan Zou, Yunhao Fang, Xuxin Cheng, Ri-Zhao Qiu, Hongxu Yin, Sifei Liu, Song Han, Yao Lu, Xiaolong Wang Preprint, 2025
Robust dexterous manipulation generalist model utilizing diverse egocentric human manipulation videos.

NaVILA: Legged Robot Vision-Language-Action Model for Navigation
An-Chieh Cheng*, Yandong Ji*, Zhaojing Yang*, Zaitian Gongye, Xueyan Zou, Jan Kautz, Erdem Bıyık, Hongxu Yin✝︎, Sifei Liu✝︎, Xiaolong Wang✝︎ RSS, 2025
A two-level framework that combines VLAs with locomotion skills for navigation. The VLA is adapted from a VLM and learns from human touring videos.

NVILA teaser

NVILA: Efficient Frontier Visual Language Models
Zhijian Liu et al.
CVPR, 2025
Efficient frontier VLM models with efficient training and inference.

SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models
An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang, Jan Kautz, Xiaolong Wang, Sifei Liu NeurIPS, 2024
A powerful region-level VLM adept at 3D spatial reasoning.
✨ Demoed at GTC 2025 as a part of Agentic AI for Physical Operations!

TUVF: Learning Generalizable Texture UV Radiance Fields
An-Chieh Cheng, Xueting Li, Sifei Liu✝︎, Xiaolong Wang✝︎ ICLR, 2024
Learning generalizable texture UV radiance fields for shapes.

Autoregressive 3D Shape Generation teaser

Autoregressive 3D Shape Generation via Canonical Mapping
An-Chieh Cheng*, Xueting Li*, Sifei Liu, Min Sun, Ming-Hsuan Yang ECCV, 2022
We decompose the point cloud into meaningful shape sequences, then we encode these sequences through a transformer for generation.

Canonical Point Autoencoder teaser

Learning 3D Dense Correspondence via Canonical Point Autoencoder
An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu NeurIPS, 2021
Unsupervised learning of dense 3D correspondence.


Template from this awesome website.