Yuanbo Xiangli

I am a postdoctoral researcher at Cornell University, advised by Prof. Noah Snavely. Previously, I obtained my Ph.D at Multimedia Lab, Information Engineering, CUHK, supervised by Prof. Dahua Lin.

Email  /  CV  /  Google Scholar  /  Github

profile photo

My research interest lies in 3D computer vision and deep generative modeling. Currently, I am working on photorealistic and efficient city scenes reconstruction, manipulation and generation based on multi-source data, including satellite imagery, oblique photography, street view panoramas and urban planning information.

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Kai Zhang*, Sai Bi*, Hao Tan*, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu
arXiv preprint
project page / paper

We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering.

GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction
Mulin Yu*, Tao Lu*, Linning Xu, Lihan Jiang, Yuanbo Xiangli ✉️, Bo Dai
arXiv preprint
project page / paper

We propose GSDF, a dual-branch system that enhances rendering and reconstruction at the same time, leveraging the mutual geometry regularization and guidance between Gaussain primitives and neural surface.

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Tao Lu*, Mulin Yu*, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai
CVPR, 2024
project page / paper

We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians, and predicts their attributes on-the-fly based on viewing direction and distance within the view frustum.

AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation
Yuanbo Xiangli*, Linning Xu*, Xingang Pan, Nanxuan Zhao, Bo Dai, Dahua Lin
ICCV, 2023
project page / paper

We propose a novel neural scene representation that learns a set of object-aware ground feature planes, where an asset library storing template feature patches can be constructed in an unsupervised manner. The representation enables flexible and intuitive 3D scene editing at instance-, category- and scene-level.

MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond
Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Dahua Lin, Bo Dai
ICCV, 2023
project page / paper

We build a large-scale, comprehensive, and high-quality synthetic dataset for city-scale neural rendering researches. Leveraging the Unreal Engine 5 City Sample project, we developed a pipeline to easily collect aerial and street city views with ground-truth camera poses, as well as a series of additional data modalities. Flexible control on environmental factors like light, weather, human and car crowd is also available in our pipeline, supporting the need of various tasks covering city-scale neural rendering and beyond.

Grid-guided Neural Radiance Fields for Large Urban Scenes
Linning Xu*, Yuanbo Xiangli*, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, Dahua Lin
CVPR, 2023
project page / paper

This work targets at modeling vast-spanned urban regions and operates on real-world data sources. We use grid features to profile the scene and a light-weighted NeRF to pick up details. The two-branch model can produce photo-realistic results with high rendering speed.

OmniCity: Omnipotent City Understanding with Multi-level and Multi-view Images
Weijia Li, Yawen Lai, Linning Xu, Yuanbo Xiangli, Jinhua Yu, Conghui He, Guisong Xia, Dahua Lin
CVPR, 2023
project page / paper

A new dataset containing multi-view satellite images and street-level panoramas, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations.

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering
Yuanbo Xiangli*, Linning Xu*, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, Dahua Lin
ECCV, 2022
project page / paper

An attempt to bring NeRF to potentially city-scale scenes, which requires rendering drastically varied observations (level-of-detail and spatial coverage) at multiscales.

BlockPlanner: City Block Generation with Vectorized Graph Representation
Linning Xu*, Yuanbo Xiangli*, Anyi Rao, Nanxuan Zhao, Bo Dai, Ziwei Liu, Dahua Lin
ICCV, 2021
project page / paper

Use graph-based VAE to automatically learn from large amount of vectorized public urban planning data for fast generation of batches of diverse and valid city block templates.

Real or Not Real, that is the Question
Yuanbo Xiangli*, Yubin Deng*, Bo Dai*, Chen Change Loy, Dahua Lin
ICLR, 2020 Spotlight
code / video / paper / zhihu

The proposed realness distribution provides stronger guidance to the generator and encourages it to learn more diverse outputs; enables the simplest GAN structure to synthesis high resolution portrait for the first time, with affordable computational overhead.

Nowhere to Hide: Cross-modal Identity Leakage between Biometrics and Devices
Chris Xiaoxuan Lu, Yang Li, Yuanbo Xiangli ✉️, Zhengxiong Li
WWW, 2020 Oral
code / HTML / paper

We explore the feasibility of the compound identity leakage across cyber-physical spaces and unveil that co-located smart device IDs (e.g., smartphone MAC addresses) and physical biometrics (e.g., facial/vocal samples) are side channels to each other. An attacker can comprehensively profile victims in multi-dimension with nearly zero analysis effort.

Autonomous Learning of Speaker Identity and WiFi Geofence from Noisy Sensor Data
Chris Xiaoxuan Lu, Yuanbo Xiangli, Peijun Zhao, Changhao Chen, Niki Trigoni, Andrew Markham
IEEE Internet Things J., 2019

iSCAN: automatic speaker adaptation via iterative cross-modality association
Yuanbo Xiangli, Chris Xiaoxuan Lu, Peijun Zhao, Changhao Chen, Andrew Markham
UbiComp/ISWC Adjunct, 2019

The proposed framework leverages the abundant side-channel information provided by the ubiquitous IoT environment in mordern life, enabling the construction of an in-domain speaker recognition model with zero human enrollment.

The website template was borrowed from Jon Baron. Thanks for the generosity :)