Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Tong

SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

Nov 29, 2023

Mutian Xu, Xingyilang Yin, Lingteng Qiu, Yang Liu, Xin Tong, Xiaoguang Han

Figure 1 for SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

Figure 2 for SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

Figure 3 for SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

Figure 4 for SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

Abstract:We introduce SAMPro3D for zero-shot 3D indoor scene segmentation. Given the 3D point cloud and multiple posed 2D frames of 3D scenes, our approach segments 3D scenes by applying the pretrained Segment Anything Model (SAM) to 2D frames. Our key idea involves locating 3D points in scenes as natural 3D prompts to align their projected pixel prompts across frames, ensuring frame-consistency in both pixel prompts and their SAM-predicted masks. Moreover, we suggest filtering out low-quality 3D prompts based on feedback from all 2D frames, for enhancing segmentation quality. We also propose to consolidate different 3D prompts if they are segmenting the same object, bringing a more comprehensive segmentation. Notably, our method does not require any additional training on domain-specific data, enabling us to preserve the zero-shot power of SAM. Extensive qualitative and quantitative results show that our method consistently achieves higher quality and more diverse segmentation than previous zero-shot or fully supervised approaches, and in many cases even surpasses human-level annotations. The project page can be accessed at https://mutianxu.github.io/sampro3d/.

* Project page: https://mutianxu.github.io/sampro3d/

Via

Access Paper or Ask Questions

A Real-time Method for Inserting Virtual Objects into Neural Radiance Fields

Oct 09, 2023

Keyang Ye, Hongzhi Wu, Xin Tong, Kun Zhou

Figure 1 for A Real-time Method for Inserting Virtual Objects into Neural Radiance Fields

Figure 2 for A Real-time Method for Inserting Virtual Objects into Neural Radiance Fields

Figure 3 for A Real-time Method for Inserting Virtual Objects into Neural Radiance Fields

Figure 4 for A Real-time Method for Inserting Virtual Objects into Neural Radiance Fields

Abstract:We present the first real-time method for inserting a rigid virtual object into a neural radiance field, which produces realistic lighting and shadowing effects, as well as allows interactive manipulation of the object. By exploiting the rich information about lighting and geometry in a NeRF, our method overcomes several challenges of object insertion in augmented reality. For lighting estimation, we produce accurate, robust and 3D spatially-varying incident lighting that combines the near-field lighting from NeRF and an environment lighting to account for sources not covered by the NeRF. For occlusion, we blend the rendered virtual object with the background scene using an opacity map integrated from the NeRF. For shadows, with a precomputed field of spherical signed distance field, we query the visibility term for any point around the virtual object, and cast soft, detailed shadows onto 3D surfaces. Compared with state-of-the-art techniques, our approach can insert virtual object into scenes with superior fidelity, and has a great potential to be further applied to augmented reality systems.

Via

Access Paper or Ask Questions

"I'm Not Confident in Debiasing AI Systems Since I Know Too Little": Teaching AI Creators About Gender Bias Through Hands-on Tutorials

Sep 15, 2023

Kyrie Zhixuan Zhou, Jiaxun Cao, Xiaowen Yuan, Daniel E. Weissglass, Zachary Kilhoffer, Madelyn Rose Sanfilippo, Xin Tong

Figure 1 for "I'm Not Confident in Debiasing AI Systems Since I Know Too Little": Teaching AI Creators About Gender Bias Through Hands-on Tutorials

Figure 2 for "I'm Not Confident in Debiasing AI Systems Since I Know Too Little": Teaching AI Creators About Gender Bias Through Hands-on Tutorials

Figure 3 for "I'm Not Confident in Debiasing AI Systems Since I Know Too Little": Teaching AI Creators About Gender Bias Through Hands-on Tutorials

Figure 4 for "I'm Not Confident in Debiasing AI Systems Since I Know Too Little": Teaching AI Creators About Gender Bias Through Hands-on Tutorials

Abstract:Gender bias is rampant in AI systems, causing bad user experience, injustices, and mental harm to women. School curricula fail to educate AI creators on this topic, leaving them unprepared to mitigate gender bias in AI. In this paper, we designed hands-on tutorials to raise AI creators' awareness of gender bias in AI and enhance their knowledge of sources of gender bias and debiasing techniques. The tutorials were evaluated with 18 AI creators, including AI researchers, AI industrial practitioners (i.e., developers and product managers), and students who had learned AI. Their improved awareness and knowledge demonstrated the effectiveness of our tutorials, which have the potential to complement the insufficient AI gender bias education in CS/AI courses. Based on the findings, we synthesize design implications and a rubric to guide future research, education, and design efforts.

Via

Access Paper or Ask Questions

Adaptive conformal classification with noisy labels

Sep 10, 2023

Matteo Sesia, Y. X. Rachel Wang, Xin Tong

Figure 1 for Adaptive conformal classification with noisy labels

Figure 2 for Adaptive conformal classification with noisy labels

Figure 3 for Adaptive conformal classification with noisy labels

Figure 4 for Adaptive conformal classification with noisy labels

Abstract:This paper develops novel conformal prediction methods for classification tasks that can automatically adapt to random label contamination in the calibration sample, enabling more informative prediction sets with stronger coverage guarantees compared to state-of-the-art approaches. This is made possible by a precise theoretical characterization of the effective coverage inflation (or deflation) suffered by standard conformal inferences in the presence of label contamination, which is then made actionable through new calibration algorithms. Our solution is flexible and can leverage different modeling assumptions about the label contamination process, while requiring no knowledge about the data distribution or the inner workings of the machine-learning classifier. The advantages of the proposed methods are demonstrated through extensive simulations and an application to object classification with the CIFAR-10H image data set.

* 35 pages (98 pages including references and appendices)

Via

Access Paper or Ask Questions

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

Sep 05, 2023

Yue Wu, Sicheng Xu, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong

Figure 1 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

Figure 2 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

Figure 3 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

Figure 4 for AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

Abstract:Previous animatable 3D-aware GANs for human generation have primarily focused on either the human head or full body. However, head-only videos are relatively uncommon in real life, and full body generation typically does not deal with facial expression control and still has challenges in generating high-quality results. Towards applicable video avatars, we present an animatable 3D-aware GAN that generates portrait images with controllable facial expression, head pose, and shoulder movements. It is a generative model trained on unstructured 2D image collections without using 3D or video data. For the new task, we base our method on the generative radiance manifold representation and equip it with learnable facial and head-shoulder deformations. A dual-camera rendering and adversarial learning scheme is proposed to improve the quality of the generated faces, which is critical for portrait images. A pose deformation processing network is developed to generate plausible deformations for challenging regions such as long hair. Experiments show that our method, trained on unstructured 2D images, can generate diverse and high-quality 3D portraits with desired control over different properties.

* SIGGRAPH Asia 2023. Project Page: https://yuewuhkust.github.io/AniPortraitGAN/

Via

Access Paper or Ask Questions

Relighting Neural Radiance Fields with Shadow and Highlight Hints

Aug 25, 2023

Chong Zeng, Guojun Chen, Yue Dong, Pieter Peers, Hongzhi Wu, Xin Tong

Figure 1 for Relighting Neural Radiance Fields with Shadow and Highlight Hints

Figure 2 for Relighting Neural Radiance Fields with Shadow and Highlight Hints

Figure 3 for Relighting Neural Radiance Fields with Shadow and Highlight Hints

Figure 4 for Relighting Neural Radiance Fields with Shadow and Highlight Hints

Abstract:This paper presents a novel neural implicit radiance representation for free viewpoint relighting from a small set of unstructured photographs of an object lit by a moving point light source different from the view position. We express the shape as a signed distance function modeled by a multi layer perceptron. In contrast to prior relightable implicit neural representations, we do not disentangle the different reflectance components, but model both the local and global reflectance at each point by a second multi layer perceptron that, in addition, to density features, the current position, the normal (from the signed distace function), view direction, and light position, also takes shadow and highlight hints to aid the network in modeling the corresponding high frequency light transport effects. These hints are provided as a suggestion, and we leave it up to the network to decide how to incorporate these in the final relit result. We demonstrate and validate our neural implicit representation on synthetic and real scenes exhibiting a wide variety of shapes, material properties, and global illumination light transport.

* ACM SIGGRAPH 2023 Conference Proceedings
* Accepted to SIGGRAPH 2023. Author's version. Project page: https://nrhints.github.io/

Via

Access Paper or Ask Questions

Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

Jun 26, 2023

Rui Cao, Yilin Luo, Jinhua Xu, Xiaofei Luo, Ku Geng, Yousuf Aborahama, Manxiu Cui, Samuel Davis, Shuai Na, Xin Tong(+8 more)

Figure 1 for Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

Figure 2 for Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

Figure 3 for Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

Figure 4 for Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

Abstract:Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we have engineered a PACT system that features a densely packed hemispherical ultrasonic transducer array with 3072 channels, operating at a central frequency of 1 MHz. This system allows for single-shot 3D imaging at a rate equal to the laser repetition rate, such as 20 Hz. We have achieved a single-shot light penetration depth of approximately 9 cm in chicken breast tissue utilizing a 750 nm laser (withstanding 3295-fold light attenuation and still retaining an SNR of 74) and successfully performed transcranial imaging through an ex vivo human skull using a 1064 nm laser. Moreover, we have proven the capacity of our system to perform single-shot 3D PACT imaging in both tissue phantoms and human subjects. These results suggest that our PACT system is poised to unlock potential for real-time, in vivo transcranial functional imaging in humans.

Via

Access Paper or Ask Questions

Locally Attentional SDF Diffusion for Controllable 3D Shape Generation

May 09, 2023

Xin-Yang Zheng, Hao Pan, Peng-Shuai Wang, Xin Tong, Yang Liu, Heung-Yeung Shum

Abstract:Although the recent rapid evolution of 3D generative neural networks greatly improves 3D shape generation, it is still not convenient for ordinary users to create 3D shapes and control the local geometry of generated shapes. To address these challenges, we propose a diffusion-based 3D generation framework -- locally attentional SDF diffusion, to model plausible 3D shapes, via 2D sketch image input. Our method is built on a two-stage diffusion model. The first stage, named occupancy-diffusion, aims to generate a low-resolution occupancy field to approximate the shape shell. The second stage, named SDF-diffusion, synthesizes a high-resolution signed distance field within the occupied voxels determined by the first stage to extract fine geometry. Our model is empowered by a novel view-aware local attention mechanism for image-conditioned shape generation, which takes advantage of 2D image patch features to guide 3D voxel feature learning, greatly improving local controllability and model generalizability. Through extensive experiments in sketch-conditioned and category-conditioned 3D shape generation tasks, we validate and demonstrate the ability of our method to provide plausible and diverse 3D shapes, as well as its superior controllability and generalizability over existing work. Our code and trained models are available at https://zhengxinyang.github.io/projects/LAS-Diffusion.html

* ACM Transactions on Graphics (SIGGRAPH), 42, 4 (August 2023), 13 pages
* Accepted to SIGGRAPH 2023 (Journal version)

Via

Access Paper or Ask Questions

Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

Apr 24, 2023

Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, Baining Guo

Abstract:Pretrained backbones with fine-tuning have been widely adopted in 2D vision and natural language processing tasks and demonstrated significant advantages to task-specific networks. In this paper, we present a pretrained 3D backbone, named Swin3D, which first outperforms all state-of-the-art methods in downstream 3D indoor scene understanding tasks. Our backbone network is based on a 3D Swin transformer and carefully designed to efficiently conduct self-attention on sparse voxels with linear memory complexity and capture the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset that is 10 times larger than the ScanNet dataset and fine-tuned the pretrained model in various downstream real-world indoor scene understanding tasks. The results demonstrate that our model pretrained on the synthetic dataset not only exhibits good generality in both downstream segmentation and detection on real 3D point datasets, but also surpasses the state-of-the-art methods on downstream tasks after fine-tuning with +2.3 mIoU and +2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation, +2.1 mIoU on ScanNet segmentation (val), +1.9 mAP@0.5 on ScanNet detection, +8.1 mAP@0.5 on S3DIS detection. Our method demonstrates the great potential of pretrained 3D backbones with fine-tuning for 3D understanding tasks. The code and models are available at https://github.com/microsoft/Swin3D .

* Project page: https://yukichiii.github.io/project/swin3D/swin3D.html

Via

Access Paper or Ask Questions

3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining

Apr 14, 2023

Siming Yan, Yuqi Yang, Yuxiao Guo, Hao Pan, Peng-shuai Wang, Xin Tong, Yang Liu, Qixing Huang

Figure 1 for 3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining

Figure 2 for 3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining

Figure 3 for 3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining

Figure 4 for 3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining

Abstract:Masked autoencoders (MAE) have recently been introduced to 3D self-supervised pretraining for point clouds due to their great success in NLP and computer vision. Unlike MAEs used in the image domain, where the pretext task is to restore features at the masked pixels, such as colors, the existing 3D MAE works reconstruct the missing geometry only, i.e, the location of the masked points. In contrast to previous studies, we advocate that point location recovery is inessential and restoring intrinsic point features is much superior. To this end, we propose to ignore point position reconstruction and recover high-order features at masked points including surface normals and surface variations, through a novel attention-based decoder which is independent of the encoder design. We validate the effectiveness of our pretext task and decoder design using different encoder structures for 3D training and demonstrate the advantages of our pretrained networks on various point cloud analysis tasks.

* 11 pages, 4 figures

Via

Access Paper or Ask Questions