Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Information": models, code, and papers

Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting

Jan 27, 2024
Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang

We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian splatting and position-based dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to Gaussian shader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views. For more information, please visit our project page at \url{https://amysteriouscat.github.io/GaussianSplashing/}.

Via

Access Paper or Ask Questions

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution

Dec 28, 2023
Ying Wang, Tim G. J. Rudner, Andrew Gordon Wilson

Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability. To improve the interpretability of vision-language models such as CLIP, we propose a multi-modal information bottleneck (M2IB) approach that learns latent representations that compress irrelevant information while preserving relevant visual and textual features. We demonstrate how M2IB can be applied to attribution analysis of vision-language pretrained models, increasing attribution accuracy and improving the interpretability of such models when applied to safety-critical domains such as healthcare. Crucially, unlike commonly used unimodal attribution methods, M2IB does not require ground truth labels, making it possible to audit representations of vision-language pretrained models when multiple modalities but no ground-truth data is available. Using CLIP as an example, we demonstrate the effectiveness of M2IB attribution and show that it outperforms gradient-based, perturbation-based, and attention-based attribution methods both qualitatively and quantitatively.

* Published in Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

Via

Access Paper or Ask Questions

Efficient Tuning and Inference for Large Language Models on Textual Graphs

Jan 28, 2024
Yun Zhu, Yaoke Wang, Haizhou Shi, Siliang Tang

Rich textual and topological information of textual graphs need to be modeled in real-world applications such as webpages, e-commerce, and academic articles. Practitioners have been long following the path of adopting a shallow text encoder and a subsequent graph neural network (GNN) to solve this problem. In light of recent advancements in large language models (LLMs), it is apparent that integrating LLMs for enhanced textual encoding can substantially improve the performance of textual graphs. Nevertheless, the efficiency of these methods poses a significant challenge. In this paper, we propose ENGINE, a parameter- and memory-efficient fine-tuning method for textual graphs with an LLM encoder. The key insight is to combine the LLMs and GNNs through a tunable side structure, which significantly reduces the training complexity without impairing the joint model's capacity. Extensive experiments on textual graphs demonstrate our method's effectiveness by achieving the best model performance, meanwhile having the lowest training cost compared to previous methods. Moreover, we introduce two variants with caching and dynamic early exit to further enhance training and inference speed. Specifically, caching accelerates ENGINE's training by 12x, and dynamic early exit achieves up to 5x faster inference with a negligible performance drop (at maximum 1.17% relevant drop across 7 datasets).

Via

Access Paper or Ask Questions

S2TPVFormer: Spatio-Temporal Tri-Perspective View for temporally coherent 3D Semantic Occupancy Prediction

Jan 24, 2024
Sathira Silva, Savindu Bhashitha Wannigama, Roshan Ragel, Gihan Jayatilaka

Holistic understanding and reasoning in 3D scenes play a vital role in the success of autonomous driving systems. The evolution of 3D semantic occupancy prediction as a pretraining task for autonomous driving and robotic downstream tasks captures finer 3D details compared to methods like 3D detection. Existing approaches predominantly focus on spatial cues, often overlooking temporal cues. Query-based methods tend to converge on computationally intensive Voxel representation for encoding 3D scene information. This study introduces S2TPVFormer, an extension of TPVFormer, utilizing a spatiotemporal transformer architecture for coherent 3D semantic occupancy prediction. Emphasizing the importance of spatiotemporal cues in 3D scene perception, particularly in 3D semantic occupancy prediction, our work explores the less-explored realm of temporal cues. Leveraging Tri-Perspective View (TPV) representation, our spatiotemporal encoder generates temporally rich embeddings, improving prediction coherence while maintaining computational efficiency. To achieve this, we propose a novel Temporal Cross-View Hybrid Attention (TCVHA) mechanism, facilitating effective spatiotemporal information exchange across TPV views. Experimental evaluations on the nuScenes dataset demonstrate a substantial 3.1% improvement in mean Intersection over Union (mIoU) for 3D Semantic Occupancy compared to TPVFormer, confirming the effectiveness of the proposed S2TPVFormer in enhancing 3D scene perception.

Via

Access Paper or Ask Questions

Scale-free vision-based aerial control of a ground formation with hybrid topology

Jan 24, 2024
Miguel Aranda, Youcef Mezouar, Gonzalo López-Nicolás, Carlos Sagüés

We present a novel vision-based control method to make a group of ground mobile robots achieve a specified formation shape with unspecified size. Our approach uses multiple aerial control units equipped with downward-facing cameras, each observing a partial subset of the multirobot team. The units compute the control commands from the ground robots' image projections, using neither calibration nor scene scale information, and transmit them to the robots. The control strategy relies on the calculation of image similarity transformations, and we show it to be asymptotically stable if the overlaps between the subsets of controlled robots satisfy certain conditions. The presence of the supervisory units, which coordinate their motions to guarantee a correct control performance, gives rise to a hybrid system topology. All in all, the proposed system provides relevant practical advantages in simplicity and flexibility. Within the problem of controlling a team shape, our contribution lies in addressing several simultaneous challenges: the controller needs only partial information of the robotic group, does not use distance measurements or global reference frames, is designed for unicycle agents, and can accommodate topology changes. We present illustrative simulation results.

* M. Aranda, Y. Mezouar, G. L\'opez-Nicol\'as and C. Sag\"u\'es, "Scale-Free Vision-Based Aerial Control of a Ground Formation With Hybrid Topology," in IEEE Transactions on Control Systems Technology, vol. 27, no. 4, pp. 1703-1711, July 2019
* This is the accepted version an already published manuscript. See journal reference for details

Via

Access Paper or Ask Questions

Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection

Jan 24, 2024
Lucas Lange, Nils Wenzlitschke, Erhard Rahm

Smartwatch health sensor data is increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprises sensitive personal information and is resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress. Our method involves the generation of synthetic sequence data through Generative Adversarial Networks (GANs), coupled with the implementation of Differential Privacy (DP) safeguards for protecting patient information during model training. To ensure the integrity of our synthetic data, we employ a range of quality assessments and monitor the plausibility between synthetic and original data. To test the usefulness, we create private machine learning models on a commonly used, albeit small, stress detection dataset, exploring strategies for enhancing the existing data foundation with our synthetic data. Through our GAN-based augmentation methods, we observe improvements in model performance, both in non-private (0.45% F1) and private (11.90-15.48% F1) training scenarios. We underline the potential of differentially private synthetic data in optimizing utility-privacy trade-offs, especially with limited availability of real training samples.

Via

Access Paper or Ask Questions

Large receptive field strategy and important feature extraction strategy in 3D object detection

Jan 22, 2024
Leichao Cui, Xiuxian Li, Min Meng

The enhancement of 3D object detection is pivotal for precise environmental perception and improved task execution capabilities in autonomous driving. LiDAR point clouds, offering accurate depth information, serve as a crucial information for this purpose. Our study focuses on key challenges in 3D target detection. To tackle the challenge of expanding the receptive field of a 3D convolutional kernel, we introduce the Dynamic Feature Fusion Module (DFFM). This module achieves adaptive expansion of the 3D convolutional kernel's receptive field, balancing the expansion with acceptable computational loads. This innovation reduces operations, expands the receptive field, and allows the model to dynamically adjust to different object requirements. Simultaneously, we identify redundant information in 3D features. Employing the Feature Selection Module (FSM) quantitatively evaluates and eliminates non-important features, achieving the separation of output box fitting and feature extraction. This innovation enables the detector to focus on critical features, resulting in model compression, reduced computational burden, and minimized candidate frame interference. Extensive experiments confirm that both DFFM and FSM not only enhance current benchmarks, particularly in small target detection, but also accelerate network performance. Importantly, these modules exhibit effective complementarity.

Via

Access Paper or Ask Questions

ADA-GNN: Atom-Distance-Angle Graph Neural Network for Crystal Material Property Prediction

Jan 22, 2024
Jiao Huang, Qianli Xing, Jinglong Ji, Bo Yang

Property prediction is a fundamental task in crystal material research. To model atoms and structures, structures represented as graphs are widely used and graph learning-based methods have achieved significant progress. Bond angles and bond distances are two key structural information that greatly influence crystal properties. However, most of the existing works only consider bond distances and overlook bond angles. The main challenge lies in the time cost of handling bond angles, which leads to a significant increase in inference time. To solve this issue, we first propose a crystal structure modeling based on dual scale neighbor partitioning mechanism, which uses a larger scale cutoff for edge neighbors and a smaller scale cutoff for angle neighbors. Then, we propose a novel Atom-Distance-Angle Graph Neural Network (ADA-GNN) for property prediction tasks, which can process node information and structural information separately. The accuracy of predictions and inference time are improved with the dual scale modeling and the specially designed architecture of ADA-GNN. The experimental results validate that our approach achieves state-of-the-art results in two large-scale material benchmark datasets on property prediction tasks.

Via

Access Paper or Ask Questions

Inter-Domain Mixup for Semi-Supervised Domain Adaptation

Jan 21, 2024
Jichang Li, Guanbin Li, Yizhou Yu

Semi-supervised domain adaptation (SSDA) aims to bridge source and target domain distributions, with a small number of target labels available, achieving better classification performance than unsupervised domain adaptation (UDA). However, existing SSDA work fails to make full use of label information from both source and target domains for feature alignment across domains, resulting in label mismatch in the label space during model testing. This paper presents a novel SSDA approach, Inter-domain Mixup with Neighborhood Expansion (IDMNE), to tackle this issue. Firstly, we introduce a cross-domain feature alignment strategy, Inter-domain Mixup, that incorporates label information into model adaptation. Specifically, we employ sample-level and manifold-level data mixing to generate compatible training samples. These newly established samples, combined with reliable and actual label information, display diversity and compatibility across domains, while such extra supervision thus facilitates cross-domain feature alignment and mitigates label mismatch. Additionally, we utilize Neighborhood Expansion to leverage high-confidence pseudo-labeled samples in the target domain, diversifying the label information of the target domain and thereby further increasing the performance of the adaptation model. Accordingly, the proposed approach outperforms existing state-of-the-art methods, achieving significant accuracy improvements on popular SSDA benchmarks, including DomainNet, Office-Home, and Office-31.

* Publisted to Elsevier PR2024, available at https://www.sciencedirect.com/science/article/pii/S0031320323007203?via%3Dihub

Via

Access Paper or Ask Questions

Large Language Models for Generative Information Extraction: A Survey

Dec 29, 2023
Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Enhong Chen

Information extraction (IE) aims to extract structural knowledge (such as entities, relations, and events) from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation, allowing for generalization across various domains and tasks. As a result, numerous works have been proposed to harness abilities of LLMs and offer viable solutions for IE tasks based on a generative paradigm. To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks, in this study, we survey the most recent advancements in this field. We first present an extensive overview by categorizing these works in terms of various IE subtasks and learning paradigms, then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs. Based on thorough review conducted, we identify several insights in technique and promising research directions that deserve further exploration in future studies. We maintain a public repository and consistently update related resources at: \url{https://github.com/quqxui/Awesome-LLM4IE-Papers}.

Via

Access Paper or Ask Questions