Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Wang

School of Physics and Astronomy, Shanghai Jiao Tong University, State Key Laboratory of Dark Matter Physics, Shanghai Jiao Tong University, Tsung-Dao Lee Institute, Shanghai Jiao Tong University

Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling

Oct 08, 2024

Zijie Huang, Wanjia Zhao, Jingdong Gao, Ziniu Hu, Xiao Luo, Yadi Cao, Yuanzhou Chen, Yizhou Sun, Wei Wang

Figure 1 for Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling

Figure 2 for Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling

Figure 3 for Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling

Figure 4 for Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling

Abstract:Learning complex physical dynamics purely from data is challenging due to the intrinsic properties of systems to be satisfied. Incorporating physics-informed priors, such as in Hamiltonian Neural Networks (HNNs), achieves high-precision modeling for energy-conservative systems. However, real-world systems often deviate from strict energy conservation and follow different physical priors. To address this, we present a framework that achieves high-precision modeling for a wide range of dynamical systems from the numerical aspect, by enforcing Time-Reversal Symmetry (TRS) via a novel regularization term. It helps preserve energies for conservative systems while serving as a strong inductive bias for non-conservative, reversible systems. While TRS is a domain-specific physical prior, we present the first theoretical proof that TRS loss can universally improve modeling accuracy by minimizing higher-order Taylor terms in ODE integration, which is numerically beneficial to various systems regardless of their properties, even for irreversible systems. By integrating the TRS loss within neural ordinary differential equation models, the proposed model TREAT demonstrates superior performance on diverse physical systems. It achieves a significant 11.5% MSE improvement in a challenging chaotic triple-pendulum scenario, underscoring TREAT's broad applicability and effectiveness.

* Accepted to The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

Via

Access Paper or Ask Questions

Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering

Oct 07, 2024

Zimu Wang, Lei Xia, Wei Wang, Xinya Du

Figure 1 for Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering

Figure 2 for Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering

Figure 3 for Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering

Figure 4 for Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering

Abstract:As an essential task in information extraction (IE), Event-Event Causal Relation Extraction (ECRE) aims to identify and classify the causal relationships between event mentions in natural language texts. However, existing research on ECRE has highlighted two critical challenges, including the lack of document-level modeling and causal hallucinations. In this paper, we propose a Knowledge-guided binary Question Answering (KnowQA) method with event structures for ECRE, consisting of two stages: Event Structure Construction and Binary Question Answering. We conduct extensive experiments under both zero-shot and fine-tuning settings with large language models (LLMs) on the MECI and MAVEN-ERE datasets. Experimental results demonstrate the usefulness of event structures on document-level ECRE and the effectiveness of KnowQA by achieving state-of-the-art on the MECI dataset. We observe not only the effectiveness but also the high generalizability and low inconsistency of our method, particularly when with complete event structures after fine-tuning the models.

* Accepted at Findings of EMNLP 2024. Camera-ready version

Via

Access Paper or Ask Questions

Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Sep 28, 2024

Wei Wang, Chenyang Li, Zhaoxi Chen, Wenyu Zhang, Zetao Wang, Xi Guo, Jian Guan, Gang Li

Figure 1 for Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Figure 2 for Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Figure 3 for Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Figure 4 for Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Abstract:Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a sleep-related breathing disorder associated with significant morbidity and mortality worldwide. The gold standard for OSAHS diagnosis, polysomnography (PSG), faces challenges in popularization due to its high cost and complexity. Recently, radar has shown potential in detecting sleep apnea-hypopnea events (SAE) with the advantages of low cost and non-contact monitoring. However, existing studies, especially those using deep learning, employ segment-based classification approach for SAE detection, making the task of event quantity estimation difficult. Additionally, radar-based SAE detection is susceptible to interference from body movements and the environment. Oxygen saturation (SpO2) can offer valuable information about OSAHS, but it also has certain limitations and cannot be used alone for diagnosis. In this study, we propose a method using millimeter-wave radar and pulse oximeter to detect SAE, called ROSA. It fuses information from both sensors, and directly predicts the temporal localization of SAE. Experimental results demonstrate a high degree of consistency (ICC=0.9864) between AHI from ROSA and PSG. This study presents an effective method with low-load device for the diagnosis of OSAHS.

Via

Access Paper or Ask Questions

Deep Learning-based Automated Diagnosis of Obstructive Sleep Apnea and Sleep Stage Classification in Children Using Millimeter-wave Radar and Pulse Oximeter

Sep 28, 2024

Wei Wang, Ruobing Song, Yunxiao Wu, Li Zheng, Wenyu Zhang, Zhaoxi Chen, Gang Li, Zhifei Xu

Figure 1 for Deep Learning-based Automated Diagnosis of Obstructive Sleep Apnea and Sleep Stage Classification in Children Using Millimeter-wave Radar and Pulse Oximeter

Figure 2 for Deep Learning-based Automated Diagnosis of Obstructive Sleep Apnea and Sleep Stage Classification in Children Using Millimeter-wave Radar and Pulse Oximeter

Figure 3 for Deep Learning-based Automated Diagnosis of Obstructive Sleep Apnea and Sleep Stage Classification in Children Using Millimeter-wave Radar and Pulse Oximeter

Abstract:Study Objectives: To evaluate the agreement between the millimeter-wave radar-based device and polysomnography (PSG) in diagnosis of obstructive sleep apnea (OSA) and classification of sleep stage in children. Methods: 281 children, aged 1 to 18 years, who underwent sleep monitoring between September and November 2023 at the Sleep Center of Beijing Children's Hospital, Capital Medical University, were recruited in the study. All enrolled children underwent sleep monitoring by PSG and the millimeter-wave radar-based device, QSA600, simultaneously. QSA600 recordings were automatically analyzed using a deep learning model meanwhile the PSG data was manually scored. Results: The Obstructive Apnea-Hypopnea Index (OAHI) obtained from QSA600 and PSG demonstrates a high level of agreement with an intraclass correlation coefficient of 0.945 (95% CI: 0.93 to 0.96). Bland-Altman analysis indicates that the mean difference of OAHI between QSA600 and PSG is -0.10 events/h (95% CI: -11.15 to 10.96). The deep learning model evaluated through cross-validation showed good sensitivity (81.8%, 84.3% and 89.7%) and specificity (90.5%, 95.3% and 97.1%) values for diagnosing children with OAHI>1, OAHI>5 and OAHI>10. The area under the receiver operating characteristic curve is 0.923, 0.955 and 0.988, respectively. For sleep stage classification, the model achieved Kappa coefficients of 0.854, 0.781, and 0.734, with corresponding overall accuracies of 95.0%, 84.8%, and 79.7% for Wake-sleep classification, Wake-REM-Light-Deep classification, and Wake-REM-N1-N2 N3 classification, respectively. Conclusions: QSA600 has demonstrated high agreement with PSG in diagnosing OSA and performing sleep staging in children. The device is portable, low-load and suitable for follow up and long-term pediatric sleep assessment.

Via

Access Paper or Ask Questions

Dark Miner: Defend against unsafe generation for text-to-image diffusion models

Sep 26, 2024

Zheling Meng, Bo Peng, Xiaochuan Jin, Yue Jiang, Jing Dong, Wei Wang, Tieniu Tan

Abstract:Text-to-image diffusion models have been demonstrated with unsafe generation due to unfiltered large-scale training data, such as violent, sexual, and shocking images, necessitating the erasure of unsafe concepts. Most existing methods focus on modifying the generation probabilities conditioned on the texts containing unsafe descriptions. However, they fail to guarantee safe generation for unseen texts in the training phase, especially for the prompts from adversarial attacks. In this paper, we re-analyze the erasure task and point out that existing methods cannot guarantee the minimization of the total probabilities of unsafe generation. To tackle this problem, we propose Dark Miner. It entails a recurring three-stage process that comprises mining, verifying, and circumventing. It greedily mines embeddings with maximum generation probabilities of unsafe concepts and reduces unsafe generation more effectively. In the experiments, we evaluate its performance on two inappropriate concepts, two objects, and two styles. Compared with 6 previous state-of-the-art methods, our method achieves better erasure and defense results in most cases, especially under 4 state-of-the-art attacks, while preserving the model's native generation capability. Our code will be available on GitHub.

Via

Access Paper or Ask Questions

Schrodinger's Memory: Large Language Models

Sep 17, 2024

Wei Wang, Qing Li

Figure 1 for Schrodinger's Memory: Large Language Models

Figure 2 for Schrodinger's Memory: Large Language Models

Figure 3 for Schrodinger's Memory: Large Language Models

Figure 4 for Schrodinger's Memory: Large Language Models

Abstract:Memory is the foundation of all human activities; without memory, it would be nearly impossible for people to perform any task in daily life. With the development of Large Language Models (LLMs), their language capabilities are becoming increasingly comparable to those of humans. But do LLMs have memory? Based on current performance, LLMs do appear to exhibit memory. So, what is the underlying mechanism of this memory? Previous research has lacked a deep exploration of LLMs' memory capabilities and the underlying theory. In this paper, we use Universal Approximation Theorem (UAT) to explain the memory mechanism in LLMs. We also conduct experiments to verify the memory capabilities of various LLMs, proposing a new method to assess their abilities based on these memory ability. We argue that LLM memory operates like Schr\"odinger's memory, meaning that it only becomes observable when a specific memory is queried. We can only determine if the model retains a memory based on its output in response to the query; otherwise, it remains indeterminate. Finally, we expand on this concept by comparing the memory capabilities of the human brain and LLMs, highlighting the similarities and differences in their operational mechanisms.

Via

Access Paper or Ask Questions

AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Sep 03, 2024

Chenghao Qian, Mahdi Rezaei, Saeed Anwar, Wenjing Li, Tanveer Hussain, Mohsen Azarmi, Wei Wang

Figure 1 for AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Figure 2 for AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Figure 3 for AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Figure 4 for AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

Abstract:Adverse conditions like snow, rain, nighttime, and fog, pose challenges for autonomous driving perception systems. Existing methods have limited effectiveness in improving essential computer vision tasks, such as semantic segmentation, and often focus on only one specific condition, such as removing rain or translating nighttime images into daytime ones. To address these limitations, we propose a method to improve the visual quality and clarity degraded by such adverse conditions. Our method, AllWeather-Net, utilizes a novel hierarchical architecture to enhance images across all adverse conditions. This architecture incorporates information at three semantic levels: scene, object, and texture, by discriminating patches at each level. Furthermore, we introduce a Scaled Illumination-aware Attention Mechanism (SIAM) that guides the learning towards road elements critical for autonomous driving perception. SIAM exhibits robustness, remaining unaffected by changes in weather conditions or environmental scenes. AllWeather-Net effectively transforms images into normal weather and daytime scenes, demonstrating superior image enhancement results and subsequently enhancing the performance of semantic segmentation, with up to a 5.3% improvement in mIoU in the trained domain. We also show our model's generalization ability by applying it to unseen domains without re-training, achieving up to 3.9% mIoU improvement. Code can be accessed at: https://github.com/Jumponthemoon/AllWeatherNet.

Via

Access Paper or Ask Questions

PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View

Aug 29, 2024

Zichen Yu, Quanli Liu, Wei Wang, Liyong Zhang, Xiaoguang Zhao

Figure 1 for PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View

Figure 2 for PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View

Figure 3 for PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View

Figure 4 for PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View

Abstract:Recently, LSS-based multi-view 3D object detection provides an economical and deployment-friendly solution for autonomous driving. However, all the existing LSS-based methods transform multi-view image features into a Cartesian Bird's-Eye-View(BEV) representation, which does not take into account the non-uniform image information distribution and hardly exploits the view symmetry. In this paper, in order to adapt the image information distribution and preserve the view symmetry by regular convolution, we propose to employ the polar BEV representation to substitute the Cartesian BEV representation. To achieve this, we elaborately tailor three modules: a polar view transformer to generate the polar BEV representation, a polar temporal fusion module for fusing historical polar BEV features and a polar detection head to predict the polar-parameterized representation of the object. In addition, we design a 2D auxiliary detection head and a spatial attention enhancement module to improve the quality of feature extraction in perspective view and BEV, respectively. Finally, we integrate the above improvements into a novel multi-view 3D object detector, PolarBEVDet. Experiments on nuScenes show that PolarBEVDet achieves the superior performance. The code is available at https://github.com/Yzichen/PolarBEVDet.git.

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content

Aug 29, 2024

Lebin Zhou, Kun Han, Nam Ling, Wei Wang, Wei Jiang

Figure 1 for GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content

Figure 2 for GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content

Figure 3 for GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content

Figure 4 for GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content

Abstract:Image restoration methods like super-resolution and image synthesis have been successfully used in commercial cloud gaming products like NVIDIA's DLSS. However, restoration over gaming content is not well studied by the general public. The discrepancy is mainly caused by the lack of ground-truth gaming training data that match the test cases. Due to the unique characteristics of gaming content, the common approach of generating pseudo training data by degrading the original HR images results in inferior restoration performance. In this work, we develop GameIR, a large-scale high-quality computer-synthesized ground-truth dataset to fill in the blanks, targeting at two different applications. The first is super-resolution with deferred rendering, to support the gaming solution of rendering and transferring LR images only and restoring HR images on the client side. We provide 19200 LR-HR paired ground-truth frames coming from 640 videos rendered at 720p and 1440p for this task. The second is novel view synthesis (NVS), to support the multiview gaming solution of rendering and transferring part of the multiview frames and generating the remaining frames on the client side. This task has 57,600 HR frames from 960 videos of 160 scenes with 6 camera views. In addition to the RGB frames, the GBuffers during the deferred rendering stage are also provided, which can be used to help restoration. Furthermore, we evaluate several SOTA super-resolution algorithms and NeRF-based NVS algorithms over our dataset, which demonstrates the effectiveness of our ground-truth GameIR data in improving restoration performance for gaming content. Also, we test the method of incorporating the GBuffers as additional input information for helping super-resolution and NVS. We release our dataset and models to the general public to facilitate research on restoration methods over gaming content.

Via

Access Paper or Ask Questions

BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems

Aug 28, 2024

Wei Wang, Dan Zhang, Tao Feng, Boyan Wang, Jie Tang

Abstract:Large Language Models (LLMs) are becoming increasingly powerful and capable of handling complex tasks, e.g., building single agents and multi-agent systems. Compared to single agents, multi-agent systems have higher requirements for the collaboration capabilities of language models. Many benchmarks are proposed to evaluate their collaborative abilities. However, these benchmarks lack fine-grained evaluations of LLM collaborative capabilities. Additionally, multi-agent collaborative and competitive scenarios are ignored in existing works. To address these two problems, we propose a benchmark, called BattleAgentBench, which defines seven sub-stages of three varying difficulty levels and conducts a fine-grained evaluation of language models in terms of single-agent scenario navigation capabilities, paired-agent task execution abilities, and multi-agent collaboration and competition capabilities. We conducted extensive evaluations on leading four closed-source and seven open-source models. Experimental results indicate that API-based models perform excellently on simple tasks but open-source small models struggle with simple tasks. Regarding difficult tasks that require collaborative and competitive abilities, although API-based models have demonstrated some collaborative capabilities, there is still enormous room for improvement.

Via

Access Paper or Ask Questions