Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Li

Tsinghua University, Beijing, China

DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

Dec 25, 2023

Li Xiang, Junbo Yin, Wei Li, Cheng-Zhong Xu, Ruigang Yang, Jianbing Shen

Figure 1 for DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

Figure 2 for DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

Figure 3 for DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

Figure 4 for DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

Abstract:Vehicle-to-Everything (V2X) collaborative perception has recently gained significant attention due to its capability to enhance scene understanding by integrating information from various agents, e.g., vehicles, and infrastructure. However, current works often treat the information from each agent equally, ignoring the inherent domain gap caused by the utilization of different LiDAR sensors of each agent, thus leading to suboptimal performance. In this paper, we propose DI-V2X, that aims to learn Domain-Invariant representations through a new distillation framework to mitigate the domain discrepancy in the context of V2X 3D object detection. DI-V2X comprises three essential components: a domain-mixing instance augmentation (DMA) module, a progressive domain-invariant distillation (PDD) module, and a domain-adaptive fusion (DAF) module. Specifically, DMA builds a domain-mixing 3D instance bank for the teacher and student models during training, resulting in aligned data representation. Next, PDD encourages the student models from different domains to gradually learn a domain-invariant feature representation towards the teacher, where the overlapping regions between agents are employed as guidance to facilitate the distillation process. Furthermore, DAF closes the domain gap between the students by incorporating calibration-aware domain-adaptive attention. Extensive experiments on the challenging DAIR-V2X and V2XSet benchmark datasets demonstrate DI-V2X achieves remarkable performance, outperforming all the previous V2X models. Code is available at https://github.com/Serenos/DI-V2X

* aaai2024

Via

Access Paper or Ask Questions

CGS-Mask: Making Time Series Predictions Intuitive for All

Dec 20, 2023

Feng Lu, Wei Li, Yifei Sun, Cheng Song, Yufei Ren, Albert Y. Zomaya

Figure 1 for CGS-Mask: Making Time Series Predictions Intuitive for All

Figure 2 for CGS-Mask: Making Time Series Predictions Intuitive for All

Figure 3 for CGS-Mask: Making Time Series Predictions Intuitive for All

Figure 4 for CGS-Mask: Making Time Series Predictions Intuitive for All

Abstract:Artificial intelligence (AI) has immense potential in time series prediction, but most explainable tools have limited capabilities in providing a systematic understanding of important features over time. These tools typically rely on evaluating a single time point, overlook the time ordering of inputs, and neglect the time-sensitive nature of time series applications. These factors make it difficult for users, particularly those without domain knowledge, to comprehend AI model decisions and obtain meaningful explanations. We propose CGS-Mask, a post-hoc and model-agnostic cellular genetic strip mask-based saliency approach to address these challenges. CGS-Mask uses consecutive time steps as a cohesive entity to evaluate the impact of features on the final prediction, providing binary and sustained feature importance scores over time. Our algorithm optimizes the mask population iteratively to obtain the optimal mask in a reasonable time. We evaluated CGS-Mask on synthetic and real-world datasets, and it outperformed state-of-the-art methods in elucidating the importance of features over time. According to our pilot user study via a questionnaire survey, CGS-Mask is the most effective approach in presenting easily understandable time series prediction results, enabling users to comprehend the decision-making process of AI models with ease.

Via

Access Paper or Ask Questions

Domain Similarity-Perceived Label Assignment for Domain Generalized Underwater Object Detection

Dec 20, 2023

Xisheng Li, Wei Li, Pinhao Song, Mingjun Zhang, Jie Zhou

Abstract:The inherent characteristics and light fluctuations of water bodies give rise to the huge difference between different layers and regions in underwater environments. When the test set is collected in a different marine area from the training set, the issue of domain shift emerges, significantly compromising the model's ability to generalize. The Domain Adversarial Learning (DAL) training strategy has been previously utilized to tackle such challenges. However, DAL heavily depends on manually one-hot domain labels, which implies no difference among the samples in the same domain. Such an assumption results in the instability of DAL. This paper introduces the concept of Domain Similarity-Perceived Label Assignment (DSP). The domain label for each image is regarded as its similarity to the specified domains. Through domain-specific data augmentation techniques, we achieved state-of-the-art results on the underwater cross-domain object detection benchmark S-UODAC2020. Furthermore, we validated the effectiveness of our method in the Cityscapes dataset.

* 10 pages,7 figures

Via

Access Paper or Ask Questions

UINav: A maker of UI automation agents

Dec 15, 2023

Wei Li, Fu-Lin Hsu, Will Bishop, Folawiyo Campbell-Ajala, Oriana Riva, Max Lin

Figure 1 for UINav: A maker of UI automation agents

Figure 2 for UINav: A maker of UI automation agents

Figure 3 for UINav: A maker of UI automation agents

Figure 4 for UINav: A maker of UI automation agents

Abstract:An automation system that can execute natural language instructions by driving the user interface (UI) of an application can benefit users, especially when situationally or permanently impaired. Traditional automation systems (manual scripting, programming by demonstration tools, etc.) do not produce generalizable models that can tolerate changes in the UI or task workflow. Machine-learned automation agents generalize better, but either work only in simple, hand-crafted applications or rely on large pre-trained models, which may be too computationally expensive to run on mobile devices. In this paper, we propose \emph{UINav}, a demonstration-based agent maker system. UINav agents are lightweight enough to run on mobile devices, yet they achieve high success rates with a modest number of task demonstrations. To minimize the number of task demonstrations, UINav includes a referee model that allows users to receive immediate feedback on tasks where the agent is failing to best guide efforts to collect additional demonstrations. Further, UINav adopts macro actions to reduce an agent's state space, and augments human demonstrations to increase the diversity of training data. Our evaluation demonstrates that with an average of 10 demonstrations per task UINav can achieve an accuracy of 70\% or higher, and that with enough demonstrations it can achieve near-perfect success rates on 40+ different tasks.

Via

Access Paper or Ask Questions

CBQ: Cross-Block Quantization for Large Language Models

Dec 13, 2023

Xin Ding, Xiaoyu Liu, Yun Zhang, Zhijun Tu, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin(+1 more)

Figure 1 for CBQ: Cross-Block Quantization for Large Language Models

Figure 2 for CBQ: Cross-Block Quantization for Large Language Models

Figure 3 for CBQ: Cross-Block Quantization for Large Language Models

Figure 4 for CBQ: Cross-Block Quantization for Large Language Models

Abstract:Post-training quantization (PTQ) has driven attention to producing efficient large language models (LLMs) with ultra-low costs. Since hand-craft quantization parameters lead to low performance in low-bit quantization, recent methods optimize the quantization parameters through block-wise reconstruction between the floating-point and quantized models. However, these methods suffer from two challenges: accumulated errors from independent one-by-one block quantization and reconstruction difficulties from extreme weight and activation outliers. To address these two challenges, we propose CBQ, a cross-block reconstruction-based PTQ method for LLMs. To reduce error accumulation, we introduce a cross-block dependency with the aid of a homologous reconstruction scheme to build the long-range dependency between adjacent multi-blocks with overlapping. To reduce reconstruction difficulty, we design a coarse-to-fine pre-processing (CFP) to truncate weight outliers and dynamically scale activation outliers before optimization, and an adaptive rounding scheme, called LoRA-Rounding, with two low-rank learnable matrixes to further rectify weight quantization errors. Extensive experiments demonstrate that: (1) CBQ pushes both activation and weight quantization to low-bit settings W4A4, W4A8, and W2A16. (2) CBQ achieves better performance than the existing state-of-the-art methods on various LLMs and benchmark datasets.

Via

Access Paper or Ask Questions

Brain Computer Interface Technology for Future Battlefield

Dec 13, 2023

Guodong Xiong, Xinyan Ma, Wei Li, Jiaqi Cao, Jian Zhong, Yicong Su

Abstract:With the development of artificial intelligence and unmanned equipment, human-machine hybrid formations will be the main focus in future combat formations. With the development of big data and various situational awareness technologies, while enhancing the breadth and depth of information, decision-making has also become more complex. The operation mode of existing unmanned equipment often requires complex manual input, which is not conducive to the battlefield environment. How to reduce the cognitive load of information exchange between soldiers and various unmanned equipment is an important issue in future intelligent warfare. This paper proposes a brain computer interface communication system for soldier combat, which takes into account the characteristics of soldier combat scenarios in design. The stimulation paradigm is combined with helmets, portable computers, and firearms, and brain computer interface technology is used to achieve fast, barrier free, and hands-free communication between humans and machines. Intelligent algorithms are combined to assist decision-making in fully perceiving and fusing situational information on the battlefield, and a large amount of data is processed quickly, understanding and integrating a large amount of data from human and machine networks, achieving real-time perception of battlefield information, making intelligent decisions, and achieving the effect of direct control of drone swarms and other equipment by the human brain to assist in soldier scenarios.

* 4 pages, 1 figure

Via

Access Paper or Ask Questions

GenDet: Towards Good Generalizations for AI-Generated Image Detection

Dec 12, 2023

Mingjian Zhu, Hanting Chen, Mouxiao Huang, Wei Li, Hailin Hu, Jie Hu, Yunhe Wang

Figure 1 for GenDet: Towards Good Generalizations for AI-Generated Image Detection

Figure 2 for GenDet: Towards Good Generalizations for AI-Generated Image Detection

Figure 3 for GenDet: Towards Good Generalizations for AI-Generated Image Detection

Figure 4 for GenDet: Towards Good Generalizations for AI-Generated Image Detection

Abstract:The misuse of AI imagery can have harmful societal effects, prompting the creation of detectors to combat issues like the spread of fake news. Existing methods can effectively detect images generated by seen generators, but it is challenging to detect those generated by unseen generators. They do not concentrate on amplifying the output discrepancy when detectors process real versus fake images. This results in a close output distribution of real and fake samples, increasing classification difficulty in detecting unseen generators. This paper addresses the unseen-generator detection problem by considering this task from the perspective of anomaly detection and proposes an adversarial teacher-student discrepancy-aware framework. Our method encourages smaller output discrepancies between the student and the teacher models for real images while aiming for larger discrepancies for fake images. We employ adversarial learning to train a feature augmenter, which promotes smaller discrepancies between teacher and student networks when the inputs are fake images. Our method has achieved state-of-the-art on public benchmarks, and the visualization results show that a large output discrepancy is maintained when faced with various types of generators.

Via

Access Paper or Ask Questions

Unsupervised learning of site percolation based on shuffled configurations

Nov 20, 2023

Dian Xu, Shanshan Wang, Feng Gao, Wei Li, Jianmin Shen

Figure 1 for Unsupervised learning of site percolation based on shuffled configurations

Figure 2 for Unsupervised learning of site percolation based on shuffled configurations

Figure 3 for Unsupervised learning of site percolation based on shuffled configurations

Figure 4 for Unsupervised learning of site percolation based on shuffled configurations

Abstract:In the field of statistical physics, machine learning has gained significant popularity and has achieved remarkable results in recent studies on phase transitions.In this paper, we apply Principal Component Analysis (PCA) and Autoencoder(AE) based on Unsupervised learning to study the various configurations of the percolation model in equilibrium phase transition. In certain phase transition models, such as the DP model in non-equilibrium phase transitions, the order parameter is particle density. However, in some other phase transition models, such as the percolation model, it is not. This study involved randomizing and selecting percolation graphs to be used as input for a neural network, and analyzed the obtained results, indicating that the outputs of the single latent variable of AE and the first principal component of PCA are signals related to particle density.

* 10 pages,8 figures,41 references

Via

Access Paper or Ask Questions

FireMatch: A Semi-Supervised Video Fire Detection Network Based on Consistency and Distribution Alignment

Nov 09, 2023

Qinghua Lin, Zuoyong Li, Kun Zeng, Haoyi Fan, Wei Li, Xiaoguang Zhou

Figure 1 for FireMatch: A Semi-Supervised Video Fire Detection Network Based on Consistency and Distribution Alignment

Figure 2 for FireMatch: A Semi-Supervised Video Fire Detection Network Based on Consistency and Distribution Alignment

Figure 3 for FireMatch: A Semi-Supervised Video Fire Detection Network Based on Consistency and Distribution Alignment

Figure 4 for FireMatch: A Semi-Supervised Video Fire Detection Network Based on Consistency and Distribution Alignment

Abstract:Deep learning techniques have greatly enhanced the performance of fire detection in videos. However, video-based fire detection models heavily rely on labeled data, and the process of data labeling is particularly costly and time-consuming, especially when dealing with videos. Considering the limited quantity of labeled video data, we propose a semi-supervised fire detection model called FireMatch, which is based on consistency regularization and adversarial distribution alignment. Specifically, we first combine consistency regularization with pseudo-label. For unlabeled data, we design video data augmentation to obtain corresponding weakly augmented and strongly augmented samples. The proposed model predicts weakly augmented samples and retains pseudo-label above a threshold, while training on strongly augmented samples to predict these pseudo-labels for learning more robust feature representations. Secondly, we generate video cross-set augmented samples by adversarial distribution alignment to expand the training data and alleviate the decline in classification performance caused by insufficient labeled data. Finally, we introduce a fairness loss to help the model produce diverse predictions for input samples, thereby addressing the issue of high confidence with the non-fire class in fire classification scenarios. The FireMatch achieved an accuracy of 76.92% and 91.81% on two real-world fire datasets, respectively. The experimental results demonstrate that the proposed method outperforms the current state-of-the-art semi-supervised classification methods.

Via

Access Paper or Ask Questions

An invariant feature extraction for multi-modal images matching

Nov 06, 2023

Chenzhong Gao, Wei Li

Abstract:This paper aims at providing an effective multi-modal images invariant feature extraction and matching algorithm for the application of multi-source data analysis. Focusing on the differences and correlation of multi-modal images, a feature-based matching algorithm is implemented. The key technologies include phase congruency (PC) and Shi-Tomasi feature point for keypoints detection, LogGabor filter and a weighted partial main orientation map (WPMOM) for feature extraction, and a multi-scale process to deal with scale differences and optimize matching results. The experimental results on practical data from multiple sources prove that the algorithm has effective performances on multi-modal images, which achieves accurate spatial alignment, showing practical application value and good generalization.

Via

Access Paper or Ask Questions