Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Li

HyperAI Team, Xiaomi Corporation

When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach

Jan 24, 2025

Qian Chen, Lei Li, Qian Li, Jianghua Wu, Akang Wang, Ruoyu Sun, Xiaodong Luo, Tsung-Hui Chang, Qingjiang Shi

Abstract:A common characteristic in integer linear programs (ILPs) is symmetry, allowing variables to be permuted without altering the underlying problem structure. Recently, GNNs have emerged as a promising approach for solving ILPs. However, a significant challenge arises when applying GNNs to ILPs with symmetry: classic GNN architectures struggle to differentiate between symmetric variables, which limits their predictive accuracy. In this work, we investigate the properties of permutation equivariance and invariance in GNNs, particularly in relation to the inherent symmetry of ILP formulations. We reveal that the interaction between these two factors contributes to the difficulty of distinguishing between symmetric variables. To address this challenge, we explore the potential of feature augmentation and propose several guiding principles for constructing augmented features. Building on these principles, we develop an orbit-based augmentation scheme that first groups symmetric variables and then samples augmented features for each group from a discrete uniform distribution. Empirical results demonstrate that our proposed approach significantly enhances both training efficiency and predictive performance.

Via

Access Paper or Ask Questions

Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Jan 15, 2025

Kaiyuan Zheng, Qinghua Zhao, Lei Li

Figure 1 for Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Figure 2 for Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Figure 3 for Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Figure 4 for Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Abstract:The relationship between language and thought remains an unresolved philosophical issue. Existing viewpoints can be broadly categorized into two schools: one asserting their independence, and another arguing that language constrains thought. In the context of large language models, this debate raises a crucial question: Does a language model's grasp of semantic meaning depend on thought processes? To explore this issue, we investigate whether reasoning techniques can facilitate semantic understanding. Specifically, we conceptualize thought as reasoning, employ chain-of-thought prompting as a reasoning technique, and examine its impact on sentiment analysis tasks. The experiments show that chain-of-thought has a minimal impact on sentiment analysis tasks. Both the standard and chain-of-thought prompts focus on aspect terms rather than sentiment in the generated content. Furthermore, counterfactual experiments reveal that the model's handling of sentiment tasks primarily depends on information from demonstrations. The experimental results support the first viewpoint.

Via

Access Paper or Ask Questions

Contrast-Free Myocardial Scar Segmentation in Cine MRI using Motion and Texture Fusion

Jan 09, 2025

Guang Yang, Jingkun Chen, Xicheng Sheng, Shan Yang, Xiahai Zhuang, Betty Raman, Lei Li, Vicente Grau

Abstract:Late gadolinium enhancement MRI (LGE MRI) is the gold standard for the detection of myocardial scars for post myocardial infarction (MI). LGE MRI requires the injection of a contrast agent, which carries potential side effects and increases scanning time and patient discomfort. To address these issues, we propose a novel framework that combines cardiac motion observed in cine MRI with image texture information to segment the myocardium and scar tissue in the left ventricle. Cardiac motion tracking can be formulated as a full cardiac image cycle registration problem, which can be solved via deep neural networks. Experimental results prove that the proposed method can achieve scar segmentation based on non-contrasted cine images with comparable accuracy to LGE MRI. This demonstrates its potential as an alternative to contrast-enhanced techniques for scar detection.

* 5 pages, 2figs, 2tables

Via

Access Paper or Ask Questions

Addressing Domain Shift via Imbalance-Aware Domain Adaptation in Embryo Development Assessment

Jan 09, 2025

Lei Li, Xinglin Zhang, Jun Liang, Tao Chen

Abstract:Deep learning models in medical imaging face dual challenges: domain shift, where models perform poorly when deployed in settings different from their training environment, and class imbalance, where certain disease conditions are naturally underrepresented. We present Imbalance-Aware Domain Adaptation (IADA), a novel framework that simultaneously tackles both challenges through three key components: (1) adaptive feature learning with class-specific attention mechanisms, (2) balanced domain alignment with dynamic weighting, and (3) adaptive threshold optimization. Our theoretical analysis establishes convergence guarantees and complexity bounds. Through extensive experiments on embryo development assessment across four imaging modalities, IADA demonstrates significant improvements over existing methods, achieving up to 25.19\% higher accuracy while maintaining balanced performance across classes. In challenging scenarios with low-quality imaging systems, IADA shows robust generalization with AUC improvements of up to 12.56\%. These results demonstrate IADA's potential for developing reliable and equitable medical imaging systems for diverse clinical settings. The code is made public available at \url{https://github.com/yinghemedical/imbalance-aware_domain_adaptation}

* 15 pages

Via

Access Paper or Ask Questions

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Dec 30, 2024

Liang Chen, Zekun Wang, Shuhuai Ren, Lei Li, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong(+17 more)

Figure 1 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Figure 2 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Figure 3 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Figure 4 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Abstract:Building on the foundations of language modeling in natural language processing, Next Token Prediction (NTP) has evolved into a versatile training objective for machine learning tasks across various modalities, achieving considerable success. As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context. This survey introduces a comprehensive taxonomy that unifies both understanding and generation within multimodal learning through the lens of NTP. The proposed taxonomy covers five key aspects: Multimodal tokenization, MMNTP model architectures, unified task representation, datasets \& evaluation, and open challenges. This new taxonomy aims to aid researchers in their exploration of multimodal intelligence. An associated GitHub repository collecting the latest papers and repos is available at https://github.com/LMM101/Awesome-Multimodal-Next-Token-Prediction

* 69 papes, 18 figures, repo at https://github.com/LMM101/Awesome-Multimodal-Next-Token-Prediction

Via

Access Paper or Ask Questions

Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection

Dec 22, 2024

Yi Liu, Chengxin Li, Xiaohui Dong, Lei Li, Dingwen Zhang, Shoukun Xu, Jungong Han

Abstract:Achieving joint learning of Salient Object Detection (SOD) and Camouflaged Object Detection (COD) is extremely challenging due to their distinct object characteristics, i.e., saliency and camouflage. The only preliminary research treats them as two contradictory tasks, training models on large-scale labeled data alternately for each task and assessing them independently. However, such task-specific mechanisms fail to meet real-world demands for addressing unknown tasks effectively. To address this issue, in this paper, we pioneer a task-agnostic framework to unify SOD and COD. To this end, inspired by the agreeable nature of binary segmentation for SOD and COD, we propose a Contrastive Distillation Paradigm (CDP) to distil the foreground from the background, facilitating the identification of salient and camouflaged objects amidst their surroundings. To probe into the contribution of our CDP, we design a simple yet effective contextual decoder involving the interval-layer and global context, which achieves an inference speed of 67 fps. Besides the supervised setting, our CDP can be seamlessly integrated into unsupervised settings, eliminating the reliance on extensive human annotations. Experiments on public SOD and COD datasets demonstrate the superiority of our proposed framework in both supervised and unsupervised settings, compared with existing state-of-the-art approaches. Code is available on https://github.com/liuyi1989/Seamless-Detection.

Via

Access Paper or Ask Questions

GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models

Dec 17, 2024

Mukai Li, Lei Li, Shansan Gong, Qi Liu

Figure 1 for GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models

Figure 2 for GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models

Figure 3 for GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models

Figure 4 for GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models

Abstract:Visual Language Models (VLMs) demonstrate impressive capabilities in processing multimodal inputs, yet applications such as visual agents, which require handling multiple images and high-resolution videos, demand enhanced long-range modeling. Moreover, existing open-source VLMs lack systematic exploration into extending their context length, and commercial models often provide limited details. To tackle this, we aim to establish an effective solution that enhances long context performance of VLMs while preserving their capacities in short context scenarios. Towards this goal, we make the best design choice through extensive experiment settings from data curation to context window extending and utilizing: (1) we analyze data sources and length distributions to construct ETVLM - a data recipe to balance the performance across scenarios; (2) we examine existing position extending methods, identify their limitations and propose M-RoPE++ as an enhanced approach; we also choose to solely instruction-tune the backbone with mixed-source data; (3) we discuss how to better utilize extended context windows and propose hybrid-resolution training. Built on the Qwen-VL series model, we propose Giraffe, which is effectively extended to 128K lengths. Evaluated on extensive long context VLM benchmarks such as VideoMME and Viusal Haystacks, our Giraffe achieves state-of-the-art performance among similarly sized open-source long VLMs and is competitive with commercial model GPT-4V. We will open-source the code, data, and models.

* Working in progress

Via

Access Paper or Ask Questions

MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Dec 16, 2024

Daoyi Gao, Yawar Siddiqui, Lei Li, Angela Dai

Figure 1 for MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Figure 2 for MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Figure 3 for MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Figure 4 for MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Abstract:Articulated 3D object generation is fundamental for creating realistic, functional, and interactable virtual assets which are not simply static. We introduce MeshArt, a hierarchical transformer-based approach to generate articulated 3D meshes with clean, compact geometry, reminiscent of human-crafted 3D models. We approach articulated mesh generation in a part-by-part fashion across two stages. First, we generate a high-level articulation-aware object structure; then, based on this structural information, we synthesize each part's mesh faces. Key to our approach is modeling both articulation structures and part meshes as sequences of quantized triangle embeddings, leading to a unified hierarchical framework with transformers for autoregressive generation. Object part structures are first generated as their bounding primitives and articulation modes; a second transformer, guided by these articulation structures, then generates each part's mesh triangles. To ensure coherency among generated parts, we introduce structure-guided conditioning that also incorporates local part mesh connectivity. MeshArt shows significant improvements over state of the art, with 57.1% improvement in structure coverage and a 209-point improvement in mesh generation FID.

* Project Page: https://daoyig.github.io/Mesh_Art/

Via

Access Paper or Ask Questions

Position-aware Guided Point Cloud Completion with CLIP Model

Dec 11, 2024

Feng Zhou, Qi Zhang, Ju Dai, Lei Li, Qing Fan, Junliang Xing

Figure 1 for Position-aware Guided Point Cloud Completion with CLIP Model

Figure 2 for Position-aware Guided Point Cloud Completion with CLIP Model

Figure 3 for Position-aware Guided Point Cloud Completion with CLIP Model

Figure 4 for Position-aware Guided Point Cloud Completion with CLIP Model

Abstract:Point cloud completion aims to recover partial geometric and topological shapes caused by equipment defects or limited viewpoints. Current methods either solely rely on the 3D coordinates of the point cloud to complete it or incorporate additional images with well-calibrated intrinsic parameters to guide the geometric estimation of the missing parts. Although these methods have achieved excellent performance by directly predicting the location of complete points, the extracted features lack fine-grained information regarding the location of the missing area. To address this issue, we propose a rapid and efficient method to expand an unimodal framework into a multimodal framework. This approach incorporates a position-aware module designed to enhance the spatial information of the missing parts through a weighted map learning mechanism. In addition, we establish a Point-Text-Image triplet corpus PCI-TI and MVP-TI based on the existing unimodal point cloud completion dataset and use the pre-trained vision-language model CLIP to provide richer detail information for 3D shapes, thereby enhancing performance. Extensive quantitative and qualitative experiments demonstrate that our method outperforms state-of-the-art point cloud completion methods.

* Accepted by AAAI25

Via

Access Paper or Ask Questions

A Practical Examination of AI-Generated Text Detectors for Large Language Models

Dec 06, 2024

Brian Tufts, Xuandong Zhao, Lei Li

Figure 1 for A Practical Examination of AI-Generated Text Detectors for Large Language Models

Figure 2 for A Practical Examination of AI-Generated Text Detectors for Large Language Models

Figure 3 for A Practical Examination of AI-Generated Text Detectors for Large Language Models

Figure 4 for A Practical Examination of AI-Generated Text Detectors for Large Language Models

Abstract:The proliferation of large language models has raised growing concerns about their misuse, particularly in cases where AI-generated text is falsely attributed to human authors. Machine-generated content detectors claim to effectively identify such text under various conditions and from any language model. This paper critically evaluates these claims by assessing several popular detectors (RADAR, Wild, T5Sentinel, Fast-DetectGPT, GPTID, LogRank, Binoculars) on a range of domains, datasets, and models that these detectors have not previously encountered. We employ various prompting strategies to simulate adversarial attacks, demonstrating that even moderate efforts can significantly evade detection. We emphasize the importance of the true positive rate at a specific false positive rate (TPR@FPR) metric and demonstrate that these detectors perform poorly in certain settings, with TPR@.01 as low as 0\%. Our findings suggest that both trained and zero-shot detectors struggle to maintain high sensitivity while achieving a reasonable true positive rate.

* 8 pages. Submitted to ARR October cycle

Via

Access Paper or Ask Questions