Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ming Li

School of Integrated Circuits, Peking University

From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Nov 06, 2024

Charles Zhang, Benji Peng, Xintian Sun, Qian Niu, Junyu Liu, Keyu Chen, Ming Li, Pohsun Feng, Ziqian Bi, Ming Liu(+5 more)

Figure 1 for From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Figure 2 for From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Figure 3 for From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Abstract:Word embeddings and language models have transformed natural language processing (NLP) by facilitating the representation of linguistic elements in continuous vector spaces. This review visits foundational concepts such as the distributional hypothesis and contextual similarity, tracing the evolution from sparse representations like one-hot encoding to dense embeddings including Word2Vec, GloVe, and fastText. We examine both static and contextualized embeddings, underscoring advancements in models such as ELMo, BERT, and GPT and their adaptations for cross-lingual and personalized applications. The discussion extends to sentence and document embeddings, covering aggregation methods and generative topic models, along with the application of embeddings in multimodal domains, including vision, robotics, and cognitive science. Advanced topics such as model compression, interpretability, numerical encoding, and bias mitigation are analyzed, addressing both technical challenges and ethical implications. Additionally, we identify future research directions, emphasizing the need for scalable training techniques, enhanced interpretability, and robust grounding in non-textual modalities. By synthesizing current methodologies and emerging trends, this survey offers researchers and practitioners an in-depth resource to push the boundaries of embedding-based language models.

* 21 pages

Via

Access Paper or Ask Questions

Urban Flood Mapping Using Satellite Synthetic Aperture Radar Data: A Review of Characteristics, Approaches and Datasets

Nov 06, 2024

Jie Zhao, Ming Li, Yu Li, Patrick Matgen, Marco Chini

Figure 1 for Urban Flood Mapping Using Satellite Synthetic Aperture Radar Data: A Review of Characteristics, Approaches and Datasets

Figure 2 for Urban Flood Mapping Using Satellite Synthetic Aperture Radar Data: A Review of Characteristics, Approaches and Datasets

Figure 3 for Urban Flood Mapping Using Satellite Synthetic Aperture Radar Data: A Review of Characteristics, Approaches and Datasets

Figure 4 for Urban Flood Mapping Using Satellite Synthetic Aperture Radar Data: A Review of Characteristics, Approaches and Datasets

Abstract:Understanding the extent of urban flooding is crucial for assessing building damage, casualties and economic losses. Synthetic Aperture Radar (SAR) technology offers significant advantages for mapping flooded urban areas due to its ability to collect data regardless weather and solar illumination conditions. However, the wide range of existing methods makes it difficult to choose the best approach for a specific situation and to identify future research directions. Therefore, this study provides a comprehensive review of current research on urban flood mapping using SAR data, summarizing key characteristics of floodwater in SAR images and outlining various approaches from scientific articles. Additionally, we provide a brief overview of the advantages and disadvantages of each method category, along with guidance on selecting the most suitable approach for different scenarios. This study focuses on the challenges and advancements in SAR-based urban flood mapping. It specifically addresses the limitations of spatial and temporal resolution in SAR data and discusses the essential pre-processing steps. Moreover, the article explores the potential benefits of Polarimetric SAR (PolSAR) techniques and uncertainty analysis for future research. Furthermore, it highlights a lack of open-access SAR datasets for urban flood mapping, hindering development in advanced deep learning-based methods. Besides, we evaluated the Technology Readiness Levels (TRLs) of urban flood mapping techniques to identify challenges and future research areas. Finally, the study explores the practical applications of SAR-based urban flood mapping in both the private and public sectors and provides a comprehensive overview of the benefits and potential impact of these methods.

* Accepted by IEEE Geoscience and Remote Sensing Magazine

Via

Access Paper or Ask Questions

EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark

Nov 03, 2024

Ming Li, Jike Zhong, Tianle Chen, Yuxiang Lai, Konstantinos Psounis

Abstract:Recent studies on large language models (LLMs) and large multimodal models (LMMs) have demonstrated promising skills in various domains including science and mathematics. However, their capability in more challenging and real-world related scenarios like engineering has not been systematically studied. To bridge this gap, we propose EEE-Bench, a multimodal benchmark aimed at assessing LMMs' capabilities in solving practical engineering tasks, using electrical and electronics engineering (EEE) as the testbed. Our benchmark consists of 2860 carefully curated problems spanning 10 essential subdomains such as analog circuits, control systems, etc. Compared to benchmarks in other domains, engineering problems are intrinsically 1) more visually complex and versatile and 2) less deterministic in solutions. Successful solutions to these problems often demand more-than-usual rigorous integration of visual and textual information as models need to understand intricate images like abstract circuits and system diagrams while taking professional instructions, making them excellent candidates for LMM evaluations. Alongside EEE-Bench, we provide extensive quantitative evaluations and fine-grained analysis of 17 widely-used open and closed-sourced LLMs and LMMs. Our results demonstrate notable deficiencies of current foundation models in EEE, with an average performance ranging from 19.48% to 46.78%. Finally, we reveal and explore a critical shortcoming in LMMs which we term laziness: the tendency to take shortcuts by relying on the text while overlooking the visual context when reasoning for technical image problems. In summary, we believe EEE-Bench not only reveals some noteworthy limitations of LMMs but also provides a valuable resource for advancing research on their application in practical engineering tasks, driving future improvements in their capability to handle complex, real-world scenarios.

* preprint

Via

Access Paper or Ask Questions

A Survey on Bundle Recommendation: Methods, Applications, and Challenges

Nov 01, 2024

Meng Sun, Lin Li, Ming Li, Xiaohui Tao, Dong Zhang, Peipei Wang, Jimmy Xiangji Huang

Figure 1 for A Survey on Bundle Recommendation: Methods, Applications, and Challenges

Figure 2 for A Survey on Bundle Recommendation: Methods, Applications, and Challenges

Figure 3 for A Survey on Bundle Recommendation: Methods, Applications, and Challenges

Figure 4 for A Survey on Bundle Recommendation: Methods, Applications, and Challenges

Abstract:In recent years, bundle recommendation systems have gained significant attention in both academia and industry due to their ability to enhance user experience and increase sales by recommending a set of items as a bundle rather than individual items. This survey provides a comprehensive review on bundle recommendation, beginning by a taxonomy for exploring product bundling. We classify it into two categories based on bundling strategy from various application domains, i.e., discriminative and generative bundle recommendation. Then we formulate the corresponding tasks of the two categories and systematically review their methods: 1) representation learning from bundle and item levels and interaction modeling for discriminative bundle recommendation; 2) representation learning from item level and bundle generation for generative bundle recommendation. Subsequently, we survey the resources of bundle recommendation including datasets and evaluation metrics, and conduct reproducibility experiments on mainstream models. Lastly, we discuss the main challenges and highlight the promising future directions in the field of bundle recommendation, aiming to serve as a useful resource for researchers and practitioners. Our code and datasets are publicly available at https://github.com/WUT-IDEA/bundle-recommendation-survey.

Via

Access Paper or Ask Questions

CRB Optimization using a Parametric Scattering Model for Extended Targets in ISAC Systems

Oct 31, 2024

Rang Liu, A. Lee Swindlehurst, Ming Li

Abstract:This paper presents a novel parametric scattering model (PSM) for sensing extended targets in integrated sensing and communication (ISAC) systems. The PSM addresses the limitations of traditional models by efficiently capturing the target's angular characteristics through a compact set of key parameters, including the central angle and angular spread, enabling efficient optimization. Based on the PSM, we first derive the Cramer-Rao Bound (CRB) for parameter estimation and then propose a beamforming design algorithm to minimize the CRB while meeting both communication signal-to-interference-plus-noise ratio (SINR) and power constraints. By integrating the PSM into the beamforming optimization process, the proposed framework achieves superior CRB performance while balancing the tradeoff between sensing accuracy and communication quality. Simulation results demonstrate that the PSM-based approach consistently outperforms traditional unstructured and discrete scattering models, particularly in resource-limited scenarios, highlighting its practical applicability and scalability.

* 5 pages, 3 figures

Via

Access Paper or Ask Questions

How Do Flow Matching Models Memorize and Generalize in Sample Data Subspaces?

Oct 31, 2024

Weiguo Gao, Ming Li

Figure 1 for How Do Flow Matching Models Memorize and Generalize in Sample Data Subspaces?

Figure 2 for How Do Flow Matching Models Memorize and Generalize in Sample Data Subspaces?

Figure 3 for How Do Flow Matching Models Memorize and Generalize in Sample Data Subspaces?

Figure 4 for How Do Flow Matching Models Memorize and Generalize in Sample Data Subspaces?

Abstract:Real-world data is often assumed to lie within a low-dimensional structure embedded in high-dimensional space. In practical settings, we observe only a finite set of samples, forming what we refer to as the sample data subspace. It serves an essential approximation supporting tasks such as dimensionality reduction and generation. A major challenge lies in whether generative models can reliably synthesize samples that stay within this subspace rather than drifting away from the underlying structure. In this work, we provide theoretical insights into this challenge by leveraging Flow Matching models, which transform a simple prior into a complex target distribution via a learned velocity field. By treating the real data distribution as discrete, we derive analytical expressions for the optimal velocity field under a Gaussian prior, showing that generated samples memorize real data points and represent the sample data subspace exactly. To generalize to suboptimal scenarios, we introduce the Orthogonal Subspace Decomposition Network (OSDNet), which systematically decomposes the velocity field into subspace and off-subspace components. Our analysis shows that the off-subspace component decays, while the subspace component generalizes within the sample data subspace, ensuring generated samples preserve both proximity and diversity.

* 33 pages, 9 figures

Via

Access Paper or Ask Questions

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Oct 31, 2024

Ming Li, Yanhong Li, Tianyi Zhou

Figure 1 for What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Figure 2 for What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Figure 3 for What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Figure 4 for What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Abstract:What makes a difference in the post-training of LLMs? We investigate the training patterns of different layers in large language models (LLMs), through the lens of gradient, when training with different responses and initial models. We are specifically interested in how fast vs. slow thinking affects the layer-wise gradients, given the recent popularity of training LLMs on reasoning paths such as chain-of-thoughts (CoT) and process rewards. In our study, fast thinking without CoT leads to larger gradients and larger differences of gradients across layers than slow thinking (Detailed CoT), indicating the learning stability brought by the latter. Moreover, pre-trained LLMs are less affected by the instability of fast thinking than instruction-tuned LLMs. Additionally, we study whether the gradient patterns can reflect the correctness of responses when training different LLMs using slow vs. fast thinking paths. The results show that the gradients of slow thinking can distinguish correct and irrelevant reasoning paths. As a comparison, we conduct similar gradient analyses on non-reasoning knowledge learning tasks, on which, however, trivially increasing the response length does not lead to similar behaviors of slow thinking. Our study strengthens fundamental understandings of LLM training and sheds novel insights on its efficiency and stability, which pave the way towards building a generalizable System-2 agent. Our code, data, and gradient statistics can be found in: https://github.com/MingLiiii/Layer_Gradient.

Via

Access Paper or Ask Questions

Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application

Oct 30, 2024

Keyu Chen, Cheng Fei, Ziqian Bi, Junyu Liu, Benji Peng, Sen Zhang, Xuanhe Pan, Jiawei Xu, Jinlang Wang, Caitlyn Heqi Yin(+15 more)

Figure 1 for Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application

Abstract:With a focus on natural language processing (NLP) and the role of large language models (LLMs), we explore the intersection of machine learning, deep learning, and artificial intelligence. As artificial intelligence continues to revolutionize fields from healthcare to finance, NLP techniques such as tokenization, text classification, and entity recognition are essential for processing and understanding human language. This paper discusses advanced data preprocessing techniques and the use of frameworks like Hugging Face for implementing transformer-based models. Additionally, it highlights challenges such as handling multilingual data, reducing bias, and ensuring model robustness. By addressing key aspects of data processing and model fine-tuning, this work aims to provide insights into deploying effective and ethically sound AI solutions.

* 255 pages

Via

Access Paper or Ask Questions

Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning

Oct 29, 2024

an Zhang, Ming Li, Chun Li, Zhaoxia Liu, Ye Zhang, Fei Richard Yu

Figure 1 for Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning

Figure 2 for Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning

Figure 3 for Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning

Figure 4 for Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning

Abstract:Evidence-based deep learning represents a burgeoning paradigm for uncertainty estimation, offering reliable predictions with negligible extra computational overheads. Existing methods usually adopt Kullback-Leibler divergence to estimate the uncertainty of network predictions, ignoring domain gaps among various modalities. To tackle this issue, this paper introduces a novel algorithm based on H\"older Divergence (HD) to enhance the reliability of multi-view learning by addressing inherent uncertainty challenges from incomplete or noisy data. Generally, our method extracts the representations of multiple modalities through parallel network branches, and then employs HD to estimate the prediction uncertainties. Through the Dempster-Shafer theory, integration of uncertainty from different modalities, thereby generating a comprehensive result that considers all available representations. Mathematically, HD proves to better measure the ``distance'' between real data distribution and predictive distribution of the model and improve the performances of multi-class recognition tasks. Specifically, our method surpass the existing state-of-the-art counterparts on all evaluating benchmarks. We further conduct extensive experiments on different backbones to verify our superior robustness. It is demonstrated that our method successfully pushes the corresponding performance boundaries. Finally, we perform experiments on more challenging scenarios, \textit{i.e.}, learning with incomplete or noisy data, revealing that our method exhibits a high tolerance to such corrupted data.

* NA

Via

Access Paper or Ask Questions

Large Language Model Benchmarks in Medical Tasks

Oct 28, 2024

Lawrence K. Q. Yan, Ming Li, Yichao Zhang, Caitlyn Heqi Yin, Cheng Fei, Benji Peng, Ziqian Bi, Pohsun Feng, Keyu Chen, Junyu Liu(+1 more)

Abstract:With the increasing application of large language models (LLMs) in the medical domain, evaluating these models' performance using benchmark datasets has become crucial. This paper presents a comprehensive survey of various benchmark datasets employed in medical LLM tasks. These datasets span multiple modalities including text, image, and multimodal benchmarks, focusing on different aspects of medical knowledge such as electronic health records (EHRs), doctor-patient dialogues, medical question-answering, and medical image captioning. The survey categorizes the datasets by modality, discussing their significance, data structure, and impact on the development of LLMs for clinical tasks such as diagnosis, report generation, and predictive decision support. Key benchmarks include MIMIC-III, MIMIC-IV, BioASQ, PubMedQA, and CheXpert, which have facilitated advancements in tasks like medical report generation, clinical summarization, and synthetic data generation. The paper summarizes the challenges and opportunities in leveraging these benchmarks for advancing multimodal medical intelligence, emphasizing the need for datasets with a greater degree of language diversity, structured omics data, and innovative approaches to synthesis. This work also provides a foundation for future research in the application of LLMs in medicine, contributing to the evolving field of medical artificial intelligence.

* 25 pages, 5 tables

Via

Access Paper or Ask Questions