Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yan Xu

Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction

Apr 22, 2025

Yuxin Jiang, Yufei Wang, Chuhan Wu, Xinyi Dai, Yan Xu, Weinan Gan, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang(+1 more)

Abstract:The improvement of LLMs' instruction-following capabilities depends critically on the availability of high-quality instruction-response pairs. While existing automatic data synthetic methods alleviate the burden of manual curation, they often rely heavily on either the quality of seed data or strong assumptions about the structure and content of web documents. To tackle these challenges, we propose Web Reconstruction (WebR), a fully automated framework for synthesizing high-quality instruction-tuning (IT) data directly from raw web documents with minimal assumptions. Leveraging the inherent diversity of raw web content, we conceptualize web reconstruction as an instruction-tuning data synthesis task via a novel dual-perspective paradigm--Web as Instruction and Web as Response--where each web document is designated as either an instruction or a response to trigger the reconstruction process. Comprehensive experiments show that datasets generated by WebR outperform state-of-the-art baselines by up to 16.65% across four instruction-following benchmarks. Notably, WebR demonstrates superior compatibility, data efficiency, and scalability, enabling enhanced domain adaptation with minimal effort. The data and code are publicly available at https://github.com/YJiangcm/WebR.

* 15 pages, 11 figures, 9 tables

Via

Access Paper or Ask Questions

CMEdataset Advancing China Map Detection and Standardization with Digital Image Resources

Apr 10, 2025

Yan Xu, Zhenqiang Zhang, Zhiwei Zhou, Liting Geng, Yue Li, Jintao Li

Figure 1 for CMEdataset Advancing China Map Detection and Standardization with Digital Image Resources

Figure 2 for CMEdataset Advancing China Map Detection and Standardization with Digital Image Resources

Figure 3 for CMEdataset Advancing China Map Detection and Standardization with Digital Image Resources

Figure 4 for CMEdataset Advancing China Map Detection and Standardization with Digital Image Resources

Abstract:Digital images of Chinas maps play a crucial role in map detection, particularly in ensuring national sovereignty, territorial integrity, and map compliance. However, there is currently no publicly available dataset specifically dedicated to problematic maps the CME dataset. Existing datasets primarily focus on general map data and are insufficient for effectively identifying complex issues such as national boundary misrepresentations, missing elements, and blurred boundaries. Therefore, this study creates a Problematic Map dataset that covers five key problem areas, aiming to provide diverse samples for problematic map detection technologies, support high-precision map compliance detection, and enhance map data quality and timeliness. This dataset not only provides essential resources for map compliance, national security monitoring, and map updates, but also fosters innovation and application of related technologies.

Via

Access Paper or Ask Questions

Sparsity-Induced Global Matrix Autoregressive Model with Auxiliary Network Data

Mar 11, 2025

Sanyou Wu, Dan Yang, Yan Xu, Long Feng

Abstract:Jointly modeling and forecasting economic and financial variables across a large set of countries has long been a significant challenge. Two primary approaches have been utilized to address this issue: the vector autoregressive model with exogenous variables (VARX) and the matrix autoregression (MAR). The VARX model captures domestic dependencies, but treats variables exogenous to represent global factors driven by international trade. In contrast, the MAR model simultaneously considers variables from multiple countries but ignores the trade network. In this paper, we propose an extension of the MAR model that achieves these two aims at once, i.e., studying both international dependencies and the impact of the trade network on the global economy. Additionally, we introduce a sparse component to the model to differentiate between systematic and idiosyncratic cross-predictability. To estimate the model parameters, we propose both a likelihood estimation method and a bias-corrected alternating minimization version. We provide theoretical and empirical analyses of the model's properties, alongside presenting intriguing economic insights derived from our findings.

Via

Access Paper or Ask Questions

Prediction of Halo Coronal Mass Ejections Using SDO/HMI Vector Magnetic Data Products and a Transformer Model

Mar 05, 2025

Hongyang Zhang, Ju Jing, Jason T. L. Wang, Haimin Wang, Yasser Abduallah, Yan Xu, Khalid A. Alobaid, Hameedullah Farooki, Vasyl Yurchyshyn

Abstract:We present a transformer model, named DeepHalo, to predict the occurrence of halo coronal mass ejections (CMEs). Our model takes as input an active region (AR) and a profile, where the profile contains a time series of data samples in the AR that are collected 24 hours before the beginning of a day, and predicts whether the AR would produce a halo CME during that day. Each data sample contains physical parameters, or features, derived from photospheric vector magnetic field data taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO). We survey and match CME events in the Space Weather Database Of Notification, Knowledge, Information (DONKI) and Large Angle and Spectrometric Coronagraph (LASCO) CME Catalog, and compile a list of CMEs including halo CMEs and non-halo CMEs associated with ARs in the period between November 2010 and August 2023. We use the information gathered above to build the labels (positive versus negative) of the data samples and profiles at hand, where the labels are needed for machine learning. Experimental results show that DeepHalo with a true skill statistics (TSS) score of 0.907 outperforms a closely related long short-term memory network with a TSS score of 0.821. To our knowledge, this is the first time that the transformer model has been used for halo CME prediction.

* 13 pages, 8 figures

Via

Access Paper or Ask Questions

Improving the Temporal Resolution of SOHO/MDI Magnetograms of Solar Active Regions Using a Deep Generative Model

Mar 05, 2025

Jialiang Li, Vasyl Yurchyshyn, Jason T. L. Wang, Haimin Wang, Yasser Abduallah, Khalid A. Alobaid, Chunhui Xu, Ruizhu Chen, Yan Xu

Figure 1 for Improving the Temporal Resolution of SOHO/MDI Magnetograms of Solar Active Regions Using a Deep Generative Model

Figure 2 for Improving the Temporal Resolution of SOHO/MDI Magnetograms of Solar Active Regions Using a Deep Generative Model

Figure 3 for Improving the Temporal Resolution of SOHO/MDI Magnetograms of Solar Active Regions Using a Deep Generative Model

Figure 4 for Improving the Temporal Resolution of SOHO/MDI Magnetograms of Solar Active Regions Using a Deep Generative Model

Abstract:We present a novel deep generative model, named GenMDI, to improve the temporal resolution of line-of-sight (LOS) magnetograms of solar active regions (ARs) collected by the Michelson Doppler Imager (MDI) on board the Solar and Heliospheric Observatory (SOHO). Unlike previous studies that focus primarily on spatial super-resolution of MDI magnetograms, our approach can perform temporal super-resolution, which generates and inserts synthetic data between observed MDI magnetograms, thus providing finer temporal structure and enhanced details in the LOS data. The GenMDI model employs a conditional diffusion process, which synthesizes images by considering both preceding and subsequent magnetograms, ensuring that the generated images are not only of high-quality, but also temporally coherent with the surrounding data. Experimental results show that the GenMDI model performs better than the traditional linear interpolation method, especially in ARs with dynamic evolution in magnetic fields.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Feb 17, 2025

Xin Xu, Yan Xu, Tianhao Chen, Yuchen Yan, Chengwu Liu, Zaoyu Chen, Yufei Wang, Yichun Yin, Yasheng Wang, Lifeng Shang(+1 more)

Figure 1 for Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Figure 2 for Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Figure 3 for Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Figure 4 for Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Abstract:Existing approaches to mathematical reasoning with large language models (LLMs) rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation. While efforts have been made to combine these methods, they primarily rely on post-selection or predefined strategies, leaving an open question: whether LLMs can autonomously adapt their reasoning strategy based on their inherent capabilities. In this work, we propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously, aligning it with their intrinsic aptitude. TATA incorporates base-LLM-aware data selection during supervised fine-tuning (SFT) to tailor training data to the model's unique abilities. This approach equips LLMs to autonomously determine and apply the appropriate reasoning strategy at test time. We evaluate TATA through extensive experiments on six mathematical reasoning benchmarks, using both general-purpose and math-specialized LLMs. Empirical results demonstrate that TATA effectively combines the complementary strengths of CoT and TIR, achieving superior or comparable performance with improved inference efficiency compared to TIR alone. Further analysis underscores the critical role of aptitude-aware data selection in enabling LLMs to make effective and adaptive reasoning decisions and align reasoning strategies with model capabilities.

* 8 pages

Via

Access Paper or Ask Questions

Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification

Feb 11, 2025

Peipei Wei, Dimitris Dimitriadis, Yan Xu, Mingwei Shen

Figure 1 for Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification

Figure 2 for Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification

Figure 3 for Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification

Figure 4 for Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification

Abstract:We present PRINCIPLE-BASED PROMPTING, a simple but effective multi-agent prompting strategy for text classification. It first asks multiple LLM agents to independently generate candidate principles based on analysis of demonstration samples with or without labels, consolidates them into final principles via a finalizer agent, and then sends them to a classifier agent to perform downstream classification tasks. Extensive experiments on binary and multi-class classification datasets with different sizes of LLMs show that our approach not only achieves substantial performance gains (1.55% - 19.37%) over zero-shot prompting on macro-F1 score but also outperforms other strong baselines (CoT and stepback prompting). Principles generated by our approach help LLMs perform better on classification tasks than human crafted principles on two private datasets. Our multi-agent PRINCIPLE-BASED PROMPTING approach also shows on-par or better performance compared to demonstration-based few-shot prompting approaches, yet with substantially lower inference costs. Ablation studies show that label information and the multi-agent cooperative LLM framework play an important role in generating high-quality principles to facilitate downstream classification tasks.

* To be published in AAAI 2025 Workshop on Advancing LLM-Based Multi-Agent Collaboration

Via

Access Paper or Ask Questions

CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs

Feb 11, 2025

Lejla Skelic, Yan Xu, Matthew Cox, Wenjie Lu, Tao Yu, Ruonan Han

Abstract:The role of Large Language Models (LLMs) has not been extensively explored in analog circuit design, which could benefit from a reasoning-based approach that transcends traditional optimization techniques. In particular, despite their growing relevance, there are no benchmarks to assess LLMs' reasoning capability about circuits. Therefore, we created the CIRCUIT dataset consisting of 510 question-answer pairs spanning various levels of analog-circuit-related subjects. The best-performing model on our dataset, GPT-4o, achieves 48.04% accuracy when evaluated on the final numerical answer. To evaluate the robustness of LLMs on our dataset, we introduced a unique feature that enables unit-test-like evaluation by grouping questions into unit tests. In this case, GPT-4o can only pass 27.45% of the unit tests, highlighting that the most advanced LLMs still struggle with understanding circuits, which requires multi-level reasoning, particularly when involving circuit topologies. This circuit-specific benchmark highlights LLMs' limitations, offering valuable insights for advancing their application in analog integrated circuit design.

Via

Access Paper or Ask Questions

AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Oct 22, 2024

Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yan Xu

Figure 1 for AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Figure 2 for AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Figure 3 for AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Figure 4 for AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Abstract:Large-scale visual-language pre-trained models (VLPMs) have demonstrated exceptional performance in downstream object detection through text prompts for natural scenes. However, their application to zero-shot nuclei detection on histopathology images remains relatively unexplored, mainly due to the significant gap between the characteristics of medical images and the web-originated text-image pairs used for pre-training. This paper aims to investigate the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP), for zero-shot nuclei detection. Specifically, we propose an innovative auto-prompting pipeline, named AttriPrompter, comprising attribute generation, attribute augmentation, and relevance sorting, to avoid subjective manual prompt design. AttriPrompter utilizes VLPMs' text-to-image alignment to create semantically rich text prompts, which are then fed into GLIP for initial zero-shot nuclei detection. Additionally, we propose a self-trained knowledge distillation framework, where GLIP serves as the teacher with its initial predictions used as pseudo labels, to address the challenges posed by high nuclei density, including missed detections, false positives, and overlapping instances. Our method exhibits remarkable performance in label-free nuclei detection, outperforming all existing unsupervised methods and demonstrating excellent generality. Notably, this work highlights the astonishing potential of VLPMs pre-trained on natural image-text pairs for downstream tasks in the medical field as well. Code will be released at https://github.com/wuyongjianCODE/AttriPrompter.

* This article has been accepted for publication in a future issue of IEEE Transactions on Medical Imaging (TMI), but has not been fully edited. Content may change prior to final publication. Citation information: DOI: https://doi.org/10.1109/TMI.2024.3473745 . Code: https://github.com/wuyongjianCODE/AttriPrompter

Via

Access Paper or Ask Questions

Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Sep 24, 2024

Jiankai Sun, Aidan Curtis, Yang You, Yan Xu, Michael Koehle, Leonidas Guibas, Sachin Chitta, Mac Schwager, Hui Li

Figure 1 for Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Figure 2 for Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Figure 3 for Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Figure 4 for Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Abstract:Generalizable long-horizon robotic assembly requires reasoning at multiple levels of abstraction. End-to-end imitation learning (IL) has been proven a promising approach, but it requires a large amount of demonstration data for training and often fails to meet the high-precision requirement of assembly tasks. Reinforcement Learning (RL) approaches have succeeded in high-precision assembly tasks, but suffer from sample inefficiency and hence, are less competent at long-horizon tasks. To address these challenges, we propose a hierarchical modular approach, named ARCH (Adaptive Robotic Composition Hierarchy), which enables long-horizon high-precision assembly in contact-rich settings. ARCH employs a hierarchical planning framework, including a low-level primitive library of continuously parameterized skills and a high-level policy. The low-level primitive library includes essential skills for assembly tasks, such as grasping and inserting. These primitives consist of both RL and model-based controllers. The high-level policy, learned via imitation learning from a handful of demonstrations, selects the appropriate primitive skills and instantiates them with continuous input parameters. We extensively evaluate our approach on a real robot manipulation platform. We show that while trained on a single task, ARCH generalizes well to unseen tasks and outperforms baseline methods in terms of success rate and data efficiency. Videos can be found at https://long-horizon-assembly.github.io.

Via

Access Paper or Ask Questions