Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fan Yang

refer to the report for detailed contributions

PAFFA: Premeditated Actions For Fast Agents

Dec 10, 2024

Shambhavi Krishna, Zheng Chen, Vaibhav Kumar, Xiaojiang Huang, Yingjie Li, Fan Yang, Xiang Li

Figure 1 for PAFFA: Premeditated Actions For Fast Agents

Figure 2 for PAFFA: Premeditated Actions For Fast Agents

Figure 3 for PAFFA: Premeditated Actions For Fast Agents

Figure 4 for PAFFA: Premeditated Actions For Fast Agents

Abstract:Modern AI assistants have made significant progress in natural language understanding and API/tool integration, with emerging efforts to incorporate diverse interfaces (such as Web interfaces) for enhanced scalability and functionality. However, current approaches that heavily rely on repeated LLM-driven HTML parsing are computationally expensive and error-prone, particularly when handling dynamic web interfaces and multi-step tasks. To overcome these challenges, we introduce PAFFA (Premeditated Actions For Fast Agents), a framework designed to enhance web interaction capabilities through an Action API Library of reusable, verified browser interaction functions. By pre-computing interaction patterns and employing two core methodologies - "Dist-Map" for task-agnostic element distillation and "Unravel" for incremental page-wise exploration - PAFFA reduces inference calls by 87% while maintaining robust performance even as website structures evolve. This framework accelerates multi-page task execution and offers a scalable solution to advance autonomous web agent research.

* 9 pages

Via

Access Paper or Ask Questions

Neuro-Symbolic Data Generation for Math Reasoning

Dec 06, 2024

Zenan Li, Zhi Zhou, Yuan Yao, Yu-Feng Li, Chun Cao, Fan Yang, Xian Zhang, Xiaoxing Ma

Figure 1 for Neuro-Symbolic Data Generation for Math Reasoning

Figure 2 for Neuro-Symbolic Data Generation for Math Reasoning

Figure 3 for Neuro-Symbolic Data Generation for Math Reasoning

Figure 4 for Neuro-Symbolic Data Generation for Math Reasoning

Abstract:A critical question about Large Language Models (LLMs) is whether their apparent deficiency in mathematical reasoning is inherent, or merely a result of insufficient exposure to high-quality mathematical data. To explore this, we developed an automated method for generating high-quality, supervised mathematical datasets. The method carefully mutates existing math problems, ensuring both diversity and validity of the newly generated problems. This is achieved by a neuro-symbolic data generation framework combining the intuitive informalization strengths of LLMs, and the precise symbolic reasoning of math solvers along with projected Markov chain Monte Carlo sampling in the highly-irregular symbolic space. Empirical experiments demonstrate the high quality of data generated by the proposed method, and that the LLMs, specifically LLaMA-2 and Mistral, when realigned with the generated data, surpass their state-of-the-art counterparts.

* Published as a conference paper at NeurIPS 2024

Via

Access Paper or Ask Questions

SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition

Nov 29, 2024

Fangze Fu, Wei Ai, Fan Yang, Yuntao Shou, Tao Meng, Keqin Li

Abstract:Multimodal Emotion Recognition in Conversations (MERC) aims to classify utterance emotions using textual, auditory, and visual modal features. Most existing MERC methods assume each utterance has complete modalities, overlooking the common issue of incomplete modalities in real-world scenarios. Recently, graph neural networks (GNNs) have achieved notable results in Incomplete Multimodal Emotion Recognition in Conversations (IMERC). However, traditional GNNs focus on binary relationships between nodes, limiting their ability to capture more complex, higher-order information. Moreover, repeated message passing can cause over-smoothing, reducing their capacity to preserve essential high-frequency details. To address these issues, we propose a Spectral Domain Reconstruction Graph Neural Network (SDR-GNN) for incomplete multimodal learning in conversational emotion recognition. SDR-GNN constructs an utterance semantic interaction graph using a sliding window based on both speaker and context relationships to model emotional dependencies. To capture higher-order and high-frequency information, SDR-GNN utilizes weighted relationship aggregation, ensuring consistent semantic feature extraction across utterances. Additionally, it performs multi-frequency aggregation in the spectral domain, enabling efficient recovery of incomplete modalities by extracting both high- and low-frequency information. Finally, multi-head attention is applied to fuse and optimize features for emotion recognition. Extensive experiments on various real-world datasets demonstrate that our approach is effective in incomplete multimodal learning and outperforms current state-of-the-art methods.

* 17 pages, 8 figures

Via

Access Paper or Ask Questions

RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Nov 29, 2024

Fan Yang, You Lu, Bihuan Chen, Peng Qin, Xin Peng

Figure 1 for RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Figure 2 for RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Figure 3 for RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Figure 4 for RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Abstract:With the rapid development of autonomous vehicles, there is an increasing demand for scenario-based testing to simulate diverse driving scenarios. However, as the base of any driving scenarios, road scenarios (e.g., road topology and geometry) have received little attention by the literature. Despite several advances, they either generate basic road components without a complete road network, or generate a complete road network but with simple road components. The resulting road scenarios lack diversity in both topology and geometry. To address this problem, we propose RoadGen to systematically generate diverse road scenarios. The key idea is to connect eight types of parameterized road components to form road scenarios with high diversity in topology and geometry. Our evaluation has demonstrated the effectiveness and usefulness of RoadGen in generating diverse road scenarios for simulation.

* 7 pages

Via

Access Paper or Ask Questions

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Nov 26, 2024

Fan Yang, Ru Zhen, Jianing Wang, Yanhao Zhang, Haoxiang Chen, Haonan Lu, Sicheng Zhao, Guiguang Ding

Figure 1 for HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Figure 2 for HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Figure 3 for HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Figure 4 for HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Abstract:AIGC images are prevalent across various fields, yet they frequently suffer from quality issues like artifacts and unnatural textures. Specialized models aim to predict defect region heatmaps but face two primary challenges: (1) lack of explainability, failing to provide reasons and analyses for subtle defects, and (2) inability to leverage common sense and logical reasoning, leading to poor generalization. Multimodal large language models (MLLMs) promise better comprehension and reasoning but face their own challenges: (1) difficulty in fine-grained defect localization due to the limitations in capturing tiny details; and (2) constraints in providing pixel-wise outputs necessary for precise heatmap generation. To address these challenges, we propose HEIE: a novel MLLM-Based Hierarchical Explainable image Implausibility Evaluator. We introduce the CoT-Driven Explainable Trinity Evaluator, which integrates heatmaps, scores, and explanation outputs, using CoT to decompose complex tasks into subtasks of increasing difficulty and enhance interpretability. Our Adaptive Hierarchical Implausibility Mapper synergizes low-level image features with high-level mapper tokens from LLMs, enabling precise local-to-global hierarchical heatmap predictions through an uncertainty-based adaptive token approach. Moreover, we propose a new dataset: Expl-AIGI-Eval, designed to facilitate interpretable implausibility evaluation of AIGC images. Our method demonstrates state-of-the-art performance through extensive experiments.

Via

Access Paper or Ask Questions

Uncertainty-Aware Regression for Socio-Economic Estimation via Multi-View Remote Sensing

Nov 21, 2024

Fan Yang, Sahoko Ishida, Mengyan Zhang, Daniel Jenson, Swapnil Mishra, Jhonathan Navott, Seth Flaxman

Figure 1 for Uncertainty-Aware Regression for Socio-Economic Estimation via Multi-View Remote Sensing

Figure 2 for Uncertainty-Aware Regression for Socio-Economic Estimation via Multi-View Remote Sensing

Figure 3 for Uncertainty-Aware Regression for Socio-Economic Estimation via Multi-View Remote Sensing

Figure 4 for Uncertainty-Aware Regression for Socio-Economic Estimation via Multi-View Remote Sensing

Abstract:Remote sensing imagery offers rich spectral data across extensive areas for Earth observation. Many attempts have been made to leverage these data with transfer learning to develop scalable alternatives for estimating socio-economic conditions, reducing reliance on expensive survey-collected data. However, much of this research has primarily focused on daytime satellite imagery due to the limitation that most pre-trained models are trained on 3-band RGB images. Consequently, modeling techniques for spectral bands beyond the visible spectrum have not been thoroughly investigated. Additionally, quantifying uncertainty in remote sensing regression has been less explored, yet it is essential for more informed targeting and iterative collection of ground truth survey data. In this paper, we introduce a novel framework that leverages generic foundational vision models to process remote sensing imagery using combinations of three spectral bands to exploit multi-spectral data. We also employ methods such as heteroscedastic regression and Bayesian modeling to generate uncertainty estimates for the predictions. Experimental results demonstrate that our method outperforms existing models that use RGB or multi-spectral models with unstructured band usage. Moreover, our framework helps identify uncertain predictions, guiding future ground truth data acquisition.

* 11 pages, 4 figures

Via

Access Paper or Ask Questions

Kwai-STaR: Transform LLMs into State-Transition Reasoners

Nov 07, 2024

Xingyu Lu, Yuhang Hu, Changyi Liu, Tianke Zhang, Zhenyu Yang, Zhixiang Ding, Shengsheng Qian, Meng Du, Ruiwen Kang, Kaiyu Tang(+5 more)

Figure 1 for Kwai-STaR: Transform LLMs into State-Transition Reasoners

Figure 2 for Kwai-STaR: Transform LLMs into State-Transition Reasoners

Figure 3 for Kwai-STaR: Transform LLMs into State-Transition Reasoners

Figure 4 for Kwai-STaR: Transform LLMs into State-Transition Reasoners

Abstract:Mathematical reasoning presents a significant challenge to the cognitive capabilities of LLMs. Various methods have been proposed to enhance the mathematical ability of LLMs. However, few recognize the value of state transition for LLM reasoning. In this work, we define mathematical problem-solving as a process of transiting from an initial unsolved state to the final resolved state, and propose Kwai-STaR framework, which transforms LLMs into State-Transition Reasoners to improve their intuitive reasoning capabilities. Our approach comprises three main steps: (1) Define the state space tailored to the mathematical reasoning. (2) Generate state-transition data based on the state space. (3) Convert original LLMs into State-Transition Reasoners via a curricular training strategy. Our experiments validate the effectiveness of Kwai-STaR in enhancing mathematical reasoning: After training on the small-scale Kwai-STaR dataset, general LLMs, including Mistral-7B and LLaMA-3, achieve considerable performance gain on the GSM8K and GSM-Hard dataset. Additionally, the state transition-based design endows Kwai-STaR with remarkable training and inference efficiency. Further experiments are underway to establish the generality of Kwai-STaR.

* 6 pages, 2 figures

Via

Access Paper or Ask Questions

Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

Nov 05, 2024

Xianghui Yang, Huiwen Shi, Bowen Zhang, Fan Yang, Jiacheng Wang, Hongxu Zhao, Xinhai Liu, Xinzhou Wang, Qingxiang Lin, Jiaao Yu(+8 more)

Figure 1 for Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

Figure 2 for Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

Figure 3 for Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

Figure 4 for Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

Abstract:While 3D generative models have greatly improved artists' workflows, the existing diffusion models for 3D generation suffer from slow generation and poor generalization. To address this issue, we propose a two-stage approach named Hunyuan3D-1.0 including a lite version and a standard version, that both support text- and image-conditioned generation. In the first stage, we employ a multi-view diffusion model that efficiently generates multi-view RGB in approximately 4 seconds. These multi-view images capture rich details of the 3D asset from different viewpoints, relaxing the tasks from single-view to multi-view reconstruction. In the second stage, we introduce a feed-forward reconstruction model that rapidly and faithfully reconstructs the 3D asset given the generated multi-view images in approximately 7 seconds. The reconstruction network learns to handle noises and in-consistency introduced by the multi-view diffusion and leverages the available information from the condition image to efficiently recover the 3D structure. Our framework involves the text-to-image model, i.e., Hunyuan-DiT, making it a unified framework to support both text- and image-conditioned 3D generation. Our standard version has 3x more parameters than our lite and other existing model. Our Hunyuan3D-1.0 achieves an impressive balance between speed and quality, significantly reducing generation time while maintaining the quality and diversity of the produced assets.

* Technical Report; 3D Generation

Via

Access Paper or Ask Questions

Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency

Oct 28, 2024

Zenan Li, Yifan Wu, Zhaoyu Li, Xinming Wei, Xian Zhang, Fan Yang, Xiaoxing Ma

Figure 1 for Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency

Figure 2 for Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency

Figure 3 for Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency

Figure 4 for Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency

Abstract:Autoformalization, the task of automatically translating natural language descriptions into a formal language, poses a significant challenge across various domains, especially in mathematics. Recent advancements in large language models (LLMs) have unveiled their promising capabilities to formalize even competition-level math problems. However, we observe a considerable discrepancy between pass@1 and pass@k accuracies in LLM-generated formalizations. To address this gap, we introduce a novel framework that scores and selects the best result from k autoformalization candidates based on two complementary self-consistency methods: symbolic equivalence and semantic consistency. Elaborately, symbolic equivalence identifies the logical homogeneity among autoformalization candidates using automated theorem provers, and semantic consistency evaluates the preservation of the original meaning by informalizing the candidates and computing the similarity between the embeddings of the original and informalized texts. Our extensive experiments on the MATH and miniF2F datasets demonstrate that our approach significantly enhances autoformalization accuracy, achieving up to 0.22-1.35x relative improvements across various LLMs and baseline methods.

* Published as a conference paper at NeurIPS 2024. Code is available at [this https URL](https://github.com/Miracle-Messi/Isa-AutoFormal)

Via

Access Paper or Ask Questions

Theoretical Insights into Line Graph Transformation on Graph Learning

Oct 21, 2024

Fan Yang, Xingyue Huang

Abstract:Line graph transformation has been widely studied in graph theory, where each node in a line graph corresponds to an edge in the original graph. This has inspired a series of graph neural networks (GNNs) applied to transformed line graphs, which have proven effective in various graph representation learning tasks. However, there is limited theoretical study on how line graph transformation affects the expressivity of GNN models. In this study, we focus on two types of graphs known to be challenging to the Weisfeiler-Leman (WL) tests: Cai-F\"urer-Immerman (CFI) graphs and strongly regular graphs, and show that applying line graph transformation helps exclude these challenging graph properties, thus potentially assist WL tests in distinguishing these graphs. We empirically validate our findings by conducting a series of experiments that compare the accuracy and efficiency of graph isomorphism tests and GNNs on both line-transformed and original graphs across these graph structure types.

* 21 pages, code available at https://github.com/lukeyf/graphs-and-lines

Via

Access Paper or Ask Questions