Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fei Wu

Causal Inference with Complex Treatments: A Survey

Jul 19, 2024

Yingrong Wang, Haoxuan Li, Minqin Zhu, Anpeng Wu, Ruoxuan Xiong, Fei Wu, Kun Kuang

Figure 1 for Causal Inference with Complex Treatments: A Survey

Figure 2 for Causal Inference with Complex Treatments: A Survey

Figure 3 for Causal Inference with Complex Treatments: A Survey

Figure 4 for Causal Inference with Complex Treatments: A Survey

Abstract:Causal inference plays an important role in explanatory analysis and decision making across various fields like statistics, marketing, health care, and education. Its main task is to estimate treatment effects and make intervention policies. Traditionally, most of the previous works typically focus on the binary treatment setting that there is only one treatment for a unit to adopt or not. However, in practice, the treatment can be much more complex, encompassing multi-valued, continuous, or bundle options. In this paper, we refer to these as complex treatments and systematically and comprehensively review the causal inference methods for addressing them. First, we formally revisit the problem definition, the basic assumptions, and their possible variations under specific conditions. Second, we sequentially review the related methods for multi-valued, continuous, and bundled treatment settings. In each situation, we tentatively divide the methods into two categories: those conforming to the unconfoundedness assumption and those violating it. Subsequently, we discuss the available datasets and open-source codes. Finally, we provide a brief summary of these works and suggest potential directions for future research.

Via

Access Paper or Ask Questions

Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

Jun 24, 2024

Ziyu Zhao, Leilei Gan, Guoyin Wang, Yuwei Hu, Tao Shen, Hongxia Yang, Kun Kuang, Fei Wu

Figure 1 for Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

Figure 2 for Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

Figure 3 for Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

Figure 4 for Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

Abstract:Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs). Its modular and plug-and-play nature allows the integration of various domain-specific LoRAs, enhancing LLM capabilities. Open-source platforms like Huggingface and Modelscope have introduced a new computational paradigm, Uploadable Machine Learning (UML). In UML, contributors use decentralized data to train specialized adapters, which are then uploaded to a central platform to improve LLMs. This platform uses these domain-specific adapters to handle mixed-task requests requiring personalized service. Previous research on LoRA composition either focuses on specific tasks or fixes the LoRA selection during training. However, in UML, the pool of LoRAs is dynamically updated with new uploads, requiring a generalizable selection mechanism for unseen LoRAs. Additionally, the mixed-task nature of downstream requests necessitates personalized services. To address these challenges, we propose Retrieval-Augmented Mixture of LoRA Experts (RAMoLE), a framework that adaptively retrieves and composes multiple LoRAs based on input prompts. RAMoLE has three main components: LoraRetriever for identifying and retrieving relevant LoRAs, an on-the-fly MoLE mechanism for coordinating the retrieved LoRAs, and efficient batch inference for handling heterogeneous requests. Experimental results show that RAMoLE consistently outperforms baselines, highlighting its effectiveness and scalability.

* arXiv admin note: substantial text overlap with arXiv:2402.09997

Via

Access Paper or Ask Questions

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Jun 17, 2024

Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye(+12 more)

Figure 1 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Figure 2 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Figure 3 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Figure 4 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Abstract:Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extraction and understanding tasks, and their capacity to process within-document data formats such as charts and equations remains under-explored. To address these issues, we present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community, using our custom auto-labeling pipeline. DocGenome features four key characteristics: 1) Completeness: It is the first dataset to structure data from all modalities including 13 layout attributes along with their LaTeX source codes. 2) Logicality: It provides 6 logical relationships between different entities within each scientific document. 3) Diversity: It covers various document-oriented tasks, including document classification, visual grounding, document layout detection, document transformation, open-ended single-page QA and multi-page QA. 4) Correctness: It undergoes rigorous quality control checks conducted by a specialized team. We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.

* Homepage of DocGenome: https://unimodal4reasoning.github.io/DocGenome_page 22 pages, 11 figures

Via

Access Paper or Ask Questions

Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

May 31, 2024

Shengyu Zhang, Ziqi Jiang, Jiangchao Yao, Fuli Feng, Kun Kuang, Zhou Zhao, Shuo Li, Hongxia Yang, Tat-Seng Chua, Fei Wu

Figure 1 for Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Figure 2 for Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Figure 3 for Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Figure 4 for Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Abstract:Recommendation performance usually exhibits a long-tail distribution over users -- a small portion of head users enjoy much more accurate recommendation services than the others. We reveal two sources of this performance heterogeneity problem: the uneven distribution of historical interactions (a natural source); and the biased training of recommender models (a model source). As addressing this problem cannot sacrifice the overall performance, a wise choice is to eliminate the model bias while maintaining the natural heterogeneity. The key to debiased training lies in eliminating the effect of confounders that influence both the user's historical behaviors and the next behavior. The emerging causal recommendation methods achieve this by modeling the causal effect between user behaviors, however potentially neglect unobserved confounders (\eg, friend suggestions) that are hard to measure in practice. To address unobserved confounders, we resort to the front-door adjustment (FDA) in causal theory and propose a causal multi-teacher distillation framework (CausalD). FDA requires proper mediators in order to estimate the causal effects of historical behaviors on the next behavior. To achieve this, we equip CausalD with multiple heterogeneous recommendation models to model the mediator distribution. Then, the causal effect estimated by FDA is the expectation of recommendation prediction over the mediator distribution and the prior distribution of historical behaviors, which is technically achieved by multi-teacher ensemble. To pursue efficient inference, CausalD further distills multiple teachers into one student model to directly infer the causal effect for making recommendations.

* TKDE 2023

Via

Access Paper or Ask Questions

More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs

May 28, 2024

Chengyuan Liu, Shihang Wang, Yangyang Kang, Lizhi Qing, Fubang Zhao, Changlong Sun, Kun Kuang, Fei Wu

Figure 1 for More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs

Figure 2 for More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs

Figure 3 for More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs

Figure 4 for More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs

Abstract:The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, the phenomenon is known as Catastrophic Forgetting (CF). However, this paper presents a further challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI), which necessitates the integration of both the general capabilities and domain knowledge within a single instance. The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks. Taking legal domain as an example, we carefully design three groups of training and testing tasks without lacking practicability, and construct the corresponding datasets. To better incorporate general capabilities across domain-specific scenarios, we introduce ALoRA, which utilizes a multi-head attention module upon LoRA, facilitating direct information transfer from preceding tokens to the current one. This enhancement permits the representation to dynamically switch between domain-specific knowledge and general competencies according to the attention. Extensive experiments are conducted on the proposed tasks. The results exhibit the significance of our setting, and the effectiveness of our method.

Via

Access Paper or Ask Questions

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

May 23, 2024

Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Xuelong Li, Zhen Wang

Abstract:Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verification or self-reflection suffer from excessive and inefficient querying of LLMs. In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. Specifically, we perform critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. It endows the LLM with the foresight to discern whether the action contributes to accomplishing the final task. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. More results are given at \url{https://read-llm.github.io/}.

* The first two authors contributed equally

Via

Access Paper or Ask Questions

RemoCap: Disentangled Representation Learning for Motion Capture

May 21, 2024

Hongsheng Wang, Lizao Zhang, Zhangnan Zhong, Shuolin Xu, Xinrui Zhou, Shengyu Zhang, Huahao Xu, Fei Wu, Feng Lin

Figure 1 for RemoCap: Disentangled Representation Learning for Motion Capture

Figure 2 for RemoCap: Disentangled Representation Learning for Motion Capture

Figure 3 for RemoCap: Disentangled Representation Learning for Motion Capture

Figure 4 for RemoCap: Disentangled Representation Learning for Motion Capture

Abstract:Reconstructing 3D human bodies from realistic motion sequences remains a challenge due to pervasive and complex occlusions. Current methods struggle to capture the dynamics of occluded body parts, leading to model penetration and distorted motion. RemoCap leverages Spatial Disentanglement (SD) and Motion Disentanglement (MD) to overcome these limitations. SD addresses occlusion interference between the target human body and surrounding objects. It achieves this by disentangling target features along the dimension axis. By aligning features based on their spatial positions in each dimension, SD isolates the target object's response within a global window, enabling accurate capture despite occlusions. The MD module employs a channel-wise temporal shuffling strategy to simulate diverse scene dynamics. This process effectively disentangles motion features, allowing RemoCap to reconstruct occluded parts with greater fidelity. Furthermore, this paper introduces a sequence velocity loss that promotes temporal coherence. This loss constrains inter-frame velocity errors, ensuring the predicted motion exhibits realistic consistency. Extensive comparisons with state-of-the-art (SOTA) methods on benchmark datasets demonstrate RemoCap's superior performance in 3D human body reconstruction. On the 3DPW dataset, RemoCap surpasses all competitors, achieving the best results in MPVPE (81.9), MPJPE (72.7), and PA-MPJPE (44.1) metrics. Codes are available at https://wanghongsheng01.github.io/RemoCap/.

Via

Access Paper or Ask Questions

Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

May 21, 2024

Hongsheng Wang, Weiyue Zhang, Sihao Liu, Xinrui Zhou, Shengyu Zhang, Fei Wu, Feng Lin

Figure 1 for Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

Figure 2 for Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

Figure 3 for Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

Figure 4 for Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

Abstract:Although 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach involves leveraging explicitly semantic priors of body parts to ensure the consistency of geometric topology, thereby enabling the capture of the complex geometrical and topological associations among body parts. Additionally, we disentangle high-frequency features from global human features to refine surface details in body parts. Extensive experiments demonstrate that our method exhibits superior performance in human body reconstruction, particularly in enhancing surface details and accurately reconstructing body part junctions. Codes are available at https://wanghongsheng01.github.io/HUGS/.

Via

Access Paper or Ask Questions

NieR: Normal-Based Lighting Scene Rendering

May 21, 2024

Hongsheng Wang, Yang Wang, Yalan Liu, Fayuan Hu, Shengyu Zhang, Fei Wu, Feng Lin

Figure 1 for NieR: Normal-Based Lighting Scene Rendering

Figure 2 for NieR: Normal-Based Lighting Scene Rendering

Figure 3 for NieR: Normal-Based Lighting Scene Rendering

Figure 4 for NieR: Normal-Based Lighting Scene Rendering

Abstract:In real-world road scenes, diverse material properties lead to complex light reflection phenomena, making accurate color reproduction crucial for enhancing the realism and safety of simulated driving environments. However, existing methods often struggle to capture the full spectrum of lighting effects, particularly in dynamic scenarios where viewpoint changes induce significant material color variations. To address this challenge, we introduce NieR (Normal-Based Lighting Scene Rendering), a novel framework that takes into account the nuances of light reflection on diverse material surfaces, leading to more precise rendering. To simulate the lighting synthesis process, we present the LD (Light Decomposition) module, which captures the lighting reflection characteristics on surfaces. Furthermore, to address dynamic lighting scenes, we propose the HNGD (Hierarchical Normal Gradient Densification) module to overcome the limitations of sparse Gaussian representation. Specifically, we dynamically adjust the Gaussian density based on normal gradients. Experimental evaluations demonstrate that our method outperforms state-of-the-art (SOTA) methods in terms of visual quality and exhibits significant advantages in performance indicators. Codes are available at https://wanghongsheng01.github.io/NieR/.

Via

Access Paper or Ask Questions

NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

May 21, 2024

Hongsheng Wang, Nanjie Yao, Xinrui Zhou, Shengyu Zhang, Huahao Xu, Fei Wu, Feng Lin

Figure 1 for NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

Figure 2 for NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

Figure 3 for NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

Figure 4 for NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

Abstract:In the animation industry, 3D modelers typically rely on front and back non-overlapped concept designs to guide the 3D modeling of anime characters. However, there is currently a lack of automated approaches for generating anime characters directly from these 2D designs. In light of this, we explore a novel task of reconstructing anime characters from non-overlapped views. This presents two main challenges: existing multi-view approaches cannot be directly applied due to the absence of overlapping regions, and there is a scarcity of full-body anime character data and standard benchmarks. To bridge the gap, we present Non-Overlapped Views for 3D \textbf{A}nime Character Reconstruction (NOVA-3D), a new framework that implements a method for view-aware feature fusion to learn 3D-consistent features effectively and synthesizes full-body anime characters from non-overlapped front and back views directly. To facilitate this line of research, we collected the NOVA-Human dataset, which comprises multi-view images and accurate camera parameters for 3D anime characters. Extensive experiments demonstrate that the proposed method outperforms baseline approaches, achieving superior reconstruction of anime characters with exceptional detail fidelity. In addition, to further verify the effectiveness of our method, we applied it to the animation head reconstruction task and improved the state-of-the-art baseline to 94.453 in SSIM, 7.726 in LPIPS, and 19.575 in PSNR on average. Codes and datasets are available at https://wanghongsheng01.github.io/NOVA-3D/.

Via

Access Paper or Ask Questions