Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feng Lin

Peter

ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Nov 07, 2024

Hongsheng Wang, Zehui Feng, Tong Xiao, Genfan Yang, Shengyu Zhang, Fei Wu, Feng Lin

Figure 1 for ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Figure 2 for ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Figure 3 for ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Figure 4 for ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Abstract:Current 3D human motion reconstruction methods from monocular videos rely on features within the current reconstruction window, leading to distortion and deformations in the human structure under local occlusions or blurriness in video frames. To estimate realistic 3D human mesh sequences based on incomplete features, we propose Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction (ProGraph). For missing parts recovery, we exploit the explicit topological-aware probability distribution across the entire motion sequence. To restore the complete human, Graph Topological Modeling (GTM) learns the underlying topological structure, focusing on the relationships inherent in the individual parts. Next, to generate blurred motion parts, Temporal-alignable Probability Distribution (TPDist) utilizes the GTM to predict features based on distribution. This interactive mechanism facilitates motion consistency, allowing the restoration of human parts. Furthermore, Hierarchical Human Loss (HHLoss) constrains the probability distribution errors of inter-frame features during topological structure variation. Our Method achieves superior results than other SOTA methods in addressing occlusions and blurriness on 3DPW.

Via

Access Paper or Ask Questions

Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion

Nov 06, 2024

Tiantian Liu, Hongwei Yao, Tong Wu, Zhan Qin, Feng Lin, Kui Ren, Chun Chen

Figure 1 for Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion

Figure 2 for Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion

Figure 3 for Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion

Figure 4 for Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion

Abstract:Embeddings have become a cornerstone in the functionality of large language models (LLMs) due to their ability to transform text data into rich, dense numerical representations that capture semantic and syntactic properties. These embedding vector databases serve as the long-term memory of LLMs, enabling efficient handling of a wide range of natural language processing tasks. However, the surge in popularity of embedding vector databases in LLMs has been accompanied by significant concerns about privacy leakage. Embedding vector databases are particularly vulnerable to embedding inversion attacks, where adversaries can exploit the embeddings to reverse-engineer and extract sensitive information from the original text data. Existing defense mechanisms have shown limitations, often struggling to balance security with the performance of downstream tasks. To address these challenges, we introduce Eguard, a novel defense mechanism designed to mitigate embedding inversion attacks. Eguard employs a transformer-based projection network and text mutual information optimization to safeguard embeddings while preserving the utility of LLMs. Our approach significantly reduces privacy risks, protecting over 95% of tokens from inversion while maintaining high performance across downstream tasks consistent with original embeddings.

Via

Access Paper or Ask Questions

ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features

Aug 03, 2024

Peng Cheng, Yuwei Wang, Peng Huang, Zhongjie Ba, Xiaodong Lin, Feng Lin, Li Lu, Kui Ren

Abstract:Extensive research has revealed that adversarial examples (AE) pose a significant threat to voice-controllable smart devices. Recent studies have proposed black-box adversarial attacks that require only the final transcription from an automatic speech recognition (ASR) system. However, these attacks typically involve many queries to the ASR, resulting in substantial costs. Moreover, AE-based adversarial audio samples are susceptible to ASR updates. In this paper, we identify the root cause of these limitations, namely the inability to construct AE attack samples directly around the decision boundary of deep learning (DL) models. Building on this observation, we propose ALIF, the first black-box adversarial linguistic feature-based attack pipeline. We leverage the reciprocal process of text-to-speech (TTS) and ASR models to generate perturbations in the linguistic embedding space where the decision boundary resides. Based on the ALIF pipeline, we present the ALIF-OTL and ALIF-OTA schemes for launching attacks in both the digital domain and the physical playback environment on four commercial ASRs and voice assistants. Extensive evaluations demonstrate that ALIF-OTL and -OTA significantly improve query efficiency by 97.7% and 73.3%, respectively, while achieving competitive performance compared to existing methods. Notably, ALIF-OTL can generate an attack sample with only one query. Furthermore, our test-of-time experiment validates the robustness of our approach against ASR updates.

* Published in the 2024 IEEE Symposium on Security and Privacy (SP)

Via

Access Paper or Ask Questions

Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

May 28, 2024

Rui Zhang, Shuailong Li, Junxiao Xue, Feng Lin, Qing Zhang, Xiao Ma, Xiaoran Yan

Figure 1 for Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Figure 2 for Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Figure 3 for Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Figure 4 for Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Abstract:Video recognition remains an open challenge, requiring the identification of diverse content categories within videos. Mainstream approaches often perform flat classification, overlooking the intrinsic hierarchical structure relating categories. To address this, we formalize the novel task of hierarchical video recognition, and propose a video-language learning framework tailored for hierarchical recognition. Specifically, our framework encodes dependencies between hierarchical category levels, and applies a top-down constraint to filter recognition predictions. We further construct a new fine-grained dataset based on medical assessments for rehabilitation of stroke patients, serving as a challenging benchmark for hierarchical recognition. Through extensive experiments, we demonstrate the efficacy of our approach for hierarchical recognition, significantly outperforming conventional methods, especially for fine-grained subcategories. The proposed framework paves the way for hierarchical modeling in video understanding tasks, moving beyond flat categorization.

Via

Access Paper or Ask Questions

Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

May 21, 2024

Hongsheng Wang, Weiyue Zhang, Sihao Liu, Xinrui Zhou, Shengyu Zhang, Fei Wu, Feng Lin

Figure 1 for Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

Figure 2 for Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

Figure 3 for Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

Figure 4 for Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

Abstract:Although 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach involves leveraging explicitly semantic priors of body parts to ensure the consistency of geometric topology, thereby enabling the capture of the complex geometrical and topological associations among body parts. Additionally, we disentangle high-frequency features from global human features to refine surface details in body parts. Extensive experiments demonstrate that our method exhibits superior performance in human body reconstruction, particularly in enhancing surface details and accurately reconstructing body part junctions. Codes are available at https://wanghongsheng01.github.io/HUGS/.

Via

Access Paper or Ask Questions

MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

May 21, 2024

Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Shengyu Zhang, Feng Lin, Fei Wu

Figure 1 for MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

Figure 2 for MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

Figure 3 for MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

Figure 4 for MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

Abstract:Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clothed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/.

Via

Access Paper or Ask Questions

NieR: Normal-Based Lighting Scene Rendering

May 21, 2024

Hongsheng Wang, Yang Wang, Yalan Liu, Fayuan Hu, Shengyu Zhang, Fei Wu, Feng Lin

Figure 1 for NieR: Normal-Based Lighting Scene Rendering

Figure 2 for NieR: Normal-Based Lighting Scene Rendering

Figure 3 for NieR: Normal-Based Lighting Scene Rendering

Figure 4 for NieR: Normal-Based Lighting Scene Rendering

Abstract:In real-world road scenes, diverse material properties lead to complex light reflection phenomena, making accurate color reproduction crucial for enhancing the realism and safety of simulated driving environments. However, existing methods often struggle to capture the full spectrum of lighting effects, particularly in dynamic scenarios where viewpoint changes induce significant material color variations. To address this challenge, we introduce NieR (Normal-Based Lighting Scene Rendering), a novel framework that takes into account the nuances of light reflection on diverse material surfaces, leading to more precise rendering. To simulate the lighting synthesis process, we present the LD (Light Decomposition) module, which captures the lighting reflection characteristics on surfaces. Furthermore, to address dynamic lighting scenes, we propose the HNGD (Hierarchical Normal Gradient Densification) module to overcome the limitations of sparse Gaussian representation. Specifically, we dynamically adjust the Gaussian density based on normal gradients. Experimental evaluations demonstrate that our method outperforms state-of-the-art (SOTA) methods in terms of visual quality and exhibits significant advantages in performance indicators. Codes are available at https://wanghongsheng01.github.io/NieR/.

Via

Access Paper or Ask Questions

NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

May 21, 2024

Hongsheng Wang, Nanjie Yao, Xinrui Zhou, Shengyu Zhang, Huahao Xu, Fei Wu, Feng Lin

Figure 1 for NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

Figure 2 for NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

Figure 3 for NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

Figure 4 for NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

Abstract:In the animation industry, 3D modelers typically rely on front and back non-overlapped concept designs to guide the 3D modeling of anime characters. However, there is currently a lack of automated approaches for generating anime characters directly from these 2D designs. In light of this, we explore a novel task of reconstructing anime characters from non-overlapped views. This presents two main challenges: existing multi-view approaches cannot be directly applied due to the absence of overlapping regions, and there is a scarcity of full-body anime character data and standard benchmarks. To bridge the gap, we present Non-Overlapped Views for 3D \textbf{A}nime Character Reconstruction (NOVA-3D), a new framework that implements a method for view-aware feature fusion to learn 3D-consistent features effectively and synthesizes full-body anime characters from non-overlapped front and back views directly. To facilitate this line of research, we collected the NOVA-Human dataset, which comprises multi-view images and accurate camera parameters for 3D anime characters. Extensive experiments demonstrate that the proposed method outperforms baseline approaches, achieving superior reconstruction of anime characters with exceptional detail fidelity. In addition, to further verify the effectiveness of our method, we applied it to the animation head reconstruction task and improved the state-of-the-art baseline to 94.453 in SSIM, 7.726 in LPIPS, and 19.575 in PSNR on average. Codes and datasets are available at https://wanghongsheng01.github.io/NOVA-3D/.

Via

Access Paper or Ask Questions

RemoCap: Disentangled Representation Learning for Motion Capture

May 21, 2024

Hongsheng Wang, Lizao Zhang, Zhangnan Zhong, Shuolin Xu, Xinrui Zhou, Shengyu Zhang, Huahao Xu, Fei Wu, Feng Lin

Figure 1 for RemoCap: Disentangled Representation Learning for Motion Capture

Figure 2 for RemoCap: Disentangled Representation Learning for Motion Capture

Figure 3 for RemoCap: Disentangled Representation Learning for Motion Capture

Figure 4 for RemoCap: Disentangled Representation Learning for Motion Capture

Abstract:Reconstructing 3D human bodies from realistic motion sequences remains a challenge due to pervasive and complex occlusions. Current methods struggle to capture the dynamics of occluded body parts, leading to model penetration and distorted motion. RemoCap leverages Spatial Disentanglement (SD) and Motion Disentanglement (MD) to overcome these limitations. SD addresses occlusion interference between the target human body and surrounding objects. It achieves this by disentangling target features along the dimension axis. By aligning features based on their spatial positions in each dimension, SD isolates the target object's response within a global window, enabling accurate capture despite occlusions. The MD module employs a channel-wise temporal shuffling strategy to simulate diverse scene dynamics. This process effectively disentangles motion features, allowing RemoCap to reconstruct occluded parts with greater fidelity. Furthermore, this paper introduces a sequence velocity loss that promotes temporal coherence. This loss constrains inter-frame velocity errors, ensuring the predicted motion exhibits realistic consistency. Extensive comparisons with state-of-the-art (SOTA) methods on benchmark datasets demonstrate RemoCap's superior performance in 3D human body reconstruction. On the 3DPW dataset, RemoCap surpasses all competitors, achieving the best results in MPVPE (81.9), MPJPE (72.7), and PA-MPJPE (44.1) metrics. Codes are available at https://wanghongsheng01.github.io/RemoCap/.

Via

Access Paper or Ask Questions

When LLM-based Code Generation Meets the Software Development Process

Mar 23, 2024

Feng Lin, Dong Jae Kim, Tse-Husn, Chen

Figure 1 for When LLM-based Code Generation Meets the Software Development Process

Figure 2 for When LLM-based Code Generation Meets the Software Development Process

Figure 3 for When LLM-based Code Generation Meets the Software Development Process

Figure 4 for When LLM-based Code Generation Meets the Software Development Process

Abstract:Software process models play a pivotal role in fostering collaboration and communication within software teams, enabling them to tackle intricate development tasks effectively. This paper introduces LCG, a code generation framework inspired by established software engineering practices. LCG leverages multiple Large Language Model (LLM) agents to emulate various software process models, namely LCGWaterfall, LCGTDD, and LCGScrum. Each model assigns LLM agents specific roles such as requirement engineer, architect, developer, tester, and scrum master, mirroring typical development activities and communication patterns. Through collaborative efforts utilizing chain-of-thought and prompt composition techniques, the agents continuously refine themselves to enhance code quality. Utilizing GPT3.5 as the underlying LLM and baseline (GPT), we evaluate LCG across four code generation benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET. Results indicate LCGScrum outperforms other models, achieving Pass@1 scores of 75.2, 65.5, 82.5, and 56.7 in HumanEval, HumanEval-ET, MBPP, and MBPP-ET, respectively - an average 15% improvement over GPT. Analysis reveals distinct impacts of development activities on generated code, with design and code reviews contributing to enhanced exception handling, while design, testing, and code reviews mitigate code smells. Furthermore, temperature values exhibit negligible influence on Pass@1 across all models. However, variations in Pass@1 are notable for different GPT3.5 model versions, ranging from 5 to over 60 in HumanEval, highlighting the stability of LCG across model versions. This stability underscores the importance of adopting software process models to bolster the quality and consistency of LLM-generated code.

Via

Access Paper or Ask Questions