Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Wang

School of Physics and Astronomy, Shanghai Jiao Tong University, State Key Laboratory of Dark Matter Physics, Shanghai Jiao Tong University, Tsung-Dao Lee Institute, Shanghai Jiao Tong University

HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

Apr 20, 2024

Lei Lu, Yanyue Xie, Wei Jiang, Wei Wang, Xue Lin, Yanzhi Wang

Figure 1 for HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

Figure 2 for HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

Figure 3 for HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

Figure 4 for HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

Abstract:This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the insufficient representation power of limited codewords in capturing faithful details. We propose a novel dual-stream framework, HyrbidFlow, which combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme low bitrates. The codebook-based stream benefits from the high-quality learned codebook priors to provide high quality and clarity in reconstructed images. The continuous feature stream targets at maintaining fidelity details. To achieve the ultra low bitrate, a masked token-based transformer is further proposed, where we only transmit a masked portion of codeword indices and recover the missing indices through token generation guided by information from the continuous feature stream. We also develop a bridging correction network to merge the two streams in pixel decoding for final image reconstruction, where the continuous stream features rectify biases of the codebook-based pixel decoder to impose reconstructed fidelity details. Experimental results demonstrate superior performance across several datasets under extremely low bitrates, compared with existing single-stream codebook-based or continuous-feature-based LIC methods.

Via

Access Paper or Ask Questions

LayeredMAPF: a decomposition of MAPF instance without compromising solvability

Apr 19, 2024

Zhuo Yao, Wei Wang

Figure 1 for LayeredMAPF: a decomposition of MAPF instance without compromising solvability

Figure 2 for LayeredMAPF: a decomposition of MAPF instance without compromising solvability

Figure 3 for LayeredMAPF: a decomposition of MAPF instance without compromising solvability

Figure 4 for LayeredMAPF: a decomposition of MAPF instance without compromising solvability

Abstract:Generally, the calculation and memory space required for multi-agent path finding (MAPF) grows exponentially as the number of agents increases. This often results in some MAPF instances being unsolvable under limited computational resources and memory space, thereby limiting the application of MAPF in complex scenarios. Hence, we propose a decomposition approach for MAPF instances, which breaks down instances involving a large number of agents into multiple isolated subproblems involving fewer agents. Moreover, we present a framework to enable general MAPF algorithms to solve each subproblem independently and merge their solutions into one conflict-free final solution, without compromising on solvability. Unlike existing works that propose isolated methods aimed at reducing the time cost of MAPF, our method is applicable to all MAPF methods. In our results, we apply decomposition to multiple state-of-the-art MAPF methods using a classic MAPF benchmark (https://movingai.com/benchmarks/mapf.html). The decomposition of MAPF instances is completed on average within 1s, and its application to seven MAPF methods reduces the memory usage and time cost significantly, particularly for serial methods. To facilitate further research within the community, we have made the source code of the proposed algorithm publicly available (https://github.com/JoeYao-bit/LayeredMAPF).

Via

Access Paper or Ask Questions

No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Apr 15, 2024

Yu-Ju Tsai, Jin-Cheng Jhang, Jingjing Zheng, Wei Wang, Albert Y. C. Chen, Min Sun, Cheng-Hao Kuo, Ming-Hsuan Yang

Figure 1 for No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Figure 2 for No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Figure 3 for No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Figure 4 for No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Abstract:Inherent ambiguity in layout annotations poses significant challenges to developing accurate 360{\deg} room layout estimation models. To address this issue, we propose a novel Bi-Layout model capable of predicting two distinct layout types. One stops at ambiguous regions, while the other extends to encompass all visible areas. Our model employs two global context embeddings, where each embedding is designed to capture specific contextual information for each layout type. With our novel feature guidance module, the image feature retrieves relevant context from these embeddings, generating layout-aware features for precise bi-layout predictions. A unique property of our Bi-Layout model is its ability to inherently detect ambiguous regions by comparing the two predictions. To circumvent the need for manual correction of ambiguous annotations during testing, we also introduce a new metric for disambiguating ground truth layouts. Our method demonstrates superior performance on benchmark datasets, notably outperforming leading approaches. Specifically, on the MatterportLayout dataset, it improves 3DIoU from 81.70% to 82.57% across the full test set and notably from 54.80% to 59.97% in subsets with significant ambiguity. Project page: https://liagm.github.io/Bi_Layout/

* CVPR 2024, Project page: https://liagm.github.io/Bi_Layout/

Via

Access Paper or Ask Questions

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Apr 14, 2024

Yuqi Wang, Zeqiang Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, Suparna De

Figure 1 for DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Figure 2 for DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Figure 3 for DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Figure 4 for DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Abstract:Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

Via

Access Paper or Ask Questions

Hierarchical Attention Models for Multi-Relational Graphs

Apr 14, 2024

Roshni G. Iyer, Wei Wang, Yizhou Sun

Figure 1 for Hierarchical Attention Models for Multi-Relational Graphs

Figure 2 for Hierarchical Attention Models for Multi-Relational Graphs

Figure 3 for Hierarchical Attention Models for Multi-Relational Graphs

Figure 4 for Hierarchical Attention Models for Multi-Relational Graphs

Abstract:We present Bi-Level Attention-Based Relational Graph Convolutional Networks (BR-GCN), unique neural network architectures that utilize masked self-attentional layers with relational graph convolutions, to effectively operate on highly multi-relational data. BR-GCN models use bi-level attention to learn node embeddings through (1) node-level attention, and (2) relation-level attention. The node-level self-attentional layers use intra-relational graph interactions to learn relation-specific node embeddings using a weighted aggregation of neighborhood features in a sparse subgraph region. The relation-level self-attentional layers use inter-relational graph interactions to learn the final node embeddings using a weighted aggregation of relation-specific node embeddings. The BR-GCN bi-level attention mechanism extends Transformer-based multiplicative attention from the natural language processing (NLP) domain, and Graph Attention Networks (GAT)-based attention, to large-scale heterogeneous graphs (HGs). On node classification, BR-GCN outperforms baselines from 0.29% to 14.95% as a stand-alone model, and on link prediction, BR-GCN outperforms baselines from 0.02% to 7.40% as an auto-encoder model. We also conduct ablation studies to evaluate the quality of BR-GCN's relation-level attention and discuss how its learning of graph structure may be transferred to enrich other graph neural networks (GNNs). Through various experiments, we show that BR-GCN's attention mechanism is both scalable and more effective in learning compared to state-of-the-art GNNs.

Via

Access Paper or Ask Questions

Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

Apr 12, 2024

Yang Li, Songlin Yang, Wei Wang, Ziwen He, Bo Peng, Jing Dong

Figure 1 for Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

Figure 2 for Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

Figure 3 for Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

Figure 4 for Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

Abstract:Highly realistic AI generated face forgeries known as deepfakes have raised serious social concerns. Although DNN-based face forgery detection models have achieved good performance, they are vulnerable to latest generative methods that have less forgery traces and adversarial attacks. This limitation of generalization and robustness hinders the credibility of detection results and requires more explanations. In this work, we provide counterfactual explanations for face forgery detection from an artifact removal perspective. Specifically, we first invert the forgery images into the StyleGAN latent space, and then adversarially optimize their latent representations with the discrimination supervision from the target detection model. We verify the effectiveness of the proposed explanations from two aspects: (1) Counterfactual Trace Visualization: the enhanced forgery images are useful to reveal artifacts by visually contrasting the original images and two different visualization methods; (2) Transferable Adversarial Attacks: the adversarial forgery images generated by attacking the detection model are able to mislead other detection models, implying the removed artifacts are general. Extensive experiments demonstrate that our method achieves over 90% attack success rate and superior attack transferability. Compared with naive adversarial noise methods, our method adopts both generative and discriminative model priors, and optimize the latent representations in a synthesis-by-analysis way, which forces the search of counterfactual explanations on the natural face manifold. Thus, more general counterfactual traces can be found and better adversarial attack transferability can be achieved.

* Accepted to ICME2024

Via

Access Paper or Ask Questions

Efficient and Scalable Chinese Vector Font Generation via Component Composition

Apr 10, 2024

Jinyu Song, Weitao You, Shuhui Shi, Shuxuan Guo, Lingyun Sun, Wei Wang

Figure 1 for Efficient and Scalable Chinese Vector Font Generation via Component Composition

Figure 2 for Efficient and Scalable Chinese Vector Font Generation via Component Composition

Figure 3 for Efficient and Scalable Chinese Vector Font Generation via Component Composition

Figure 4 for Efficient and Scalable Chinese Vector Font Generation via Component Composition

Abstract:Chinese vector font generation is challenging due to the complex structure and huge amount of Chinese characters. Recent advances remain limited to generating a small set of characters with simple structure. In this work, we first observe that most Chinese characters can be disassembled into frequently-reused components. Therefore, we introduce the first efficient and scalable Chinese vector font generation approach via component composition, allowing generating numerous vector characters from a small set of components. To achieve this, we collect a large-scale dataset that contains over \textit{90K} Chinese characters with their components and layout information. Upon the dataset, we propose a simple yet effective framework based on spatial transformer networks (STN) and multiple losses tailored to font characteristics to learn the affine transformation of the components, which can be directly applied to the B\'ezier curves, resulting in Chinese characters in vector format. Our qualitative and quantitative experiments have demonstrated that our method significantly surpasses the state-of-the-art vector font generation methods in generating large-scale complex Chinese characters in both font generation and zero-shot font extension.

* 15 pages, 23 figures

Via

Access Paper or Ask Questions

Object Detectors in the Open Environment: Challenges, Solutions, and Outlook

Apr 09, 2024

Siyuan Liang, Wei Wang, Ruoyu Chen, Aishan Liu, Boxi Wu, Ee-Chien Chang, Xiaochun Cao, Dacheng Tao

Figure 1 for Object Detectors in the Open Environment: Challenges, Solutions, and Outlook

Figure 2 for Object Detectors in the Open Environment: Challenges, Solutions, and Outlook

Figure 3 for Object Detectors in the Open Environment: Challenges, Solutions, and Outlook

Figure 4 for Object Detectors in the Open Environment: Challenges, Solutions, and Outlook

Abstract:With the emergence of foundation models, deep learning-based object detectors have shown practical usability in closed set scenarios. However, for real-world tasks, object detectors often operate in open environments, where crucial factors (e.g., data distribution, objective) that influence model learning are often changing. The dynamic and intricate nature of the open environment poses novel and formidable challenges to object detectors. Unfortunately, current research on object detectors in open environments lacks a comprehensive analysis of their distinctive characteristics, challenges, and corresponding solutions, which hinders their secure deployment in critical real-world scenarios. This paper aims to bridge this gap by conducting a comprehensive review and analysis of object detectors in open environments. We initially identified limitations of key structural components within the existing detection pipeline and propose the open environment object detector challenge framework that includes four quadrants (i.e., out-of-domain, out-of-category, robust learning, and incremental learning) based on the dimensions of the data / target changes. For each quadrant of challenges in the proposed framework, we present a detailed description and systematic analysis of the overarching goals and core difficulties, systematically review the corresponding solutions, and benchmark their performance over multiple widely adopted datasets. In addition, we engage in a discussion of open problems and potential avenues for future research. This paper aims to provide a fresh, comprehensive, and systematic understanding of the challenges and solutions associated with open-environment object detectors, thus catalyzing the development of more solid applications in real-world scenarios. A project related to this survey can be found at https://github.com/LiangSiyuan21/OEOD_Survey.

* 37 pages, 17 figures

Via

Access Paper or Ask Questions

Image and Video Compression using Generative Sparse Representation with Fidelity Controls

Apr 09, 2024

Wei Jiang, Wei Wang

Abstract:We propose a framework for learned image and video compression using the generative sparse visual representation (SVR) guided by fidelity-preserving controls. By embedding inputs into a discrete latent space spanned by learned visual codebooks, SVR-based compression transmits integer codeword indices, which is efficient and cross-platform robust. However, high-quality (HQ) reconstruction in the decoder relies on intermediate feature inputs from the encoder via direct connections. Due to the prohibitively high transmission costs, previous SVR-based compression methods remove such feature links, resulting in largely degraded reconstruction quality. In this work, we treat the intermediate features as fidelity-preserving control signals that guide the conditioned generative reconstruction in the decoder. Instead of discarding or directly transferring such signals, we draw them from a low-quality (LQ) fidelity-preserving alternative input that is sent to the decoder with very low bitrate. These control signals provide complementary fidelity cues to improve reconstruction, and their quality is determined by the compression rate of the LQ alternative, which can be tuned to trade off bitrate, fidelity and perceptual quality. Our framework can be conveniently used for both learned image compression (LIC) and learned video compression (LVC). Since SVR is robust against input perturbations, a large portion of codeword indices between adjacent frames can be the same. By only transferring different indices, SVR-based LIC and LVC can share a similar processing pipeline. Experiments over standard image and video compression benchmarks demonstrate the effectiveness of our approach.

Via

Access Paper or Ask Questions

CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST)

Apr 02, 2024

Xu Li, Ruiqi Sun, Jiameng Lv, Peng Jia, Nan Li, Chengliang Wei, Zou Hu, Xinzhong Er, Yun Chen, Zhang Ban(+22 more)

Abstract:Strong gravitational lensing is a powerful tool for investigating dark matter and dark energy properties. With the advent of large-scale sky surveys, we can discover strong lensing systems on an unprecedented scale, which requires efficient tools to extract them from billions of astronomical objects. The existing mainstream lens-finding tools are based on machine learning algorithms and applied to cut-out-centered galaxies. However, according to the design and survey strategy of optical surveys by CSST, preparing cutouts with multiple bands requires considerable efforts. To overcome these challenges, we have developed a framework based on a hierarchical visual Transformer with a sliding window technique to search for strong lensing systems within entire images. Moreover, given that multi-color images of strong lensing systems can provide insights into their physical characteristics, our framework is specifically crafted to identify strong lensing systems in images with any number of channels. As evaluated using CSST mock data based on an Semi-Analytic Model named CosmoDC2, our framework achieves precision and recall rates of 0.98 and 0.90, respectively. To evaluate the effectiveness of our method in real observations, we have applied it to a subset of images from the DESI Legacy Imaging Surveys and media images from Euclid Early Release Observations. 61 new strong lensing system candidates are discovered by our method. However, we also identified false positives arising primarily from the simplified galaxy morphology assumptions within the simulation. This underscores the practical limitations of our approach while simultaneously highlighting potential avenues for future improvements.

* The paper is accepted by the AJ. The complete code could be downloaded with DOI of: 10.12149/101393. Comments are welcome

Via

Access Paper or Ask Questions