Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhao Zhang

Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video

Dec 04, 2024

Shanding Diao, Yang Zhao, Yuan Chen, Zhao Zhang, Wei Jia, Ronggang Wang

Figure 1 for Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video

Figure 2 for Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video

Figure 3 for Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video

Figure 4 for Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video

Abstract:With the rapid development of stereoscopic display technologies, especially glasses-free 3D screens, and virtual reality devices, stereoscopic conversion has become an important task to address the lack of high-quality stereoscopic image and video resources. Current stereoscopic conversion algorithms typically struggle to balance reconstruction performance and inference efficiency. This paper proposes a planar video real-time stereoscopic conversion network based on multi-plane images (MPI), which consists of a detail branch for generating MPI and a depth-semantic branch for perceiving depth information. Unlike models that depend on explicit depth map inputs, the proposed method employs a lightweight depth-semantic branch to extract depth-aware features implicitly. To optimize the lightweight branch, a heavy training but light inference strategy is adopted, which involves designing a coarse-to-fine auxiliary branch that is only used during the training stage. In addition, the proposed method simplifies the MPI rendering process for stereoscopic conversion scenarios to further accelerate the inference. Experimental results demonstrate that the proposed method can achieve comparable performance to some state-of-the-art (SOTA) models and support real-time inference at 2K resolution. Compared to the SOTA TMPI algorithm, the proposed method obtains similar subjective quality while achieving over $40\times$ inference acceleration.

* 9 pages, 7 figures

Via

Access Paper or Ask Questions

FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Oct 23, 2024

Wei Chen, Meng Yuan, Zhao Zhang, Ruobing Xie, Fuzhen Zhuang, Deqing Wang, Rui Liu

Figure 1 for FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Figure 2 for FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Figure 3 for FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Figure 4 for FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Abstract:As trustworthy AI continues to advance, the fairness issue in recommendations has received increasing attention. A recommender system is considered unfair when it produces unequal outcomes for different user groups based on user-sensitive attributes (e.g., age, gender). Some researchers have proposed data augmentation-based methods aiming at alleviating user-level unfairness by altering the skewed distribution of training data among various user groups. Despite yielding promising results, they often rely on fairness-related assumptions that may not align with reality, potentially reducing the data quality and negatively affecting model effectiveness. To tackle this issue, in this paper, we study how to implement high-quality data augmentation to improve recommendation fairness. Specifically, we propose FairDgcl, a dynamic graph adversarial contrastive learning framework aiming at improving fairness in recommender system. First, FairDgcl develops an adversarial contrastive network with a view generator and a view discriminator to learn generating fair augmentation strategies in an adversarial style. Then, we propose two dynamic, learnable models to generate contrastive views within contrastive learning framework, which automatically fine-tune the augmentation strategies. Meanwhile, we theoretically show that FairDgcl can simultaneously generate enhanced representations that possess both fairness and accuracy. Lastly, comprehensive experiments conducted on four real-world datasets demonstrate the effectiveness of the proposed FairDgcl.

* 12 pages, submitted to TKDE

Via

Access Paper or Ask Questions

TextMaster: Universal Controllable Text Edit

Oct 13, 2024

Aoqiang Wang, Jian Wang, Zhenyu Yan, Wenxiang Shang, Ran Lin, Zhao Zhang

Abstract:In image editing tasks, high-quality text editing capabilities can significantly reduce human and material resource costs. Current methods rely heavily on training data based on OCR text segment detection, where the text is tightly aligned with the mask area. This reliance creates a strong dependency on the mask area and lacks modules for adjusting text spacing and size in various scenarios. When the amount of text to be edited does not match the modification area or when the mask area is too large, significant issues may arise. Furthermore, no existing methods have explored controllable style transfer for text editing.To address these challenges, we propose TextMaster, a solution capable of accurately editing text with high realism and proper layout in any scenario and image area. Our approach employs adaptive standard letter spacing as guidance during training and uses adaptive mask boosting to prevent the leakage of text position and size information. We also utilize an attention mechanism to calculate the bounding box regression loss for each character, making text layout methods learnable across different scenarios. By injecting high-resolution standard font information and applying perceptual loss in the text editing area, we further enhance text rendering accuracy and fidelity. Additionally, we achieve style consistency between the modified and target text through a novel style injection method. Extensive qualitative and quantitative evaluations demonstrate that our method outperforms all existing approaches.

Via

Access Paper or Ask Questions

Thinking in Granularity: Dynamic Quantization for Image Super-Resolution by Intriguing Multi-Granularity Clues

Sep 22, 2024

Mingshen Wang, Zhao Zhang, Feng Li, Ke Xu, Kang Miao, Meng Wang

Abstract:Dynamic quantization has attracted rising attention in image super-resolution (SR) as it expands the potential of heavy SR models onto mobile devices while preserving competitive performance. Existing methods explore layer-to-bit configuration upon varying local regions, adaptively allocating the bit to each layer and patch. Despite the benefits, they still fall short in the trade-off of SR accuracy and quantization efficiency. Apart from this, adapting the quantization level for each layer individually can disturb the original inter-layer relationships, thus diminishing the representation capability of quantized models. In this work, we propose Granular-DQ, which capitalizes on the intrinsic characteristics of images while dispensing with the previous consideration for layer sensitivity in quantization. Granular-DQ conducts a multi-granularity analysis of local patches with further exploration of their information densities, achieving a distinctive patch-wise and layer-invariant dynamic quantization paradigm. Specifically, Granular-DQ initiates by developing a granularity-bit controller (GBC) to apprehend the coarse-to-fine granular representations of different patches, matching their proportional contribution to the entire image to determine the proper bit-width allocation. On this premise, we investigate the relation between bit-width and information density, devising an entropy-to-bit (E2B) mechanism that enables further fine-grained dynamic bit adaption of high-bit patches. Extensive experiments validate the superiority and generalization ability of Granular-DQ over recent state-of-the-art methods on various SR models. Code will be available at \url{https://github.com/MmmingS/Granular-DQ.git}.

Via

Access Paper or Ask Questions

MarsCode Agent: AI-native Automated Bug Fixing

Sep 04, 2024

Yizhou Liu, Pengfei Gao, Xinchen Wang, Jie Liu, Yexuan Shi, Zhao Zhang, Chao Peng

Figure 1 for MarsCode Agent: AI-native Automated Bug Fixing

Figure 2 for MarsCode Agent: AI-native Automated Bug Fixing

Figure 3 for MarsCode Agent: AI-native Automated Bug Fixing

Figure 4 for MarsCode Agent: AI-native Automated Bug Fixing

Abstract:Recent advances in large language models (LLMs) have shown significant potential to automate various software development tasks, including code completion, test generation, and bug fixing. However, the application of LLMs for automated bug fixing remains challenging due to the complexity and diversity of real-world software systems. In this paper, we introduce MarsCode Agent, a novel framework that leverages LLMs to automatically identify and repair bugs in software code. MarsCode Agent combines the power of LLMs with advanced code analysis techniques to accurately localize faults and generate patches. Our approach follows a systematic process of planning, bug reproduction, fault localization, candidate patch generation, and validation to ensure high-quality bug fixes. We evaluated MarsCode Agent on SWE-bench, a comprehensive benchmark of real-world software projects, and our results show that MarsCode Agent achieves a high success rate in bug fixing compared to most of the existing automated approaches.

* Yizhou Liu and Pengfei Gao contributed equally and the order is determined by rolling the dice. Chao Peng is the corresponding author

Via

Access Paper or Ask Questions

Aligning Teacher with Student Preferences for Tailored Training Data Generation

Jun 27, 2024

Yantao Liu, Zhao Zhang, Zijun Yao, Shulin Cao, Lei Hou, Juanzi Li

Figure 1 for Aligning Teacher with Student Preferences for Tailored Training Data Generation

Figure 2 for Aligning Teacher with Student Preferences for Tailored Training Data Generation

Figure 3 for Aligning Teacher with Student Preferences for Tailored Training Data Generation

Figure 4 for Aligning Teacher with Student Preferences for Tailored Training Data Generation

Abstract:Large Language Models (LLMs) have shown significant promise as copilots in various tasks. Local deployment of LLMs on edge devices is necessary when handling privacy-sensitive data or latency-sensitive tasks. The computational constraints of such devices make direct deployment of powerful large-scale LLMs impractical, necessitating the Knowledge Distillation from large-scale models to lightweight models. Lots of work has been done to elicit diversity and quality training examples from LLMs, but little attention has been paid to aligning teacher instructional content based on student preferences, akin to "responsive teaching" in pedagogy. Thus, we propose ARTE, dubbed Aligning TeacheR with StudenT PreferencEs, a framework that aligns the teacher model with student preferences to generate tailored training examples for Knowledge Distillation. Specifically, we elicit draft questions and rationales from the teacher model, then collect student preferences on these questions and rationales using students' performance with in-context learning as a proxy, and finally align the teacher model with student preferences. In the end, we repeat the first step with the aligned teacher model to elicit tailored training examples for the student model on the target task. Extensive experiments on academic benchmarks demonstrate the superiority of ARTE over existing instruction-tuning datasets distilled from powerful LLMs. Moreover, we thoroughly investigate the generalization of ARTE, including the generalization of fine-tuned student models in reasoning ability and the generalization of aligned teacher models to generate tailored training data across tasks and students. In summary, our contributions lie in proposing a novel framework for tailored training example generation, demonstrating its efficacy in experiments, and investigating the generalization of both student & aligned teacher models in ARTE.

Via

Access Paper or Ask Questions

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Jun 13, 2024

Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

Figure 1 for Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Figure 2 for Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Figure 3 for Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Figure 4 for Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Abstract:Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness. However, these approaches fail to effectively restore content in dynamic range extremes, which are regions with pixel values close to 0 or 255. To address the full scope of challenges in HDR imaging and surpass the limitations of current models, we propose a novel two-stage approach. The first stage maps the color and brightness to an appropriate range while keeping the existing details, and the second stage utilizes a diffusion prior to generate content in dynamic range extremes lost during capture. This generative refinement module can also be used as a plug-and-play module to enhance and complement existing LDR enhancement models. The proposed method markedly improves the quality and details of LDR images, demonstrating superior performance through rigorous experimental validation. The project page is at https://sagiri0208.github.io

* https://sagiri0208.github.io

Via

Access Paper or Ask Questions

A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+

Jun 08, 2024

Jianzhao Wang, Yanyan Wei, Dehua Hu, Yilin Zhang, Shengeng Tang, Dan Guo, Zhao Zhang

Abstract:This technical report presents our team's solution for the WeatherProof Dataset Challenge: Semantic Segmentation in Adverse Weather at CVPR'24 UG2+. We propose a two-stage deep learning framework for this task. In the first stage, we preprocess the provided dataset by concatenating images into video sequences. Subsequently, we leverage a low-rank video deraining method to generate high-fidelity pseudo ground truths. These pseudo ground truths offer superior alignment compared to the original ground truths, facilitating model convergence during training. In the second stage, we employ the InternImage network to train for the semantic segmentation task using the generated pseudo ground truths. Notably, our meticulously designed framework demonstrates robustness to degraded data captured under adverse weather conditions. In the challenge, our solution achieved a competitive score of 0.43 on the Mean Intersection over Union (mIoU) metric, securing a respectable rank of 4th.

Via

Access Paper or Ask Questions

SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

May 25, 2024

Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang, Maryam Mehri Dehnavi

Figure 1 for SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

Figure 2 for SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

Figure 3 for SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

Figure 4 for SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

Abstract:We propose SLoPe, a Double-Pruned Sparse Plus Lazy Low-rank Adapter Pretraining method for LLMs that improves the accuracy of sparse LLMs while accelerating their pretraining and inference and reducing their memory footprint. Sparse pretraining of LLMs reduces the accuracy of the model, to overcome this, prior work uses dense models during fine-tuning. SLoPe improves the accuracy of sparsely pretrained models by adding low-rank adapters in the final 1% iterations of pretraining without adding significant overheads to the model pretraining and inference. In addition, SLoPe uses a double-pruned backward pass formulation that prunes the transposed weight matrix using N:M sparsity structures to enable an accelerated sparse backward pass. SLoPe accelerates the training and inference of models with billions of parameters up to $1.14\times$ and $1.34\times$ respectively (OPT-33B and OPT-66B) while reducing their memory usage by up to $0.77\times$ and $0.51\times$ for training and inference respectively.

Via

Access Paper or Ask Questions

Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2

May 24, 2024

Yeqing Lin, Minji Lee, Zhao Zhang, Mohammed AlQuraishi

Figure 1 for Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2

Figure 2 for Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2

Figure 3 for Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2

Figure 4 for Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2

Abstract:Protein diffusion models have emerged as a promising approach for protein design. One such pioneering model is Genie, a method that asymmetrically represents protein structures during the forward and backward processes, using simple Gaussian noising for the former and expressive SE(3)-equivariant attention for the latter. In this work we introduce Genie 2, extending Genie to capture a larger and more diverse protein structure space through architectural innovations and massive data augmentation. Genie 2 adds motif scaffolding capabilities via a novel multi-motif framework that designs co-occurring motifs with unspecified inter-motif positions and orientations. This makes possible complex protein designs that engage multiple interaction partners and perform multiple functions. On both unconditional and conditional generation, Genie 2 achieves state-of-the-art performance, outperforming all known methods on key design metrics including designability, diversity, and novelty. Genie 2 also solves more motif scaffolding problems than other methods and does so with more unique and varied solutions. Taken together, these advances set a new standard for structure-based protein design. Genie 2 inference and training code, as well as model weights, are freely available at: https://github.com/aqlaboratory/genie2.

Via

Access Paper or Ask Questions