Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yun Liu

Guangdong University of Finance, Guangzhou, China

Five A$^{+}$ Network: You Only Need 9K Parameters for Underwater Image Enhancement

May 15, 2023

Jingxia Jiang, Tian Ye, Jinbin Bai, Sixiang Chen, Wenhao Chai, Shi Jun, Yun Liu, Erkang Chen

Abstract:A lightweight underwater image enhancement network is of great significance for resource-constrained platforms, but balancing model size, computational efficiency, and enhancement performance has proven difficult for previous approaches. In this work, we propose the Five A$^{+}$ Network (FA$^{+}$Net), a highly efficient and lightweight real-time underwater image enhancement network with only $\sim$ 9k parameters and $\sim$ 0.01s processing time. The FA$^{+}$Net employs a two-stage enhancement structure. The strong prior stage aims to decompose challenging underwater degradations into sub-problems, while the fine-grained stage incorporates multi-branch color enhancement module and pixel attention module to amplify the network's perception of details. To the best of our knowledge, FA$^{+}$Net is the only network with the capability of real-time enhancement of 1080P images. Thorough extensive experiments and comprehensive visual comparison, we show that FA$^{+}$Net outperforms previous approaches by obtaining state-of-the-art performance on multiple datasets while significantly reducing both parameter count and computational complexity. The code is open source at https://github.com/Owen718/FiveAPlus-Network.

Via

Access Paper or Ask Questions

Predicting Cardiovascular Disease Risk using Photoplethysmography and Deep Learning

May 09, 2023

Wei-Hung Weng, Sebastien Baur, Mayank Daswani, Christina Chen, Lauren Harrell, Sujay Kakarmath, Mariam Jabara, Babak Behsaz, Cory Y. McLean, Yossi Matias(+6 more)

Abstract:Cardiovascular diseases (CVDs) are responsible for a large proportion of premature deaths in low- and middle-income countries. Early CVD detection and intervention is critical in these populations, yet many existing CVD risk scores require a physical examination or lab measurements, which can be challenging in such health systems due to limited accessibility. Here we investigated the potential to use photoplethysmography (PPG), a sensing technology available on most smartphones that can potentially enable large-scale screening at low cost, for CVD risk prediction. We developed a deep learning PPG-based CVD risk score (DLS) to predict the probability of having major adverse cardiovascular events (MACE: non-fatal myocardial infarction, stroke, and cardiovascular death) within ten years, given only age, sex, smoking status and PPG as predictors. We compared the DLS with the office-based refit-WHO score, which adopts the shared predictors from WHO and Globorisk scores (age, sex, smoking status, height, weight and systolic blood pressure) but refitted on the UK Biobank (UKB) cohort. In UKB cohort, DLS's C-statistic (71.1%, 95% CI 69.9-72.4) was non-inferior to office-based refit-WHO score (70.9%, 95% CI 69.7-72.2; non-inferiority margin of 2.5%, p<0.01). The calibration of the DLS was satisfactory, with a 1.8% mean absolute calibration error. Adding DLS features to the office-based score increased the C-statistic by 1.0% (95% CI 0.6-1.4). DLS predicts ten-year MACE risk comparable with the office-based refit-WHO score. It provides a proof-of-concept and suggests the potential of a PPG-based approach strategies for community-based primary prevention in resource-limited regions.

* main: 24 pages (3 tables, 2 figures, 42 references), supplementary: 25 pages (9 tables, 4 figures, 11 references)

Via

Access Paper or Ask Questions

Indiscernible Object Counting in Underwater Scenes

Apr 23, 2023

Guolei Sun, Zhaochong An, Yun Liu, Ce Liu, Christos Sakaridis, Deng-Ping Fan, Luc Van Gool

Figure 1 for Indiscernible Object Counting in Underwater Scenes

Figure 2 for Indiscernible Object Counting in Underwater Scenes

Figure 3 for Indiscernible Object Counting in Underwater Scenes

Figure 4 for Indiscernible Object Counting in Underwater Scenes

Abstract:Recently, indiscernible scene understanding has attracted a lot of attention in the vision community. We further advance the frontier of this field by systematically studying a new challenge named indiscernible object counting (IOC), the goal of which is to count objects that are blended with respect to their surroundings. Due to a lack of appropriate IOC datasets, we present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 659,024 annotated center points. Our dataset consists of a large number of indiscernible objects (mainly fish) in underwater scenes, making the annotation process all the more challenging. IOCfish5K is superior to existing datasets with indiscernible scenes because of its larger scale, higher image resolutions, more annotations, and denser scenes. All these aspects make it the most challenging dataset for IOC so far, supporting progress in this area. For benchmarking purposes, we select 14 mainstream methods for object counting and carefully evaluate them on IOCfish5K. Furthermore, we propose IOCFormer, a new strong baseline that combines density and regression branches in a unified framework and can effectively tackle object counting under concealed scenes. Experiments show that IOCFormer achieves state-of-the-art scores on IOCfish5K.

* To appear in CVPR 2023. The resources are available at https://github.com/GuoleiSun/Indiscernible-Object-Counting

Via

Access Paper or Ask Questions

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Apr 04, 2023

Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, He Wang

Figure 1 for UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Figure 2 for UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Figure 3 for UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Figure 4 for UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Abstract:We propose a novel, object-agnostic method for learning a universal policy for dexterous object grasping from realistic point cloud observations and proprioceptive information under a table-top setting, namely UniDexGrasp++. To address the challenge of learning the vision-based policy across thousands of object instances, we propose Geometry-aware Curriculum Learning (GeoCurriculum) and Geometry-aware iterative Generalist-Specialist Learning (GiGSL) which leverage the geometry feature of the task and significantly improve the generalizability. With our proposed techniques, our final policy shows universal dexterous grasping on thousands of object instances with 85.4% and 78.2% success rate on the train set and test set which outperforms the state-of-the-art baseline UniDexGrasp by 11.7% and 11.3%, respectively.

Via

Access Paper or Ask Questions

CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis

Mar 25, 2023

Juntian Zheng, Qingyuan Zheng, Lixing Fang, Yun Liu, Li Yi

Figure 1 for CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis

Figure 2 for CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis

Figure 3 for CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis

Figure 4 for CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis

Abstract:In this work, we focus on a novel task of category-level functional hand-object manipulation synthesis covering both rigid and articulated object categories. Given an object geometry, an initial human hand pose as well as a sparse control sequence of object poses, our goal is to generate a physically reasonable hand-object manipulation sequence that performs like human beings. To address such a challenge, we first design CAnonicalized Manipulation Spaces (CAMS), a two-level space hierarchy that canonicalizes the hand poses in an object-centric and contact-centric view. Benefiting from the representation capability of CAMS, we then present a two-stage framework for synthesizing human-like manipulation animations. Our framework achieves state-of-the-art performance for both rigid and articulated categories with impressive visual effects. Codes and video results can be found at our project homepage: https://cams-hoi.github.io/

* CVPR 2023 Received

Via

Access Paper or Ask Questions

PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

Mar 23, 2023

Yun Liu, Xuefeng Yan, Zhilei Chen, Zhiqi Li, Zeyong Wei, Mingqiang Wei

Figure 1 for PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

Figure 2 for PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

Figure 3 for PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

Figure 4 for PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

Abstract:Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity and sparsity. We propose a geometrically and adaptively masked auto-encoder for self-supervised learning on point clouds, termed \textit{PointGame}. PointGame contains two core components: GATE and EAT. GATE stands for the geometrical and adaptive token embedding module; it not only absorbs the conventional wisdom of geometric descriptors that captures the surface shape effectively, but also exploits adaptive saliency to focus on the salient part of a point cloud. EAT stands for the external attention-based Transformer encoder with linear computational complexity, which increases the efficiency of the whole pipeline. Unlike cutting-edge unsupervised learning models, PointGame leverages geometric descriptors to perceive surface shapes and adaptively mines discriminative features from training data. PointGame showcases clear advantages over its competitors on various downstream tasks under both global and local fine-tuning strategies. The code and pre-trained models will be publicly available.

Via

Access Paper or Ask Questions

DEHRFormer: Real-time Transformer for Depth Estimation and Haze Removal from Varicolored Haze Scenes

Mar 13, 2023

Sixiang Chen, Tian Ye, Jun Shi, Yun Liu, JingXia Jiang, Erkang Chen, Peng Chen

Figure 1 for DEHRFormer: Real-time Transformer for Depth Estimation and Haze Removal from Varicolored Haze Scenes

Figure 2 for DEHRFormer: Real-time Transformer for Depth Estimation and Haze Removal from Varicolored Haze Scenes

Figure 3 for DEHRFormer: Real-time Transformer for Depth Estimation and Haze Removal from Varicolored Haze Scenes

Figure 4 for DEHRFormer: Real-time Transformer for Depth Estimation and Haze Removal from Varicolored Haze Scenes

Abstract:Varicolored haze caused by chromatic casts poses haze removal and depth estimation challenges. Recent learning-based depth estimation methods are mainly targeted at dehazing first and estimating depth subsequently from haze-free scenes. This way, the inner connections between colored haze and scene depth are lost. In this paper, we propose a real-time transformer for simultaneous single image Depth Estimation and Haze Removal (DEHRFormer). DEHRFormer consists of a single encoder and two task-specific decoders. The transformer decoders with learnable queries are designed to decode coupling features from the task-agnostic encoder and project them into clean image and depth map, respectively. In addition, we introduce a novel learning paradigm that utilizes contrastive learning and domain consistency learning to tackle weak-generalization problem for real-world dehazing, while predicting the same depth map from the same scene with varicolored haze. Experiments demonstrate that DEHRFormer achieves significant performance improvement across diverse varicolored haze scenes over previous depth estimation networks and dehazing approaches.

* Accepted to ICASSP'2023

Via

Access Paper or Ask Questions

RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

Feb 23, 2023

Jingxia Jiang, Jinbin Bai, Yun Liu, Junjie Yin, Sixiang Chen, Tian Ye, Erkang Chen

Figure 1 for RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

Figure 2 for RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

Figure 3 for RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

Figure 4 for RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

Abstract:Underwater images typically experience mixed degradations of brightness and structure caused by the absorption and scattering of light by suspended particles. To address this issue, we propose a Real-time Spatial and Frequency Domains Modulation Network (RSFDM-Net) for the efficient enhancement of colors and details in underwater images. Specifically, our proposed conditional network is designed with Adaptive Fourier Gating Mechanism (AFGM) and Multiscale Convolutional Attention Module (MCAM) to generate vectors carrying low-frequency background information and high-frequency detail features, which effectively promote the network to model global background information and local texture details. To more precisely correct the color cast and low saturation of the image, we introduce a Three-branch Feature Extraction (TFE) block in the primary net that processes images pixel by pixel to integrate the color information extended by the same channel (R, G, or B). This block consists of three small branches, each of which has its own weights. Extensive experiments demonstrate that our network significantly outperforms over state-of-the-art methods in both visual quality and quantitative metrics.

Via

Access Paper or Ask Questions

Generative Sentiment Transfer via Adaptive Masking

Feb 23, 2023

Yingze Xie, Jie Xu, LiQiang Qiao, Yun Liu, Feiren Huang, Chaozhuo Li

Figure 1 for Generative Sentiment Transfer via Adaptive Masking

Figure 2 for Generative Sentiment Transfer via Adaptive Masking

Figure 3 for Generative Sentiment Transfer via Adaptive Masking

Figure 4 for Generative Sentiment Transfer via Adaptive Masking

Abstract:Sentiment transfer aims at revising the input text to satisfy a given sentiment polarity while retaining the original semantic content. The nucleus of sentiment transfer lies in precisely separating the sentiment information from the content information. Existing explicit approaches generally identify and mask sentiment tokens simply based on prior linguistic knowledge and manually-defined rules, leading to low generality and undesirable transfer performance. In this paper, we view the positions to be masked as the learnable parameters, and further propose a novel AM-ST model to learn adaptive task-relevant masks based on the attention mechanism. Moreover, a sentiment-aware masked language model is further proposed to fill in the blanks in the masked positions by incorporating both context and sentiment polarity to capture the multi-grained semantics comprehensively. AM-ST is thoroughly evaluated on two popular datasets, and the experimental results demonstrate the superiority of our proposal.

Via

Access Paper or Ask Questions

Large Language Models Encode Clinical Knowledge

Dec 26, 2022

Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl(+20 more)

Abstract:Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.

Via

Access Paper or Ask Questions