Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yihao Liu

Unifying Image Processing as Visual Prompting Question Answering

Oct 16, 2023
Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong

Figure 1 for Unifying Image Processing as Visual Prompting Question Answering

Figure 2 for Unifying Image Processing as Visual Prompting Question Answering

Figure 3 for Unifying Image Processing as Visual Prompting Question Answering

Figure 4 for Unifying Image Processing as Visual Prompting Question Answering

Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications. Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise. Building upon the success of large language models (LLMs) in natural language processing (NLP), there is a similar trend in computer vision, which focuses on developing large-scale models through pretraining and in-context learning. This paradigm shift reduces the reliance on task-specific models, yielding a powerful unified model to deal with various tasks. However, these advances have predominantly concentrated on high-level vision tasks, with less attention paid to low-level vision tasks. To address this issue, we propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks, \textit{etc}. Our proposed framework, named PromptGIP, unifies these diverse image processing tasks within a universal framework. Inspired by NLP question answering (QA) techniques, we employ a visual prompting question answering paradigm. Specifically, we treat the input-output image pair as a structured question-answer sentence, thereby reprogramming the image processing task as a prompting QA problem. PromptGIP can undertake diverse \textbf{cross-domain} tasks using provided visual prompts, eliminating the need for task-specific finetuning. Our methodology offers a universal and adaptive solution to general image processing. While PromptGIP has demonstrated a certain degree of out-of-domain task generalization capability, further research is expected to fully explore its more powerful emergent generalization.

* 16 pages, 12 figures

Via

Access Paper or Ask Questions

Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation

Sep 08, 2023
Xiangyu Chen, Zheyuan Li, Zhengwen Zhang, Jimmy S. Ren, Yihao Liu, Jingwen He, Yu Qiao, Jiantao Zhou, Chao Dong

Figure 1 for Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation

Figure 2 for Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation

Figure 3 for Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation

Figure 4 for Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation

Modern displays are capable of rendering video content with high dynamic range (HDR) and wide color gamut (WCG). However, the majority of available resources are still in standard dynamic range (SDR). As a result, there is significant value in transforming existing SDR content into the HDRTV standard. In this paper, we define and analyze the SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV content. Our analysis and observations indicate that a naive end-to-end supervised training pipeline suffers from severe gamut transition errors. To address this issue, we propose a novel three-step solution pipeline called HDRTVNet++, which includes adaptive global color mapping, local enhancement, and highlight refinement. The adaptive global color mapping step uses global statistics as guidance to perform image-adaptive color mapping. A local enhancement network is then deployed to enhance local details. Finally, we combine the two sub-networks above as a generator and achieve highlight consistency through GAN-based joint training. Our method is primarily designed for ultra-high-definition TV content and is therefore effective and lightweight for processing 4K resolution images. We also construct a dataset using HDR videos in the HDR10 standard, named HDRTV1K that contains 1235 and 117 training images and 117 testing images, all in 4K resolution. Besides, we select five metrics to evaluate the results of SDRTV-to-HDRTV algorithms. Our final results demonstrate state-of-the-art performance both quantitatively and visually. The code, model and dataset are available at https://github.com/xiaom233/HDRTVNet-plus.

* Extended version of HDRTVNet

Via

Access Paper or Ask Questions

Toward Process Controlled Medical Robotic System

Aug 10, 2023
Yihao Liu, Amir Kheradmand, Mehran Armand

Figure 1 for Toward Process Controlled Medical Robotic System

Figure 2 for Toward Process Controlled Medical Robotic System

Figure 3 for Toward Process Controlled Medical Robotic System

Figure 4 for Toward Process Controlled Medical Robotic System

Medical errors, defined as unintended acts either of omission or commission that cause the failure of medical actions, are the third leading cause of death in the United States. The application of autonomy and robotics can alleviate some causes of medical errors by improving accuracy and providing means to preciously follow planned procedures. However, for the robotic applications to improve safety, they must maintain constant operating conditions in the presence of disturbances, and provide reliable measurements, evaluation, and control for each state of the procedure. This article addresses the need for process control in medical robotic systems, and proposes a standardized design cycle toward its automation. Monitoring and controlling the changing conditions in a medical or surgical environment necessitates a clear definition of workflows and their procedural dependencies. We propose integrating process control into medical robotic workflows to identify change in states of the system and environment, possible operations, and transitions to new states. Therefore, the system translates clinician experiences and procedure workflows into machine-interpretable languages. The design cycle using hFSM formulation can be a deterministic process, which opens up possibilities for higher-level automation in medical robotics. Shown in our work, with a standardized design cycle and software paradigm, we pave the way toward controlled workflows that can be automatically generated. Additionally, a modular design for a robotic system architecture that integrates hFSM can provide easy software and hardware integration. This article discusses the system design, software implementation, and example application to Robot-Assisted Transcranial Magnetic Stimulation and robot-assisted femoroplasty. We also provide assessments of these two system examples by testing their robotic tool placement.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

MomentaMorph: Unsupervised Spatial-Temporal Registration with Momenta, Shooting, and Correction

Aug 05, 2023
Zhangxing Bian, Shuwen Wei, Yihao Liu, Junyu Chen, Jiachen Zhuo, Fangxu Xing, Jonghye Woo, Aaron Carass, Jerry L. Prince

Figure 1 for MomentaMorph: Unsupervised Spatial-Temporal Registration with Momenta, Shooting, and Correction

Figure 2 for MomentaMorph: Unsupervised Spatial-Temporal Registration with Momenta, Shooting, and Correction

Figure 3 for MomentaMorph: Unsupervised Spatial-Temporal Registration with Momenta, Shooting, and Correction

Figure 4 for MomentaMorph: Unsupervised Spatial-Temporal Registration with Momenta, Shooting, and Correction

Tagged magnetic resonance imaging (tMRI) has been employed for decades to measure the motion of tissue undergoing deformation. However, registration-based motion estimation from tMRI is difficult due to the periodic patterns in these images, particularly when the motion is large. With a larger motion the registration approach gets trapped in a local optima, leading to motion estimation errors. We introduce a novel "momenta, shooting, and correction" framework for Lagrangian motion estimation in the presence of repetitive patterns and large motion. This framework, grounded in Lie algebra and Lie group principles, accumulates momenta in the tangent vector space and employs exponential mapping in the diffeomorphic space for rapid approximation towards true optima, circumventing local optima. A subsequent correction step ensures convergence to true optima. The results on a 2D synthetic dataset and a real 3D tMRI dataset demonstrate our method's efficiency in estimating accurate, dense, and diffeomorphic 2D/3D motion fields amidst large motion and repetitive patterns.

* Accepted by MICCAI Workshop 2023: Time-Series Data Analytics and Learning (MTSAIL)

Via

Access Paper or Ask Questions

A Survey on Deep Learning in Medical Image Registration: New Technologies, Uncertainty, Evaluation Metrics, and Beyond

Jul 28, 2023
Junyu Chen, Yihao Liu, Shuwen Wei, Zhangxing Bian, Shalini Subramanian, Aaron Carass, Jerry L. Prince, Yong Du

Figure 1 for A Survey on Deep Learning in Medical Image Registration: New Technologies, Uncertainty, Evaluation Metrics, and Beyond

Figure 2 for A Survey on Deep Learning in Medical Image Registration: New Technologies, Uncertainty, Evaluation Metrics, and Beyond

Figure 3 for A Survey on Deep Learning in Medical Image Registration: New Technologies, Uncertainty, Evaluation Metrics, and Beyond

Figure 4 for A Survey on Deep Learning in Medical Image Registration: New Technologies, Uncertainty, Evaluation Metrics, and Beyond

Over the past decade, deep learning technologies have greatly advanced the field of medical image registration. The initial developments, such as ResNet-based and U-Net-based networks, laid the groundwork for deep learning-driven image registration. Subsequent progress has been made in various aspects of deep learning-based registration, including similarity measures, deformation regularizations, and uncertainty estimation. These advancements have not only enriched the field of deformable image registration but have also facilitated its application in a wide range of tasks, including atlas construction, multi-atlas segmentation, motion estimation, and 2D-3D registration. In this paper, we present a comprehensive overview of the most recent advancements in deep learning-based image registration. We begin with a concise introduction to the core concepts of deep learning-based image registration. Then, we delve into innovative network architectures, loss functions specific to registration, and methods for estimating registration uncertainty. Additionally, this paper explores appropriate evaluation metrics for assessing the performance of deep learning models in registration tasks. Finally, we highlight the practical applications of these novel techniques in medical imaging and discuss the future prospects of deep learning-based image registration.

Via

Access Paper or Ask Questions

EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

Jul 03, 2023
Haowei Li, Wenqing Yan, Du Liu, Long Qian, Yuxing Yang, Yihao Liu, Zhe Zhao, Hui Ding, Guangzhi Wang

Figure 1 for EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

Figure 2 for EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

Figure 3 for EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

Figure 4 for EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

Augmented Reality (AR) has been used to facilitate surgical guidance during External Ventricular Drain (EVD) surgery, reducing the risks of misplacement in manual operations. During this procedure, the key challenge is accurately estimating the spatial relationship between pre-operative images and actual patient anatomy in AR environment. This research proposes a novel framework utilizing Time of Flight (ToF) depth sensors integrated in commercially available AR Head Mounted Devices (HMD) for precise EVD surgical guidance. As previous studies have proven depth errors for ToF sensors, we first assessed their properties on AR-HMDs. Subsequently, a depth error model and patient-specific parameter identification method are introduced for accurate surface information. A tracking pipeline combining retro-reflective markers and point clouds is then proposed for accurate head tracking. The head surface is reconstructed using depth data for spatial registration, avoiding fixing tracking targets rigidly on the patient's skull. Firstly, $7.580\pm 1.488 mm$ depth value error was revealed on human skin, indicating the significance of depth correction. Our results showed that the error was reduced by over $85\%$ using proposed depth correction method on head phantoms in different materials. Meanwhile, the head surface reconstructed with corrected depth data achieved sub-millimetre accuracy. An experiment on sheep head revealed $0.79 mm$ reconstruction error. Furthermore, a user study was conducted for the performance in simulated EVD surgery, where five surgeons performed nine k-wire injections on a head phantom with virtual guidance. Results of this study revealed $2.09 \pm 0.16 mm$ translational accuracy and $2.97\pm 0.91$ degree orientational accuracy.

Via

Access Paper or Ask Questions

LeCo: Lightweight Compression via Learning Serial Correlations

Jun 27, 2023
Yihao Liu, Xinyu Zeng, Huanchen Zhang

Figure 1 for LeCo: Lightweight Compression via Learning Serial Correlations

Figure 2 for LeCo: Lightweight Compression via Learning Serial Correlations

Figure 3 for LeCo: Lightweight Compression via Learning Serial Correlations

Figure 4 for LeCo: Lightweight Compression via Learning Serial Correlations

Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works have systematically exploited the serial correlation in a column for compression. In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance simultaneously. LeCo presents a general approach to this end, making existing (ad-hoc) algorithms such as Frame-of-Reference (FOR), Delta Encoding, and Run-Length Encoding (RLE) special cases under our framework. Our microbenchmark with three synthetic and six real-world data sets shows that a prototype of LeCo achieves a Pareto improvement on both compression ratio and random access speed over the existing solutions. When integrating LeCo into widely-used applications, we observe up to 3.9x speed up in filter-scanning a Parquet file and a 16% increase in Rocksdb's throughput.

Via

Access Paper or Ask Questions

Efficient HDR Reconstruction from Real-World Raw Images

Jun 22, 2023
Qirui Yang, Yihao Liu, Jingyu Yang

Figure 1 for Efficient HDR Reconstruction from Real-World Raw Images

Figure 2 for Efficient HDR Reconstruction from Real-World Raw Images

Figure 3 for Efficient HDR Reconstruction from Real-World Raw Images

Figure 4 for Efficient HDR Reconstruction from Real-World Raw Images

High dynamic range (HDR) imaging is still a significant yet challenging problem due to the limited dynamic range of generic image sensors. Most existing learning-based HDR reconstruction methods take a set of bracketed-exposure sRGB images to extend the dynamic range, and thus are computational- and memory-inefficient by requiring the Image Signal Processor (ISP) to produce multiple sRGB images from the raw ones. In this paper, we propose to broaden the dynamic range from the raw inputs and perform only one ISP processing for the reconstructed HDR raw image. Our key insights are threefold: (1) we design a new computational raw HDR data formation pipeline and construct the first real-world raw HDR dataset, RealRaw-HDR; (2) we develop a lightweight-efficient HDR model, RepUNet, using the structural re-parameterization technique; (3) we propose a plug-and-play motion alignment loss to mitigate motion misalignment between short- and long-exposure images. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in both visual quality and quantitative metrics.

Via

Access Paper or Ask Questions