The main obstacles for the practical deployment of DNA-based data storage platforms are the prohibitively high cost of synthetic DNA and the large number of errors introduced during synthesis. In particular, synthetic DNA products contain both individual oligo (fragment) symbol errors as well as missing DNA oligo errors, with rates that exceed those of modern storage systems by orders of magnitude. These errors can be corrected either through the use of a large number of redundant oligos or through cycles of writing, reading, and rewriting of information that eliminate the errors. Both approaches add to the overall storage cost and are hence undesirable. Here we propose the first method for storing quantized images in DNA that uses signal processing and machine learning techniques to deal with error and cost issues without resorting to the use of redundant oligos or rewriting. Our methods rely on decoupling the RGB channels of images, performing specialized quantization and compression on the individual color channels, and using new discoloration detection and image inpainting techniques. We demonstrate the performance of our approach experimentally on a collection of movie posters stored in DNA.
In recent years, the connections between deep residual networks and first-order Ordinary Differential Equations (ODEs) have been disclosed. In this work, we further bridge the deep neural architecture design with the second-order ODEs and propose a novel reversible neural network, termed as m-RevNet, that is characterized by inserting momentum update to residual blocks. The reversible property allows us to perform backward pass without access to activation values of the forward pass, greatly relieving the storage burden during training. Furthermore, the theoretical foundation based on second-order ODEs grants m-RevNet with stronger representational power than vanilla residual networks, which potentially explains its performance gains. For certain learning scenarios, we analytically and empirically reveal that our m-RevNet succeeds while standard ResNet fails. Comprehensive experiments on various image classification and semantic segmentation benchmarks demonstrate the superiority of our m-RevNet over ResNet, concerning both memory efficiency and recognition performance.
In this work, we propose a camera self-calibration algorithm for generic cameras with arbitrary non-linear distortions. We jointly learn the geometry of the scene and the accurate camera parameters without any calibration objects. Our camera model consists of a pinhole model, a fourth order radial distortion, and a generic noise model that can learn arbitrary non-linear camera distortions. While traditional self-calibration algorithms mostly rely on geometric constraints, we additionally incorporate photometric consistency. This requires learning the geometry of the scene, and we use Neural Radiance Fields (NeRF). We also propose a new geometric loss function, viz., projected ray distance loss, to incorporate geometric consistency for complex non-linear camera models. We validate our approach on standard real image datasets and demonstrate that our model can learn the camera intrinsics and extrinsics (pose) from scratch without COLMAP initialization. Also, we show that learning accurate camera models in a differentiable manner allows us to improve PSNR over baselines. Our module is an easy-to-use plugin that can be applied to NeRF variants to improve performance. The code and data are currently available at https://github.com/POSTECH-CVLab/SCNeRF
For steel product manufacturing in indoor factories, steel defect detection is important for quality control. For example, a steel sheet is extremely delicate, and must be accurately inspected. However, to maintain the painted steel parts of the infrastructure around a severe outdoor environment, corrosion detection is critical for predictive maintenance. In this paper, we propose a general-purpose application for steel anomaly detection that consists of the following four components. The first, a learner, is a unit image classification network to determine whether the region of interest or background has been recognised, after dividing the original large sized image into 256 square unit images. The second, an extractor, is a discriminator feature encoder based on a pre-trained steel generator with a patch generative adversarial network discriminator(GAN). The third, an anomaly detector, is a one-class support vector machine(SVM) to predict the anomaly score using the discriminator feature. The fourth, an indicator, is an anomalous probability map used to visually explain the anomalous features. Furthermore, we demonstrated our method through the inspection of steel sheet defects with 13,774 unit images using high-speed cameras, and painted steel corrosion with 19,766 unit images based on an eye inspection of the photographs. Finally, we visualise anomalous feature maps of steel using a strip and painted steel inspection dataset
In recent years, there has been a growing interest in realizing methodologies to integrate more and more computation at the level of the image sensor. The rising trend has seen an increased research interest in developing novel event cameras that can facilitate CNN computation directly in the sensor. However, event-based cameras are not generally available in the market, limiting performance exploration on high-level models and algorithms. This paper presents an event camera simulator that can be a potent tool for hardware design prototyping, parameter optimization, attention-based innovative algorithm development, and benchmarking. The proposed simulator implements a distributed computation model to identify relevant regions in an image frame. Our simulator's relevance computation model is realized as a collection of modules and performs computations in parallel. The distributed computation model is configurable, making it highly useful for design space exploration. The Rendering engine of the simulator samples frame-regions only when there is a new event. The simulator closely emulates an image processing pipeline similar to that of physical cameras. Our experimental results show that the simulator can effectively emulate event vision with low overheads.
Accurate characterization of in utero human brain maturation is critical as it involves complex and interconnected structural and functional processes that may influence health later in life. Magnetic resonance imaging is a powerful tool to investigate equivocal neurological patterns during fetal development. However, the number of acquisitions of satisfactory quality available in this cohort of sensitive subjects remains scarce, thus hindering the validation of advanced image processing techniques. Numerical phantoms can mitigate these limitations by providing a controlled environment with a known ground truth. In this work, we present FaBiAN, an open-source Fetal Brain magnetic resonance Acquisition Numerical phantom that simulates clinical T2-weighted fast spin echo sequences of the fetal brain. This unique tool is based on a general, flexible and realistic setup that includes stochastic fetal movements, thus providing images of the fetal brain throughout maturation comparable to clinical acquisitions. We demonstrate its value to evaluate the robustness and optimize the accuracy of an algorithm for super-resolution fetal brain magnetic resonance imaging from simulated motion-corrupted 2D low-resolution series as compared to a synthetic high-resolution reference volume. We also show that the images generated can complement clinical datasets to support data-intensive deep learning methods for fetal brain tissue segmentation.
In 1-bit compressive sensing, each measurement is quantized to a single bit, namely the sign of a linear function of an unknown vector, and the goal is to accurately recover the vector. While it is most popular to assume a standard Gaussian sensing matrix for 1-bit compressive sensing, using structured sensing matrices such as partial Gaussian circulant matrices is of significant practical importance due to their faster matrix operations. In this paper, we provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing with randomly signed partial Gaussian circulant matrices and generative models. Under suitable assumptions, we match guarantees that were previously only known to hold for i.i.d.~Gaussian matrices that require significantly more computation. We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our theoretical results.
In a regular open set detection problem, samples of known classes (also called closed set classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and (2) also detect samples that do not belong to any of the known classes (we say they belong to some unknown or open set classes). This paper studies the problem of zero-shot open-set detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel and yet simple method (called ZO-CLIP) to solve the problem. ZO-CLIP builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained multi-modal model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate some candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot open set detection. Experimental results on 5 benchmark datasets for open set detection confirm that ZO-CLIP outperforms the baselines by a large margin.
Image enhancement aims at improving the information content of original image for a specific purpose. This purpose could be for visual interpretation or for effective extraction of required details. Nevertheless, some acquired images are often associated with pixels of low dynamic range and as such result in low contrast images. Enhancing the contrast therefore tends to increase the dynamic range of the gray levels in the acquired image so as to span the full intensity range. Techniques such as Histogram Equalization (HE) and fuzzy technique can be adopted for contrast enhancement. HE adjusts the contrast of an input image by modifying the intensity distribution of its histogram. It is characterized by providing a global approach to image enhancement, computationally fast and easy to implement approach but can introduce unnatural artifacts and other undesirable elements to the resulting image. Fuzzy technique on its part enhances image by mapping the image gray level intensities into a fuzzy plane using membership functions; modifying the membership functions as desired and mapping back into the gray level plane. Thus, details at desired areas can be enhanced at the expense of increase in computational cost. This paper explores the effect of the use of HE and fuzzy technique to enhance low contrast images. Their performances are evaluated using the Mean squared error (MSE), Peak to signal noise ratio (PSNR), entropy and Absolute mean brightness error (AMBE).
Crop and weed monitoring is an important challenge for agriculture and food production nowadays. Thanks to recent advances in data acquisition and computation technologies, agriculture is evolving to a more smart and precision farming to meet with the high yield and high quality crop production. Classification and recognition in Unmanned Aerial Vehicles (UAV) images are important phases for crop monitoring. Advances in deep learning models relying on Convolutional Neural Network (CNN) have achieved high performances in image classification in the agricultural domain. Despite the success of this architecture, CNN still faces many challenges such as high computation cost, the need of large labelled datasets, ... Natural language processing's transformer architecture can be an alternative approach to deal with CNN's limitations. Making use of the self-attention paradigm, Vision Transformer (ViT) models can achieve competitive or better results without applying any convolution operations. In this paper, we adopt the self-attention mechanism via the ViT models for plant classification of weeds and crops: red beet, off-type beet (green leaves), parsley and spinach. Our experiments show that with small set of labelled training data, ViT models perform better compared to state-of-the-art CNN-based models EfficientNet and ResNet, with a top accuracy of 99.8\% achieved by the ViT model.