Text-driven 3D scene generation techniques have made rapid progress in recent years. Their success is mainly attributed to using existing generative models to iteratively perform image warping and inpainting to generate 3D scenes. However, these methods heavily rely on the outputs of existing models, leading to error accumulation in geometry and appearance that prevent the models from being used in various scenarios (e.g., outdoor and unreal scenarios). To address this limitation, we generatively refine the newly generated local views by querying and aggregating global 3D information, and then progressively generate the 3D scene. Specifically, we employ a tri-plane features-based NeRF as a unified representation of the 3D scene to constrain global 3D consistency, and propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior from 2D diffusion model as well as the global 3D information of the current scene. Our extensive experiments demonstrate that, in comparison to previous methods, our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
Video frame interpolation aims to generate high-quality intermediate frames from boundary frames and increase frame rate. While existing linear, symmetric and nonlinear models are used to bridge the gap from the lack of inter-frame motion, they cannot reconstruct real motions. Event cameras, however, are ideal for capturing inter-frame dynamics with their extremely high temporal resolution. In this paper, we propose an event-and-frame-based video frame interpolation method named IDO-VFI that assigns varying amounts of computation for different sub-regions via optical flow guidance. The proposed method first estimates the optical flow based on frames and events, and then decides whether to further calculate the residual optical flow in those sub-regions via a Gumbel gating module according to the optical flow amplitude. Intermediate frames are eventually generated through a concise Transformer-based fusion network. Our proposed method maintains high-quality performance while reducing computation time and computational effort by 10% and 17% respectively on Vimeo90K datasets, compared with a unified process on the whole region. Moreover, our method outperforms state-of-the-art frame-only and frames-plus-events methods on multiple video frame interpolation benchmarks. Codes and models are available at https://github.com/shicy17/IDO-VFI.
Machine learning algorithms have the potential to improve patient outcomes in digital pathology. However, generalization of these tools is currently limited by sensitivity to variations in tissue preparation, staining procedures and scanning equipment that lead to domain shift in digitized slides. To overcome this limitation and improve model generalization, we studied the effectiveness of two Synthetic DOmain-Targeted Augmentation (S-DOTA) methods, namely CycleGAN-enabled Scanner Transform (ST) and targeted Stain Vector Augmentation (SVA), and compared them against the International Color Consortium (ICC) profile-based color calibration (ICC Cal) method and a baseline method using traditional brightness, color and noise augmentations. We evaluated the ability of these techniques to improve model generalization to various tasks and settings: four models, two model types (tissue segmentation and cell classification), two loss functions, six labs, six scanners, and three indications (hepatocellular carcinoma (HCC), nonalcoholic steatohepatitis (NASH), prostate adenocarcinoma). We compared these methods based on the macro-averaged F1 scores on in-distribution (ID) and out-of-distribution (OOD) test sets across multiple domains, and found that S-DOTA methods (i.e., ST and SVA) led to significant improvements over ICC Cal and baseline on OOD data while maintaining comparable performance on ID data. Thus, we demonstrate that S-DOTA may help address generalization due to domain shift in real world applications.
In-time and accurate assessments of on-road vehicle emissions play a central role in urban air quality and health policymaking. However, official insight is hampered by the Inspection/Maintenance (I/M) procedure conducted in the laboratory annually. It not only has a large gap to real-world situations (e.g., meteorological conditions) but also is incapable of regular supervision. Here we build a unique dataset including 103831 light-duty gasoline vehicles, in which on-road remote sensing (ORRS) measurements are linked to the I/M records based on the vehicle identification numbers and license plates. On this basis, we develop an ensemble model framework that integrates three machining learning algorithms, including neural network (NN), extreme gradient boosting (XGBoost), and random forest (RF). We demonstrate that this ensemble model could rapidly assess the vehicle-specific emissions (i.e., CO, HC, and NO). In particular, the model performs quite well for the passing vehicles under normal conditions (i.e., lower VSP (< 18 kw/t), temperature (6 ~ 32 {\deg}C), relative humidity (< 80%), and wind speed (< 5m/s)). Together with the current emission standard, we identify a large number of the dirty (2.33%) or clean (74.92%) vehicles in the real world. Our results show that the ORRS measurements, assisted by the machine-learning-based ensemble model developed here, can realize day-to-day supervision of on-road vehicle-specific emissions. This approach framework provides a valuable opportunity to reform the I/M procedures globally and mitigate urban air pollution deeply.
Several AutoML approaches have been proposed to automate the machine learning (ML) process, such as searching for the ML model architectures and hyper-parameters. However, these AutoML pipelines only focus on improving the learning accuracy of benign samples while ignoring the ML model robustness under adversarial attacks. As ML systems are increasingly being used in a variety of mission-critical applications, improving the robustness of ML systems has become of utmost importance. In this paper, we propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL)--to perform feature selection, aiming to select features that lead to both accurate and robust ML systems. We show that a variation of the 0-1 robust loss can be directly optimized via an RL-based combinatorial search in the feature selection scenario. In addition, we employ heuristics to accelerate the search procedure based on feature scoring metrics, which are mutual information scores, tree-based classifiers feature importance scores, F scores, and Integrated Gradient (IG) scores, as well as their combinations. We conduct extensive experiments and show that the proposed framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples compared with other feature selection methods.
Polarized light microscopy provides high contrast to birefringent specimen and is widely used as a diagnostic tool in pathology. However, polarization microscopy systems typically operate by analyzing images collected from two or more light paths in different states of polarization, which lead to relatively complex optical designs, high system costs or experienced technicians being required. Here, we present a deep learning-based holographic polarization microscope that is capable of obtaining quantitative birefringence retardance and orientation information of specimen from a phase recovered hologram, while only requiring the addition of one polarizer/analyzer pair to an existing holographic imaging system. Using a deep neural network, the reconstructed holographic images from a single state of polarization can be transformed into images equivalent to those captured using a single-shot computational polarized light microscope (SCPLM). Our analysis shows that a trained deep neural network can extract the birefringence information using both the sample specific morphological features as well as the holographic amplitude and phase distribution. To demonstrate the efficacy of this method, we tested it by imaging various birefringent samples including e.g., monosodium urate (MSU) and triamcinolone acetonide (TCA) crystals. Our method achieves similar results to SCPLM both qualitatively and quantitatively, and due to its simpler optical design and significantly larger field-of-view, this method has the potential to expand the access to polarization microscopy and its use for medical diagnosis in resource limited settings.
We present a computational live bacteria detection system that periodically captures coherent microscopy images of bacterial growth inside a 60 mm diameter agar-plate and analyzes these time-lapsed holograms using deep neural networks for rapid detection of bacterial growth and classification of the corresponding species. The performance of our system was demonstrated by rapid detection of Escherichia coli and total coliform bacteria (i.e., Klebsiella aerogenes and Klebsiella pneumoniae subsp. pneumoniae) in water samples. These results were confirmed against gold-standard culture-based results, shortening the detection time of bacterial growth by >12 h as compared to the Environmental Protection Agency (EPA)-approved analytical methods. Our experiments further confirmed that this method successfully detects 90% of bacterial colonies within 7-10 h (and >95% within 12 h) with a precision of 99.2-100%, and correctly identifies their species in 7.6-12 h with 80% accuracy. Using pre-incubation of samples in growth media, our system achieved a limit of detection (LOD) of ~1 colony forming unit (CFU)/L within 9 h of total test time. This computational bacteria detection and classification platform is highly cost-effective (~$0.6 per test) and high-throughput with a scanning speed of 24 cm2/min over the entire plate surface, making it highly suitable for integration with the existing analytical methods currently used for bacteria detection on agar plates. Powered by deep learning, this automated and cost-effective live bacteria detection platform can be transformative for a wide range of applications in microbiology by significantly reducing the detection time, also automating the identification of colonies, without labeling or the need for an expert.
Leveraging multilingual parallel texts to automatically generate paraphrases has drawn much attention as size of high-quality paraphrase corpus is limited. Round-trip translation, also known as the pivoting method, is a typical approach to this end. However, we notice that the pivoting process involves multiple machine translation models and is likely to incur semantic drift during the two-step translations. In this paper, inspired by the Transformer-based language models, we propose a simple and unified paraphrasing model, which is purely trained on multilingual parallel data and can conduct zero-shot paraphrase generation in one step. Compared with the pivoting approach, paraphrases generated by our model is more semantically similar to the input sentence. Moreover, since our model shares the same architecture as GPT (Radford et al., 2018), we are able to pre-train the model on large-scale unparallel corpus, which further improves the fluency of the output sentences. In addition, we introduce the mechanism of denoising auto-encoder (DAE) to improve diversity and robustness of the model. Experimental results show that our model surpasses the pivoting method in terms of relevance, diversity, fluency and efficiency.
We report a framework based on a generative adversarial network (GAN) that performs high-fidelity color image reconstruction using a single hologram of a sample that is illuminated simultaneously by light at three different wavelengths. The trained network learns to eliminate missing-phase-related artifacts, and generates an accurate color transformation for the reconstructed image. Our framework is experimentally demonstrated using lung and prostate tissue sections that are labeled with different histological stains. This framework is envisaged to be applicable to point-of-care histopathology, and presents a significant improvement in the throughput of coherent microscopy systems given that only a single hologram of the specimen is required for accurate color imaging.
In this paper, we study the problem of monotone (weakly) DR-submodular continuous maximization. While previous methods require the gradient information of the objective function, we propose a derivative-free algorithm LDGM for the first time. We define $\beta$ and $\alpha$ to characterize how close a function is to continuous DR-submodulr and submodular, respectively. Under a convex polytope constraint, we prove that LDGM can achieve a $(1-e^{-\beta}-\epsilon)$-approximation guarantee after $O(1/\epsilon)$ iterations, which is the same as the best previous gradient-based algorithm. Moreover, in some special cases, a variant of LDGM can achieve a $((\alpha/2)(1-e^{-\alpha})-\epsilon)$-approximation guarantee for (weakly) submodular functions. We also compare LDGM with the gradient-based algorithm Frank-Wolfe under noise, and show that LDGM can be more robust. Empirical results on budget allocation verify the effectiveness of LDGM.