Various volatile aerosols have been associated with adverse health effects; however, characterization of these aerosols is challenging due to their dynamic nature. Here we present a method that directly measures the volatility of particulate matter (PM) using computational microscopy and deep learning. This method was applied to aerosols generated by electronic cigarettes (e-cigs), which vaporize a liquid mixture (e-liquid) that mainly consists of propylene glycol (PG), vegetable glycerin (VG), nicotine, and flavoring compounds. E-cig generated aerosols were recorded by a field-portable computational microscope, using an impaction-based air sampler. A lensless digital holographic microscope inside this mobile device continuously records the inline holograms of the collected particles. A deep learning-based algorithm is used to automatically reconstruct the microscopic images of e-cig generated particles from their holograms, and rapidly quantify their volatility. To evaluate the effects of e-liquid composition on aerosol dynamics, we measured the volatility of the particles generated by flavorless, nicotine-free e-liquids with various PG/VG volumetric ratios, revealing a negative correlation between the particles' volatility and the volumetric ratio of VG in the e-liquid. For a given PG/VG composition, the addition of nicotine dominated the evaporation dynamics of the e-cig aerosol and the aforementioned negative correlation was no longer observed. We also revealed that flavoring additives in e-liquids significantly decrease the volatility of e-cig aerosol. The presented holographic volatility measurement technique and the associated mobile device might provide new insights on the volatility of e-cig generated particles and can be applied to characterize various volatile PM.
Light detection and ranging (LiDAR) has been widely used in autonomous driving and large-scale manufacturing. Although state-of-the-art scanning LiDAR can perform long-range three-dimensional imaging, the frame rate is limited by both round-trip delay and the beam steering speed, hindering the development of high-speed autonomous vehicles. For hundred-meter level ranging applications, a several-time speedup is highly desirable. Here, we uniquely combine fiber-based encoders with wavelength-division multiplexing devices to implement all-optical time-encoding on the illumination light. Using this method, parallel detection and fast inertia-free spectral scanning can be achieved simultaneously with single-pixel detection. As a result, the frame rate of a scanning LiDAR can be multiplied with scalability. We demonstrate a 4.4-fold speedup for a maximum 75-m detection range, compared with a time-of-flight-limited laser ranging system. This approach has the potential to improve the velocity of LiDAR-based autonomous vehicles to the regime of hundred kilometers per hour and open up a new paradigm for ultrafast-frame-rate LiDAR imaging.
The continuous speech separation (CSS) is a task to separate the speech sources from a long, partially overlapped recording, which involves a varying number of speakers. A straightforward extension of conventional utterance-level speech separation to the CSS task is to segment the long recording with a size-fixed window and process each window separately. Though effective, this extension fails to model the long dependency in speech and thus leads to sub-optimum performance. The recent proposed dual-path modeling could be a remedy to this problem, thanks to its capability in jointly modeling the cross-window dependency and the local-window processing. In this work, we further extend the dual-path modeling framework for CSS task. A transformer-based dual-path system is proposed, which integrates transform layers for global modeling. The proposed models are applied to LibriCSS, a real recorded multi-talk dataset, and consistent WER reduction can be observed in the ASR evaluation for separated speech. Also, a dual-path transformer equipped with convolutional layers is proposed. It significantly reduces the computation amount by 30% with better WER evaluation. Furthermore, the online processing dual-path models are investigated, which shows 10% relative WER reduction compared to the baseline.
Digital holography is one of the most widely used label-free microscopy techniques in biomedical imaging. Recovery of the missing phase information of a hologram is an important step in holographic image reconstruction. Here we demonstrate a convolutional recurrent neural network (RNN) based phase recovery approach that uses multiple holograms, captured at different sample-to-sensor distances to rapidly reconstruct the phase and amplitude information of a sample, while also performing autofocusing through the same network. We demonstrated the success of this deep learning-enabled holography method by imaging microscopic features of human tissue samples and Papanicolaou (Pap) smears. These results constitute the first demonstration of the use of recurrent neural networks for holographic imaging and phase recovery, and compared with existing methods, the presented approach improves the reconstructed image quality, while also increasing the depth-of-field and inference speed.
Conventional Supervised Learning approaches focus on the mapping from input features to output labels. After training, the learnt models alone are adapted onto testing features to predict testing labels in isolation, with training data wasted and their associations ignored. To take full advantage of the vast number of training data and their associations, we propose a novel learning paradigm called Memory-Associated Differential (MAD) Learning. We first introduce an additional component called Memory to memorize all the training data. Then we learn the differences of labels as well as the associations of features in the combination of a differential equation and some sampling methods. Finally, in the evaluating phase, we predict unknown labels by inferencing from the memorized facts plus the learnt differences and associations in a geometrically meaningful manner. We gently build this theory in unary situations and apply it on Image Recognition, then extend it into Link Prediction as a binary situation, in which our method outperforms strong state-of-the-art baselines on three citation networks and ogbl-ddi dataset.
Ultra-lightweight model design is an important topic for the deployment of existing speech enhancement and source separation techniques on low-resource platforms. Various lightweight model design paradigms have been proposed in recent years; however, most models still suffer from finding a balance between model size, model complexity, and model performance. In this paper, we propose the group communication with context codec (GC3) design to decrease both model size and complexity without sacrificing the model performance. Group communication splits a high-dimensional feature into groups of low-dimensional features and applies a module to capture the inter-group dependency. A model can then be applied to the groups in parallel with a significantly smaller width. A context codec is applied to decrease the length of a sequential feature, where a context encoder compresses the temporal context of local features into a single feature representing the global characteristics of the context, and a context decoder decompresses the transformed global features back to the context features. Experimental results show that GC3 can achieve on par or better performance than a wide range of baseline architectures with as small as 2.5% model size.
This work presented a new drone-based face detection dataset Drone LAMS in order to solve issues of low performance of drone-based face detection in scenarios such as large angles which was a predominant working condition when a drone flies high. The proposed dataset captured images from 261 videos with over 43k annotations and 4.0k images with pitch or yaw angle in the range of -90{\deg} to 90{\deg}. Drone LAMS showed significant improvement over currently available drone-based face detection datasets in terms of detection performance, especially with large pitch and yaw angle. Detailed analysis of how key factors, such as duplication rate, annotation method, etc., impact dataset performance was also provided to facilitate further usage of a drone on face detection.
Polarized light microscopy provides high contrast to birefringent specimen and is widely used as a diagnostic tool in pathology. However, polarization microscopy systems typically operate by analyzing images collected from two or more light paths in different states of polarization, which lead to relatively complex optical designs, high system costs or experienced technicians being required. Here, we present a deep learning-based holographic polarization microscope that is capable of obtaining quantitative birefringence retardance and orientation information of specimen from a phase recovered hologram, while only requiring the addition of one polarizer/analyzer pair to an existing holographic imaging system. Using a deep neural network, the reconstructed holographic images from a single state of polarization can be transformed into images equivalent to those captured using a single-shot computational polarized light microscope (SCPLM). Our analysis shows that a trained deep neural network can extract the birefringence information using both the sample specific morphological features as well as the holographic amplitude and phase distribution. To demonstrate the efficacy of this method, we tested it by imaging various birefringent samples including e.g., monosodium urate (MSU) and triamcinolone acetonide (TCA) crystals. Our method achieves similar results to SCPLM both qualitatively and quantitatively, and due to its simpler optical design and significantly larger field-of-view, this method has the potential to expand the access to polarization microscopy and its use for medical diagnosis in resource limited settings.
Recent advances in deep learning have been providing non-intuitive solutions to various inverse problems in optics. At the intersection of machine learning and optics, diffractive networks merge wave-optics with deep learning to design task-specific elements to all-optically perform various tasks such as object classification and machine vision. Here, we present a diffractive network, which is used to shape an arbitrary broadband pulse into a desired optical waveform, forming a compact pulse engineering system. We experimentally demonstrate the synthesis of square pulses with different temporal-widths by manufacturing passive diffractive layers that collectively control both the spectral amplitude and the phase of an input terahertz pulse. Our results constitute the first demonstration of direct pulse shaping in terahertz spectrum, where a complex-valued spectral modulation function directly acts on terahertz frequencies. Furthermore, a Lego-like physical transfer learning approach is presented to illustrate pulse-width tunability by replacing part of an existing network with newly trained diffractive layers, demonstrating its modularity. This learning-based diffractive pulse engineering framework can find broad applications in e.g., communications, ultra-fast imaging and spectroscopy.
Machine vision systems mostly rely on lens-based optical imaging architectures that relay the spatial information of objects onto high pixel-count opto-electronic sensor arrays, followed by digital processing of this information. Here, we demonstrate an optical machine vision system that uses trainable matter in the form of diffractive layers to transform and encode the spatial information of objects into the power spectrum of the diffracted light, which is used to perform optical classification of objects with a single-pixel spectroscopic detector. Using a time-domain spectroscopy setup with a plasmonic nanoantenna-based detector, we experimentally validated this framework at terahertz spectrum to optically classify the images of handwritten digits by detecting the spectral power of the diffracted light at ten distinct wavelengths, each representing one class/digit. We also report the coupling of this spectral encoding achieved through a diffractive optical network with a shallow electronic neural network, separately trained to reconstruct the images of handwritten digits based on solely the spectral information encoded in these ten distinct wavelengths within the diffracted light. These reconstructed images demonstrate task-specific image decompression and can also be cycled back as new inputs to the same diffractive network to improve its optical object classification. This unique framework merges the power of deep learning with the spatial and spectral processing capabilities of trainable matter, and can also be extended to other spectral-domain measurement systems to enable new 3D imaging and sensing modalities integrated with spectrally encoded classification tasks performed through diffractive networks.