Online construction of open-ended language scenes is crucial for robotic applications, where open-vocabulary interactive scene understanding is required. Recently, neural implicit representation has provided a promising direction for online interactive mapping. However, implementing open-vocabulary scene understanding capability into online neural implicit mapping still faces three challenges: lack of local scene updating ability, blurry spatial hierarchical semantic segmentation and difficulty in maintaining multi-view consistency. To this end, we proposed O2V-mapping, which utilizes voxel-based language and geometric features to create an open-vocabulary field, thus allowing for local updates during online training process. Additionally, we leverage a foundational model for image segmentation to extract language features on object-level entities, achieving clear segmentation boundaries and hierarchical semantic features. For the purpose of preserving consistency in 3D object properties across different viewpoints, we propose a spatial adaptive voxel adjustment mechanism and a multi-view weight selection method. Extensive experiments on open-vocabulary object localization and semantic segmentation demonstrate that O2V-mapping achieves online construction of language scenes while enhancing accuracy, outperforming the previous SOTA method.
Online dense mapping of urban scenes forms a fundamental cornerstone for scene understanding and navigation of autonomous vehicles. Recent advancements in mapping methods are mainly based on NeRF, whose rendering speed is too slow to meet online requirements. 3D Gaussian Splatting (3DGS), with its rendering speed hundreds of times faster than NeRF, holds greater potential in online dense mapping. However, integrating 3DGS into a street-view dense mapping framework still faces two challenges, including incomplete reconstruction due to the absence of geometric information beyond the LiDAR coverage area and extensive computation for reconstruction in large urban scenes. To this end, we propose HGS-Mapping, an online dense mapping framework in unbounded large-scale scenes. To attain complete construction, our framework introduces Hybrid Gaussian Representation, which models different parts of the entire scene using Gaussians with distinct properties. Furthermore, we employ a hybrid Gaussian initialization mechanism and an adaptive update method to achieve high-fidelity and rapid reconstruction. To the best of our knowledge, we are the first to integrate Gaussian representation into online dense mapping of urban scenes. Our approach achieves SOTA reconstruction accuracy while only employing 66% number of Gaussians, leading to 20% faster reconstruction speed.
Anatomy-specific RF receive coil arrays routinely adopted in magnetic resonance imaging (MRI) for signal acquisition, are commonly burdened by their bulky, fixed, and rigid configurations, which may impose patient discomfort, bothersome positioning, and suboptimal sensitivity in certain situations. Herein, leveraging coaxial cables' inherent flexibility and electric field confining property, for the first time, we present wireless, ultra-lightweight, coaxially-shielded MRI coils achieving a signal-to-noise ratio (SNR) comparable to or surpassing that of commercially available cutting-edge receive coil arrays with the potential for improved patient comfort, ease of implementation, and significantly reduced costs. The proposed coils demonstrate versatility by functioning both independently in form-fitting configurations, closely adapting to relatively small anatomical sites, and collectively by inductively coupling together as metamaterials, allowing for extension of the field-of-view of their coverage to encompass larger anatomical regions without compromising coil sensitivity. The wireless, coaxially-shielded MRI coils reported herein pave the way toward next generation MRI coils.
Recent advancements in metamaterials have yielded the possibility of a wireless solution to improve signal-to-noise ratio (SNR) in magnetic resonance imaging (MRI). Unlike traditional closely packed local coil arrays with rigid designs and numerous components, these lightweight, cost-effective metamaterials eliminate the need for radio frequency (RF) cabling, baluns, adapters, and interfaces. However, their clinical adoption has been limited by their low sensitivity, bulky physical footprint, and limited, specific use cases. Herein, we introduce a wearable metamaterial developed using commercially available coaxial cable, designed for a 3.0 T MRI system. This metamaterial inherits the coaxially-shielded structure of its constituent coaxial cable, effectively containing the electric field within the cable, thereby mitigating the electric coupling to its loading while ensuring safer clinical adoption, lower signal loss, and resistance to frequency shifts. Weighing only 50g, the metamaterial maximizes its sensitivity by conforming to the anatomical region of interest. MRI images acquired using this metamaterial with various pulse sequences demonstrate an up to 2-fold SNR enhancement when compared to a state-of-the-art 16-channel knee coil. This work introduces a novel paradigm for constructing metamaterials in the MRI environment, paving the way for the development of next-generation wireless MRI technology.
One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs by incorporating finite-state constraints during decoding; these eliminate invalid outputs without requiring additional training. We discover that LLMs are adaptable to transcripts containing ASR errors through prompt-tuning or fine-tuning. Relative to a state-of-the-art automatic punctuation baseline, our best LLM improves the average BLEU by 2.9 points for English-German, English-Spanish, and English-Arabic TED talk translation in 9 test sets, just by improving segmentation.
Ongoing effort has been devoted to applying metamaterials to boost the imaging performance of magnetic resonance imaging owing to their unique capacity for electromagnetic field confinement and enhancement. However, there are still major obstacles to widespread clinical adoption of conventional metamaterials due to several notable restrictions, namely: their typically bulky and rigid structures, deviations in their optimal resonance frequency, and their inevitable interference with the transmission RF field in MRI. Herein, we address these restrictions and report a conformal, smart metamaterial, which may not only be readily tuned to achieve the desired, precise frequency match with MRI by a controlling circuit, but is also capable of selectively amplifying the magnetic field during the RF reception phase by sensing the excitation signal strength passively, thereby remaining off during the RF transmission phase and thereby ensuring its optimal performance when applied to MRI as an additive technology. By addressing a host of current technological challenges, the metamaterial presented herein paves the way toward the wide-ranging utilization of metamaterials in clinical MRI, thereby translating this promising technology to the MRI bedside.
Deep learning models have demonstrated impressive performance in various domains. However, the prolonged training time of these models remains a critical problem. Manually designed parallel training strategies could enhance efficiency but require considerable time and deliver little flexibility. Hence, automatic parallelism is proposed to automate the parallel strategy searching process. Even so, existing approaches suffer from sub-optimal strategy space because they treat automatic parallelism as two independent stages, namely inter- and intra-layer parallelism. To address this issue, we propose UniAP, which utilizes mixed integer quadratic programming to unify inter- and intra-layer automatic parallelism. To the best of our knowledge, UniAP is the first work to unify these two categories to search for a globally optimal strategy. The experimental results show that UniAP outperforms state-of-the-art methods by up to 1.70$\times$ in throughput and reduces strategy searching time by up to 16$\times$ across four Transformer-like models.
Time-periodic form or expression is a ubiquitous natural and man-made phenomenon observable in all the scientific and engineering disciplines. In this article, we propose a theory of periodic sequence (TPS), which can be formulated as a foundational theory for computational sciences and engineering, to transform arbitrary time-periodic electromagnetic (EM) problems into a computational space with mapped discrete events, which is characterized in neither frequency domain nor time domain. Within the TPS framework, periodic-sequential Maxwell's curl equations are decomposed and decoupled to independent and paralleled instances via designated mappings. The fundamental solutions and mapped responses of EM periodic sequences are elucidated, and corroborated by RF/microwave measurements. The nature of outstanding computational parallelism and the unique frequency-independent property make TPS a promising methodology for computational electromagnetics such as analysis of high-speed signal integrity and broadband RF transmission.
We introduce LAST, a LAttice-based Speech Transducer library in JAX. With an emphasis on flexibility, ease-of-use, and scalability, LAST implements differentiable weighted finite state automaton (WFSA) algorithms needed for training \& inference that scale to a large WFSA such as a recognition lattice over the entire utterance. Despite these WFSA algorithms being well-known in the literature, new challenges arise from performance characteristics of modern architectures, and from nuances in automatic differentiation. We describe a suite of generally applicable techniques employed in LAST to address these challenges, and demonstrate their effectiveness with benchmarks on TPUv3 and V100 GPU.
Machine learning enables the development of new, supplemental, and empowering tools that can either expand existing technologies or invent new ones. In education, space exists for a tool that supports generic student course review formats to organize and recapitulate students' views on the pedagogical practices to which they are exposed. Often, student opinions are gathered with a general comment section that solicits their feelings towards their courses without polling specifics about course contents. Herein, we show a novel approach to summarizing and organizing students' opinions via analyzing their sentiment towards a course as a function of the language/vocabulary used to convey their opinions about a class and its contents. This analysis is derived from their responses to a general comment section encountered at the end of post-course review surveys. This analysis, accomplished with Python, LaTeX, and Google's Natural Language API, allows for the conversion of unstructured text data into both general and topic-specific sub-reports that convey students' views in a unique, novel way.