Abstract:In recent years, significant advancements have been made in text-driven 3D content generation. However, several challenges remain. In practical applications, users often provide extremely simple text inputs while expecting high-quality 3D content. Generating optimal results from such minimal text is a difficult task due to the strong dependency of text-to-3D models on the quality of input prompts. Moreover, the generation process exhibits high variability, making it difficult to control. Consequently, multiple iterations are typically required to produce content that meets user expectations, reducing generation efficiency. To address this issue, we propose GPT-4V for self-optimization, which significantly enhances the efficiency of generating satisfactory content in a single attempt. Furthermore, the controllability of text-to-3D generation methods has not been fully explored. Our approach enables users to not only provide textual descriptions but also specify additional conditions, such as style, edges, scribbles, poses, or combinations of multiple conditions, allowing for more precise control over the generated 3D content. Additionally, during training, we effectively integrate multi-view information, including multi-view depth, masks, features, and images, to address the common Janus problem in 3D content generation. Extensive experiments demonstrate that our method achieves robust generalization, facilitating the efficient and controllable generation of high-quality 3D content.
Abstract:Current object detectors often suffer significant perfor-mance degradation in real-world applications when encountering distributional shifts. Consequently, the out-of-distribution (OOD) generalization capability of object detectors has garnered increasing attention from researchers. Despite this growing interest, there remains a lack of a large-scale, comprehensive dataset and evaluation benchmark with fine-grained annotations tailored to assess the OOD generalization on more intricate tasks like object detection and grounding. To address this gap, we introduce COUNTS, a large-scale OOD dataset with object-level annotations. COUNTS encompasses 14 natural distributional shifts, over 222K samples, and more than 1,196K labeled bounding boxes. Leveraging COUNTS, we introduce two novel benchmarks: O(OD)2 and OODG. O(OD)2 is designed to comprehensively evaluate the OOD generalization capabilities of object detectors by utilizing controlled distribution shifts between training and testing data. OODG, on the other hand, aims to assess the OOD generalization of grounding abilities in multimodal large language models (MLLMs). Our findings reveal that, while large models and extensive pre-training data substantially en hance performance in in-distribution (IID) scenarios, significant limitations and opportunities for improvement persist in OOD contexts for both object detectors and MLLMs. In visual grounding tasks, even the advanced GPT-4o and Gemini-1.5 only achieve 56.7% and 28.0% accuracy, respectively. We hope COUNTS facilitates advancements in the development and assessment of robust object detectors and MLLMs capable of maintaining high performance under distributional shifts.
Abstract:At present, MRI scans are performed inside a fully-enclosed RF shielding room, posing stringent installation requirement and unnecessary patient discomfort. We aim to develop an electromagnetic interference (EMI) cancellation strategy for MRI with no or incomplete RF shielding. In this study, a simultaneous sensing and deep learning driven EMI cancellation strategy is presented to model, predict and remove EMI signals from acquired MRI signals. Specifically, during each MRI scan, separate EMI sensing coils placed in various spatial locations are utilized to simultaneously sample environmental and internal EMI signals within two windows (for both conventional MRI signal acquisition and EMI characterization acquisition). Then a CNN model is trained using the EMI characterization data to relate EMI signals detected by EMI sensing coils to EMI signals in MRI receive coil. This model is utilized to retrospectively predict and remove EMI signals components detected by MRI receive coil during the MRI signal acquisition window. We implemented and demonstrated this strategy for various EMI sources on a mobile ultra-low-field 0.055 T permanent magnet MRI scanner and a 1.5 T superconducting magnet MRI scanner with no or incomplete RF shielding. Our experimental results demonstrate that the method is highly effective and robust in predicting and removing various EMI sources from both external environments and internal scanner electronics at both 0.055 T (2.3 MHz) and 1.5 T (64 MHz), producing final image signal-to-noise ratios that are comparable to those obtained using a fully enclosed RF shielding. Our proposed strategy enables MRI operation with no or incomplete RF shielding, alleviating MRI installation and operational requirements. It is also potentially applicable to other scenarios of accurate RF signal detection or discrimination in presence of external and internal EMI or RF sources.