Large-scale road surface reconstruction is becoming important to autonomous driving systems, as it provides valuable training and testing data effectively. In this paper, we introduce a simple yet efficient method, RoMe, for large-scale Road surface reconstruction via Mesh representations. To simplify the problem, RoMe decomposes a 3D road surface into a triangle-mesh and a multilayer perception network to model the road elevation implicitly. To retain fine surface details, each mesh vertex has two extra attributes, namely color and semantics. To improve the efficiency of RoMe in large-scale environments, a novel waypoint sampling method is introduced. As such, RoMe can properly preserve road surface details, with only linear computational complexity to road areas. In addition, to improve the accuracy of RoMe, extrinsics optimization is proposed to mitigate inaccurate extrinsic calibrations. Experimental results on popular public datasets also demonstrate the high efficiency and accuracy of RoMe.
ChatGPT has stimulated the research boom in the field of large language models. In this paper, we assess the capabilities of ChatGPT from four perspectives including Performance, Evaluation Criteria, Robustness and Error Types. Specifically, we first evaluate ChatGPT's performance on 17 datasets with 14 IE sub-tasks under the zero-shot, few-shot and chain-of-thought scenarios, and find a huge performance gap between ChatGPT and SOTA results. Next, we rethink this gap and propose a soft-matching strategy for evaluation to more accurately reflect ChatGPT's performance. Then, we analyze the robustness of ChatGPT on 14 IE sub-tasks, and find that: 1) ChatGPT rarely outputs invalid responses; 2) Irrelevant context and long-tail target types greatly affect ChatGPT's performance; 3) ChatGPT cannot understand well the subject-object relationships in RE task. Finally, we analyze the errors of ChatGPT, and find that "unannotated spans" is the most dominant error type. This raises concerns about the quality of annotated data, and indicates the possibility of annotating data with ChatGPT. The data and code are released at Github site.
Document-level relation extraction faces two overlooked challenges: long-tail problem and multi-label problem. Previous work focuses mainly on obtaining better contextual representations for entity pairs, hardly address the above challenges. In this paper, we analyze the co-occurrence correlation of relations, and introduce it into DocRE task for the first time. We argue that the correlations can not only transfer knowledge between data-rich relations and data-scarce ones to assist in the training of tailed relations, but also reflect semantic distance guiding the classifier to identify semantically close relations for multi-label entity pairs. Specifically, we use relation embedding as a medium, and propose two co-occurrence prediction sub-tasks from both coarse- and fine-grained perspectives to capture relation correlations. Finally, the learned correlation-aware embeddings are used to guide the extraction of relational facts. Substantial experiments on two popular DocRE datasets are conducted, and our method achieves superior results compared to baselines. Insightful analysis also demonstrates the potential of relation correlations to address the above challenges.
Accurate localization ability is fundamental in autonomous driving. Traditional visual localization frameworks approach the semantic map-matching problem with geometric models, which rely on complex parameter tuning and thus hinder large-scale deployment. In this paper, we propose BEV-Locator: an end-to-end visual semantic localization neural network using multi-view camera images. Specifically, a visual BEV (Birds-Eye-View) encoder extracts and flattens the multi-view images into BEV space. While the semantic map features are structurally embedded as map queries sequence. Then a cross-model transformer associates the BEV features and semantic map queries. The localization information of ego-car is recursively queried out by cross-attention modules. Finally, the ego pose can be inferred by decoding the transformer outputs. We evaluate the proposed method in large-scale nuScenes and Qcraft datasets. The experimental results show that the BEV-locator is capable to estimate the vehicle poses under versatile scenarios, which effectively associates the cross-model information from multi-view images and global semantic maps. The experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$^\circ$ in lateral, longitudinal translation and heading angle degree.
Artificial intelligence has recently been widely used in computational imaging. The deep neural network (DNN) improves the signal-to-noise ratio of the retrieved images, whose quality is otherwise corrupted due to the low sampling ratio or noisy environments. This work proposes a new computational imaging scheme based on the sequence transduction mechanism with the transformer network. The simulation database assists the network in achieving signal translation ability. The experimental single-pixel detector's signal will be `translated' into a 2D image in an end-to-end manner. High-quality images with no background noise can be retrieved at a sampling ratio as low as 2%. The illumination patterns can be either well-designed speckle patterns for sub-Nyquist imaging or random speckle patterns. Moreover, our method is robust to noise interference. This translation mechanism opens a new direction for DNN-assisted ghost imaging and can be used in various computational imaging scenarios.
For Head and Neck Cancers (HNC) patient management, automatic gross tumor volume (GTV) segmentation and accurate pre-treatment cancer recurrence prediction are of great importance to assist physicians in designing personalized management plans, which have the potential to improve the treatment outcome and quality of life for HNC patients. In this paper, we developed an automated primary tumor (GTVp) and lymph nodes (GTVn) segmentation method based on combined pre-treatment positron emission tomography/computed tomography (PET/CT) scans of HNC patients. We extracted radiomics features from the segmented tumor volume and constructed a multi-modality tumor recurrence-free survival (RFS) prediction model, which fused the prediction results from separate CT radiomics, PET radiomics, and clinical models. We performed 5-fold cross-validation to train and evaluate our methods on the MICCAI 2022 HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR) dataset. The ensemble prediction results on the testing cohort achieved Dice scores of 0.77 and 0.73 for GTVp and GTVn segmentation, respectively, and a C-index value of 0.67 for RFS prediction. The code is publicly available (https://github.com/wangkaiwan/HECKTOR-2022-AIRT). Our team's name is AIRT.
The quality of quantitative differential phase contrast reconstruction (qDPC) can be severely degenerated by the mismatch of the background of two oblique illuminated images, yielding problematic phase recovery results. These background mismatches may result from illumination patterns, inhomogeneous media distribution, or other defocusing layers. In previous reports, the background is manually calibrated which is time-consuming, and unstable, since new calibrations are needed if any modification to the optical system was made. It is also impossible to calibrate the background from the defocusing layers, or for high dynamic observation as the background changes over time. To tackle the mismatch of background and increases the experimental robustness, we propose the Retinex-qDPC in which we use the images edge features as data fidelity term yielding L2-Retinex-qDPC and L1-Retinex-qDPC for high background-robustness qDPC reconstruction. The split Bregman method is used to solve the L1-Retinex DPC. We compare both Retinex-qDPC models against state-of-the-art DPC reconstruction algorithms including total-variation regularized qDPC, and isotropic-qDPC using both simulated and experimental data. Results show that the Retinex qDPC can significantly improve the phase recovery quality by suppressing the impact of mismatch background. Within, the L1-Retinex-qDPC is better than L2-Retinex and other state-of-the-art DPC algorithms. In general, the Retinex-qDPC increases the experimental robustness against background illumination without any modification of the optical system, which will benefit all qDPC applications.
In this paper, we present a method for speckle pattern design using deep learning. The speckle patterns possess unique features after experiencing convolutions in Speckle-Net, our well-designed framework for speckle pattern generation. We then apply our method to the computational ghost imaging system. The standard deep learning-assisted ghost imaging methods use the network to recognize the reconstructed objects or imaging algorithms. In contrast, this innovative application optimizes the illuminating speckle patterns via Speckle-Net with specific sampling ratios. Our method, therefore, outperforms the other techniques for ghost imaging, particularly its ability to retrieve high-quality images with extremely low sampling ratios. It opens a new route towards nontrivial speckle generation by referring to a standard loss function on specified objectives with the modified deep neural network. It also has great potential for applications in the fields of dynamic speckle illumination microscopy, structured illumination microscopy, x-ray imaging, photo-acoustic imaging, and optical lattices.
Optical imaging through scattering media is a long-standing challenge. Although many approaches have been developed to focus light or image objects through scattering media, they are either invasive, restricted to stationary or slowly-moving media, or require high-resolution cameras and complex algorithms to retrieve the images. Here we introduce a computational imaging technique that can overcome these restrictions by exploiting spatial-temporal encoded patterns (STEP). We present non-invasive imaging through scattering media with a single-pixel photodetector. We show that the method is insensitive to the motions of media. We further demonstrate that our image reconstruction algorithm is much more efficient than correlation-based algorithms for single-pixel imaging, which may allow fast imaging in currently unreachable scenarios.