Event cameras have emerged as a promising vision sensor in recent years due to their unparalleled temporal resolution and dynamic range. While registration of 2D RGB images to 3D point clouds is a long-standing problem in computer vision, no prior work studies 2D-3D registration for event cameras. To this end, we propose E2PNet, the first learning-based method for event-to-point cloud registration. The core of E2PNet is a novel feature representation network called Event-Points-to-Tensor (EP2T), which encodes event data into a 2D grid-shaped feature tensor. This grid-shaped feature enables matured RGB-based frameworks to be easily used for event-to-point cloud registration, without changing hyper-parameters and the training procedure. EP2T treats the event input as spatio-temporal point clouds. Unlike standard 3D learning architectures that treat all dimensions of point clouds equally, the novel sampling and information aggregation modules in EP2T are designed to handle the inhomogeneity of the spatial and temporal dimensions. Experiments on the MVSEC and VECtor datasets demonstrate the superiority of E2PNet over hand-crafted and other learning-based methods. Compared to RGB-based registration, E2PNet is more robust to extreme illumination or fast motion due to the use of event data. Beyond 2D-3D registration, we also show the potential of EP2T for other vision tasks such as flow estimation, event-to-image reconstruction and object recognition. The source code can be found at: https://github.com/Xmu-qcj/E2PNet.
Millimeter-wave (mmWave,>30 GHz) radars are the key enabler in the coming 6G era for high-resolution sensing and detection of targets. Photonic radar provides an effective approach to overcome the limitations of electronic radars thanks to the high frequency, broad bandwidth, and excellent reconfigurability of photonic systems. However, conventional photonic radars are mostly realized in tabletop systems composed of bulky discrete components, whereas the more compact integrated photonic radars are difficult to reach the mmWave bands due to the unsatisfactory bandwidths and signal integrity of the underlining electro-optic modulators. Here, we overcome these challenges and demonstrate a centimeter-resolution integrated photonic radar operating in the mmWave V band (40-50 GHz) based on a 4-inch wafer-scale thin-film lithium niobate (TFLN) technology. The fabricated TFLN mmWave photonic integrated circuit consists of a first electro-optic modulator capable of generating a broadband linear frequency modulated mmWave radar waveform through optical frequency multiplication of a low-frequency input signal, and a second electro-optic modulator responsible for frequency de-chirp of the received reflected echo wave, therefore greatly relieving the bandwidth requirements for the analog-to-digital converter in the receiver. Thanks to the absence of optical and electrical filters in the system, our integrated photonic mmWave radar features continuous on-demand tunability of the center frequency and bandwidth, currently only limited by the bandwidths of electrical amplifiers. We achieve multi-target ranging with a resolution of 1.50 cm and velocity measurement with a resolution of 0.067 m/s. Furthermore, we construct an inverse synthetic aperture radar (ISAR) and successfully demonstrate the imaging of targets with various shapes and postures with a two-dimensional resolution of 1.50 cm * 1.06 cm.
Multi-agent systems are characterized by environmental uncertainty, varying policies of agents, and partial observability, which result in significant risks. In the context of Multi-Agent Reinforcement Learning (MARL), learning coordinated and decentralized policies that are sensitive to risk is challenging. To formulate the coordination requirements in risk-sensitive MARL, we introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles. This principle requires that the collection of risk-sensitive action selections of each agent should be equivalent to the risk-sensitive action selection of the central policy. Current MARL value factorization methods do not satisfy the RIGM principle for common risk metrics such as the Value at Risk (VaR) metric or distorted risk measurements. Therefore, we propose RiskQ to address this limitation, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities. RiskQ satisfies the RIGM principle for the VaR and distorted risk metrics. We show that RiskQ can obtain promising performance through extensive experiments. The source code of RiskQ is available in https://github.com/xmu-rl-3dv/RiskQ.
Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries.
Characterizing urban environments with broad coverages and high precision is more important than ever for achieving the UN's Sustainable Development Goals (SDGs) as half of the world's populations are living in cities. Urban building height as a fundamental 3D urban structural feature has far-reaching applications. However, so far, producing readily available datasets of recent urban building heights with fine spatial resolutions and global coverages remains a challenging task. Here, we provide an up-to-date global product of urban building heights based on a fine grid size of 150 m around 2020 by combining the spaceborne lidar instrument of GEDI and multi-sourced data including remotely sensed images (i.e., Landsat-8, Sentinel-2, and Sentinel-1) and topographic data. Our results revealed that the estimated method of building height samples based on the GEDI data was effective with 0.78 of Pearson's r and 3.67 m of RMSE in comparison to the reference data. The mapping product also demonstrated good performance as indicated by its strong correlation with the reference data (i.e., Pearson's r = 0.71, RMSE = 4.60 m). Compared with the currently existing products, our global urban building height map holds the ability to provide a higher spatial resolution (i.e., 150 m) with a great level of inherent details about the spatial heterogeneity and flexibility of updating using the GEDI samples as inputs. This work will boost future urban studies across many fields including climate, environmental, ecological, and social sciences.
Deep neural networks usually process information through multiple hidden layers. However, most hardware reservoir computing recurrent networks only have one hidden reservoir layer, which significantly limits the capability of solving real-world complex tasks. Here we show a deep photonic reservoir computing (PRC) architecture, which is constructed by cascading injection-locked semiconductor lasers. In particular, the connection between successive hidden layers is all optical, without any optical-electrical conversion or analog-digital conversion. The proof of concept is demonstrated on a PRC consisting of 4 hidden layers and 320 interconnected neurons. In addition, we apply the deep PRC in the real-world signal equalization of an optical fiber communication system. It is found that the deep PRC owns strong ability to compensate the nonlinearity of fibers.
The unstructured nature of point clouds demands that local aggregation be adaptive to different local structures. Previous methods meet this by explicitly embedding spatial relations into each aggregation process. Although this coupled approach has been shown effective in generating clear semantics, aggregation can be greatly slowed down due to repeated relation learning and redundant computation to mix directional and point features. In this work, we propose to decouple the explicit modelling of spatial relations from local aggregation. We theoretically prove that basic neighbor pooling operations can too function without loss of clarity in feature fusion, so long as essential spatial information has been encoded in point features. As an instantiation of decoupled local aggregation, we present DeLA, a lightweight point network, where in each learning stage relative spatial encodings are first formed, and only pointwise convolutions plus edge max-pooling are used for local aggregation then. Further, a regularization term is employed to reduce potential ambiguity through the prediction of relative coordinates. Conceptually simple though, experimental results on five classic benchmarks demonstrate that DeLA achieves state-of-the-art performance with reduced or comparable latency. Specifically, DeLA achieves over 90\% overall accuracy on ScanObjectNN and 74\% mIoU on S3DIS Area 5. Our code is available at https://github.com/Matrix-ASC/DeLA .
This study attempts to segment teeth and root-canals simultaneously from CBCT images, but there are very challenging problems in this process. First, the clinical CBCT image data is very large (e.g., 672 *688 * 688), and the use of downsampling operation will lose useful information about teeth and root canals. Second, teeth and root canals are very different in morphology, and it is difficult for a simple network to identify them precisely. In addition, there are weak edges at the tooth, between tooth and root canal, which makes it very difficult to segment such weak edges. To this end, we propose a coarse-to-fine segmentation method based on inverse feature fusion transformer and uncertainty estimation to address above challenging problems. First, we use the downscaled volume data (e.g., 128 * 128 * 128) to conduct coarse segmentation and map it to the original volume to obtain the area of teeth and root canals. Then, we design a transformer with reverse feature fusion, which can bring better segmentation effect of different morphological objects by transferring deeper features to shallow features. Finally, we design an auxiliary branch to calculate and refine the difficult areas in order to improve the weak edge segmentation performance of teeth and root canals. Through the combined tooth and root canal segmentation experiment of 157 clinical high-resolution CBCT data, it is verified that the proposed method is superior to the existing tooth or root canal segmentation methods.
Image deraining is a challenging task that involves restoring degraded images affected by rain streaks.
Progress in artificial intelligence and machine learning over the past decade has been driven by the ability to train larger deep neural networks (DNNs), leading to a compute demand that far exceeds the growth in hardware performance afforded by Moore's law. Training DNNs is an extremely memory-intensive process, requiring not just the model weights but also activations and gradients for an entire minibatch to be stored. The need to provide high-density and low-leakage on-chip memory motivates the exploration of emerging non-volatile memory for training accelerators. Spin-Transfer-Torque MRAM (STT-MRAM) offers several desirable properties for training accelerators, including 3-4x higher density than SRAM, significantly reduced leakage power, high endurance and reasonable access time. On the one hand, MRAM write operations require high write energy and latency due to the need to ensure reliable switching. In this study, we perform a comprehensive device-to-system evaluation and co-optimization of STT-MRAM for efficient ML training accelerator design. We devised a cross-layer simulation framework to evaluate the effectiveness of STT-MRAM as a scratchpad replacing SRAM in a systolic-array-based DNN accelerator. To address the inefficiency of writes in STT-MRAM, we propose to reduce write voltage and duration. To evaluate the ensuing accuracy-efficiency trade-off, we conduct a thorough analysis of the error tolerance of input activations, weights, and errors during the training. We propose heterogeneous memory configurations that enable training convergence with good accuracy. We show that MRAM provide up to 15-22x improvement in system level energy across a suite of DNN benchmarks under iso-capacity and iso-area scenarios. Further optimizing STT-MRAM write operations can provide over 2x improvement in write energy for minimal degradation in application-level training accuracy.