Utilizing pivot language effectively can significantly improve low-resource machine translation. Usually, the two translation models, source-pivot and pivot-target, are trained individually and do not utilize the limited (source, target) parallel data. This work proposes an end-to-end training method for the cascaded translation model and configures an improved decoding algorithm. The input of the pivot-target model is modified to weighted pivot embedding based on the probability distribution output by the source-pivot model. This allows the model to be trained end-to-end. In addition, we mitigate the inconsistency between tokens and probability distributions while using beam search in pivot decoding. Experiments demonstrate that our method enhances the quality of translation.
Training Graph Neural Networks (GNNs) on large graphs is challenging due to the conflict between the high memory demand and limited GPU memory. Recently, distributed full-graph GNN training has been widely adopted to tackle this problem. However, the substantial inter-GPU communication overhead can cause severe throughput degradation. Existing communication compression techniques mainly focus on traditional DNN training, whose bottleneck lies in synchronizing gradients and parameters. We find they do not work well in distributed GNN training as the barrier is the layer-wise communication of features during the forward pass & feature gradients during the backward pass. To this end, we propose an efficient distributed GNN training framework Sylvie, which employs one-bit quantization technique in GNNs and further pipelines the curtailed communication with computation to enormously shrink the overhead while maintaining the model quality. In detail, Sylvie provides a lightweight Low-bit Module to quantize the sent data and dequantize the received data back to full precision values in each layer. Additionally, we propose a Bounded Staleness Adaptor to control the introduced staleness to achieve further performance enhancement. We conduct theoretical convergence analysis and extensive experiments on various models & datasets to demonstrate Sylvie can considerably boost the training throughput by up to 28.1x.
Multi-view imaging systems enable uniform coverage of 3D space and reduce the impact of occlusion, which is beneficial for 3D object detection and tracking accuracy. However, existing imaging systems built with multi-view cameras or depth sensors are limited by the small applicable scene and complicated composition. In this paper, we propose a wireless multi-view multi-modal 3D imaging system generally applicable to large outdoor scenes, which consists of a master node and several slave nodes. Multiple spatially distributed slave nodes equipped with cameras and LiDARs are connected to form a wireless sensor network. While providing flexibility and scalability, the system applies automatic spatio-temporal calibration techniques to obtain accurate 3D multi-view multi-modal data. This system is the first imaging system that integrates mutli-view RGB cameras and LiDARs in large outdoor scenes among existing 3D imaging systems. We perform point clouds based 3D object detection and long-term tracking using the 3D imaging dataset collected by this system. The experimental results show that multi-view point clouds greatly improve 3D object detection and tracking accuracy regardless of complex and various outdoor environments.
The ubiquity of camera-embedded devices and the advances in deep learning have stimulated various intelligent mobile video applications. These applications often demand on-device processing of video streams to deliver real-time, high-quality services for privacy and robustness concerns. However, the performance of these applications is constrained by the raw video streams, which tend to be taken with small-aperture cameras of ubiquitous mobile platforms in dim light. Despite extensive low-light video enhancement solutions, they are unfit for deployment to mobile devices due to their complex models and and ignorance of system dynamics like energy budgets. In this paper, we propose AdaEnlight, an energy-aware low-light video stream enhancement system on mobile devices. It achieves real-time video enhancement with competitive visual quality while allowing runtime behavior adaptation to the platform-imposed dynamic energy budgets. We report extensive experiments on diverse datasets, scenarios, and platforms and demonstrate the superiority of AdaEnlight compared with state-of-the-art low-light image and video enhancement solutions.
In a modern power system with an increasing proportion of renewable energy, wind power prediction is crucial to the arrangement of power grid dispatching plans due to the volatility of wind power. However, traditional centralized forecasting methods raise concerns regarding data privacy-preserving and data islands problem. To handle the data privacy and openness, we propose a forecasting scheme that combines federated learning and deep reinforcement learning (DRL) for ultra-short-term wind power forecasting, called federated deep reinforcement learning (FedDRL). Firstly, this paper uses the deep deterministic policy gradient (DDPG) algorithm as the basic forecasting model to improve prediction accuracy. Secondly, we integrate the DDPG forecasting model into the framework of federated learning. The designed FedDRL can obtain an accurate prediction model in a decentralized way by sharing model parameters instead of sharing private data which can avoid sensitive privacy issues. The simulation results show that the proposed FedDRL outperforms the traditional prediction methods in terms of forecasting accuracy. More importantly, while ensuring the forecasting performance, FedDRL can effectively protect the data privacy and relieve the communication pressure compared with the traditional centralized forecasting method. In addition, a simulation with different federated learning parameters is conducted to confirm the robustness of the proposed scheme.
Realistic dynamic garments on animated characters have many AR/VR applications. While authoring such dynamic garment geometry is still a challenging task, data-driven simulation provides an attractive alternative, especially if it can be controlled simply using the motion of the underlying character. In this work, we focus on motion guided dynamic 3D garments, especially for loose garments. In a data-driven setup, we first learn a generative space of plausible garment geometries. Then, we learn a mapping to this space to capture the motion dependent dynamic deformations, conditioned on the previous state of the garment as well as its relative position with respect to the underlying body. Technically, we model garment dynamics, driven using the input character motion, by predicting per-frame local displacements in a canonical state of the garment that is enriched with frame-dependent skinning weights to bring the garment to the global space. We resolve any remaining per-frame collisions by predicting residual local displacements. The resultant garment geometry is used as history to enable iterative rollout prediction. We demonstrate plausible generalization to unseen body shapes and motion inputs, and show improvements over multiple state-of-the-art alternatives.
Great progress has been achieved in the community of autonomous driving in the past few years. As a safety-critical problem, however, anomaly detection is a huge hurdle towards a large-scale deployment of autonomous vehicles in the real world. While many approaches, such as uncertainty estimation or segmentation-based image resynthesis, are extremely promising, there is more to be explored. Especially inspired by works on anomaly detection based on image resynthesis, we propose a novel approach for anomaly detection through style transfer. We leverage generative models to map an image from its original style domain of road traffic to an arbitrary one and back to generate pixelwise anomaly scores. However, our experiments have proven our hypothesis wrong, and we were unable to produce significant results. Nevertheless, we want to share our findings, so that others can learn from our experiments.
Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT). Different from prior works where pre-trained models usually adopt an unidirectional decoder, this paper demonstrates that pre-training a sequence-to-sequence model but with a bidirectional decoder can produce notable performance gains for both Autoregressive and Non-autoregressive NMT. Specifically, we propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora in many languages. We also introduce two simple but effective methods to enhance the CeMAT, aligned code-switching & masking and dynamic dual-masking. We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios from low- to extremely high-resource languages, i.e., up to +14.4 BLEU on low resource and +7.9 BLEU improvements on average for Autoregressive NMT. For Non-autoregressive NMT, we demonstrate it can also produce consistent performance gains, i.e., up to +5.3 BLEU. To the best of our knowledge, this is the first work to pre-train a unified model for fine-tuning on both NMT tasks. Code, data, and pre-trained models are available at https://github.com/huawei-noah/Pretrained-Language-Model/CeMAT
Triangular machine translation is a special case of low-resource machine translation where the language pair of interest has limited parallel data, but both languages have abundant parallel data with a pivot language. Naturally, the key to triangular machine translation is the successful exploitation of such auxiliary data. In this work, we propose a transfer-learning-based approach that utilizes all types of auxiliary data. As we train auxiliary source-pivot and pivot-target translation models, we initialize some parameters of the pivot side with a pre-trained language model and freeze them to encourage both translation models to work in the same pivot language space, so that they can be smoothly transferred to the source-target translation model. Experiments show that our approach can outperform previous ones.
Sign language translation as a kind of technology with profound social significance has attracted growing researchers' interest in recent years. However, the existing sign language translation methods need to read all the videos before starting the translation, which leads to a high inference latency and also limits their application in real-life scenarios. To solve this problem, we propose SimulSLT, the first end-to-end simultaneous sign language translation model, which can translate sign language videos into target text concurrently. SimulSLT is composed of a text decoder, a boundary predictor, and a masked encoder. We 1) use the wait-k strategy for simultaneous translation. 2) design a novel boundary predictor based on the integrate-and-fire module to output the gloss boundary, which is used to model the correspondence between the sign language video and the gloss. 3) propose an innovative re-encode method to help the model obtain more abundant contextual information, which allows the existing video features to interact fully. The experimental results conducted on the RWTH-PHOENIX-Weather 2014T dataset show that SimulSLT achieves BLEU scores that exceed the latest end-to-end non-simultaneous sign language translation model while maintaining low latency, which proves the effectiveness of our method.