Cross-domain recommender (CDR) systems aim to enhance the performance of the target domain by utilizing data from other related domains. However, irrelevant information from the source domain may instead degrade target domain performance, which is known as the negative transfer problem. There have been some attempts to address this problem, mostly by designing adaptive representations for overlapped users. Whereas, representation adaptions solely rely on the expressive capacity of the CDR model, lacking explicit constraint to filter the irrelevant source-domain collaborative information for the target domain. In this paper, we propose a novel Collaborative information regularized User Transformation (CUT) framework to tackle the negative transfer problem by directly filtering users' collaborative information. In CUT, user similarity in the target domain is adopted as a constraint for user transformation learning to filter the user collaborative information from the source domain. CUT first learns user similarity relationships from the target domain. Then, source-target information transfer is guided by the user similarity, where we design a user transformation layer to learn target-domain user representations and a contrastive loss to supervise the user collaborative information transferred. The results show significant performance improvement of CUT compared with SOTA single and cross-domain methods. Further analysis of the target-domain results illustrates that CUT can effectively alleviate the negative transfer problem.
Nash equilibrium is one of the most influential solution concepts in game theory. With the development of computer science and artificial intelligence, there is an increasing demand on Nash equilibrium computation, especially for Internet economics and multi-agent learning. This paper reviews various algorithms computing the Nash equilibrium and its approximation solutions in finite normal-form games from both theoretical and empirical perspectives. For the theoretical part, we classify algorithms in the literature and present basic ideas on algorithm design and analysis. For the empirical part, we present a comprehensive comparison on the algorithms in the literature over different kinds of games. Based on these results, we provide practical suggestions on implementations and uses of these algorithms. Finally, we present a series of open problems from both theoretical and practical considerations.
AI in Math deals with mathematics in a constructive manner so that reasoning becomes automated, less laborious, and less error-prone. For algorithms, the question becomes how to automate analyses for specific problems. For the first time, this work provides an automatic method for approximation analysis on a well-studied problem in theoretical computer science: computing approximate Nash equilibria in two-player games. We observe that such algorithms can be reformulated into a search-and-mix paradigm, which involves a search phase followed by a mixing phase. By doing so, we are able to fully automate the procedure of designing and analyzing the mixing phase. For example, we illustrate how to perform our method with a program to analyze the approximation bounds of all the algorithms in the literature. Same approximation bounds are computed without any hand-written proof. Our automatic method heavily relies on the LP-relaxation structure in approximate Nash equilibria. Since many approximation algorithms and online algorithms adopt the LP relaxation, our approach may be extended to automate the analysis of other algorithms.
Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related problems. To accelerate computer vision research and innovation for Vehicle-Infrastructure Cooperative Autonomous Driving (VICAD), we release DAIR-V2X Dataset, which is the first large-scale, multi-modality, multi-view dataset from real scenarios for VICAD. DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. The Vehicle-Infrastructure Cooperative 3D Object Detection problem (VIC3D) is introduced, formulating the problem of collaboratively locating and identifying 3D objects using sensory inputs from both vehicle and infrastructure. In addition to solving traditional 3D object detection problems, the solution of VIC3D needs to consider the temporal asynchrony problem between vehicle and infrastructure sensors and the data transmission cost between them. Furthermore, we propose Time Compensation Late Fusion (TCLF), a late fusion framework for the VIC3D task as a benchmark based on DAIR-V2X. Find data, code, and more up-to-date information at https://thudair.baai.ac.cn/index and https://github.com/AIR-THU/DAIR-V2X.
We consider stability issues in minimizing a continuous (probably parameterized, nonconvex and nonsmooth) real-valued function $f$. We call a point stationary if all its possible directional derivatives are nonnegative. In this work, we focus on two notions of stability on stationary points of $f$: parametric stability and convergence stability. Parametric considerations are widely studied in various fields, including smoothed analysis, numerical stability, condition numbers and sensitivity analysis for linear programming. Parametric stability asks whether minor perturbations on parameters lead to dramatic changes in the position and $f$ value of a stationary point. Meanwhile, convergence stability indicates a non-escapable solution: Any point sequence iteratively produced by an optimization algorithm cannot escape from a neighborhood of a stationary point but gets close to it in the sense that such stationary points are stable to the precision parameter and algorithmic numerical errors. It turns out that these notions have deep connections to geometry theory. We show that parametric stability is linked to deformations of graphs of functions. On the other hand, convergence stability is concerned with area partitioning of the function domain. Utilizing these connections, we prove quite tight conditions of these two stability notions for a wide range of functions and optimization algorithms with small enough step sizes and precision parameters. These conditions are subtle in the sense that a slightly weaker function requirement goes to the opposite of primitive intuitions and leads to wrong conclusions. We present three applications of this theory. These applications reveal some understanding on Nash equilibrium computation, nonconvex and nonsmooth optimization, as well as the new optimization methodology of deep neural networks.
Concurrent perception datasets for autonomous driving are mainly limited to frontal view with sensors mounted on the vehicle. None of them is designed for the overlooked roadside perception tasks. On the other hand, the data captured from roadside cameras have strengths over frontal-view data, which is believed to facilitate a safer and more intelligent autonomous driving system. To accelerate the progress of roadside perception, we present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view. The dataset consists of 50k images and over 1.5M 3D objects in various scenes, which are captured under different settings including various cameras with ambiguous mounting positions, camera specifications, viewpoints, and different environmental conditions. We conduct strict 2D-3D joint annotation and comprehensive data analysis, as well as set up a new 3D roadside perception benchmark with metrics and evaluation devkit. Furthermore, we tailor the existing frontal-view monocular 3D object detection approaches and propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints. Our dataset is available on https://thudair.baai.ac.cn/rope.
Electron microscopy (EM) enables the reconstruction of neural circuits at the level of individual synapses, which has been transformative for scientific discoveries. However, due to the complex morphology, an accurate reconstruction of cortical axons has become a major challenge. Worse still, there is no publicly available large-scale EM dataset from the cortex that provides dense ground truth segmentation for axons, making it difficult to develop and evaluate large-scale axon reconstruction methods. To address this, we introduce the AxonEM dataset, which consists of two 30x30x30 um^3 EM image volumes from the human and mouse cortex, respectively. We thoroughly proofread over 18,000 axon instances to provide dense 3D axon instance segmentation, enabling large-scale evaluation of axon reconstruction methods. In addition, we densely annotate nine ground truth subvolumes for training, per each data volume. With this, we reproduce two published state-of-the-art methods and provide their evaluation results as a baseline. We publicly release our code and data at https://connectomics-bazaar.github.io/proj/AxonEM/index.html to foster the development of advanced methods.
Underwater image enhancement algorithms have attracted much attention in underwater vision task. However, these algorithms are mainly evaluated on different data sets and different metrics. In this paper, we set up an effective and pubic underwater test dataset named U45 including the color casts, low contrast and haze-like effects of underwater degradation and propose a fusion adversarial network for enhancing underwater images. Meanwhile, the well-designed the adversarial loss including Lgt loss and Lfe loss is presented to focus on image features of ground truth, and image features of the image enhanced by fusion enhance method, respectively. The proposed network corrects color casts effectively and owns faster testing time with fewer parameters. Experiment results on U45 dataset demonstrate that the proposed method achieves better or comparable performance than the other state-of-the-art methods in terms of qualitative and quantitative evaluations. Moreover, an ablation study demonstrates the contributions of each component, and the application test further shows the effectiveness of the enhanced images.
Underwater image enhancement algorithms have attracted much attention in underwater vision task. However, these algorithms are mainly evaluated on different data sets and different metrics. In this paper, we set up an effective and pubic underwater test dataset named U45 including the color casts, low contrast and haze-like effects of underwater degradation and propose a fusion adversarial network for enhancing underwater images. Meanwhile, the well-designed the adversarial loss including Lgt loss and Lfe loss is presented to focus on image features of ground truth, and image features of the image enhanced by fusion enhance method, respectively. The proposed network corrects color casts effectively and owns faster testing time with fewer parameters. Experiment results on U45 dataset demonstrate that the proposed method achieves better or comparable performance than the other state-of-the-art methods in terms of qualitative and quantitative evaluations. Moreover, an ablation study demonstrates the contributions of each component, and the application test further shows the effectiveness of the enhanced images.
Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data. The flood-filling networks (FFN) architecture can achieve leading performance. However, the training of the network is computationally very expensive. In order to reduce the training time, we implemented synchronous and data-parallel distributed training using the Horovod framework on top of the published FFN code. We demonstrated the scaling of FFN training up to 1024 Intel Knights Landing (KNL) nodes at Argonne Leadership Computing Facility. We investigated the training accuracy with different optimizers, learning rates, and optional warm-up periods. We discovered that square root scaling for learning rate works best beyond 16 nodes, which is contrary to the case of smaller number of nodes, where linear learning rate scaling with warm-up performs the best. Our distributed training reaches 95% accuracy in approximately 4.5 hours on 1024 KNL nodes using Adam optimizer.