GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the training iteration change and only collects performance counter data when an iteration shift is detected. GPOEO employs multi-objective models based on gradient boosting and a local search algorithm to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites running on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with a modest average execution time increase of 5.1%.
The Braess's Paradox (BP) is the observation that adding one or more roads to the existing road network will counter-intuitively increase traffic congestion and slow down the overall traffic flow. Previously, the existence of the BP is modeled using the static traffic assignment model, which solves for the user equilibrium subject to network flow conservation to find the equilibrium state and distributes all vehicles instantaneously. Such approach neglects the dynamic nature of real-world traffic, including vehicle behaviors and the interaction between vehicles and the infrastructure. As such, this article proposes a dynamic traffic network model and empirically validates the existence of the BP under dynamic traffic. In particular, we use microsimulation environment to study the impacts of an added path on a grid network. We explore how the network flow, vehicle travel time, and network capacity respond, as well as when the BP will occur.
With the continuous development of neural networks in computer vision tasks, more and more network architectures have achieved outstanding success. As one of the most advanced neural network architectures, DenseNet shortcuts all feature maps to solve the problem of model depth. Although this network architecture has excellent accuracy at low MACs (multiplications and accumulations), it takes excessive inference time. To solve this problem, HarDNet reduces the connections between feature maps, making the remaining connections resemble harmonic waves. However, this compression method may result in decreasing model accuracy and increasing MACs and model size. This network architecture only reduces the memory access time, its overall performance still needs to be improved. Therefore, we propose a new network architecture using threshold mechanism to further optimize the method of connections. Different numbers of connections for different convolutional layers are discarded to compress the feature maps in ThreshNet. The proposed network architecture used three datasets, CIFAR-10, CIFAR-100, and SVHN, to evaluate the performance for image classifications. Experimental results show that ThreshNet achieves up to 60% reduction in inference time compared to DenseNet, and up to 35% faster training speed and 20% reduction in error rate compared to HarDNet on these datasets.
The detection and identification of toxic comments are conducive to creating a civilized and harmonious Internet environment. In this experiment, we collected various data sets related to toxic comments. Because of the characteristics of comment data, we perform data cleaning and feature extraction operations on it from different angles to obtain different toxic comment training sets. In terms of model construction, we used the training set to train the models based on TFIDF and finetuned the Bert model separately. Finally, we encapsulated the code into software to score toxic comments in real-time.
Data center interconnects (DCIs) will have to support throughputs of 400 Gbps or more per wavelength in the near future. To achieve such high data rates, coherent modulation and detection is used, which conventionally requires high-speed data conversion and signal processing in the digital domain. Alternatively, high-speed signal conditioning and processing could be carried out in co-designed photonic and electronic integrated circuits, in the optical and electrical analog domains, respectively, to achieve reduced power consumption, latency, form factor, and cost. A few demonstrations of analog domain processing electronic integrated circuits (EICs), including those of equalizer and carrier phase recovery (CPR) modules showcase progress in this direction in the literature. In this brief, for the first time, we present integration of a silicon photonic integrated coherent receiver (ICR) module with a CPR module, as a part of a complete coherent receiver solution. A phase shifter in the ICR (fabricated in a 220 nm silicon-on-insulator technology) receives feedback from a CPR EIC, and the combination compensates for the time varying phase offset between the modulated signal and the unmodulated carrier in the closed loop configuration. In this proof-of-concept demonstration, we present experimental results obtained from the stand-alone silicon photonic ICR along with its system level integration with CPR chip, for QPSK signals. The technique can be extended to a higher-order modulation format, such as 16-QAM, for data rate scaling. The proposed scheme is suitable for homodyne systems, such as polarization multiplexed carrier based self-homodyne links.
Deep Learning (DL) has shown remarkable results in solving inverse problems in various domains. In particular, the Tikhonet approach is very powerful to deconvolve optical astronomical images (Sureau et al. 2020). Yet, this approach only uses the $\ell_2$ loss, which does not guarantee the preservation of physical information (e.g. flux and shape) of the object reconstructed in the image. In Nammour et al. (2021), a new loss function was proposed in the framework of sparse deconvolution, which better preserves the shape of galaxies and reduces the pixel error. In this paper, we extend Tikhonet to take into account this shape constraint, and apply our new DL method, called ShapeNet, to optical and radio-interferometry simulated data set. The originality of the paper relies on i) the shape constraint we use in the neural network framework, ii) the application of deep learning to radio-interferometry image deconvolution for the first time, and iii) the generation of a simulated radio data set that we make available for the community. A range of examples illustrates the results.
In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good single-image segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.
An Xception model reaches state-of-the-art (SOTA) accuracy on the ESC-50 dataset for audio event detection through knowledge transfer from ImageNet weights, pretraining on AudioSet, and an on-the-fly data augmentation pipeline. This paper presents an ablation study that analyzes which components contribute to the boost in performance and training time. A smaller Xception model is also presented which nears SOTA performance with almost a third of the parameters.
Visual inertial odometry (VIO) is widely used for the state estimation of multicopters, but it may function poorly in environments with few visual features or in overly aggressive flights. In this work, we propose a perception-aware collision avoidance local planner for multicopters. Our approach is able to fly the vehicle to a goal position at high speed, avoiding obstacles in the environment while achieving good VIO state estimation accuracy. The proposed planner samples a group of minimum jerk trajectories and finds collision-free trajectories among them, which are then evaluated based on their speed to the goal and perception quality. Both the features' motion blur and their locations are considered for the perception quality. The best trajectory from the evaluation is tracked by the vehicle and is updated in a receding horizon manner when new images are received from the camera. All the sampled trajectories have zero speed and acceleration at the end, and the planner assumes no other visual features except those already found by the VIO. As a result, the vehicle will follow the current trajectory to the end and stop safely if no new trajectories are found, avoiding collision or flying into areas without features. The proposed method can run in real time on a small embedded computer on board. We validated the effectiveness of our proposed approach through experiments in indoor and outdoor environments. Compared to a perception-agnostic planner, the proposed planner kept more features in the camera's view and made the flight less aggressive, making the VIO more accurate. It also reduced VIO failures, which occurred for the perception-agnostic planner but not for the proposed planner. The experiment video can be found at https://youtu.be/LjZju4KEH9Q.
Joint, radio-based communication, localization and sensing is a rapidly emerging research field with various application potentials. Greatly benefiting from these capabilities, smart city, mobility, and logistic concepts are key components for maximizing the efficiency of modern transportation systems. In urban environments, both the search for parking space and freight transport are time- and space-consuming and present the bottlenecks for these transportation chains. Providing location information for these heterogeneous requirement profiles (both active and passive localization of objects), can be realized by using retrofittable wireless sensor networks, which are typically only deployed for active localization. An additional passive detection of objects can be achieved by assessing signal reflections and multipath properties of the transmission channel stored within the Channel Impulse Response (CIR). In this work, a proof-of-concept realization and preliminary experimental results of a CIR-based occupancy detection for parking lots are presented. As the time resolution is dependent on available bandwidth, the CIR of Ultra-wideband transceivers are used. For this, the CIR is smoothed and time-variant changes within it are detected by performing a background subtraction. Finally, the reflecting objects are mapped to individual parking lots. The developed method is tested in an in-house parking garage. The work provided is a foundation for passive occupancy detection, whose capabilities can prospectively be enhanced by exploiting additional physical layers, such as 5G or even 6G.