Taxi arrival time prediction is an essential part of building intelligent transportation systems. Traditional arrival time estimation methods mainly rely on traffic map feature extraction, which can not model complex situations and nonlinear spatial and temporal relationships. Therefore, we propose a Multi-View Spatial-Temporal Model (MVSTM) to capture the dependence of spatial-temporal and trajectory. Specifically, we use graph2vec to model the spatial view, dual-channel temporal module to model the trajectory view, and structural embedding to model the traffic semantics. Experiments on large-scale taxi trajectory data show that our approach is more effective than the novel method. The source code can be obtained from https://github.com/775269512/SIGSPATIAL-2021-GISCUP-4th-Solution.
Federated Learning (FL) opens new perspectives for training machine learning models while keeping personal data on the users premises. Specifically, in FL, models are trained on the users devices and only model updates (i.e., gradients) are sent to a central server for aggregation purposes. However, the long list of inference attacks that leak private data from gradients, published in the recent years, have emphasized the need of devising effective protection mechanisms to incentivize the adoption of FL at scale. While there exist solutions to mitigate these attacks on the server side, little has been done to protect users from attacks performed on the client side. In this context, the use of Trusted Execution Environments (TEEs) on the client side are among the most proposing solutions. However, existing frameworks (e.g., DarkneTZ) require statically putting a large portion of the machine learning model into the TEE to effectively protect against complex attacks or a combination of attacks. We present GradSec, a solution that allows protecting in a TEE only sensitive layers of a machine learning model, either statically or dynamically, hence reducing both the TCB size and the overall training time by up to 30% and 56%, respectively compared to state-of-the-art competitors.
Buses and heavy vehicles have more blind spots compared to cars and other road vehicles due to their large sizes. Therefore, accidents caused by these heavy vehicles are more fatal and result in severe injuries to other road users. These possible blind-spot collisions can be identified early using vision-based object detection approaches. Yet, the existing state-of-the-art vision-based object detection models rely heavily on a single feature descriptor for making decisions. In this research, the design of two convolutional neural networks (CNNs) based on high-level feature descriptors and their integration with faster R-CNN is proposed to detect blind-spot collisions for heavy vehicles. Moreover, a fusion approach is proposed to integrate two pre-trained networks (i.e., Resnet 50 and Resnet 101) for extracting high level features for blind-spot vehicle detection. The fusion of features significantly improves the performance of faster R-CNN and outperformed the existing state-of-the-art methods. Both approaches are validated on a self-recorded blind-spot vehicle detection dataset for buses and an online LISA dataset for vehicle detection. For both proposed approaches, a false detection rate (FDR) of 3.05% and 3.49% are obtained for the self recorded dataset, making these approaches suitable for real time applications.
A social network (SN) is a social structure consisting of a group representing the interaction between them. SNs have recently been widely used and, subsequently, have become suitable and popular platforms for product promotion and information diffusion. People in an SN directly influence each other's interests and behavior. One of the most important problems in SNs is to find people who can have the maximum influence on other nodes in the network in a cascade manner if they are chosen as the seed nodes of a network diffusion scenario. Influential diffusers are people who, if they are chosen as the seed set in a publishing issue in the network, that network will have the most people who have learned about that diffused entity. This is a well-known problem in literature known as influence maximization (IM) problem. Although it has been proven that this is an NP-complete problem and does not have a solution in polynomial time, it has been argued that it has the properties of sub modular functions and, therefore, can be solved using a greedy algorithm. Most of the methods proposed to improve this complexity are based on the assumption that the entire graph is visible. However, this assumption does not hold for many real-world graphs. This study is conducted to extend current maximization methods with link prediction techniques to pseudo-visibility graphs. To this end, a graph generation method called the exponential random graph model (ERGM) is used for link prediction. The proposed method is tested using the data from the Snap dataset of Stanford University. According to the experimental tests, the proposed method is efficient on real-world graphs.
Virtual reality (VR) technology is commonly used in entertainment applications; however, it has also been deployed in practical applications in more serious aspects of our lives, such as safety. To support people working in dangerous industries, VR can ensure operators manipulate standardized tasks and work collaboratively to deal with potential risks. Surprisingly, little research has focused on how people can collaboratively work in VR environments. Few studies have paid attention to the cognitive load of operators in their collaborative tasks. Once task demands become complex, many researchers focus on optimizing the design of the interaction interfaces to reduce the cognitive load on the operator. That approach could be of merit; however, it can actually subject operators to a more significant cognitive load and potentially more errors and a failure of collaboration. In this paper, we propose a new collaborative VR system to support two teleoperators working in the VR environment to remote control an uncrewed ground vehicle. We use a compared experiment to evaluate the collaborative VR systems, focusing on the time spent on tasks and the total number of operations. Our results show that the total number of processes and the cognitive load during operations were significantly lower in the two-person group than in the single-person group. Our study sheds light on designing VR systems to support collaborative work with respect to the flow of work of teleoperators instead of simply optimizing the design outcomes.
Food image classification serves as the foundation of image-based dietary assessment to predict food categories. Since there are many different food classes in real life, conventional models cannot achieve sufficiently high accuracy. Personalized classifiers aim to largely improve the accuracy of food image classification for each individual. However, a lack of public personal food consumption data proves to be a challenge for training such models. To address this issue, we propose a novel framework to simulate personal food consumption data patterns, leveraging the use of a modified Markov chain model and self-supervised learning. Our method is capable of creating an accurate future data pattern from a limited amount of initial data, and our simulated data patterns can be closely correlated with the initial data pattern. Furthermore, we use Dynamic Time Warping distance and Kullback-Leibler divergence as metrics to evaluate the effectiveness of our method on the public Food-101 dataset. Our experimental results demonstrate promising performance compared with random simulation and the original Markov chain method.
In this paper, we investigate a reconfigurable intelligent surface (RIS)-aided multiuser full-duplex secure communication system with hardware impairments at transceivers and RIS, where multiple eavesdroppers overhear the two-way transmitted signals simultaneously, and an RIS is applied to enhance the secrecy performance. Aiming at maximizing the sum secrecy rate (SSR), a joint optimization problem of the transmit beamforming at the base station (BS) and the reflecting beamforming at the RIS is formulated under the transmit power constraint of the BS and the unit modulus constraint of the phase shifters. As the environment is time-varying and the system is high-dimensional, this non-convex optimization problem is mathematically intractable. A deep reinforcement learning (DRL)-based algorithm is explored to obtain the satisfactory solution by repeatedly interacting with and learning from the dynamic environment. Extensive simulation results illustrate that the DRL-based secure beamforming algorithm is proved to be significantly effective in improving the SSR. It is also found that the performance of the DRL-based method can be greatly improved and the convergence speed of neural network can be accelerated with appropriate neural network parameters.
As event-based sensing gains in popularity, theoretical understanding is needed to harness this technology's potential. Instead of recording video by capturing frames, event-based cameras have sensors that emit events when their inputs change, thus encoding information in the timing of events. This creates new challenges in establishing reconstruction guarantees and algorithms, but also provides advantages over frame-based video. We use time encoding machines to model event-based sensors: TEMs also encode their inputs by emitting events characterized by their timing and reconstruction from time encodings is well understood. We consider the case of time encoding bandlimited video and demonstrate a dependence between spatial sensor density and overall spatial and temporal resolution. Such a dependence does not occur in frame-based video, where temporal resolution depends solely on the frame rate of the video and spatial resolution depends solely on the pixel grid. However, this dependence arises naturally in event-based video and allows oversampling in space to provide better time resolution. As such, event-based vision encourages using more sensors that emit fewer events over time.
Streaming services use recommender systems to surface the right music to users. Playlists are a popular way to present music in a list-like fashion, ie as a plain list of songs. An alternative are tours, where the songs alternate segues, which explain the connections between consecutive songs. Tours address the user need of seeking background information about songs, and are found to be superior to playlists, given the right user context. In this work, we provide, for the first time, a user-centered evaluation of two tour-generation algorithms (Greedy and Optimal) using semi-structured interviews. We assess the algorithms, we discuss attributes of the tours that the algorithms produce, we identify which attributes are desirable and which are not, and we enumerate several possible improvements to the algorithms, along with practical suggestions on how to implement the improvements. Our main findings are that Greedy generates more likeable tours than Optimal, and that three important attributes of tours are segue diversity, song arrangement and song familiarity. More generally, we provide insights into how to present music to users, which could inform the design of user-centered recommender systems.
Deploying Machine learning (ML) on the milliwatt-scale edge devices (tinyML) is gaining popularity due to recent breakthroughs in ML and IoT. However, the capabilities of tinyML are restricted by strict power and compute constraints. The majority of the contemporary research in tinyML focuses on model compression techniques such as model pruning and quantization to fit ML models on low-end devices. Nevertheless, the improvements in energy consumption and inference time obtained by existing techniques are limited because aggressive compression quickly shrinks model capacity and accuracy. Another approach to improve inference time and/or reduce power while preserving its model capacity is through early-exit networks. These networks place intermediate classifiers along a baseline neural network that facilitate early exit from neural network computation if an intermediate classifier exhibits sufficient confidence in its prediction. Previous work on early-exit networks have focused on large networks, beyond what would typically be used for tinyML applications. In this paper, we discuss the challenges of adding early-exits to state-of-the-art tiny-CNNs and devise an early-exit architecture, T-RECX, that addresses these challenges. In addition, we develop a method to alleviate the effect of network overthinking at the final exit by leveraging the high-level representations learned by the early-exit. We evaluate T-RECX on three CNNs from the MLPerf tiny benchmark suite for image classification, keyword spotting and visual wake word detection tasks. Our results demonstrate that T-RECX improves the accuracy of baseline network and significantly reduces the average inference time of tiny-CNNs. T-RECX achieves 32.58% average reduction in FLOPS in exchange for 1% accuracy across all evaluated models. Also, our techniques increase the accuracy of baseline network in two out of three models we evaluate