Large Language Models (LLMs) have demonstrated impressive performance across a wide range of applications; however, assessing their reasoning capabilities remains a significant challenge. In this paper, we introduce a framework grounded in group and symmetry principles, which have played a crucial role in fields such as physics and mathematics, and offer another way to evaluate their capabilities. While the proposed framework is general, to showcase the benefits of employing these properties, we focus on arithmetic reasoning and investigate the performance of these models on four group properties: closure, identity, inverse, and associativity. Our findings reveal that LLMs studied in this work struggle to preserve group properties across different test regimes. In the closure test, we observe biases towards specific outputs and an abrupt degradation in their performance from 100% to 0% after a specific sequence length. They also perform poorly in the identity test, which represents adding irrelevant information in the context, and show sensitivity when subjected to inverse test, which examines the robustness of the model with respect to negation. In addition, we demonstrate that breaking down problems into smaller steps helps LLMs in the associativity test that we have conducted. To support these tests we have developed a synthetic dataset which will be released.
The spread of the Coronavirus disease-2019 epidemic has caused many courses and exams to be conducted online. The cheating behavior detection model in examination invigilation systems plays a pivotal role in guaranteeing the equality of long-distance examinations. However, cheating behavior is rare, and most researchers do not comprehensively take into account features such as head posture, gaze angle, body posture, and background information in the task of cheating behavior detection. In this paper, we develop and present CHEESE, a CHEating detection framework via multiplE inStancE learning. The framework consists of a label generator that implements weak supervision and a feature encoder to learn discriminative features. In addition, the framework combines body posture and background features extracted by 3D convolution with eye gaze, head posture and facial features captured by OpenFace 2.0. These features are fed into the spatio-temporal graph module by stitching to analyze the spatio-temporal changes in video clips to detect the cheating behaviors. Our experiments on three datasets, UCF-Crime, ShanghaiTech and Online Exam Proctoring (OEP), prove the effectiveness of our method as compared to the state-of-the-art approaches, and obtain the frame-level AUC score of 87.58% on the OEP dataset.
In this paper, we present a novel sequential team selection model in soccer. Specifically, we model the stochastic process of player injury and unavailability using player-specific information learned from real-world soccer data. Monte-Carlo Tree Search is used to select teams for games that optimise long-term team performance across a soccer season by reasoning over player injury probability. We validate our approach compared to benchmark solutions for the 2018/19 English Premier League season. Our model achieves similar season expected points to the benchmark whilst reducing first-team injuries by ~13% and the money inefficiently spent on injured players by ~11% - demonstrating the potential to reduce costs and improve player welfare in real-world soccer teams.
The popularity of LiDAR devices and sensor technology has gradually empowered users from autonomous driving to forest monitoring, and research on 3D LiDAR has made remarkable progress over the years. Unlike 2D images, whose focused area is visible and rich in texture information, understanding the point distribution can help companies and researchers find better ways to develop point-based 3D applications. In this work, we contribute an unreal-based LiDAR simulation tool and a 3D simulation dataset named LiDAR-Forest, which can be used by various studies to evaluate forest reconstruction, tree DBH estimation, and point cloud compression for easy visualization. The simulation is customizable in tree species, LiDAR types and scene generation, with low cost and high efficiency.
To foster an immersive and natural human-robot interaction, the implementation of tactile perception and feedback becomes imperative, effectively bridging the conventional sensory gap. In this paper, we propose a dual-modal electronic skin (e-skin) that integrates magnetic tactile sensing and vibration feedback for enhanced human-robot interaction. The dual-modal tactile e-skin offers multi-functional tactile sensing and programmable haptic feedback, underpinned by a layered structure comprised of flexible magnetic films, soft silicone, a Hall sensor and actuator array, and a microcontroller unit. The e-skin captures the magnetic field changes caused by subtle deformations through Hall sensors, employing deep learning for accurate tactile perception. Simultaneously, the actuator array generates mechanical vibrations to facilitate haptic feedback, delivering diverse mechanical stimuli. Notably, the dual-modal e-skin is capable of transmitting tactile information bidirectionally, enabling object recognition and fine-weighing operations. This bidirectional tactile interaction framework will enhance the immersion and efficiency of interactions between humans and robots.
This paper presents a range inertial localization algorithm for a 3D prior map. The proposed algorithm tightly couples scan-to-scan and scan-to-map point cloud registration factors along with IMU factors on a sliding window factor graph. The tight coupling of the scan-to-scan and scan-to-map registration factors enables a smooth fusion of sensor ego-motion estimation and map-based trajectory correction that results in robust tracking of the sensor pose under severe point cloud degeneration and defective regions in a map. We also propose an initial sensor state estimation algorithm that robustly estimates the gravity direction and IMU state and helps perform global localization in 3- or 4-DoF for system initialization without prior position information. Experimental results show that the proposed method outperforms existing state-of-the-art methods in extremely severe situations where the point cloud data becomes degenerate, there are momentary sensor interruptions, or the sensor moves along the map boundary or into unmapped regions.
Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE's scalability. To address this issue, we propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE), which can be trained once but provides customized pre-training models for diverse downstream tasks. Different from the mixture of experts (MoE), our MoCE trains each expert only with semantically relevant images by using cluster-conditional gates. Thus, each downstream task can be allocated to its customized model pre-trained with data most similar to the downstream data. Experiments on a collection of 11 downstream tasks show that MoCE outperforms the vanilla MAE by 2.45\% on average. It also obtains new state-of-the-art self-supervised learning results on detection and segmentation.
A robust and efficient optimization-based 2D/3D registration framework is crucial for the navigation system of orthopedic surgical robots. It can provide precise position information of surgical instruments and implants during surgery. While artificial intelligence technology has advanced rapidly in recent years, traditional optimization-based registration methods remain indispensable in the field of 2D/3D registration.he exceptional precision of this method enables it to be considered as a post-processing step of the learning-based methods, thereby offering a reliable assurance for registration. In this paper, we present a coarse-to-fine registration framework based on the CMA-ES algorithm. We conducted intensive testing of our method using data from different parts of the spine. The results shows the effectiveness of the proposed framework on real orthopedic spine surgery clinical data. This work can be viewed as an additional extension that complements the optimization-based methods employed in our previous studies.
Multidimensional scaling (MDS) is the act of embedding proximity information about a set of $n$ objects in $d$-dimensional Euclidean space. As originally conceived by the psychometric community, MDS was concerned with embedding a fixed set of proximities associated with a fixed set of objects. Modern concerns, e.g., that arise in developing asymptotic theories for statistical inference on random graphs, more typically involve studying the limiting behavior of a sequence of proximities associated with an increasing set of objects. Standard results from the theory of point-to-set maps imply that, if $n$ is fixed and a sequence of proximities converges, then the limit of the embedded structures is the embedded structure of the limiting proximities. But what if $n$ increases? It then becomes necessary to reformulate MDS so that the entire sequence of embedding problems can be viewed as a sequence of optimization problems in a fixed space. We present such a reformulation and derive some consequences.
We study the problem of preferential Bayesian optimization (BO), where we aim to optimize a black-box function with only preference feedback over a pair of candidate solutions. Inspired by the likelihood ratio idea, we construct a confidence set of the black-box function using only the preference feedback. An optimistic algorithm with an efficient computational method is then developed to solve the problem, which enjoys an information-theoretic bound on the cumulative regret, a first-of-its-kind for preferential BO. This bound further allows us to design a scheme to report an estimated best solution, with a guaranteed convergence rate. Experimental results on sampled instances from Gaussian processes, standard test functions, and a thermal comfort optimization problem all show that our method stably achieves better or competitive performance as compared to the existing state-of-the-art heuristics, which, however, do not have theoretical guarantees on regret bounds or convergence.