Generalization is at the core of machine learning models. However, the definition of generalization is not entirely clear. We employ set theory to introduce the concepts of algorithms, hypotheses, and dataset generalization. We analyze the properties of dataset generalization and prove a theorem on surrogate generalization procedures. This theorem leads to our generalization method. Through a generalization experiment on the MNIST dataset, we obtain 13,541 sample bases. When we use the entire training set to evaluate the model's performance, the models achieve an accuracy of 99.945%. However, if we shift the sample bases or modify the neural network structure, the performance experiences a significant decline. We also identify consistently mispredicted samples and find that they are all challenging examples. The experiments substantiated the accuracy of the generalization definition and the effectiveness of the proposed methods. Both the set-theoretic deduction and the experiments help us better understand generalization.
Tool use is a hallmark of advanced intelligence, exemplified in both animal behavior and robotic capabilities. This paper investigates the feasibility of imbuing robots with the ability to creatively use tools in tasks that involve implicit physical constraints and long-term planning. Leveraging Large Language Models (LLMs), we develop RoboTool, a system that accepts natural language instructions and outputs executable code for controlling robots in both simulated and real-world environments. RoboTool incorporates four pivotal components: (i) an "Analyzer" that interprets natural language to discern key task-related concepts, (ii) a "Planner" that generates comprehensive strategies based on the language input and key concepts, (iii) a "Calculator" that computes parameters for each skill, and (iv) a "Coder" that translates these plans into executable Python code. Our results show that RoboTool can not only comprehend explicit or implicit physical constraints and environmental factors but also demonstrate creative tool use. Unlike traditional Task and Motion Planning (TAMP) methods that rely on explicit optimization, our LLM-based system offers a more flexible, efficient, and user-friendly solution for complex robotics tasks. Through extensive experiments, we validate that RoboTool is proficient in handling tasks that would otherwise be infeasible without the creative use of tools, thereby expanding the capabilities of robotic systems. Demos are available on our project page: https://creative-robotool.github.io/.
Pan-sharpening algorithm utilizes panchromatic image and multispectral image to obtain a high spatial and high spectral image. However, the optimizations of the algorithms are designed with different standards. We adopt the simple matrix equation to describe the Pan-sharpening problem. The solution existence condition and the acquirement of spectral and spatial resolution are discussed. A down-sampling enhancement method was introduced for better acquiring the spatial and spectral down-sample matrices. By the generalized inverse theory, we derived two forms of general inverse matrix formulations that can correspond to the two prominent classes of Pan-sharpening methods, that is, component substitution and multi-resolution analysis methods. Specifically, the Gram Schmidt Adaptive(GSA) was proved to follow the general inverse matrix formulation of component substitution. A model prior to the general inverse matrix of the spectral function was rendered. The theoretical errors are analyzed. Synthetic experiments and real data experiments are implemented. The proposed methods are better and sharper than other methods qualitatively in both synthetic and real experiments. The down-sample enhancement effect is shown of better results both quantitatively and qualitatively in real experiments. The generalized inverse matrix theory help us better understand the Pan-sharpening.
Training control policies in simulation is more appealing than on real robots directly, as it allows for exploring diverse states in a safe and efficient manner. Yet, robot simulators inevitably exhibit disparities from the real world, yielding inaccuracies that manifest as the simulation-to-real gap. Existing literature has proposed to close this gap by actively modifying specific simulator parameters to align the simulated data with real-world observations. However, the set of tunable parameters is usually manually selected to reduce the search space in a case-by-case manner, which is hard to scale up for complex systems and requires extensive domain knowledge. To address the scalability issue and automate the parameter-tuning process, we introduce an approach that aligns the simulator with the real world by discovering the causal relationship between the environment parameters and the sim-to-real gap. Concretely, our method learns a differentiable mapping from the environment parameters to the differences between simulated and real-world robot-object trajectories. This mapping is governed by a simultaneously-learned causal graph to help prune the search space of parameters, provide better interpretability, and improve generalization. We perform experiments to achieve both sim-to-sim and sim-to-real transfer, and show that our method has significant improvements in trajectory alignment and task success rate over strong baselines in a challenging manipulation task.
Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest in automated ECG interpretation using machine learning, most current studies focus solely on classification or regression tasks and overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models. Rather than treating ECG diagnosis as a classification or regression task, we propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data. Also, since interpreting ECG as images are more affordable and accessible, we process ECG as encoded images and adopt a vision-language learning paradigm to jointly learn vision-language alignment between encoded ECG images and ECG diagnosis reports. Encoding ECG into images can result in an efficient ECG retrieval system, which will be highly practical and useful in clinical applications. More importantly, our findings could serve as a crucial resource for providing diagnostic services in regions where only paper-printed ECG images are accessible due to past underdevelopment.
Relative radiometric normalization(RRN) of different satellite images of the same terrain is necessary for change detection, object classification/segmentation, and map-making tasks. However, traditional RRN models are not robust, disturbing by object change, and RRN models precisely considering object change can not robustly obtain the no-change set. This paper proposes auto robust relative radiometric normalization methods via latent change noise modeling. They utilize the prior knowledge that no change points possess small-scale noise under relative radiometric normalization and that change points possess large-scale radiometric noise after radiometric normalization, combining the stochastic expectation maximization method to quickly and robustly extract the no-change set to learn the relative radiometric normalization mapping functions. This makes our model theoretically grounded regarding the probabilistic theory and mathematics deduction. Specifically, when we select histogram matching as the relative radiometric normalization learning scheme integrating with the mixture of Gaussian noise(HM-RRN-MoG), the HM-RRN-MoG model achieves the best performance. Our model possesses the ability to robustly against clouds/fogs/changes. Our method naturally generates a robust evaluation indicator for RRN that is the no-change set root mean square error. We apply the HM-RRN-MoG model to the latter vegetation/water change detection task, which reduces the radiometric contrast and NDVI/NDWI differences on the no-change set, generates consistent and comparable results. We utilize the no-change set into the building change detection task, efficiently reducing the pseudo-change and boosting the precision.
Relative radiometric normalization (RRN) mosaicking among multiple remote sensing images is crucial for the downstream tasks, including map-making, image recognition, semantic segmentation, and change detection. However, there are often seam lines on the mosaic boundary and radiometric contrast left, especially in complex scenarios, making the appearance of mosaic images unsightly and reducing the accuracy of the latter classification/recognition algorithms. This paper renders a novel automatical approach to eliminate seam lines in complex RRN mosaicking scenarios. It utilizes the histogram matching on the overlap area to alleviate radiometric contrast, Poisson editing to remove the seam lines, and merging procedure to determine the normalization transfer order. Our method can handle the mosaicking seam lines with arbitrary shapes and images with extreme topological relationships (with a small intersection area). These conditions make the main feathering or blending methods, e.g., linear weighted blending and Laplacian pyramid blending, unavailable. In the experiment, our approach visually surpasses the automatic methods without Poisson editing and the manual blurring and feathering method using GIMP software.
In the field of machine learning, it is still a critical issue to identify and supervise the learned representation without manually intervention or intuition assistance to extract useful knowledge or serve for the latter tasks in machine learning. In this work, we focus on supervising the influential factors extracted by the variational autoencoder(VAE). The VAE is proposed to learn independent low dimension representation while facing the problem that sometimes pre-set factors are ignored. We argue that the mutual information of the input and each learned factor of the representation plays a necessary indicator. We find the VAE objective inclines to induce mutual information sparsity in factor dimension over the data intrinsic dimension and results in some non-influential factors whose function on data reconstruction could be ignored. We show mutual information also influences the lower bound of VAE's reconstruction error and latter classification task. To make such indicator applicable, we design an algorithm on calculating the mutual information for VAE and prove its consistency. Experimental results on Mnist, CelebA and Deap datasets show that mutual information can help determine influential factors, of which some are interpretable and can be used to further generation and classification tasks, and help discover the variant that connects with emotion on Deap dataset.
By simulating the easy-to-hard learning manners of humans/animals, the learning regimes called curriculum learning~(CL) and self-paced learning~(SPL) have been recently investigated and invoked broad interests. However, the intrinsic mechanism for analyzing why such learning regimes can work has not been comprehensively investigated. To this issue, this paper proposes a concave conjugacy theory for looking into the insight of CL/SPL. Specifically, by using this theory, we prove the equivalence of the SPL regime and a latent concave objective, which is closely related to the known non-convex regularized penalty widely used in statistics and machine learning. Beyond the previous theory for explaining CL/SPL insights, this new theoretical framework on one hand facilitates two direct approaches for designing new SPL models for certain tasks, and on the other hand can help conduct the latent objective of self-paced curriculum learning, which is the advanced version of both CL/SPL and possess advantages of both learning regimes to a certain extent. This further facilitates a theoretical understanding for SPCL, instead of only CL/SPL as conventional. Under this theory, we attempt to attain intrinsic latent objectives of two curriculum forms, the partial order and group curriculums, which easily follow the theoretical understanding of the corresponding SPCL regimes.
Self-paced learning (SPL) is a new methodology that simulates the learning principle of humans/animals to start learning easier aspects of a learning task, and then gradually take more complex examples into training. This new-coming learning regime has been empirically substantiated to be effective in various computer vision and pattern recognition tasks. Recently, it has been proved that the SPL regime has a close relationship to a implicit self-paced objective function. While this implicit objective could provide helpful interpretations to the effectiveness, especially the robustness, insights under the SPL paradigms, there are still no theoretical results strictly proved to verify such relationship. To this issue, in this paper, we provide some convergence results on this implicit objective of SPL. Specifically, we prove that the learning process of SPL always converges to critical points of this implicit objective under some mild conditions. This result verifies the intrinsic relationship between SPL and this implicit objective, and makes the previous robustness analysis on SPL complete and theoretically rational.