The Traveling Salesman Problem (TSP) is a well-known problem in combinatorial optimization with applications in various domains. However, existing TSP solvers face challenges in producing high-quality solutions with low latency. To address this issue, we propose NAR4TSP, which produces TSP solutions in a Non-Autoregressive (NAR) manner using a specially designed Graph Neural Network (GNN), achieving faster inference speed. Moreover, NAR4TSP is trained using an enhanced Reinforcement Learning (RL) strategy, eliminating the dependency on costly labels used to train conventional supervised learning-based NAR models. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR decoding. The experimental results on both synthetic and real-world TSP instances demonstrate that NAR4TSP outperforms four state-of-the-art models in terms of solution quality, inference latency, and generalization ability. Lastly, we present visualizations of NAR4TSP's decoding process and its overall path planning to showcase the feasibility of implementing NAR4TSP in an end-to-end manner and its effectiveness, respectively.
We study the problem of registration for medical CT images from a novel perspective -- the sensitivity to degree of deformations in CT images. Although some learning-based methods have shown success in terms of average accuracy, their ability to handle regions with local large deformation (LLD) may significantly decrease compared to dealing with regions with minor deformation. This motivates our research into this issue. Two main causes of LLDs are organ motion and changes in tissue structure, with the latter often being a long-term process. In this paper, we propose a novel registration model called Cascade-Dilation Inter-Layer Differential Network (CDIDN), which exhibits both high deformation impedance capability (DIC) and accuracy. CDIDN improves its resilience to LLDs in CT images by enhancing LLDs in the displacement field (DF). It uses a feature-based progressive decomposition of LLDs, blending feature flows of different levels into a main flow in a top-down manner. It leverages Inter-Layer Differential Module (IDM) at each level to locally refine the main flow and globally smooth the feature flow, and also integrates feature velocity fields that can effectively handle feature deformations of various degrees. We assess CDIDN using lungs as representative organs with large deformation. Our findings show that IDM significantly enhances LLDs of the DF, by which improves the DIC and accuracy of the model. Compared with other outstanding learning-based methods, CDIDN exhibits the best DIC and excellent accuracy. Based on vessel enhancement and enhanced LLDs of the DF, we propose a novel method to accurately track the appearance, disappearance, enlargement, and shrinkage of pulmonary lesions, which effectively addresses detection of early lesions and peripheral lung lesions, issues of false enlargement, false shrinkage, and mutilation of lesions.
Implicit neural representation (INR) characterizes the attributes of a signal as a function of corresponding coordinates which emerges as a sharp weapon for solving inverse problems. However, the expressive power of INR is limited by the spectral bias in the network training. In this paper, we find that such a frequency-related problem could be greatly solved by re-arranging the coordinates of the input signal, for which we propose the disorder-invariant implicit neural representation (DINER) by augmenting a hash-table to a traditional INR backbone. Given discrete signals sharing the same histogram of attributes and different arrangement orders, the hash-table could project the coordinates into the same distribution for which the mapped signal can be better modeled using the subsequent INR network, leading to significantly alleviated spectral bias. Furthermore, the expressive power of the DINER is determined by the width of the hash-table. Different width corresponds to different geometrical elements in the attribute space, \textit{e.g.}, 1D curve, 2D curved-plane and 3D curved-volume when the width is set as $1$, $2$ and $3$, respectively. More covered areas of the geometrical elements result in stronger expressive power. Experiments not only reveal the generalization of the DINER for different INR backbones (MLP vs. SIREN) and various tasks (image/video representation, phase retrieval, refractive index recovery, and neural radiance field optimization) but also show the superiority over the state-of-the-art algorithms both in quality and speed. \textit{Project page:} \url{https://ezio77.github.io/DINER-website/}
Fine-grained classification and counting of bone marrow erythroid cells are vital for evaluating the health status and formulating therapeutic schedules for leukemia or hematopathy. Due to the subtle visual differences between different types of erythroid cells, it is challenging to apply existing image-based deep learning models for fine-grained erythroid cell classification. Moreover, there is no large open-source datasets on erythroid cells to support the model training. In this paper, we introduce BMEC (Bone Morrow Erythroid Cells), the first large fine-grained image dataset of erythroid cells, to facilitate more deep learning research on erythroid cells. BMEC contains 5,666 images of individual erythroid cells, each of which is extracted from the bone marrow erythroid cell smears and professionally annotated to one of the four types of erythroid cells. To distinguish the erythroid cells, one key indicator is the cell shape which is closely related to the cell growth and maturation. Therefore, we design a novel shape-aware image classification network for fine-grained erythroid cell classification. The shape feature is extracted from the shape mask image and aggregated to the raw image feature with a shape attention module. With the shape-attended image feature, our network achieved superior classification performance (81.12\% top-1 accuracy) on the BMEC dataset comparing to the baseline methods. Ablation studies also demonstrate the effectiveness of incorporating the shape information for the fine-grained cell classification. To further verify the generalizability of our method, we tested our network on two additional public white blood cells (WBC) datasets and the results show our shape-aware method can generally outperform recent state-of-the-art works on classifying the WBC. The code and BMEC dataset can be found on https://github.com/wangye8899/BMEC.
Implicit neural representation (INR) characterizes the attributes of a signal as a function of corresponding coordinates which emerges as a sharp weapon for solving inverse problems. However, the capacity of INR is limited by the spectral bias in the network training. In this paper, we find that such a frequency-related problem could be largely solved by re-arranging the coordinates of the input signal, for which we propose the disorder-invariant implicit neural representation (DINER) by augmenting a hash-table to a traditional INR backbone. Given discrete signals sharing the same histogram of attributes and different arrangement orders, the hash-table could project the coordinates into the same distribution for which the mapped signal can be better modeled using the subsequent INR network, leading to significantly alleviated spectral bias. Experiments not only reveal the generalization of the DINER for different INR backbones (MLP vs. SIREN) and various tasks (image/video representation, phase retrieval, and refractive index recovery) but also show the superiority over the state-of-the-art algorithms both in quality and speed.
In contrast to humans and animals who naturally execute seamless motions, learning and smoothly executing sequences of actions remains a challenge in robotics. This paper introduces a novel skill-agnostic framework that learns to sequence and blend skills based on differentiable optimization. Our approach encodes sequences of previously-defined skills as quadratic programs (QP), whose parameters determine the relative importance of skills along the task. Seamless skill sequences are then learned from demonstrations by exploiting differentiable optimization layers and a tailored loss formulated from the QP optimality conditions. Via the use of differentiable optimization, our work offers novel perspectives on multitask control. We validate our approach in a pick-and-place scenario with planar robots, a pouring experiment with a real humanoid robot, and a bimanual sweeping task with a human model.
Achieving realistic, vivid, and human-like synthesized conversational gestures conditioned on multi-modal data is still an unsolved problem, due to the lack of available datasets, models and standard evaluation metrics. To address this, we build Body-Expression-Audio-Text dataset, BEAT, which has i) 76 hours, high-quality, multi-modal data captured from 30 speakers talking with eight different emotions and in four different languages, ii) 32 millions frame-level emotion and semantic relevance annotations.Our statistical analysis on BEAT demonstrates the correlation of conversational gestures with facial expressions, emotions, and semantics, in addition to the known correlation with audio, text, and speaker identity. Qualitative and quantitative experiments demonstrate metrics' validness, ground truth data quality, and baseline's state-of-the-art performance. To the best of our knowledge, BEAT is the largest motion capture dataset for investigating the human gestures, which may contribute to a number of different research fields including controllable gesture synthesis, cross-modality analysis, emotional gesture recognition. The data, code and model will be released for research.
We construct a contextual network to represent a document with syntactic and semantic relations between word-sentence pairs, based on which we devise an unsupervised algorithm called CNATAR (Contextual Network And Text Analysis Rank) to score sentences, and rank them through a bi-objective 0-1 knapsack maximization problem over topic analysis and sentence scores. We show that CNATAR outperforms the combined ranking of the three human judges provided on the SummBank dataset under both ROUGE and BLEU metrics, which in term significantly outperforms each individual judge's ranking. Moreover, CNATAR produces so far the highest ROUGE scores over DUC-02, and outperforms previous supervised algorithms on the CNN/DailyMail and NYT datasets. We also compare the performance of CNATAR and the latest supervised neural-network summarization models and compute oracle results.