Celine
Abstract:Recent advances in reasoning-centric language models have highlighted reinforcement learning (RL) as a promising method for aligning models with verifiable rewards. However, it remains contentious whether RL truly expands a model's reasoning capabilities or merely amplifies high-reward outputs already latent in the base model's distribution, and whether continually scaling up RL compute reliably leads to improved reasoning performance. In this work, we challenge prevailing assumptions by demonstrating that prolonged RL (ProRL) training can uncover novel reasoning strategies that are inaccessible to base models, even under extensive sampling. We introduce ProRL, a novel training methodology that incorporates KL divergence control, reference policy resetting, and a diverse suite of tasks. Our empirical analysis reveals that RL-trained models consistently outperform base models across a wide range of pass@k evaluations, including scenarios where base models fail entirely regardless of the number of attempts. We further show that reasoning boundary improvements correlates strongly with task competence of base model and training duration, suggesting that RL can explore and populate new regions of solution space over time. These findings offer new insights into the conditions under which RL meaningfully expands reasoning boundaries in language models and establish a foundation for future work on long-horizon RL for reasoning. We release model weights to support further research: https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B
Abstract:In cluttered spaces, such as forests, drone picking up a payload via an abseil claw is an open challenge, as the cable is likely tangled and blocked by the branches and obstacles. To address such a challenge, in this work, a cooperative aerial system is proposed, which consists of a payload drone and a dexterous rappelling end droid. The two ends are linked via a Kevlar tether cable. The end droid is actuated by four propellers, which enable mid-air dexterous adjustment of clawing angle and guidance of cable movement. To avoid tanglement and rappelling obstacles, a trajectory optimization method that integrates cable length constraints and dynamic feasibility is developed, which guarantees safe pickup. A tether cable dynamic model is established to evaluate real-time cable status, considering both taut and sagging conditions. Simulation and real-world experiments are conducted to demonstrate that the proposed system is capable of picking up payload in cluttered spaces. As a result, the end droid can reach the target point successfully under cable constraints and achieve passive retrieval during the lifting phase without propulsion, which enables effective and efficient aerial manipulation.
Abstract:Traditional Chinese medicine, as an essential component of traditional medicine, contains active ingredients that serve as a crucial source for modern drug development, holding immense therapeutic potential and development value. A multi-layered and complex network is formed from Chinese medicine to diseases and used to predict the potential associations between Chinese medicine ingredients and diseases. This study proposes an ingredient-disease association prediction model (Node2Vec-DGI-EL) based on hierarchical graph representation learning. First, the model uses the Node2Vec algorithm to extract node embedding vectors from the network as the initial features of the nodes. Next, the network nodes are deeply represented and learned using the DGI algorithm to enhance the model's expressive power. To improve prediction accuracy and robustness, an ensemble learning method is incorporated to achieve more accurate ingredient-disease association predictions. The effectiveness of the model is then evaluated through a series of theoretical verifications. The results demonstrated that the proposed model significantly outperformed existing methods, achieving an AUC of 0.9987 and an AUPR of 0.9545, thereby indicating superior predictive capability. Ablation experiments further revealed the contribution and importance of each module. Additionally, case studies explored potential associations, such as triptonide with hypertensive retinopathy and methyl ursolate with colorectal cancer. Molecular docking experiments validated these findings, showing the triptonide-PGR interaction and the methyl ursolate-NFE2L2 interaction can bind stable. In conclusion, the Node2Vec-DGI-EL model focuses on TCM datasets and effectively predicts ingredient-disease associations, overcoming the reliance on node semantic information.
Abstract:State space models (SSMs) have emerged as an efficient alternative to Transformer models for language modeling, offering linear computational complexity and constant memory usage as context length increases. However, despite their efficiency in handling long contexts, recent studies have shown that SSMs, such as Mamba models, generally underperform compared to Transformers in long-context understanding tasks. To address this significant shortfall and achieve both efficient and accurate long-context understanding, we propose LongMamba, a training-free technique that significantly enhances the long-context capabilities of Mamba models. LongMamba builds on our discovery that the hidden channels in Mamba can be categorized into local and global channels based on their receptive field lengths, with global channels primarily responsible for long-context capability. These global channels can become the key bottleneck as the input context lengthens. Specifically, when input lengths largely exceed the training sequence length, global channels exhibit limitations in adaptively extend their receptive fields, leading to Mamba's poor long-context performance. The key idea of LongMamba is to mitigate the hidden state memory decay in these global channels by preventing the accumulation of unimportant tokens in their memory. This is achieved by first identifying critical tokens in the global channels and then applying token filtering to accumulate only those critical tokens. Through extensive benchmarking across synthetic and real-world long-context scenarios, LongMamba sets a new standard for Mamba's long-context performance, significantly extending its operational range without requiring additional training. Our code is available at https://github.com/GATECH-EIC/LongMamba.
Abstract:Pre-training datasets are typically collected from web content and lack inherent domain divisions. For instance, widely used datasets like Common Crawl do not include explicit domain labels, while manually curating labeled datasets such as The Pile is labor-intensive. Consequently, identifying an optimal pre-training data mixture remains a challenging problem, despite its significant benefits for pre-training performance. To address these challenges, we propose CLustering-based Iterative Data Mixture Bootstrapping (CLIMB), an automated framework that discovers, evaluates, and refines data mixtures in a pre-training setting. Specifically, CLIMB embeds and clusters large-scale datasets in a semantic space and then iteratively searches for optimal mixtures using a smaller proxy model and a predictor. When continuously trained on 400B tokens with this mixture, our 1B model exceeds the state-of-the-art Llama-3.2-1B by 2.0%. Moreover, we observe that optimizing for a specific domain (e.g., Social Sciences) yields a 5% improvement over random sampling. Finally, we introduce ClimbLab, a filtered 1.2-trillion-token corpus with 20 clusters as a research playground, and ClimbMix, a compact yet powerful 400-billion-token dataset designed for efficient pre-training that delivers superior performance under an equal token budget. We analyze the final data mixture, elucidating the characteristics of an optimal data mixture. Our data is available at: https://research.nvidia.com/labs/lpr/climb/
Abstract:In this paper, we characterize the adaptive multiple path loss exponent (AMPLE) radio propagation model under urban macrocell (UMa) and urban microcell (UMi) scenarios from 0.85-5 GHz using Ranplan Professional. We first enhance the original AMPLE model by introducing an additional frequency coefficient to support path loss prediction across multiple carrier frequencies. By using measurement-validated Ranplan Professional simulator, we simulate four cities and validate the simulations for further path loss model characterization. Specifically, we extract the close-in (CI) model parameters from the simulations and compare them with parameters extracted from measurements in other works. Under the ray-based model characterization, we compare the AMPLE model with the 3rd Generation Partnership Project (3GPP) path loss model, the CI model, the alpha-beta-gamma (ABG) model, and those with simulation calibrations. In addition to standard performance metrics, we introduce the prediction-measurement difference error (PMDE) to assess overall prediction alignment with measurement, and mean simulation time per data point to evaluate model complexity. The results show that the AMPLE model outperforms existing models while maintaining similar model complexity.
Abstract:In recent years, miniature wall-climbing robots have attracted widespread attention due to their significant potential in equipment inspection and in-situ repair applications. Traditional wall-climbing systems typically rely on electromagnetic, electrostatic, vacuum suction, or van der Waals forces for controllable adhesion. However, these conventional methods impose limitations when striving for both a compact design and high-speed mobility. This paper proposes a novel Vibration-Based Adhesion (VBA) technique, which utilizes a flexible disk vibrating near a surface to generate a strong and controllable attractive force without direct contact. By employing an electric motor as the vibration source, the constructed VBA system was experimentally evaluated, achieving an adhesion-to-weight ratio exceeding 51 times. The experimental results demonstrate that this adhesion mechanism not only provides a high normal force but also maintains minimal shear force, making it particularly suitable for high-speed movement and heavy load applications in miniature wall-climbing robots.
Abstract:Repositioning drug-disease relationships has always been a hot field of research. However, actual cases of biologically validated drug relocation remain very limited, and existing models have not yet fully utilized the structural information of the drug. Furthermore, most repositioning models are only used to complete the relationship matrix, and their practicality is poor when dealing with drug cold start problems. This paper proposes a structure-enhanced multimodal relationship prediction model (SMRP). SMPR is based on the SMILE structure of the drug, using the Mol2VEC method to generate drug embedded representations, and learn disease embedded representations through heterogeneous network graph neural networks. Ultimately, a drug-disease relationship matrix is constructed. In addition, to reduce the difficulty of users' use, SMPR also provides a cold start interface based on structural similarity based on reposition results to simply and quickly predict drug-related diseases. The repositioning ability and cold start capability of the model are verified from multiple perspectives. While the AUC and ACUPR scores of repositioning reach 99% and 61% respectively, the AUC of cold start achieve 80%. In particular, the cold start Recall indicator can reach more than 70%, which means that SMPR is more sensitive to positive samples. Finally, case analysis is used to verify the practical value of the model and visual analysis directly demonstrates the improvement of the structure to the model. For quick use, we also provide local deployment of the model and package it into an executable program.
Abstract:Generative retrieval constitutes an innovative approach in information retrieval, leveraging generative language models (LM) to generate a ranked list of document identifiers (docid) for a given query. It simplifies the retrieval pipeline by replacing the large external index with model parameters. However, existing works merely learned the relationship between queries and document identifiers, which is unable to directly represent the relevance between queries and documents. To address the above problem, we propose a novel and general generative retrieval framework, namely Leveraging Document-Oriented Contrastive Learning in Generative Retrieval (DOGR), which leverages contrastive learning to improve generative retrieval tasks. It adopts a two-stage learning strategy that captures the relationship between queries and documents comprehensively through direct interactions. Furthermore, negative sampling methods and corresponding contrastive learning objectives are implemented to enhance the learning of semantic representations, thereby promoting a thorough comprehension of the relationship between queries and documents. Experimental results demonstrate that DOGR achieves state-of-the-art performance compared to existing generative retrieval methods on two public benchmark datasets. Further experiments have shown that our framework is generally effective for common identifier construction techniques.
Abstract:Repetitive action counting (RAC) aims to estimate the number of class-agnostic action occurrences in a video without exemplars. Most current RAC methods rely on a raw frame-to-frame similarity representation for period prediction. However, this approach can be significantly disrupted by common noise such as action interruptions and inconsistencies, leading to sub-optimal counting performance in realistic scenarios. In this paper, we introduce a foreground localization optimization objective into similarity representation learning to obtain more robust and efficient video features. We propose a Localization-Aware Multi-Scale Representation Learning (LMRL) framework. Specifically, we apply a Multi-Scale Period-Aware Representation (MPR) with a scale-specific design to accommodate various action frequencies and learn more flexible temporal correlations. Furthermore, we introduce the Repetition Foreground Localization (RFL) method, which enhances the representation by coarsely identifying periodic actions and incorporating global semantic information. These two modules can be jointly optimized, resulting in a more discerning periodic action representation. Our approach significantly reduces the impact of noise, thereby improving counting accuracy. Additionally, the framework is designed to be scalable and adaptable to different types of video content. Experimental results on the RepCountA and UCFRep datasets demonstrate that our proposed method effectively handles repetitive action counting.