Abstract:Large language models (LLMs) are increasingly used to complete complex tasks by selecting and coordinating external tools across multiple steps. This requires aligning tool choices with subtask intent while satisfying directional execution dependencies among tools. To do this, existing methods model these dependencies as tool graphs and incorporate the graphs with LLMs through retrieval, serialization, or prompt-level injection. However, these external graph-use strategies all follow a matching paradigm, which often fails to align tool choices with the underlying subtask structure, producing semantically plausible plans that violate graph constraints. This issue is further exacerbated by error accumulation, where an early incorrect tool selection shifts the plan into an invalid graph state and causes subsequent predictions to drift away from the valid execution path. To address these challenges, we propose GRAFT, a graph-tokenized language model framework for dependency-aware tool planning. GRAFT internalizes the tool graph by mapping each tool node to a dedicated special token and learning directed tool dependencies within the representation space. It further introduces on-policy tool context distillation, training the model on its own sampled trajectories while distilling stepwise planning signals. Experiments show that GRAFT achieves state-of-the-art performance in exact sequence matching and dependency legality, supporting more reliable LLM tool planning in complex workflows.
Abstract:This research presents a novel active detection model utilizing deep reinforcement learning to accurately detect traffic objects in real-world scenarios. The model employs a deep Q-network based on LSTM-CNN that identifies and aligns target zones with specific categories of traffic objects through implementing a top-down approach with efficient feature extraction of the environment. The model integrates historical and current actions and observations to make a comprehensive analysis. The design of the state space and reward function takes into account the impact of time steps to enable the model to complete the task in fewer steps. Tests conducted demonstrate the model's proficiency, exhibiting exceptional precision and performance in locating traffic signal lights and speed limit signs. The findings of this study highlight the efficacy and potential of the deep reinforcement learning-based active detection model in traffic-related applications, underscoring its robust detection abilities and promising performance.