Abstract:Deep Research (DR) requires LLM agents to autonomously perform multi-step information seeking, processing, and reasoning to generate comprehensive reports. In contrast to existing studies that mainly focus on unstructured web content, a more challenging DR task should additionally utilize structured knowledge to provide a solid data foundation, facilitate quantitative computation, and lead to in-depth analyses. In this paper, we refer to this novel task as Knowledgeable Deep Research (KDR), which requires DR agents to generate reports with both structured and unstructured knowledge. Furthermore, we propose the Hybrid Knowledge Analysis framework (HKA), a multi-agent architecture that reasons over both kinds of knowledge and integrates the texts, figures, and tables into coherent multimodal reports. The key design is the Structured Knowledge Analyzer, which utilizes both coding and vision-language models to produce figures, tables, and corresponding insights. To support systematic evaluation, we construct KDR-Bench, which covers 9 domains, includes 41 expert-level questions, and incorporates a large number of structured knowledge resources (e.g., 1,252 tables). We further annotate the main conclusions and key points for each question and propose three categories of evaluation metrics including general-purpose, knowledge-centric, and vision-enhanced ones. Experimental results demonstrate that HKA consistently outperforms most existing DR agents on general-purpose and knowledge-centric metrics, and even surpasses the Gemini DR agent on vision-enhanced metrics, highlighting its effectiveness in deep, structure-aware knowledge analysis. Finally, we hope this work can serve as a new foundation for structured knowledge analysis in DR agents and facilitate future multimodal DR studies.
Abstract:Vehicle color recognition plays an important role in intelligent traffic management and criminal investigation assistance. However, the current vehicle color recognition research involves at most 13 types of colors and the recognition accuracy is low, which is difficult to meet practical applications. To this end, this paper has built a benchmark dataset (Vehicle Color-24) that includes 24 types of vehicle colors, including 10091 vehicle pictures taken from 100 hours of urban road surveillance videos. In addition, in order to solve the problem of long tail distribution in Vehicle Color-24 dataset and low recognition rate of existing methods, this paper proposes a Smooth Modulated Neural Network with Multi-layer Feature Representation (SMNN-MFR) is used for 24 types of vehicle color recognition. SMNN-MFR includes four parts: feature extraction, multi-scale feature fusion, suggestion frame generation and smooth modulation. The model is trained and verified on the Vehicle Color-24 benchmark dataset. Comprehensive experiments show that the average recognition accuracy of the algorithm in the 24 categories of color benchmark databases is 94.96%, which is 33.47% higher than the Faster RCNN network. In addition, the average accuracy rate of the model when recognizing 8 types of colors is 97.25%, and the detection accuracy of algorithms in similar databases is improved. At the same time, visualization and ablation experiments also proved the rationality of our network settings and the effectiveness of each module. The code and database are published at: https://github.com/mendy-2013.