Abstract:We introduce SecCodeBench-V2, a publicly released benchmark for evaluating Large Language Model (LLM) copilots' capabilities of generating secure code. SecCodeBench-V2 comprises 98 generation and fix scenarios derived from Alibaba Group's industrial productions, where the underlying security issues span 22 common CWE (Common Weakness Enumeration) categories across five programming languages: Java, C, Python, Go, and Node.js. SecCodeBench-V2 adopts a function-level task formulation: each scenario provides a complete project scaffold and requires the model to implement or patch a designated target function under fixed interfaces and dependencies. For each scenario, SecCodeBench-V2 provides executable proof-of-concept (PoC) test cases for both functional validation and security verification. All test cases are authored and double-reviewed by security experts, ensuring high fidelity, broad coverage, and reliable ground truth. Beyond the benchmark itself, we build a unified evaluation pipeline that assesses models primarily via dynamic execution. For most scenarios, we compile and run model-generated artifacts in isolated environments and execute PoC test cases to validate both functional correctness and security properties. For scenarios where security issues cannot be adjudicated with deterministic test cases, we additionally employ an LLM-as-a-judge oracle. To summarize performance across heterogeneous scenarios and difficulty levels, we design a Pass@K-based scoring protocol with principled aggregation over scenarios and severity, enabling holistic and comparable evaluation across models. Overall, SecCodeBench-V2 provides a rigorous and reproducible foundation for assessing the security posture of AI coding assistants, with results and artifacts released at https://alibaba.github.io/sec-code-bench. The benchmark is publicly available at https://github.com/alibaba/sec-code-bench.
Abstract:The maturation of Large Audio Language Models (LALMs) has raised growing expectations for them to comprehend complex audio much like humans. Current efforts primarily replicate text-based reasoning by contextualizing audio content through a one-time encoding, which introduces a critical information bottleneck. Drawing inspiration from human cognition, we propose audio-interleaved reasoning to break through this bottleneck. It treats audio as an active reasoning component, enabling sustained audio engagement and perception-grounded analysis. To instantiate it, we introduce a two-stage training framework, first teaching LALMs to localize salient audio segments through supervised fine-tuning, and then incentivizing proficient re-listening via reinforcement learning. In parallel, a structured data generation pipeline is developed to produce high-quality training data. Consequently, we present Echo, a LALM capable of dynamically re-listening to audio in demand during reasoning. On audio comprehension benchmarks, Echo achieves overall superiority in both challenging expert-level and general-purpose tasks. Comprehensive analysis further confirms the efficiency and generalizability of audio-interleaved reasoning, establishing it as a promising direction for advancing audio comprehension. Project page: https://github.com/wdqqdw/Echo.




Abstract:A new application for real-time monitoring of the lack of movement in older adults' own homes is proposed, aiming to support people's lives and independence in their later years. A lightweight camera monitoring system, based on an RGB-D camera and a compact computer processor, was developed and piloted in community homes to observe the daily behavior of older adults. Instances of body inactivity were detected in everyday scenarios anonymously and unobtrusively. These events can be explained at a higher level, such as a loss of consciousness or physiological deterioration. The accuracy of the inactivity monitoring system is assessed, and statistics of inactivity events related to the daily behavior of the older adults are provided. The results demonstrate that our method performs accurately in inactivity detection across various environments, including low room lighting, TV flickering, and different camera views.




Abstract:Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.



Abstract:The study of tunnel failure characteristics under the load of external explosion source is an important problem in tunnel design and protection, in particular, it is of great significance to construct an intelligent topological feature description of the tunnel failure process. The failure characteristics of tunnels under explosive loading are described by using discrete element method and persistent homology-based machine learning. Firstly, the discrete element model of shallow buried tunnel was established in the discrete element software, and the explosive load was equivalent to a series of uniformly distributed loads acting on the surface by Saint-Venant principle, and the dynamic response of the tunnel under multiple explosive loads was obtained through iterative calculation. The topological characteristics of surrounding rock is studied by persistent homology-based machine learning. The geometric, physical and interunit characteristics of the tunnel subjected to explosive loading are extracted, and the nonlinear mapping relationship between the topological quantity of persistent homology, and the failure characteristics of the surrounding rock is established, and the results of the intelligent description of the failure characteristics of the tunnel are obtained. The research shows that the length of the longest Betty 1 bar code is closely related to the stability of the tunnel, which can be used for effective early warning of the tunnel failure, and an intelligent description of the tunnel failure process can be established to provide a new idea for tunnel engineering protection.



Abstract:With respect to machine operation tasks, the experiences from different skill level operators, especially novices, can provide worthy understanding about the manner in which they perceive the operational environment and formulate knowledge to deal with various operation situations. In this study, we describe the operator's behaviors by utilizing the relations among their head, hand, and operation location (hotspot) during the operation. A total of 40 experiences associated with a sewing machine operation task performed by amateur operators was recorded via a head-mounted RGB-D camera. We examined important features of operational behaviors in different skill level operators and confirmed their correlation to the difficulties of the operation steps. The result shows that the pure-gazing behavior is significantly reduced when the operator's skill improved. Moreover, the hand-approaching duration and the frequency of attention movement before operation are strongly correlated to the operational difficulty in such machine operating environments.