Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tao Dong

From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering

Dec 29, 2025

Tao Dong, Harini Sampath, Ja Young Lee, Sherry Y. Shi, Andrew Macvean

Abstract:As Large Language Models (LLMs) evolve from code generators into collaborative partners for software engineers, our methods for evaluation are lagging. Current benchmarks, focused on code correctness, fail to capture the nuanced, interactive behaviors essential for successful human-AI partnership. To bridge this evaluation gap, this paper makes two core contributions. First, we present a foundational taxonomy of desirable agent behaviors for enterprise software engineering, derived from an analysis of 91 sets of user-defined agent rules. This taxonomy defines four key expectations of agent behavior: Adhere to Standards and Processes, Ensure Code Quality and Reliability, Solving Problems Effectively, and Collaborating with the User. Second, recognizing that these expectations are not static, we introduce the Context-Adaptive Behavior (CAB) Framework. This emerging framework reveals how behavioral expectations shift along two empirically-derived axes: the Time Horizon (from immediate needs to future ideals), established through interviews with 15 expert engineers, and the Type of Work (from enterprise production to rapid prototyping, for example), identified through a prompt analysis of a prototyping agent. Together, these contributions offer a human-centered foundation for designing and evaluating the next generation of AI agents, moving the field's focus from the correctness of generated code toward the dynamics of true collaborative intelligence.

Via

Access Paper or Ask Questions

Multi Objective Resource Optimization of Wireless Network Based on Cross Domain Virtual Network Embedding

Feb 03, 2022

Chao Wang, Tao Dong, Youxiang Duan, Qifeng Sun, Peiying Zhang

Figure 1 for Multi Objective Resource Optimization of Wireless Network Based on Cross Domain Virtual Network Embedding

Figure 2 for Multi Objective Resource Optimization of Wireless Network Based on Cross Domain Virtual Network Embedding

Figure 3 for Multi Objective Resource Optimization of Wireless Network Based on Cross Domain Virtual Network Embedding

Figure 4 for Multi Objective Resource Optimization of Wireless Network Based on Cross Domain Virtual Network Embedding

Abstract:The rapid development of virtual network architecture makes it possible for wireless network to be widely used. With the popularity of artificial intelligence (AI) industry in daily life, efficient resource allocation of wireless network has become a problem. Especially when network users request wireless network resources from different management domains, they still face many practical problems. From the perspective of virtual network embedding (VNE), this paper designs and implements a multi-objective optimization VNE algorithm for wireless network resource allocation. Resource allocation in virtual network is essentially a problem of allocating underlying resources for virtual network requests (VNRs). According to the proposed objective formula, we consider the optimization mapping cost, network delay and VNR acceptance rate. VNE is completed by node mapping and link mapping. In the experiment and simulation stage, it is compared with other VNE algorithms, the cross domain VNE algorithm proposed in this paper is optimal in the above three indicators. This shows the effectiveness of the algorithm in wireless network resource allocation.

Via

Access Paper or Ask Questions

A Machine Learning Framework for Stock Selection

Aug 08, 2018

XingYu Fu, JinHong Du, YiFeng Guo, MingWen Liu, Tao Dong, XiuWen Duan

Figure 1 for A Machine Learning Framework for Stock Selection

Figure 2 for A Machine Learning Framework for Stock Selection

Figure 3 for A Machine Learning Framework for Stock Selection

Figure 4 for A Machine Learning Framework for Stock Selection

Abstract:This paper demonstrates how to apply machine learning algorithms to distinguish good stocks from the bad stocks. To this end, we construct 244 technical and fundamental features to characterize each stock, and label stocks according to their ranking with respect to the return-to-volatility ratio. Algorithms ranging from traditional statistical learning methods to recently popular deep learning method, e.g. Logistic Regression (LR), Random Forest (RF), Deep Neural Network (DNN), and the Stacking, are trained to solve the classification task. Genetic Algorithm (GA) is also used to implement feature selection. The effectiveness of the stock selection strategy is validated in Chinese stock market in both statistical and practical aspects, showing that: 1) Stacking outperforms other models reaching an AUC score of 0.972; 2) Genetic Algorithm picks a subset of 114 features and the prediction performances of all models remain almost unchanged after the selection procedure, which suggests some features are indeed redundant; 3) LR and DNN are radical models; RF is risk-neutral model; Stacking is somewhere between DNN and RF. 4) The portfolios constructed by our models outperform market average in back tests.

Via

Access Paper or Ask Questions

A Computational Method for Evaluating UI Patterns

Jul 11, 2018

Bardia Doosti, Tao Dong, Biplab Deka, Jeffrey Nichols

Figure 1 for A Computational Method for Evaluating UI Patterns

Figure 2 for A Computational Method for Evaluating UI Patterns

Figure 3 for A Computational Method for Evaluating UI Patterns

Figure 4 for A Computational Method for Evaluating UI Patterns

Abstract:UI design languages, such as Google's Material Design, make applications both easier to develop and easier to learn by providing a set of standard UI components. Nonetheless, it is hard to assess the impact of design languages in the wild. Moreover, designers often get stranded by strong-opinionated debates around the merit of certain UI components, such as the Floating Action Button and the Navigation Drawer. To address these challenges, this short paper introduces a method for measuring the impact of design languages and informing design debates through analyzing a dataset consisting of view hierarchies, screenshots, and app metadata for more than 9,000 mobile apps. Our data analysis shows that use of Material Design is positively correlated to app ratings, and to some extent, also the number of installs. Furthermore, we show that use of UI components vary by app category, suggesting a more nuanced view needed in design debates.

Via

Access Paper or Ask Questions