Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yewon Lim

Online Agent-as-a-Judge: Situation-Generating Evaluation for Interactive Agents

Jun 06, 2026

Hyogon Ryu, Jeonghwan Kim, Yewon Lim, Chaeun Lee, Jeongwook Kim, Donghoon Ham

Abstract:Evaluating LLM-powered interactive social agents is challenging because socially relevant behaviors depend not only on isolated outputs, but also on prior interactions, social roles, and downstream actions. Existing methods typically allow a target agent to act freely in an environment and then score the resulting trajectory. However, this passive setup can miss capabilities that only become observable under specific social circumstances; for example, conflict handling may remain untested if no disagreement arises. We propose Online Agent-as-a-Judge, a situation-generating evaluation framework for interactive social agents. Online Agent-as-a-Judge deploys an in-world evaluator agent that interacts with the target agent through the environment's native dialogue and action protocol, actively eliciting situations relevant to the evaluation criteria. The resulting trajectories provide evidence for assessing both immediate responses and subsequent behavior. In a life-simulation environment with $32$ designer-authored social criteria, Online Agent-as-a-Judge improves criteria coverage and agreement with human labels, yielding more reliable evidence-grounded evaluations of behaviors that passive methods can leave unobserved.

* ICML 2026 Workshop on Trustworthy AI for Good

Via

Access Paper or Ask Questions

DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection

Jun 02, 2024

Yewon Lim, Changyeon Lee, Aerin Kim, Oren Etzioni

Abstract:A dramatic influx of diffusion-generated images has marked recent years, posing unique challenges to current detection technologies. While the task of identifying these images falls under binary classification, a seemingly straightforward category, the computational load is significant when employing the "reconstruction then compare" technique. This approach, known as DIRE (Diffusion Reconstruction Error), not only identifies diffusion-generated images but also detects those produced by GANs, highlighting the technique's broad applicability. To address the computational challenges and improve efficiency, we propose distilling the knowledge embedded in diffusion models to develop rapid deepfake detection models. Our approach, aimed at creating a small, fast, cheap, and lightweight diffusion synthesized deepfake detector, maintains robust performance while significantly reducing operational demands. Maintaining performance, our experimental results indicate an inference speed 3.2 times faster than the existing DIRE framework. This advance not only enhances the practicality of deploying these systems in real-world settings but also paves the way for future research endeavors that seek to leverage diffusion model knowledge.

* 6 pages, 1 figure

Via

Access Paper or Ask Questions

Scalable Object Detection on Embedded Devices Using Weight Pruning and Singular Value Decomposition

Mar 17, 2023

Dohyun Ham, Jaeyeop Jeong, June-Kyoo Park, Raehyeon Jeong, Seungmin Jeon, Hyeongjun Jeon, Yewon Lim

Figure 1 for Scalable Object Detection on Embedded Devices Using Weight Pruning and Singular Value Decomposition

Figure 2 for Scalable Object Detection on Embedded Devices Using Weight Pruning and Singular Value Decomposition

Figure 3 for Scalable Object Detection on Embedded Devices Using Weight Pruning and Singular Value Decomposition

Figure 4 for Scalable Object Detection on Embedded Devices Using Weight Pruning and Singular Value Decomposition

Abstract:This paper presents a method for optimizing object detection models by combining weight pruning and singular value decomposition (SVD). The proposed method was evaluated on a custom dataset of street work images obtained from https://universe.roboflow.com/roboflow-100/street-work. The dataset consists of 611 training images, 175 validation images, and 87 test images with 7 classes. We compared the performance of the optimized models with the original unoptimized model in terms of frame rate, mean average precision (mAP@50), and weight size. The results show that the weight pruning + SVD model achieved a 0.724 mAP@50 with a frame rate of 1.48 FPS and a weight size of 12.1 MB, outperforming the original model (0.717 mAP@50, 1.50 FPS, and 12.3 MB). Precision-recall curves were also plotted for all models. Our work demonstrates that the proposed method can effectively optimize object detection models while balancing accuracy, speed, and model size.

* 8 pages, 3 figures. A report of the project done as part of the Yonsei-Roboin project for the 2nd semester, 2022

Via

Access Paper or Ask Questions