Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Irene Testini

Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

Jun 10, 2025

Irene Testini, José Hernández-Orallo, Lorenzo Pacchiardi

Figure 1 for Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

Figure 2 for Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

Figure 3 for Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

Figure 4 for Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

Abstract:Data science aims to extract insights from data to support decision-making processes. Recently, Large Language Models (LLMs) are increasingly used as assistants for data science, by suggesting ideas, techniques and small code snippets, or for the interpretation of results and reporting. Proper automation of some data-science activities is now promised by the rise of LLM agents, i.e., AI systems powered by an LLM equipped with additional affordances--such as code execution and knowledge bases--that can perform self-directed actions and interact with digital environments. In this paper, we survey the evaluation of LLM assistants and agents for data science. We find (1) a dominant focus on a small subset of goal-oriented activities, largely ignoring data management and exploratory activities; (2) a concentration on pure assistance or fully autonomous agents, without considering intermediate levels of human-AI collaboration; and (3) an emphasis on human substitution, therefore neglecting the possibility of higher levels of automation thanks to task transformation.

Via

Access Paper or Ask Questions

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Apr 08, 2024

Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao

Figure 1 for 360+x: A Panoptic Multi-modal Scene Understanding Dataset

Figure 2 for 360+x: A Panoptic Multi-modal Scene Understanding Dataset

Figure 3 for 360+x: A Panoptic Multi-modal Scene Understanding Dataset

Figure 4 for 360+x: A Panoptic Multi-modal Scene Understanding Dataset

Abstract:Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentric monocular/binocular views with rich modalities including video, multi-channel audio, directional binaural delay, location data and textual scene descriptions within each scene captured, presenting comprehensive observation of the world. Figure 1 offers a glimpse of all 28 scene categories of our 360+x dataset. To the best of our knowledge, this is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world. Through our benchmark analysis, we presented 5 different scene understanding tasks on the proposed 360+x dataset to evaluate the impact and benefit of each data modality and perspective in panoptic scene understanding. We hope this unique dataset could broaden the scope of comprehensive scene understanding and encourage the community to approach these problems from more diverse perspectives.

* The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024
* CVPR 2024 (Oral Presentation), Project page: https://x360dataset.github.io/

Via

Access Paper or Ask Questions