Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Roberto Daza

Leveraging Avatar Fingerprinting: A Multi-Generator Photorealistic Talking-Head Public Database and Benchmark

Mar 27, 2026

Laura Pedrouzo-Rodriguez, Luis F. Gomez, Ruben Tolosana, Ruben Vera-Rodriguez, Roberto Daza, Aythami Morales, Julian Fierrez

Abstract:Recent advances in photorealistic avatar generation have enabled highly realistic talking-head avatars, raising security concerns regarding identity impersonation in AI-mediated communication. To advance in this challenging problem, the task of avatar fingerprinting aims to determine whether two avatar videos are driven by the same human operator or not. However, current public databases in the literature are scarce and based solely on old-fashioned talking-head avatar generators, not representing realistic scenarios for the current task of avatar fingerprinting. To overcome this situation, the present article introduces AVAPrintDB, a new publicly available multi-generator talking-head avatar database for avatar fingerprinting. AVAPrintDB is constructed from two audiovisual corpora and three state-of-the-art avatar generators (GAGAvatar, LivePortrait, HunyuanPortrait), representing different synthesis paradigms, and includes both self- and cross-reenactments to simulate legitimate usage and impersonation scenarios. Building on this database, we also define a standardized and reproducible benchmark for avatar fingerprinting, considering public state-of-the-art avatar fingerprinting systems and exploring novel methods based on Foundation Models (DINOv2 and CLIP). Also, we conduct a comprehensive analysis under generator and dataset shift. Our results show that, while identity-related motion cues persist across synthetic avatars, current avatar fingerprinting systems remain highly sensitive to changes in the synthesis pipeline and source domain. The AVAPrintDB, benchmark protocols, and avatar fingerprinting systems are publicly available to facilitate reproducible research.

Via

Access Paper or Ask Questions

EduEVAL-DB: A Role-Based Dataset for Pedagogical Risk Evaluation in Educational Explanations

Feb 19, 2026

Javier Irigoyen, Roberto Daza, Aythami Morales, Julian Fierrez, Francisco Jurado, Alvaro Ortigosa, Ruben Tolosana

Abstract:This work introduces EduEVAL-DB, a dataset based on teacher roles designed to support the evaluation and training of automatic pedagogical evaluators and AI tutors for instructional explanations. The dataset comprises 854 explanations corresponding to 139 questions from a curated subset of the ScienceQA benchmark, spanning science, language, and social science across K-12 grade levels. For each question, one human-teacher explanation is provided and six are generated by LLM-simulated teacher roles. These roles are inspired by instructional styles and shortcomings observed in real educational practice and are instantiated via prompt engineering. We further propose a pedagogical risk rubric aligned with established educational standards, operationalizing five complementary risk dimensions: factual correctness, explanatory depth and completeness, focus and relevance, student-level appropriateness, and ideological bias. All explanations are annotated with binary risk labels through a semi-automatic process with expert teacher review. Finally, we present preliminary validation experiments to assess the suitability of EduEVAL-DB for evaluation. We benchmark a state-of-the-art education-oriented model (Gemini 2.5 Pro) against a lightweight local Llama 3.1 8B model and examine whether supervised fine-tuning on EduEVAL-DB supports pedagogical risk detection using models deployable on consumer hardware.

* 10 pages, 3 figures. Published in Intl. Conf. on Learning Analytics & Knowledge Workshops (LAK Workshops 2026, GenAI-LA 26)

Via

Access Paper or Ask Questions

GenAI-LA: Generative AI and Learning Analytics Workshop (LAK 2026), April 27--May 1, 2026, Bergen, Norway

Feb 17, 2026

Javier Irigoyen, Roberto Daza, Aythami Morales, Julian Fierrez, Francisco Jurado, Alvaro Ortigosa, Ruben Tolosana

* 10 pages, 3 figures. Published in Intl. Conf. on Learning Analytics & Knowledge Workshops (LAK Workshops 2026, GenAI-LA 26)

Via

Access Paper or Ask Questions

A Multimodal Dataset of Student Oral Presentations with Sensors and Evaluation Data

Jan 12, 2026

Alvaro Becerra, Ruth Cobos, Roberto Daza

Abstract:Oral presentation skills are a critical component of higher education, yet comprehensive datasets capturing real-world student performance across multiple modalities remain scarce. To address this gap, we present SOPHIAS (Student Oral Presentation monitoring for Holistic Insights & Analytics using Sensors), a 12-hour multimodal dataset containing recordings of 50 oral presentations (10-15-minute presentation followed by 5-15-minute Q&A) delivered by 65 undergraduate and master's students at the Universidad Autonoma de Madrid. SOPHIAS integrates eight synchronized sensor streams from high-definition webcams, ambient and webcam audio, eye-tracking glasses, smartwatch physiological sensors, and clicker, keyboard, and mouse interactions. In addition, the dataset includes slides and rubric-based evaluations from teachers, peers, and self-assessments, along with timestamped contextual annotations. The dataset captures presentations conducted in real classroom settings, preserving authentic student behaviors, interactions, and physiological responses. SOPHIAS enables the exploration of relationships between multimodal behavioral and physiological signals and presentation performance, supports the study of peer assessment, and provides a benchmark for developing automated feedback and Multimodal Learning Analytics tools. The dataset is publicly available for research through GitHub and Science Data Bank.

* Article under review in the journal Scientific Data. GitHub repository of the dataset at: https://github.com/dataGHIA/SOPHIAS

Via

Access Paper or Ask Questions

MOSAIC-F: A Framework for Enhancing Students' Oral Presentation Skills through Personalized Feedback

Jun 10, 2025

Alvaro Becerra, Daniel Andres, Pablo Villegas, Roberto Daza, Ruth Cobos

Figure 1 for MOSAIC-F: A Framework for Enhancing Students' Oral Presentation Skills through Personalized Feedback

Figure 2 for MOSAIC-F: A Framework for Enhancing Students' Oral Presentation Skills through Personalized Feedback

Abstract:In this article, we present a novel multimodal feedback framework called MOSAIC-F, an acronym for a data-driven Framework that integrates Multimodal Learning Analytics (MMLA), Observations, Sensors, Artificial Intelligence (AI), and Collaborative assessments for generating personalized feedback on student learning activities. This framework consists of four key steps. First, peers and professors' assessments are conducted through standardized rubrics (that include both quantitative and qualitative evaluations). Second, multimodal data are collected during learning activities, including video recordings, audio capture, gaze tracking, physiological signals (heart rate, motion data), and behavioral interactions. Third, personalized feedback is generated using AI, synthesizing human-based evaluations and data-based multimodal insights such as posture, speech patterns, stress levels, and cognitive load, among others. Finally, students review their own performance through video recordings and engage in self-assessment and feedback visualization, comparing their own evaluations with peers and professors' assessments, class averages, and AI-generated recommendations. By combining human-based and data-based evaluation techniques, this framework enables more accurate, personalized and actionable feedback. We tested MOSAIC-F in the context of improving oral presentation skills.

* Accepted in LASI Spain 25: Learning Analytics Summer Institute Spain 2025

Via

Access Paper or Ask Questions

M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards

Feb 21, 2025

Alvaro Becerra, Roberto Daza, Ruth Cobos, Aythami Morales, Julian Fierrez

Figure 1 for M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards

Abstract:We present a demonstration of a web-based system called M2LADS ("System for Generating Multimodal Learning Analytics Dashboards"), designed to integrate, synchronize, visualize, and analyze multimodal data recorded during computer-based learning sessions with biosensors. This system presents a range of biometric and behavioral data on web-based dashboards, providing detailed insights into various physiological and activity-based metrics. The multimodal data visualized include electroencephalogram (EEG) data for assessing attention and brain activity, heart rate metrics, eye-tracking data to measure visual attention, webcam video recordings, and activity logs of the monitored tasks. M2LADS aims to assist data scientists in two key ways: (1) by providing a comprehensive view of participants' experiences, displaying all data categorized by the activities in which participants are engaged, and (2) by synchronizing all biosignals and videos, facilitating easier data relabeling if any activity information contains errors.

* Published in the Workshop on Innovation and Responsibility in AI-Supported Education (iRAISE25) at AAAI 2025

Via

Access Paper or Ask Questions

SMARTe-VR: Student Monitoring and Adaptive Response Technology for e-learning in Virtual Reality

Jan 19, 2025

Roberto Daza, Lin Shengkai, Aythami Morales, Julian Fierrez, Katashi Nagao

Abstract:This work introduces SMARTe-VR, a platform for student monitoring in an immersive virtual reality environment designed for online education. SMARTe-VR is aimed to gather data for adaptive learning, focusing on facial biometrics and learning metadata. The platform allows instructors to create tailored learning sessions with video lectures, featuring an interface with an Auto QA system to evaluate understanding, interaction tools (e.g., textbook highlighting and lecture tagging), and real-time feedback. Additionally, we release a dataset containing 5 research challenges with data from 10 users in VR-based TOEIC sessions. This dataset, spanning over 25 hours, includes facial features, learning metadata, 450 responses, question difficulty levels, concept tags, and understanding labels. Alongside the database, we present preliminary experiments using Item Response Theory models, adapted for understanding detection using facial features. Two architectures were explored: a Temporal Convolutional Network for local features and a Multilayer Perceptron for global features.

* Published in the Workshop on Artificial Intelligence for Education (AI4EDU) at AAAI 2025

Via

Access Paper or Ask Questions

IMPROVE: Impact of Mobile Phones on Remote Online Virtual Education

Dec 13, 2024

Roberto Daza, Alvaro Becerra, Ruth Cobos, Julian Fierrez, Aythami Morales

Abstract:This work presents the IMPROVE dataset, designed to evaluate the effects of mobile phone usage on learners during online education. The dataset not only assesses academic performance and subjective learner feedback but also captures biometric, behavioral, and physiological signals, providing a comprehensive analysis of the impact of mobile phone use on learning. Multimodal data were collected from 120 learners in three groups with different phone interaction levels. A setup involving 16 sensors was implemented to collect data that have proven to be effective indicators for understanding learner behavior and cognition, including electroencephalography waves, videos, eye tracker, etc. The dataset includes metadata from the processed videos like face bounding boxes, facial landmarks, and Euler angles for head pose estimation. In addition, learner performance data and self-reported forms are included. Phone usage events were labeled, covering both supervisor-triggered and uncontrolled events. A semi-manual re-labeling system, using head pose and eye tracker data, is proposed to improve labeling accuracy. Technical validation confirmed signal quality, with statistical analyses revealing biometric changes during phone use.

* Article under review in the journal Scientific Data. GitHub repository of the dataset at: https://github.com/BiDAlab/IMPROVE

Via

Access Paper or Ask Questions

DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning

Aug 14, 2024

Roberto Daza, Luis F. Gomez, Julian Fierrez, Aythami Morales, Ruben Tolosana, Javier Ortega-Garcia

Figure 1 for DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning

Figure 2 for DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning

Figure 3 for DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning

Figure 4 for DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning

Abstract:This work introduces an innovative method for estimating attention levels (cognitive load) using an ensemble of facial analysis techniques applied to webcam videos. Our method is particularly useful, among others, in e-learning applications, so we trained, evaluated, and compared our approach on the mEBAL2 database, a public multi-modal database acquired in an e-learning environment. mEBAL2 comprises data from 60 users who performed 8 different tasks. These tasks varied in difficulty, leading to changes in their cognitive loads. Our approach adapts state-of-the-art facial analysis technologies to quantify the users' cognitive load in the form of high or low attention. Several behavioral signals and physiological processes related to the cognitive load are used, such as eyeblink, heart rate, facial action units, and head pose, among others. Furthermore, we conduct a study to understand which individual features obtain better results, the most efficient combinations, explore local and global features, and how temporary time intervals affect attention level estimation, among other aspects. We find that global facial features are more appropriate for multimodal systems using score-level fusion, particularly as the temporal window increases. On the other hand, local features are more suitable for fusion through neural network training with score-level fusion approaches. Our method outperforms existing state-of-the-art accuracies using the public mEBAL2 benchmark.

* Article accepted in the IEEE Access journal. Accessible at https://ieeexplore.ieee.org/document/10633208

Via

Access Paper or Ask Questions

Visual Attention Analysis in Online Learning

May 31, 2024

Miriam Navarro, Álvaro Becerra, Roberto Daza, Ruth Cobos, Aythami Morales, Julian Fierrez

Figure 1 for Visual Attention Analysis in Online Learning

Figure 2 for Visual Attention Analysis in Online Learning

Figure 3 for Visual Attention Analysis in Online Learning

Figure 4 for Visual Attention Analysis in Online Learning

Abstract:In this paper, we present an approach in the Multimodal Learning Analytics field. Within this approach, we have developed a tool to visualize and analyze eye movement data collected during learning sessions in online courses. The tool is named VAAD (an acronym for Visual Attention Analysis Dashboard). These eye movement data have been gathered using an eye-tracker and subsequently processed and visualized for interpretation. The purpose of the tool is to conduct a descriptive analysis of the data by facilitating its visualization, enabling the identification of differences and learning patterns among various learner populations. Additionally, it integrates a predictive module capable of anticipating learner activities during a learning session. Consequently, VAAD holds the potential to offer valuable insights into online learning behaviors from both descriptive and predictive perspectives.

* Accepted in CEDI 2024 (VII Congreso Espa\~nol de Inform\'atica), A Coru\~na, Spain

Via

Access Paper or Ask Questions