Abstract:Going beyond the prediction of numerical scores, recent research in automated essay scoring has increasingly emphasized the generation of high-quality feedback that provides justification and actionable guidance. To mitigate the high cost of expert annotation, prior work has commonly relied on LLM-generated feedback to train essay assessment models. However, such feedback is often incorporated without explicit quality validation, resulting in the propagation of noise in downstream applications. To address this limitation, we propose FeedEval, an LLM-based framework for evaluating LLM-generated essay feedback along three pedagogically grounded dimensions: specificity, helpfulness, and validity. FeedEval employs dimension-specialized LLM evaluators trained on datasets curated in this study to assess multiple feedback candidates and select high-quality feedback for downstream use. Experiments on the ASAP++ benchmark show that FeedEval closely aligns with human expert judgments and that essay scoring models trained with FeedEval-filtered high-quality feedback achieve superior scoring performance. Furthermore, revision experiments using small LLMs show that the high-quality feedback identified by FeedEval leads to more effective essay revisions. We will release our code and curated datasets upon accepted.
Abstract:Foundation models pretrained on large data have demonstrated remarkable zero-shot generalization capabilities across domains. Building on the success of TabPFN for tabular data and its recent extension to time series, we investigate whether graph node classification can be effectively reformulated as a tabular learning problem. We introduce TabPFN-GN, which transforms graph data into tabular features by extracting node attributes, structural properties, positional encodings, and optionally smoothed neighborhood features. This enables TabPFN to perform direct node classification without any graph-specific training or language model dependencies. Our experiments on 12 benchmark datasets reveal that TabPFN-GN achieves competitive performance with GNNs on homophilous graphs and consistently outperforms them on heterophilous graphs. These results demonstrate that principled feature engineering can bridge the gap between tabular and graph domains, providing a practical alternative to task-specific GNN training and LLM-dependent graph foundation models.
Abstract:Deep learning has demonstrated great promise in cancer classification from whole-slide images (WSIs) but remains constrained by the need for extensive annotations. Annotation-free methods, such as multiple instance learning (MIL) and self-supervised learning (SSL), have emerged to address this challenge; however, current SSL techniques often depend on synthetic augmentations or temporal context, which may not adequately capture the intricate spatial relationships inherent to histopathology. In this work, we introduce a novel spatial context-driven positive pair sampling strategy for SSL that leverages the natural coherence of adjacent patches in WSIs. By constructing biologically relevant positive pairs from spatially proximate patches, our approach harnesses inherent spatial coherence to enhance patch-level representations, ultimately boosting slide-level classification performance. Experiments on multiple datasets reveal that our strategy improves classification accuracy by 5\% to 10\% over the standard method, paving the way for more clinically relevant AI models in cancer diagnosis. The code is available at https://anonymous.4open.science/r/contextual-pairs-E72F/.