Image-to-image translation is the process of converting an image from one domain to another using deep learning techniques.
This paper introduces a hybrid two-stage registration framework for reconstructing three-dimensional (3D) kidney anatomy from macroscopic slices, using CT-derived models as the geometric reference standard. The approach addresses the data-scarcity and high-distortion challenges typical of macroscopic imaging, where fully learning-based registration (e.g., VoxelMorph) often fails to generalize due to limited training diversity and large nonrigid deformations that exceed the capture range of unconstrained convolutional filters. In the proposed pipeline, the Optimal Cross-section Matching (OCM) algorithm first performs constrained global alignment: translation, rotation, and uniform scaling to establish anatomically consistent slice initialization. Next, a lightweight deep-learning refinement network, inspired by VoxelMorph, predicts residual local deformations between consecutive slices. The core novelty of this architecture lies in its hierarchical decomposition of the registration manifold. This hybrid OCM+DL design integrates explicit geometric priors with the flexible learning capacity of neural networks, ensuring stable optimization and plausible deformation fields even with few training examples. Experiments on an original dataset of 40 kidneys demonstrated better results compared to single-stage baselines. The pipeline maintains physical calibration via Hough-based grid detection and employs Bezier-based contour smoothing for robust meshing and volume estimation. Although validated on kidney data, the proposed framework generalizes to other soft-tissue organs reconstructed from optical or photographic cross-sections. By decoupling interpretable global optimization from data-efficient deep refinement, the method advances the precision, reproducibility, and anatomical realism of multimodal 3D reconstructions for surgical planning, morphological assessment, and medical education.
Radiomics enables quantitative medical image analysis by converting imaging data into structured, high-dimensional feature representations for predictive modeling. Despite methodological developments and encouraging retrospective results, radiomics continue to face persistent challenges related to feature instability, limited reproducibility, validation bias, and restricted clinical translation. Existing reviews largely focus on application-specific outcomes or isolated pipeline components, with limited analysis of how interdependent design choices across acquisition, preprocessing, feature engineering, modeling, and evaluation collectively affect robustness and generalizability. This survey provides an end-to-end analysis of radiomics pipelines, examining how methodological decisions at each stage influence feature stability, model reliability, and translational validity. This paper reviews radiomic feature extraction, selection, and dimensionality reduction strategies; classical machine and deep learning-based modeling approaches; and ensemble and hybrid frameworks, with emphasis on validation protocols, data leakage prevention, and statistical reliability. Clinical applications are discussed with a focus on evaluation rigor rather than reported performance metrics. The survey identifies open challenges in standardization, domain shift, and clinical deployment, and outlines future directions such as hybrid radiomics-artificial intelligence models, multimodal fusion, federated learning, and standardized benchmarking.
Whereas reinforcement learning has been applied with success to a range of robotic control problems in complex, uncertain environments, reliance on extensive data - typically sourced from simulation environments - limits real-world deployment due to the domain gap between simulated and physical systems, coupled with limited real-world sample availability. We propose a novel method for sim-to-real transfer of reinforcement learning policies, based on a reinterpretation of neural style transfer from image processing to synthesise novel training data from unpaired unlabelled real world datasets. We employ a variational autoencoder to jointly learn self-supervised feature representations for style transfer and generate weakly paired source-target trajectories to improve physical realism of synthesised trajectories. We demonstrate the application of our approach based on the case study of robot cutting of unknown materials. Compared to baseline methods, including our previous work, CycleGAN, and conditional variational autoencoder-based time series translation, our approach achieves improved task completion time and behavioural stability with minimal real-world data. Our framework demonstrates robustness to geometric and material variation, and highlights the feasibility of policy adaptation in challenging contact-rich tasks where real-world reward information is unavailable.
Social bias in generative AI can manifest not only as performance disparities but also as associational bias, whereby models learn and reproduce stereotypical associations between concepts and demographic groups, even in the absence of explicit demographic information (e.g., associating doctors with men). These associations can persist, propagate, and potentially amplify across repeated exchanges in inter-model communication pipelines, where one generative model's output becomes another's input. This is especially salient for human-centred perception tasks, such as human activity recognition and affect prediction, where inferences about behaviour and internal states can lead to errors or stereotypical associations that propagate into unequal treatment. In this work, focusing on human activity and affective expression, we study how such associations evolve within an inter-model communication pipeline that alternates between image generation and image description. Using the RAF-DB and PHASE datasets, we quantify demographic distribution drift induced by model-to-model information exchange and assess whether these drifts are systematic using an explainability pipeline. Our results reveal demographic drifts toward younger representations for both actions and emotions, as well as toward more female-presenting representations, primarily for emotions. We further find evidence that some predictions are supported by spurious visual regions (e.g., background or hair) rather than concept-relevant cues (e.g., body or face). We also examine whether these demographic drifts translate into measurable differences in downstream behaviour, i.e., while predicting activity and emotion labels. Finally, we outline mitigation strategies spanning data-centric, training and deployment interventions, and emphasise the need for careful safeguards when deploying interconnected models in human-centred AI systems.
While text-to-image (T2I) models have advanced considerably, their capability to associate colors with implicit concepts remains underexplored. To address the gap, we introduce ColorConceptBench, a new human-annotated benchmark to systematically evaluate color-concept associations through the lens of probabilistic color distributions. ColorConceptBench moves beyond explicit color names or codes by probing how models translate 1,281 implicit color concepts using a foundation of 6,369 human annotations. Our evaluation of seven leading T2I models reveals that current models lack sensitivity to abstract semantics, and crucially, this limitation appears resistant to standard interventions (e.g., scaling and guidance). This demonstrates that achieving human-like color semantics requires more than larger models, but demands a fundamental shift in how models learn and represent implicit meaning.
This paper presents AnthropoCam, a mobile-based neural style transfer (NST) system optimized for the visual synthesis of Anthropocene environments. Unlike conventional artistic NST, which prioritizes painterly abstraction, stylizing human-altered landscapes demands a careful balance between amplifying material textures and preserving semantic legibility. Industrial infrastructures, waste accumulations, and modified ecosystems contain dense, repetitive patterns that are visually expressive yet highly susceptible to semantic erosion under aggressive style transfer. To address this challenge, we systematically investigate the impact of NST parameter configurations on the visual translation of Anthropocene textures, including feature layer selection, style and content loss weighting, training stability, and output resolution. Through controlled experiments, we identify an optimal parameter manifold that maximizes stylistic expression while preventing semantic erasure. Our results demonstrate that appropriate combinations of convolutional depth, loss ratios, and resolution scaling enable the faithful transformation of anthropogenic material properties into a coherent visual language. Building on these findings, we implement a low-latency, feed-forward NST pipeline deployed on mobile devices. The system integrates a React Native frontend with a Flask-based GPU backend, achieving high-resolution inference within 3-5 seconds on general mobile hardware. This enables real-time, in-situ visual intervention at the site of image capture, supporting participatory engagement with Anthropocene landscapes. By coupling domain-specific NST optimization with mobile deployment, AnthropoCam reframes neural style transfer as a practical and expressive tool for real-time environmental visualization in the Anthropocene.
Text-to-image (T2I) models are increasingly employed by users worldwide. However, prior research has pointed to the high sensitivity of T2I towards particular input languages - when faced with languages other than English (i.e., different surface forms of the same prompt), T2I models often produce culturally stereotypical depictions, prioritizing the surface over the prompt's semantics. Yet a comprehensive analysis of this behavior, which we dub Surface-over-Semantics (SoS), is missing. We present the first analysis of T2I models' SoS tendencies. To this end, we create a set of prompts covering 171 cultural identities, translated into 14 languages, and use it to prompt seven T2I models. To quantify SoS tendencies across models, languages, and cultures, we introduce a novel measure and analyze how the tendencies we identify manifest visually. We show that all but one model exhibit strong surface-level tendency in at least two languages, with this effect intensifying across the layers of T2I text encoders. Moreover, these surface tendencies frequently correlate with stereotypical visual depictions.
We study the online centralized charging scheduling problem (OCCSP). In this problem, a central authority must decide, in real time, when to charge dynamically arriving electric vehicles (EVs), subject to capacity limits, with the objective of balancing load across a finite planning horizon. To solve the problem, we first gamify it; that is, we model it as a game where charging blocks are placed within temporal and capacity constraints on a grid. We design heuristic policies, train learning agents with expert demonstrations, and improve them using Dataset Aggregation (DAgger). From a theoretical standpoint, we show that gamification reduces model complexity and yields tighter generalization bounds than vector-based formulations. Experiments across multiple EV arrival patterns confirm that gamified learning enhances load balancing. In particular, the image-to-movement model trained with DAgger consistently outperforms heuristic baselines, vector-based approaches, and supervised learning agents, while also demonstrating robustness in sensitivity analyses. These operational gains translate into tangible economic value. In a real-world case study for the Greater Montréal Area (Québec, Canada) using utility cost data, the proposed methods lower system costs by tens of millions of dollars per year over the prevailing practice and show clear potential to delay costly grid upgrades.
Glaucoma is a top cause of irreversible blindness globally, making early detection and longitudinal follow-up pivotal to preventing permanent vision loss. Current screening and progression assessment, however, rely on single tests or loosely linked examinations, introducing subjectivity and fragmented care. Limited access to high-quality imaging tools and specialist expertise further compromises consistency and equity in real-world use. To address these gaps, we developed Fair-Eye Net, a fair, reliable multimodal AI system closing the clinical loop from glaucoma screening to follow-up and risk alerting. It integrates fundus photos, OCT structural metrics, VF functional indices, and demographic factors via a dual-stream heterogeneous fusion architecture, with an uncertainty-aware hierarchical gating strategy for selective prediction and safe referral. A fairness constraint reduces missed diagnoses in disadvantaged subgroups. Experimental results show it achieved an AUC of 0.912 (96.7% specificity), cut racial false-negativity disparity by 73.4% (12.31% to 3.28%), maintained stable cross-domain performance, and enabled 3-12 months of early risk alerts (92% sensitivity, 88% specificity). Unlike post hoc fairness adjustments, Fair-Eye Net optimizes fairness as a primary goal with clinical reliability via multitask learning, offering a reproducible path for clinical translation and large-scale deployment to advance global eye health equity.
Sickle cell disease causes erythrocytes to become sickle-shaped, affecting their movement in the bloodstream and reducing oxygen delivery. It has a high global prevalence and places a significant burden on healthcare systems, especially in resource-limited regions. Automated classification of sickle cells in blood images is crucial, allowing the specialist to reduce the effort required and avoid errors when quantifying the deformed cells and assessing the severity of a crisis. Recent studies have proposed various erythrocyte representation and classification methods. Since classification depends solely on cell shape, a suitable approach models erythrocytes as closed planar curves in shape space. This approach employs elastic distances between shapes, which are invariant under rotations, translations, scaling, and reparameterizations, ensuring consistent distance measurements regardless of the curves' position, starting point, or traversal speed. While previous methods exploiting shape space distances had achieved high accuracy, we refined the model by considering the geometric characteristics of healthy and sickled erythrocytes. Our method proposes (1) to employ a fixed parameterization based on the major axis of each cell to compute distances and (2) to align each cell with two templates using this parameterization before computing distances. Aligning shapes to templates before distance computation, a concept successfully applied in areas such as molecular dynamics, and using a fixed parameterization, instead of minimizing distances across all possible parameterizations, simplifies calculations. This strategy achieves 96.03\% accuracy rate in both supervised classification and unsupervised clustering. Our method ensures efficient erythrocyte classification, maintaining or improving accuracy over shape space models while significantly reducing computational costs.