Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pedro Antunes

AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs

Nov 06, 2025

Pedro Antunes, Ana Rita Ortigoso, Gabriel Vieira, Daniel Fuentes, Luís Frazão, Nuno Costa, António Pereira

Figure 1 for AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs

Figure 2 for AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs

Figure 3 for AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs

Figure 4 for AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs

Abstract:The rise of Large Language Models (LLM) has increased the need for scalable, high-performance inference systems, yet most existing frameworks assume homogeneous, resource-rich hardware, often unrealistic in academic, or resource-constrained settings. We introduce AIvailable, a low-cost, highly available LLM-as-a-Service (LLMaaS) platform, that uses a software-defined approach for running LLMs across heterogeneous and legacy GPU nodes, including NVIDIA and AMD devices, with a focus on fully utilizing each node's VRAM. AIvailable operates as a fully GPU-accelerated inference without CPU fallbacks, featuring a unified client interface that allows seamless interaction with all deployed LLMs through a single logical unit. The architecture comprises four main components: the Client Interface for user access, the Service Frontend for secure request routing and load balancing, the SDAI Controller for orchestration, deployment, and monitoring, and the Service Backend of heterogeneous GPU nodes executing workloads. By abstracting GPU-specific details and providing dynamic, VRAM-aware allocation and reallocation of models, AIvailable ensures efficient use of resources and resilience against failures or workload fluctuations. Targeting academic labs, private companies, and other constrained organizations, it supports diverse open LLMs helping democratize generative AI through the repurposing of legacy GPUs.

Via

Access Paper or Ask Questions

FPGA-Based Neural Network Accelerators for Space Applications: A Survey

Apr 24, 2025

Pedro Antunes, Artur Podobas

Abstract:Space missions are becoming increasingly ambitious, necessitating high-performance onboard spacecraft computing systems. In response, field-programmable gate arrays (FPGAs) have garnered significant interest due to their flexibility, cost-effectiveness, and radiation tolerance potential. Concurrently, neural networks (NNs) are being recognized for their capability to execute space mission tasks such as autonomous operations, sensor data analysis, and data compression. This survey serves as a valuable resource for researchers aiming to implement FPGA-based NN accelerators in space applications. By analyzing existing literature, identifying trends and gaps, and proposing future research directions, this work highlights the potential of these accelerators to enhance onboard computing systems.

Via

Access Paper or Ask Questions