Picture for Liang Luo

Liang Luo

SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling

Add code
Apr 13, 2026
Viaarxiv icon

Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

Add code
Mar 21, 2026
Viaarxiv icon

Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations

Add code
Dec 15, 2025
Viaarxiv icon

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

Add code
Oct 06, 2025
Viaarxiv icon

SSG-Dit: A Spatial Signal Guided Framework for Controllable Video Generation

Add code
Aug 23, 2025
Viaarxiv icon

Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation

Add code
May 02, 2025
Viaarxiv icon

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

Add code
Feb 26, 2025
Figure 1 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Figure 2 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Figure 3 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Figure 4 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Viaarxiv icon

The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit

Add code
Jan 04, 2025
Figure 1 for The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Figure 2 for The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Figure 3 for The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Figure 4 for The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Viaarxiv icon

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Add code
Nov 07, 2024
Figure 1 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 2 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 3 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 4 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Viaarxiv icon

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Add code
Jul 31, 2024
Figure 1 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Figure 2 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Figure 3 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Figure 4 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Viaarxiv icon