Tokenization


Inverting Trojans in LLMs

Add code
Sep 19, 2025
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Viaarxiv icon

SONAR: Self-Distilled Continual Pre-training for Domain Adaptive Audio Representation

Add code
Sep 19, 2025
Viaarxiv icon

LiteLong: Resource-Efficient Long-Context Data Synthesis for LLMs

Add code
Sep 19, 2025
Viaarxiv icon

DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm

Add code
Sep 19, 2025
Viaarxiv icon

SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models

Add code
Sep 19, 2025
Viaarxiv icon

Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers

Add code
Sep 19, 2025
Viaarxiv icon

Localmax dynamics for attention in transformers and its asymptotic behavior

Add code
Sep 19, 2025
Viaarxiv icon

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

Add code
Sep 19, 2025
Viaarxiv icon

Monte Carlo Tree Diffusion with Multiple Experts for Protein Design

Add code
Sep 19, 2025
Viaarxiv icon