Picture for Kyuhong Shim

Kyuhong Shim

Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching

Add code
Mar 13, 2026
Viaarxiv icon

Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs

Add code
May 28, 2025
Figure 1 for Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs
Figure 2 for Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs
Figure 3 for Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs
Figure 4 for Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs
Viaarxiv icon

Voicing Personas: Rewriting Persona Descriptions into Style Prompts for Controllable Text-to-Speech

Add code
May 21, 2025
Viaarxiv icon

Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models

Add code
May 13, 2025
Viaarxiv icon

Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device

Add code
Feb 21, 2025
Figure 1 for Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device
Figure 2 for Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device
Figure 3 for Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device
Figure 4 for Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device
Viaarxiv icon

Learning Primitive Relations for Compositional Zero-Shot Learning

Add code
Jan 24, 2025
Figure 1 for Learning Primitive Relations for Compositional Zero-Shot Learning
Figure 2 for Learning Primitive Relations for Compositional Zero-Shot Learning
Figure 3 for Learning Primitive Relations for Compositional Zero-Shot Learning
Figure 4 for Learning Primitive Relations for Compositional Zero-Shot Learning
Viaarxiv icon

Unlocking Transfer Learning for Open-World Few-Shot Recognition

Add code
Nov 15, 2024
Viaarxiv icon

Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models

Add code
Oct 29, 2024
Figure 1 for Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Figure 2 for Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Figure 3 for Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Figure 4 for Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Viaarxiv icon

Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP

Add code
Oct 11, 2024
Figure 1 for Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
Figure 2 for Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
Figure 3 for Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
Figure 4 for Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
Viaarxiv icon

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Add code
Oct 02, 2024
Figure 1 for InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
Figure 2 for InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
Figure 3 for InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
Figure 4 for InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
Viaarxiv icon