Picture for Chen Zhang

Chen Zhang

SenseTime Research

Training Interactive Agent in Large FPS Game Map with Rule-enhanced Reinforcement Learning

Add code
Oct 07, 2024
Viaarxiv icon

EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis

Add code
Sep 27, 2024
Figure 1 for EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Figure 2 for EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Figure 3 for EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Figure 4 for EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Viaarxiv icon

Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models

Add code
Sep 27, 2024
Figure 1 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 2 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 3 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 4 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Viaarxiv icon

Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification

Add code
Sep 24, 2024
Figure 1 for Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
Figure 2 for Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
Figure 3 for Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
Figure 4 for Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
Viaarxiv icon

Aligning Language Models Using Follow-up Likelihood as Reward Signal

Add code
Sep 20, 2024
Figure 1 for Aligning Language Models Using Follow-up Likelihood as Reward Signal
Figure 2 for Aligning Language Models Using Follow-up Likelihood as Reward Signal
Figure 3 for Aligning Language Models Using Follow-up Likelihood as Reward Signal
Figure 4 for Aligning Language Models Using Follow-up Likelihood as Reward Signal
Viaarxiv icon

Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization

Add code
Sep 16, 2024
Figure 1 for Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Figure 2 for Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Figure 3 for Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Figure 4 for Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Viaarxiv icon

A Compressive Memory-based Retrieval Approach for Event Argument Extraction

Add code
Sep 14, 2024
Figure 1 for A Compressive Memory-based Retrieval Approach for Event Argument Extraction
Figure 2 for A Compressive Memory-based Retrieval Approach for Event Argument Extraction
Figure 3 for A Compressive Memory-based Retrieval Approach for Event Argument Extraction
Figure 4 for A Compressive Memory-based Retrieval Approach for Event Argument Extraction
Viaarxiv icon

Half-VAE: An Encoder-Free VAE to Bypass Explicit Inverse Mapping

Add code
Sep 06, 2024
Figure 1 for Half-VAE: An Encoder-Free VAE to Bypass Explicit Inverse Mapping
Figure 2 for Half-VAE: An Encoder-Free VAE to Bypass Explicit Inverse Mapping
Figure 3 for Half-VAE: An Encoder-Free VAE to Bypass Explicit Inverse Mapping
Figure 4 for Half-VAE: An Encoder-Free VAE to Bypass Explicit Inverse Mapping
Viaarxiv icon

Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models

Add code
Sep 05, 2024
Figure 1 for Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models
Viaarxiv icon

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Add code
Sep 04, 2024
Figure 1 for LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Figure 2 for LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Figure 3 for LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Figure 4 for LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Viaarxiv icon