Picture for Zhiyong Wu

Zhiyong Wu

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT

Add code
Jan 02, 2025
Figure 1 for Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Figure 2 for Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Figure 3 for Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Figure 4 for Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Viaarxiv icon

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Add code
Dec 27, 2024
Figure 1 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Figure 2 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Figure 3 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Figure 4 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Viaarxiv icon

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch

Add code
Dec 12, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon

The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024

Add code
Dec 02, 2024
Figure 1 for The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Figure 2 for The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Figure 3 for The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Figure 4 for The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Viaarxiv icon

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Add code
Oct 30, 2024
Figure 1 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 2 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 3 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 4 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Viaarxiv icon

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Add code
Oct 24, 2024
Viaarxiv icon

A Controlled Study on Long Context Extension and Generalization in LLMs

Add code
Sep 18, 2024
Figure 1 for A Controlled Study on Long Context Extension and Generalization in LLMs
Figure 2 for A Controlled Study on Long Context Extension and Generalization in LLMs
Figure 3 for A Controlled Study on Long Context Extension and Generalization in LLMs
Figure 4 for A Controlled Study on Long Context Extension and Generalization in LLMs
Viaarxiv icon

Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis

Add code
Sep 13, 2024
Figure 1 for Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
Figure 2 for Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
Figure 3 for Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
Figure 4 for Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
Viaarxiv icon

RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion

Add code
Sep 10, 2024
Figure 1 for RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Figure 2 for RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Figure 3 for RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Figure 4 for RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Viaarxiv icon