Picture for Jian Xue

Jian Xue

ExpLLM: Towards Chain of Thought for Facial Expression Recognition

Add code
Sep 04, 2024
Figure 1 for ExpLLM: Towards Chain of Thought for Facial Expression Recognition
Figure 2 for ExpLLM: Towards Chain of Thought for Facial Expression Recognition
Figure 3 for ExpLLM: Towards Chain of Thought for Facial Expression Recognition
Figure 4 for ExpLLM: Towards Chain of Thought for Facial Expression Recognition
Viaarxiv icon

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Add code
Jun 12, 2024
Figure 1 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Figure 2 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Figure 3 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Viaarxiv icon

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Add code
Oct 23, 2023
Figure 1 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 2 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 3 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Viaarxiv icon

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

Add code
Oct 06, 2023
Viaarxiv icon

DiariST: Streaming Speech Translation with Speaker Diarization

Add code
Sep 14, 2023
Viaarxiv icon

FoodSAM: Any Food Segmentation

Add code
Aug 11, 2023
Figure 1 for FoodSAM: Any Food Segmentation
Figure 2 for FoodSAM: Any Food Segmentation
Figure 3 for FoodSAM: Any Food Segmentation
Figure 4 for FoodSAM: Any Food Segmentation
Viaarxiv icon

Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text

Add code
Jul 30, 2023
Figure 1 for Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text
Figure 2 for Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text
Figure 3 for Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text
Figure 4 for Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text
Viaarxiv icon

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Add code
Jul 07, 2023
Figure 1 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 2 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 3 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 4 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Viaarxiv icon

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

Add code
Mar 01, 2023
Figure 1 for Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Figure 2 for Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Figure 3 for Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Figure 4 for Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Viaarxiv icon

Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras

Add code
Dec 12, 2022
Figure 1 for Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras
Figure 2 for Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras
Figure 3 for Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras
Figure 4 for Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras
Viaarxiv icon