Picture for Bin Wang

Bin Wang

and Other Contributors

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

Add code
Jul 01, 2024
Figure 1 for Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
Figure 2 for Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
Figure 3 for Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
Figure 4 for Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
Viaarxiv icon

A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

Add code
Jun 26, 2024
Viaarxiv icon

AudioBench: A Universal Benchmark for Audio Large Language Models

Add code
Jun 25, 2024
Figure 1 for AudioBench: A Universal Benchmark for Audio Large Language Models
Figure 2 for AudioBench: A Universal Benchmark for Audio Large Language Models
Figure 3 for AudioBench: A Universal Benchmark for Audio Large Language Models
Figure 4 for AudioBench: A Universal Benchmark for Audio Large Language Models
Viaarxiv icon

Gaze-directed Vision GNN for Mitigating Shortcut Learning in Medical Image

Add code
Jun 20, 2024
Figure 1 for Gaze-directed Vision GNN for Mitigating Shortcut Learning in Medical Image
Figure 2 for Gaze-directed Vision GNN for Mitigating Shortcut Learning in Medical Image
Figure 3 for Gaze-directed Vision GNN for Mitigating Shortcut Learning in Medical Image
Figure 4 for Gaze-directed Vision GNN for Mitigating Shortcut Learning in Medical Image
Viaarxiv icon

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

Add code
Jun 19, 2024
Viaarxiv icon

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Add code
Jun 17, 2024
Figure 1 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 2 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 3 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 4 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Viaarxiv icon

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Add code
Jun 14, 2024
Figure 1 for SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Figure 2 for SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Figure 3 for SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Figure 4 for SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon