Picture for Tiancheng Zhao

Tiancheng Zhao

OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Add code
Jul 06, 2024
Figure 1 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Figure 2 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Figure 3 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Figure 4 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Viaarxiv icon

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Add code
Jun 25, 2024
Viaarxiv icon

Preserving Knowledge in Large Language Model: A Model-Agnostic Self-Decompression Approach

Add code
Jun 17, 2024
Viaarxiv icon

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

Add code
Jun 15, 2024
Viaarxiv icon

HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

Add code
Jun 06, 2024
Viaarxiv icon

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Add code
Mar 11, 2024
Viaarxiv icon

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Add code
Dec 22, 2023
Viaarxiv icon

Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

Add code
Oct 20, 2023
Viaarxiv icon

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Add code
Aug 25, 2023
Viaarxiv icon

RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model

Add code
Jun 20, 2023
Viaarxiv icon