Picture for Yair Kittenplon

Yair Kittenplon

TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models

Add code
Nov 07, 2024
Viaarxiv icon

M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation

Add code
Jun 12, 2024
Viaarxiv icon

Question Aware Vision Transformer for Multimodal Reasoning

Add code
Feb 08, 2024
Figure 1 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 2 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 3 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 4 for Question Aware Vision Transformer for Multimodal Reasoning
Viaarxiv icon

Towards Models that Can See and Read

Add code
Jan 18, 2023
Figure 1 for Towards Models that Can See and Read
Figure 2 for Towards Models that Can See and Read
Figure 3 for Towards Models that Can See and Read
Figure 4 for Towards Models that Can See and Read
Viaarxiv icon

Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Add code
Feb 14, 2022
Figure 1 for Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Figure 2 for Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Figure 3 for Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Figure 4 for Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Viaarxiv icon

FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation

Add code
Nov 19, 2020
Figure 1 for FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation
Figure 2 for FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation
Figure 3 for FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation
Figure 4 for FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation
Viaarxiv icon