Picture for Sijie Cheng

Sijie Cheng

Position: Life-Logging Video Streams Make the Privacy-Utility Trade-off Inevitable

Add code
May 11, 2026
Viaarxiv icon

Evaluating Memory Capability in Continuous Lifelog Scenario

Add code
Apr 13, 2026
Viaarxiv icon

Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress

Add code
Mar 18, 2026
Viaarxiv icon

Building Egocentric Procedural AI Assistant: Methods, Benchmarks, and Challenges

Add code
Nov 17, 2025
Viaarxiv icon

StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs

Add code
Mar 26, 2025
Figure 1 for StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Figure 2 for StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Figure 3 for StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Figure 4 for StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Viaarxiv icon

VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI

Add code
Oct 15, 2024
Viaarxiv icon

Instruction-Guided Visual Masking

Add code
May 30, 2024
Figure 1 for Instruction-Guided Visual Masking
Figure 2 for Instruction-Guided Visual Masking
Figure 3 for Instruction-Guided Visual Masking
Figure 4 for Instruction-Guided Visual Masking
Viaarxiv icon

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Add code
May 24, 2024
Figure 1 for ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Figure 2 for ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Figure 3 for ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Figure 4 for ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Viaarxiv icon

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

Add code
Mar 13, 2024
Figure 1 for StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Figure 2 for StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Figure 3 for StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Figure 4 for StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Viaarxiv icon

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

Add code
Feb 28, 2024
Viaarxiv icon