Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tejas Vyas

Distilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoT

May 10, 2026

Alaa Asfour, Christopher Indris, Leihan Chen, Tejas Vyas, Guanghui Wang

Abstract:Large-scale 3D vision-language models (VLMs) like LLaVA-3D offer strong spatial reasoning but are difficult to deploy due to high computational costs. We propose a knowledge distillation framework that transfers spatial reasoning from a 7B teacher to a 2.29B student model. Our approach achieves 8.7x lower inference latency and a 3x reduction in model size while retaining 54-72% of the teacher's performance. The framework utilizes VGGT as the vision encoder and a multi-task distillation pipeline with uncertainty-aware loss weighting. To improve reasoning without chain-of-thought (CoT) data, we introduce "Hidden CoT": learnable latent tokens that serve as an internal scratchpad before answer generation. This is the first use of latent scratchpad reasoning in distilled 3D VLMs. The student model jointly performs spatial description, depth estimation, and object detection. Experiments on ScanNet and 3D-FRONT show strong spatial understanding, reaching 68-72% accuracy on proximity and contact tasks. Our framework enables efficient 3D scene QA on resource-constrained platforms.

Via

Access Paper or Ask Questions

Assessing Patient Eligibility for Inspire Therapy through Machine Learning and Deep Learning Models

Feb 01, 2024

Mohsena Chowdhury, Tejas Vyas, Rahul Alapati, Andrés M Bur, Guanghui Wang

Figure 1 for Assessing Patient Eligibility for Inspire Therapy through Machine Learning and Deep Learning Models

Figure 2 for Assessing Patient Eligibility for Inspire Therapy through Machine Learning and Deep Learning Models

Figure 3 for Assessing Patient Eligibility for Inspire Therapy through Machine Learning and Deep Learning Models

Figure 4 for Assessing Patient Eligibility for Inspire Therapy through Machine Learning and Deep Learning Models

Abstract:Inspire therapy is an FDA-approved internal neurostimulation treatment for obstructive sleep apnea. However, not all patients respond to this therapy, posing a challenge even for experienced otolaryngologists to determine candidacy. This paper makes the first attempt to leverage both machine learning and deep learning techniques in discerning patient responsiveness to Inspire therapy using medical data and videos captured through Drug-Induced Sleep Endoscopy (DISE), an essential procedure for Inspire therapy. To achieve this, we gathered and annotated three datasets from 127 patients. Two of these datasets comprise endoscopic videos focused on the Base of the Tongue and Velopharynx. The third dataset composes the patient's clinical information. By utilizing these datasets, we benchmarked and compared the performance of six deep learning models and five classical machine learning algorithms. The results demonstrate the potential of employing machine learning and deep learning techniques to determine a patient's eligibility for Inspire therapy, paving the way for future advancements in this field.

Via

Access Paper or Ask Questions

Predicting Mitral Valve mTEER Surgery Outcomes Using Machine Learning and Deep Learning Techniques

Jan 24, 2024

Tejas Vyas, Mohsena Chowdhury, Xiaojiao Xiao, Mathias Claeys, Géraldine Ong, Guanghui Wang

Figure 1 for Predicting Mitral Valve mTEER Surgery Outcomes Using Machine Learning and Deep Learning Techniques

Figure 2 for Predicting Mitral Valve mTEER Surgery Outcomes Using Machine Learning and Deep Learning Techniques

Figure 3 for Predicting Mitral Valve mTEER Surgery Outcomes Using Machine Learning and Deep Learning Techniques

Figure 4 for Predicting Mitral Valve mTEER Surgery Outcomes Using Machine Learning and Deep Learning Techniques

Abstract:Mitral Transcatheter Edge-to-Edge Repair (mTEER) is a medical procedure utilized for the treatment of mitral valve disorders. However, predicting the outcome of the procedure poses a significant challenge. This paper makes the first attempt to harness classical machine learning (ML) and deep learning (DL) techniques for predicting mitral valve mTEER surgery outcomes. To achieve this, we compiled a dataset from 467 patients, encompassing labeled echocardiogram videos and patient reports containing Transesophageal Echocardiography (TEE) measurements detailing Mitral Valve Repair (MVR) treatment outcomes. Leveraging this dataset, we conducted a benchmark evaluation of six ML algorithms and two DL models. The results underscore the potential of ML and DL in predicting mTEER surgery outcomes, providing insight for future investigation and advancements in this domain.

* 5 pages, 1 figure

Via

Access Paper or Ask Questions