Yandex
Abstract:Large language models have recently enabled text-to-CAD systems that synthesize parametric CAD programs (e.g., CadQuery) from natural language prompts. In practice, however, geometric descriptions can be under-specified or internally inconsistent: critical dimensions may be missing and constraints may conflict. Existing fine-tuned models tend to reactively follow user instructions and hallucinate dimensions when the text is ambiguous. To address this, we propose a proactive agentic framework for text-to-CadQuery generation, named ProCAD, that resolves specification issues before code synthesis. Our framework pairs a proactive clarifying agent, which audits the prompt and asks targeted clarification questions only when necessary to produce a self-consistent specification, with a CAD coding agent that translates the specification into an executable CadQuery program. We fine-tune the coding agent on a curated high-quality text-to-CadQuery dataset and train the clarifying agent via agentic SFT on clarification trajectories. Experiments show that proactive clarification significantly improves robustness to ambiguous prompts while keeping interaction overhead low. ProCAD outperforms frontier closed-source models, including Claude Sonnet 4.5, reducing the mean Chamfer distance by 79.9 percent and lowering the invalidity ratio from 4.8 percent to 0.9 percent. Our code and datasets will be made publicly available.



Abstract:Despite the prominence of neural network approaches in the field of recommender systems, simple methods such as matrix factorization with quadratic loss are still used in industry for several reasons. These models can be trained with alternating least squares, which makes them easy to implement in a massively parallel manner, thus making it possible to utilize billions of events from real-world datasets. Large-scale recommender systems need to account for severe popularity skew in the distributions of users and items, so a lot of research is focused on implementing sparse, mixed dimension or shared embeddings to reduce both the number of parameters and overfitting on rare users and items. In this paper we propose two matrix factorization models with mixed dimension embeddings, which can be optimized in a massively parallel fashion using the alternating least squares approach.