Abstract:Designing visually diverse and high-quality designs remains a manual, time-consuming process, limiting scalability and personalization in creative workflows. We present a system for generating editable design variations using a decoder-only language model, the Creative Pre-trained Transformer (CPT), trained to predict visual style attributes in design templates. At the core of our approach is a new representation called Creative Markup Language (CML), a compact, machine-learning-friendly format that captures canvas-level structure, page layout, and element-level details (text, images, and vector graphics), including both content and style. We fine-tune CPT on a large corpus of design templates authored by professional designers, enabling it to learn meaningful, context-aware predictions for attributes such as color schemes and font choices. The model produces semantically structured and stylistically coherent outputs, preserving internal consistency across elements. Unlike generative image models, our system yields fully editable design documents rather than pixel-only images, allowing users to iterate and personalize within a design editor. In experiments, our approach generates contextual color and font variations for existing templates and shows promise in adjusting layouts while maintaining design principles.
Abstract:We present a model-centric diagnostic framework that treats training state as a latent variable and unifies a family of internal readouts -- head-gradient norms, confidence, entropy, margin, and related signals -- as anchor-relative projections of that state. A preliminary version of this work introduced a head-gradient probe for checkpoint selection. In this version, we focus on the unifying perspective and structural diagnostics; full algorithmic details, theoretical analysis, and experimental validation will appear in a forthcoming paper. We outline the conceptual scaffold: any prediction head induces a local loss landscape whose geometry (gradient magnitude, curvature, sharpness) reflects how well the upstream features are aligned with the task. Different readout choices -- gradient norms, softmax entropy, predictive margin -- correspond to different projections of this geometry, each with complementary strengths. The framework suggests that checkpoint selection, early stopping, and lightweight architecture pre-screening can all be viewed as querying the same underlying state through different lenses. Illustrative experiments on ImageNet classification and COCO detection/segmentation hint at the practical potential; rigorous benchmarks and ablations are deferred to the full paper.
Abstract:We propose a validation-free checkpointing signal from a single forward-backward pass: the Frobenius norm of the classifier-head gradient on one detached-feature batch, ||g||_F = ||dL/dW||_F. Across ImageNet-1k CNNs and Transformers, this proxy is strongly negative with Top-1 and positive with loss. Selecting the checkpoint with the minimum head gradient in a short tail window closes most of the gap to the oracle (4.24% +/- 2.00% with a universal setup, about 1.12% with light per-family tuning). For practical deployment, a head-scale normalization is more stable within classic CNN families (e.g., ResNets), while a feature-scale normalization works well for Transformers and modern CNNs. The same one-batch probe also predicts COCO detection/segmentation mAP. In diffusion (UNet/DDPM on CIFAR-10), it tracks progress and enables near-oracle tail-window selection; it is positively correlated with same-distribution probe MSE and negatively with FID (lower is better), so it can be used as a lightweight, label-free monitor. Validation labels are never used beyond reporting. The probe adds much less than 0.1% of an epoch and works as a drop-in for validation-free checkpoint selection and early stopping.