Abstract:Automating analog circuit design remains a longstanding challenge in Electronic Design Automation (EDA). While Transformer-based Large Language Models (LLMs) have revolutionized software code generation, their application to analog hardware design is hindered by two critical limitations: (i) the scarcity of analog design datasets containing natural language description of a design and its corresponding netlist, and (ii) the inefficiency of general-purpose tokenizers (e.g., Byte Pair Encoding (BPE)) in capturing the inherent graph structure of circuits. To bridge this gap, first, we curate the largest annotated dataset of analog circuit netlists to date, comprising 31,341 netlist-natural language description pairs across all major circuit classes. Furthermore, we propose Circuit Tokenizer (CKT), a novel circuit graph tokenizer designed to encode netlist connectivity by explicitly mining frequent subcircuits. In terms of scalability, CKT overcomes the bottleneck of prior circuit graph serialization methods where vocabulary size scales linearly with maximum number of components in the dataset, n_max, (O(n_max)); instead, CKT decouples vocabulary growth from circuit complexity, achieving a constant O(1) complexity. Empirically, CKT outperforms standard BPE on circuit topology representation, reducing sequence length by 57% and achieving a 2.3x superior compression ratio using a compact, fixed vocabulary of size 512. Leveraging this optimized tokenization, we train a circuit-specific language model, CircuitFormer, a 511M parameter encoder-decoder transformer. Our model achieves 100% syntactic correctness and an 83% functional success rate across all major analog circuit categories, outperforming state-of-the-art open-source LLMs by 10% and 14%, respectively, while requiring 240x fewer parameters. The dataset is publicly available at https://huggingface.co/datasets/touhid314/cktformer-dataset.
Abstract:Automated data extraction from line charts remains fundamentally bottlenecked by extreme stylistic diversity and a severe scarcity of comprehensively annotated, real-world datasets. Current end-to-end pipelines depend heavily on costly manual annotations, crippling their ability to generalize across arbitrary aesthetics and grid layouts. Furthermore, existing models suffer from two critical failure modes during reconstruction. First, extracting thin, intersecting curves frequently causes structural fragmentation and the erasure of fine visual details, as standard architectures struggle against complex backgrounds. Second, semantic association is notoriously error-prone; current pipelines rely on rigid spatial heuristics that easily break down against the unpredictable legend placements of in-the-wild charts. Finally, measuring true progress is hindered by evaluation protocols that assess isolated sub-tasks rather than holistic, end-to-end data reconstruction. To address these foundational issues, we introduce ChartZero, a parsing framework that leverages synthetic priors to enable robust zero-shot chart data extraction. By training exclusively on a purely synthetic dataset of simple mathematical functions, our model completely bypasses the real-world annotation bottleneck. We overcome curve fragmentation via a novel Global Orthogonal Instance (GOI) loss, and replace brittle spatial rules with an open-vocabulary, Vision-Language Model (VLM)-guided legend matching strategy. Accompanied by a new metric and benchmark specifically designed for full end-to-end reconstruction, our evaluations demonstrate that ChartZero significantly advances generalized plot digitization without requiring real-world supervision. Code and dataset will be released upon acceptance.
Abstract:We present IKIWISI ("I Know It When I See It"), an interactive visual pattern generator for assessing vision-language models in video object recognition when ground truth is unavailable. IKIWISI transforms model outputs into a binary heatmap where green cells indicate object presence and red cells indicate object absence. This visualization leverages humans' innate pattern recognition abilities to evaluate model reliability. IKIWISI introduces "spy objects": adversarial instances users know are absent, to discern models hallucinating on nonexistent items. The tool functions as a cognitive audit mechanism, surfacing mismatches between human and machine perception by visualizing where models diverge from human understanding. Our study with 15 participants found that users considered IKIWISI easy to use, made assessments that correlated with objective metrics when available, and reached informed conclusions by examining only a small fraction of heatmap cells. This approach not only complements traditional evaluation methods through visual assessment of model behavior with custom object sets, but also reveals opportunities for improving alignment between human perception and machine understanding in vision-language systems.
Abstract:This paper presents a curated list of 90 objects essential for the navigation of blind and low-vision (BLV) individuals, encompassing road, sidewalk, and indoor environments. We develop the initial list by analyzing 21 publicly available videos featuring BLV individuals navigating various settings. Then, we refine the list through feedback from a focus group study involving blind, low-vision, and sighted companions of BLV individuals. A subsequent analysis reveals that most contemporary datasets used to train recent computer vision models contain only a small subset of the objects in our proposed list. Furthermore, we provide detailed object labeling for these 90 objects across 31 video segments derived from the original 21 videos. Finally, we make the object list, the 21 videos, and object labeling in the 31 video segments publicly available. This paper aims to fill the existing gap and foster the development of more inclusive and effective navigation aids for the BLV community.
Abstract:In this thesis, we propose an approach to identity resolution across social media platforms using the topics, sentiments, and timings of the posts on the platforms. After collecting the public posts of around 5000 profiles from Disqus and Twitter, we analyze their posts to match their profiles across the two platforms. We pursue both temporal and non-temporal methods in our analysis. While neither approach proves definitively superior, the temporal approach generally performs better. We found that the temporal window size influences results more than the shifting amount. On the other hand, our sentiment analysis shows that the inclusion of sentiment makes little difference, probably due to flawed data extraction methods. We also experimented with a distance-based reward-and-punishment-focused scoring model, which achieved an accuracy of 24.198% and an average rank of 158.217 out of 2525 in our collected corpus. Future work includes refining sentiment analysis by evaluating sentiments per topic, extending temporal analysis with additional phases, and improving the scoring model through weight adjustments and modified rewards.
Abstract:This paper introduces a dataset for improving real-time object recognition systems to aid blind and low-vision (BLV) individuals in navigation tasks. The dataset comprises 21 videos of BLV individuals navigating outdoor spaces, and a taxonomy of 90 objects crucial for BLV navigation, refined through a focus group study. We also provide object labeling for the 90 objects across 31 video segments created from the 21 videos. A deeper analysis reveals that most contemporary datasets used in training computer vision models contain only a small subset of the taxonomy in our dataset. Preliminary evaluation of state-of-the-art computer vision models on our dataset highlights shortcomings in accurately detecting key objects relevant to BLV navigation, emphasizing the need for specialized datasets. We make our dataset publicly available, offering valuable resources for developing more inclusive navigation systems for BLV individuals.