Internal porosity remains a critical defect mode in additively manufactured components, compromising structural performance and limiting industrial adoption. Automated defect detection methods exist but lack interpretability, preventing engineers from understanding the physical basis of criticality predictions. This study presents an explainable computer vision framework for pore detection and criticality assessment in three-dimensional tomographic volumes. Sequential grayscale slices were reconstructed into volumetric datasets, and intensity-based thresholding with connected component analysis identified 500 individual pores. Each pore was characterized using geometric descriptors including size, aspect ratio, extent, and spatial position relative to the specimen boundary. A pore interaction network was constructed using percentile-based Euclidean distance criteria, yielding 24,950 inter-pore connections. Machine learning models predicted pore criticality scores from extracted features, and SHAP analysis quantified individual feature contributions. Results demonstrate that normalized surface distance dominates model predictions, contributing more than an order of magnitude greater importance than all other descriptors. Pore size provides minimal influence, while geometric parameters show negligible impact. The strong inverse relationship between surface proximity and criticality reveals boundary-driven failure mechanisms. This interpretable framework enables transparent defect assessment and provides actionable insights for process optimization and quality control in additive manufacturing.
The rapid digitalization of customer service has intensified the demand for conversational agents capable of providing accurate and natural interactions. In the Algerian context, this is complicated by the linguistic complexity of Darja, a dialect characterized by non-standardized orthography, extensive code-switching with French, and the simultaneous use of Arabic and Latin (Arabizi) scripts. This paper introduces DziriBOT, a hybrid intelligent conversational agent specifically engineered to overcome these challenges. We propose a multi-layered architecture that integrates specialized Natural Language Understanding (NLU) with Retrieval-Augmented Generation (RAG), allowing for both structured service flows and dynamic, knowledge-intensive responses grounded in curated enterprise documentation. To address the low-resource nature of Darja, we systematically evaluate three distinct approaches: a sparse-feature Rasa pipeline, classical machine learning baselines, and transformer-based fine-tuning. Our experimental results demonstrate that the fine-tuned DziriBERT model achieves state-of-the-art performance. These results significantly outperform traditional baselines, particularly in handling orthographic noise and rare intents. Ultimately, DziriBOT provides a robust, scalable solution that bridges the gap between formal language models and the linguistic realities of Algerian users, offering a blueprint for dialect-aware automation in the regional market.
Automating the translation of natural-language specifications into logic programs is a challenging task that affects neurosymbolic engineering. We present ASP-Bench, a benchmark comprising 128 natural language problem instances, 64 base problems with easy and hard variants. It evaluates systems that translate natural-language problems into Answer Set Programs (ASPs), a prominent form of logic programming. It provides systematic coverage of ASP features, including choice rules, aggregates, and optimization. Each problem includes reference validators that check whether solutions satisfy the problem specification. We characterize problems along seven largely independent reasoning aspects (optimization, temporal reasoning, default logic, resource allocation, recursion, spatial reasoning, and quantitative complexity), providing a multidimensional view of modeling difficulty. We test the benchmark using an agentic approach based on the ReAct (Reason and Act) framework, which achieves full saturation, demonstrating that feedback-driven iterative refinement with solver feedback provides a reliable and robust approach for modeling natural language in ASP. Our analysis across multiple agent runs enables us to gain insights into what determines a problem's modeling hardness.
Audio Foundation Models (AFMs), a specialized category of Generative AI (GenAI), have the potential to transform signal processing (SP) education by integrating core applications such as speech and audio enhancement, denoising, source separation, feature extraction, automatic classification, and real-time signal analysis into learning and research. This paper introduces SPEduAFM, a conceptual AFM tailored for SP education, bridging traditional SP principles with GenAI-driven innovations. Through an envisioned case study, we outline how AFMs can enable a range of applications, including automated lecture transcription, interactive demonstrations, and inclusive learning tools, showcasing their potential to transform abstract concepts into engaging, practical experiences. This paper also addresses challenges such as ethics, explainability, and customization by highlighting dynamic, real-time auditory interactions that foster experiential and authentic learning. By presenting SPEduAFM as a forward-looking vision, we aim to inspire broader adoption of GenAI in engineering education, enhancing accessibility, engagement, and innovation in the classroom and beyond.
Large language models (LLMs) are increasingly used to automate feature engineering in tabular learning. Given task-specific information, LLMs can propose diverse feature transformation operations to enhance downstream model performance. However, current approaches typically assign the LLM as a black-box optimizer, responsible for both proposing and selecting operations based solely on its internal heuristics, which often lack calibrated estimations of operation utility and consequently lead to repeated exploration of low-yield operations without a principled strategy for prioritizing promising directions. In this paper, we propose a human-LLM collaborative feature engineering framework for tabular learning. We begin by decoupling the transformation operation proposal and selection processes, where LLMs are used solely to generate operation candidates, while the selection is guided by explicitly modeling the utility and uncertainty of each proposed operation. Since accurate utility estimation can be difficult especially in the early rounds of feature engineering, we design a mechanism within the framework that selectively elicits and incorporates human expert preference feedback, comparing which operations are more promising, into the selection process to help identify more effective operations. Our evaluations on both the synthetic study and the real user study demonstrate that the proposed framework improves feature engineering performance across a variety of tabular datasets and reduces users' cognitive load during the feature engineering process.
In this highly competitive employment environment, the selection of suitable personnel is essential for organizational success. This study presents an automated personnel selection system that utilizes sophisticated natural language processing (NLP) methods to assess and rank software engineering applicants. A distinctive dataset was created by aggregating LinkedIn profiles that include essential features such as education, work experience, abilities, and self-introduction, further enhanced with expert assessments to function as standards. The research combines large language models (LLMs) with multicriteria decision-making (MCDM) theory to develop the LLM-TOPSIS framework. In this context, we utilized the TOPSIS method enhanced by fuzzy logic (Fuzzy TOPSIS) to address the intrinsic ambiguity and subjectivity in human assessments. We utilized triangular fuzzy numbers (TFNs) to describe criteria weights and scores, thereby addressing the ambiguity frequently encountered in candidate evaluations. For candidate ranking, the DistilRoBERTa model was fine-tuned and integrated with the fuzzy TOPSIS method, achieving rankings closely aligned with human expert evaluations and attaining an accuracy of up to 91% for the Experience attribute and the Overall attribute. The study underlines the potential of NLP-driven frameworks to improve recruitment procedures by boosting scalability, consistency, and minimizing prejudice. Future endeavors will concentrate on augmenting the dataset, enhancing model interpretability, and verifying the system in actual recruitment scenarios to better evaluate its practical applicability. This research highlights the intriguing potential of merging NLP with fuzzy decision-making methods in personnel selection, enabling scalable and unbiased solutions to recruitment difficulties.
While Large Language Models (LLMs) show significant potential in hardware engineering, current benchmarks suffer from saturation and limited task diversity, failing to reflect LLMs' performance in real industrial workflows. To address this gap, we propose a comprehensive benchmark for AI-aided chip design that rigorously evaluates LLMs across three critical tasks: Verilog generation, debugging, and reference model generation. Our benchmark features 44 realistic modules with complex hierarchical structures, 89 systematic debugging cases, and 132 reference model samples across Python, SystemC, and CXXRTL. Evaluation results reveal substantial performance gaps, with state-of-the-art Claude-4.5-opus achieving only 30.74\% on Verilog generation and 13.33\% on Python reference model generation, demonstrating significant challenges compared to existing saturated benchmarks where SOTA models achieve over 95\% pass rates. Additionally, to help enhance LLM reference model generation, we provide an automated toolbox for high-quality training data generation, facilitating future research in this underexplored domain. Our code is available at https://github.com/zhongkaiyu/ChipBench.git.
Webshells remain a primary foothold for attackers to compromise servers, particularly within PHP ecosystems. However, existing detection mechanisms often struggle to keep pace with rapid variant evolution and sophisticated obfuscation techniques that camouflage malicious intent. Furthermore, many current defenses suffer from high false-alarm rates when encountering benign administrative scripts that employ heavy obfuscation for intellectual property protection. To address these challenges, we present ShellForge, an adversarial co-evolution framework that couples automated webshell generation with multi-view detection to continuously harden defensive boundaries. The framework operates through an iterative co-training loop where a generator and a detector mutually reinforce each other via the exchange of hard samples. The generator is optimized through supervised fine-tuning and preference-based reinforcement learning to synthesize functional, highly evasive variants. Simultaneously, we develop a multi-view fusion detector that integrates semantic features from long-string compression, structural features from pruned abstract syntax trees, and global statistical indicators such as Shannon entropy. To minimize false positives, ShellForge utilizes a LLM-based transformation to create de-malicious samples--scripts that retain complex obfuscation patterns but lack harmful payloads--serving as high-quality hard negatives during training. Evaluations on the public FWOID benchmark demonstrate that ShellForge significantly enhances defensive robustness. Upon convergence, the detector maintains a 0.981 F1-score while the generator achieves a 0.939 evasion rate against commercial engines on VirusTotal.
Precise segmentation of retinal arteries and veins carries the diagnosis of systemic cardiovascular conditions. However, standard convolutional architectures often yield topologically disjointed segmentations, characterized by gaps and discontinuities that render reliable graph-based clinical analysis impossible despite high pixel-level accuracy. To address this, we introduce a topology-aware framework engineered to maintain vascular connectivity. Our architecture fuses a Topological Feature Fusion Module (TFFM) that maps local feature representations into a latent graph space, deploying Graph Attention Networks to capture global structural dependencies often missed by fixed receptive fields. Furthermore, we drive the learning process with a hybrid objective function, coupling Tversky loss for class imbalance with soft clDice loss to explicitly penalize topological disconnects. Evaluation on the Fundus-AVSeg dataset reveals state-of-the-art performance, achieving a combined Dice score of 90.97% and a 95% Hausdorff Distance of 3.50 pixels. Notably, our method decreases vessel fragmentation by approximately 38% relative to baselines, yielding topologically coherent vascular trees viable for automated biomarker quantification. We open-source our code at https://tffm-module.github.io/.
SQL is central to enterprise data engineering, yet generating fully correct SQL code in a single attempt remains difficult, even for experienced developers and advanced text-to-SQL LLMs, often requiring multiple debugging iterations. We introduce OurBench, the first benchmark for enterprise-level SQL reasoning and debugging. Our benchmark is built on two key innovations: (1) an automated construction workflow that uses reverse engineering to systematically inject realistic bugs into large-scale SQL code, enabling scalable and diverse benchmark generation; and (2) an execution-free evaluation framework tailored to enterprise settings, providing fast, accurate, and resource-efficient assessment. OurBench comprises 469 OurBenchSyn queries featuring syntax errors with explicit error messages, and 516 OurBenchSem queries targeting semantic errors in which the code fails to meet user intent. The queries are highly complex, averaging over 140 lines and featuring deep and wide abstract syntax trees. Evaluation of nearly 30 LLMs reveals a substantial performance gap: the best-performing model, Claude-4-Sonnet, achieves only 36.46 percent accuracy on OurBenchSyn and 32.17 percent on OurBenchSem, while most models score below 20 percent. We further explore four solution strategies, identify key challenges, and outline promising directions for enterprise SQL debugging with LLMs.