Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fatih Cagatay Akyon

Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

Jun 02, 2026

Glenn Jocher, Jing Qiu, Mengyu Liu, Shuai Lyu, Fatih Cagatay Akyon, Muhammet Esat Kalfaoglu

Abstract:Real-time vision demands models that are accurate, efficient, and simple to deploy across diverse hardware. The YOLO family has become widely deployed for this reason, yet most YOLO detectors still rely on non-maximum suppression at inference, carry heavy detection heads due to Distribution Focal Loss, require long training schedules, and can leave the smallest objects without positive label assignments. We present Ultralytics YOLO26, a unified real-time vision model family that addresses these limitations through coordinated architecture and training advances. YOLO26 uses a dual-head design for native NMS-free end-to-end inference and removes DFL entirely, yielding a lighter head with unconstrained regression range. Its training pipeline combines MuSGD, a hybrid Muon-SGD optimizer adapted from large language model training; Progressive Loss, which shifts supervision toward the inference-time head; and STAL, a label assignment strategy that guarantees positive coverage for small objects. Beyond detection, YOLO26 introduces task-specific head and loss designs for instance segmentation, pose estimation, and oriented detection, producing consistent gains across tasks and scales. The family spans five scales (n/s/m/l/x) and supports detection, instance segmentation, pose estimation, classification, and oriented detection in a single pipeline, with an open-vocabulary extension, YOLOE-26, for text-, visual-, and prompt-free inference. Across all scales, YOLO26 achieves 40.9-57.5 mAP on COCO at 1.7-11.8 ms T4 TensorRT latency, advancing the accuracy-latency Pareto front over prior real-time detectors, while YOLOE-26x reaches 40.6 AP on LVIS minival under text prompting. Code and models are available at https://github.com/ultralytics/ultralytics.

* 31 pages, 8 figures

Via

Access Paper or Ask Questions

SenBen: Sensitive Scene Graphs for Explainable Content Moderation

Apr 09, 2026

Fatih Cagatay Akyon, Alptekin Temizel

Abstract:Content moderation systems classify images as safe or unsafe but lack spatial grounding and interpretability: they cannot explain what sensitive behavior was detected, who is involved, or where it occurs. We introduce the Sensitive Benchmark (SenBen), the first large-scale scene graph benchmark for sensitive content, comprising 13,999 frames from 157 movies annotated with Visual Genome-style scene graphs (25 object classes, 28 attributes including affective states such as pain, fear, aggression, and distress, 14 predicates) and 16 sensitivity tags across 5 categories. We distill a frontier VLM into a compact 241M student model using a multi-task recipe that addresses vocabulary imbalance in autoregressive scene graph generation through suffix-based object identity, Vocabulary-Aware Recall (VAR) Loss, and a decoupled Query2Label tag head with asymmetric loss, yielding a +6.4 percentage point improvement in SenBen Recall over standard cross-entropy training. On grounded scene graph metrics, our student model outperforms all evaluated VLMs except Gemini models and all commercial safety APIs, while achieving the highest object detection and captioning scores across all models, at $7.6\times$ faster inference and $16\times$ less GPU memory.

* Accepted at CVPRW 2026

Via

Access Paper or Ask Questions

DroBoost: An Intelligent Score and Model Boosting Method for Drone Detection

Jun 30, 2024

Ogulcan Eryuksel, Kamil Anil Ozfuttu, Fatih Cagatay Akyon, Kadir Sahin, Efe Buyukborekci, Devrim Cavusoglu, Sinan Altinuc

Abstract:Drone detection is a challenging object detection task where visibility conditions and quality of the images may be unfavorable, and detections might become difficult due to complex backgrounds, small visible objects, and hard to distinguish objects. Both provide high confidence for drone detections, and eliminating false detections requires efficient algorithms and approaches. Our previous work, which uses YOLOv5, uses both real and synthetic data and a Kalman-based tracker to track the detections and increase their confidence using temporal information. Our current work improves on the previous approach by combining several improvements. We used a more diverse dataset combining multiple sources and combined with synthetic samples chosen from a large synthetic dataset based on the error analysis of the base model. Also, to obtain more resilient confidence scores for objects, we introduced a classification component that discriminates whether the object is a drone or not. Finally, we developed a more advanced scoring algorithm for object tracking that we use to adjust localization confidence. Furthermore, the proposed technique won 1st Place in the Drone vs. Bird Challenge (Workshop on Small-Drone Surveillance, Detection and Counteraction Techniques at ICIAP 2021).

Via

Access Paper or Ask Questions

State-of-the-Art in Nudity Classification: A Comparative Analysis

Dec 26, 2023

Fatih Cagatay Akyon, Alptekin Temizel

Figure 1 for State-of-the-Art in Nudity Classification: A Comparative Analysis

Figure 2 for State-of-the-Art in Nudity Classification: A Comparative Analysis

Figure 3 for State-of-the-Art in Nudity Classification: A Comparative Analysis

Figure 4 for State-of-the-Art in Nudity Classification: A Comparative Analysis

Abstract:This paper presents a comparative analysis of existing nudity classification techniques for classifying images based on the presence of nudity, with a focus on their application in content moderation. The evaluation focuses on CNN-based models, vision transformer, and popular open-source safety checkers from Stable Diffusion and Large-scale Artificial Intelligence Open Network (LAION). The study identifies the limitations of current evaluation datasets and highlights the need for more diverse and challenging datasets. The paper discusses the potential implications of these findings for developing more accurate and effective image classification systems on online platforms. Overall, the study emphasizes the importance of continually improving image classification models to ensure the safety and well-being of platform users. The project page, including the demonstrations and results is publicly available at https://github.com/fcakyon/content-moderation-deep-learning.

* Published at ICASSP 2023

Via

Access Paper or Ask Questions

Deep Architectures for Content Moderation and Movie Content Rating

Dec 12, 2022

Fatih Cagatay Akyon, Alptekin Temizel

Abstract:Rating a video based on its content is an important step for classifying video age categories. Movie content rating and TV show rating are the two most common rating systems established by professional committees. However, manually reviewing and evaluating scene/film content by a committee is a tedious work and it becomes increasingly difficult with the ever-growing amount of online video content. As such, a desirable solution is to use computer vision based video content analysis techniques to automate the evaluation process. In this paper, related works are summarized for action recognition, multi-modal learning, movie genre classification, and sensitive content detection in the context of content moderation and movie content rating. The project page is available at https://github.com/fcakyon/content-moderation-deep-learning.

Via

Access Paper or Ask Questions

Sequence Models for Drone vs Bird Classification

Jul 21, 2022

Fatih Cagatay Akyon, Erdem Akagunduz, Sinan Onur Altinuc, Alptekin Temizel

Figure 1 for Sequence Models for Drone vs Bird Classification

Figure 2 for Sequence Models for Drone vs Bird Classification

Figure 3 for Sequence Models for Drone vs Bird Classification

Figure 4 for Sequence Models for Drone vs Bird Classification

Abstract:Drone detection has become an essential task in object detection as drone costs have decreased and drone technology has improved. It is, however, difficult to detect distant drones when there is weak contrast, long range, and low visibility. In this work, we propose several sequence classification architectures to reduce the detected false-positive ratio of drone tracks. Moreover, we propose a new drone vs. bird sequence classification dataset to train and evaluate the proposed architectures. 3D CNN, LSTM, and Transformer based sequence classification architectures have been trained on the proposed dataset to show the effectiveness of the proposed idea. As experiments show, using sequence information, bird classification and overall F1 scores can be increased by up to 73% and 35%, respectively. Among all sequence classification models, R(2+1)D-based fully convolutional model yields the best transfer learning and fine-tuning results.

* Submitted to AVSS 2022

Via

Access Paper or Ask Questions

Classification of Intra-Pulse Modulation of Radar Signals by Feature Fusion Based Convolutional Neural Networks

May 19, 2022

Fatih Cagatay Akyon, Yasar Kemal Alp, Gokhan Gok, Orhan Arikan

Figure 1 for Classification of Intra-Pulse Modulation of Radar Signals by Feature Fusion Based Convolutional Neural Networks

Figure 2 for Classification of Intra-Pulse Modulation of Radar Signals by Feature Fusion Based Convolutional Neural Networks

Figure 3 for Classification of Intra-Pulse Modulation of Radar Signals by Feature Fusion Based Convolutional Neural Networks

Figure 4 for Classification of Intra-Pulse Modulation of Radar Signals by Feature Fusion Based Convolutional Neural Networks

Abstract:Detection and classification of radars based on pulses they transmit is an important application in electronic warfare systems. In this work, we propose a novel deep-learning based technique that automatically recognizes intra-pulse modulation types of radar signals. Re-assigned spectrogram of measured radar signal and detected outliers of its instantaneous phases filtered by a special function are used for training multiple convolutional neural networks. Automatically extracted features from the networks are fused to distinguish frequency and phase modulated signals. Simulation results show that the proposed FF-CNN (Feature Fusion based Convolutional Neural Network) technique outperforms the current state-of-the-art alternatives and is easily scalable among broad range of modulation types.

* 2018 26th European Signal Processing Conference (EUSIPCO)
* Published at EUSIPCO2018

Via

Access Paper or Ask Questions

Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection

Feb 15, 2022

Fatih Cagatay Akyon, Sinan Onur Altinuc, Alptekin Temizel

Figure 1 for Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection

Figure 2 for Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection

Figure 3 for Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection

Figure 4 for Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection

Abstract:Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing aided inference and fine-tuning pipeline for small object detection. The proposed technique is generic in the sense that it can be applied on top of any available object detector without any fine-tuning. Experimental evaluations, using object detection baselines on the Visdrone and xView aerial object detection datasets show that the proposed inference method can increase object detection AP by 6.8%, 5.1% and 5.3% for FCOS, VFNet and TOOD detectors, respectively. Moreover, the detection accuracy can be further increased with a slicing aided fine-tuning, resulting in a cumulative increase of 12.7%, 13.4% and 14.5% AP in the same order. Proposed technique has been integrated with Detectron2, MMDetection and YOLOv5 models and it is publicly available at https://github.com/obss/sahi.git .

* Submitted to ICIP 2022, 5 pages, 4 figures, 2 tables

Via

Access Paper or Ask Questions

Track Boosting and Synthetic Data Aided Drone Detection

Dec 01, 2021

Fatih Cagatay Akyon, Ogulcan Eryuksel, Kamil Anil Ozfuttu, Sinan Onur Altinuc

Figure 1 for Track Boosting and Synthetic Data Aided Drone Detection

Figure 2 for Track Boosting and Synthetic Data Aided Drone Detection

Figure 3 for Track Boosting and Synthetic Data Aided Drone Detection

Figure 4 for Track Boosting and Synthetic Data Aided Drone Detection

Abstract:This is the paper for the first place winning solution of the Drone vs. Bird Challenge, organized by AVSS 2021. As the usage of drones increases with lowered costs and improved drone technology, drone detection emerges as a vital object detection task. However, detecting distant drones under unfavorable conditions, namely weak contrast, long-range, low visibility, requires effective algorithms. Our method approaches the drone detection problem by fine-tuning a YOLOv5 model with real and synthetically generated data using a Kalman-based object tracker to boost detection confidence. Our results indicate that augmenting the real data with an optimal subset of synthetic data can increase the performance. Moreover, temporal information gathered by object tracking methods can increase performance further.

Via

Access Paper or Ask Questions

Automated question generation and question answering from Turkish texts using text-to-text transformers

Nov 21, 2021

Fatih Cagatay Akyon, Devrim Cavusoglu, Cemil Cengiz, Sinan Onur Altinuc, Alptekin Temizel

Figure 1 for Automated question generation and question answering from Turkish texts using text-to-text transformers

Figure 2 for Automated question generation and question answering from Turkish texts using text-to-text transformers

Figure 3 for Automated question generation and question answering from Turkish texts using text-to-text transformers

Figure 4 for Automated question generation and question answering from Turkish texts using text-to-text transformers

Abstract:While exam-style questions are a fundamental educational tool serving a variety of purposes, manual construction of questions is a complex process that requires training, experience and resources. To reduce the expenses associated with the manual construction of questions and to satisfy the need for a continuous supply of new questions, automatic question generation (QG) techniques can be utilized. However, compared to automatic question answering (QA), QG is a more challenging task. In this work, we fine-tune a multilingual T5 (mT5) transformer in a multi-task setting for QA, QG and answer extraction tasks using a Turkish QA dataset. To the best of our knowledge, this is the first academic work that attempts to perform automated text-to-text question generation from Turkish texts. Evaluation results show that the proposed multi-task setting achieves state-of-the-art Turkish question answering and question generation performance over TQuADv1, TQuADv2 datasets and XQuAD Turkish split. The source code and pre-trained models are available at https://github.com/obss/turkish-question-generation.

* 10 pages, 3 figures, 7 tables

Via

Access Paper or Ask Questions