Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marcelo Becker

The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks

Jun 11, 2025

João Manoel Herrera Pinheiro, Suzana Vilas Boas de Oliveira, Thiago Henrique Segreto Silva, Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Leonardo André Ambrosio, Marcelo Becker

Abstract:This research addresses the critical lack of comprehensive studies on feature scaling by systematically evaluating 12 scaling techniques - including several less common transformations - across 14 different Machine Learning algorithms and 16 datasets for classification and regression tasks. We meticulously analyzed impacts on predictive performance (using metrics such as accuracy, MAE, MSE, and $R^2$) and computational costs (training time, inference time, and memory usage). Key findings reveal that while ensemble methods (such as Random Forest and gradient boosting models like XGBoost, CatBoost and LightGBM) demonstrate robust performance largely independent of scaling, other widely used models such as Logistic Regression, SVMs, TabNet, and MLPs show significant performance variations highly dependent on the chosen scaler. This extensive empirical analysis, with all source code, experimental results, and model parameters made publicly available to ensure complete transparency and reproducibility, offers model-specific crucial guidance to practitioners on the need for an optimal selection of feature scaling techniques.

* 27 pages

Via

Access Paper or Ask Questions

A Leaf-Level Dataset for Soybean-Cotton Detection and Segmentation

Mar 03, 2025

Thiago H. Segreto, Juliano Negri, Paulo H. Polegato, João Manoel Herrera Pinheiro, Ricardo Godoy, Marcelo Becker

Abstract:Soybean and cotton are major drivers of many countries' agricultural sectors, offering substantial economic returns but also facing persistent challenges from volunteer plants and weeds that hamper sustainable management. Effectively controlling volunteer plants and weeds demands advanced recognition strategies that can identify these amidst complex crop canopies. While deep learning methods have demonstrated promising results for leaf-level detection and segmentation, existing datasets often fail to capture the complexity of real-world agricultural fields. To address this, we collected 640 high-resolution images from a commercial farm spanning multiple growth stages, weed pressures, and lighting variations. Each image is annotated at the leaf-instance level, with 7,221 soybean and 5,190 cotton leaves labeled via bounding boxes and segmentation masks, capturing overlapping foliage, small leaf size, and morphological similarities. We validate this dataset using YOLOv11, demonstrating state-of-the-art performance in accurately identifying and segmenting overlapping foliage. Our publicly available dataset supports advanced applications such as selective herbicide spraying and pest monitoring and can foster more robust, data-driven strategies for soybean-cotton management.

Via

Access Paper or Ask Questions

CropNav: a Framework for Autonomous Navigation in Real Farms

Nov 17, 2024

Mateus Valverde Gasparino, Vitor Akihiro Hisano Higuti, Arun Narenthiran Sivakumar, Andres Eduardo Baquero Velasquez, Marcelo Becker, Girish Chowdhary

Figure 1 for CropNav: a Framework for Autonomous Navigation in Real Farms

Figure 2 for CropNav: a Framework for Autonomous Navigation in Real Farms

Figure 3 for CropNav: a Framework for Autonomous Navigation in Real Farms

Figure 4 for CropNav: a Framework for Autonomous Navigation in Real Farms

Abstract:Small robots that can operate under the plant canopy can enable new possibilities in agriculture. However, unlike larger autonomous tractors, autonomous navigation for such under canopy robots remains an open challenge because Global Navigation Satellite System (GNSS) is unreliable under the plant canopy. We present a hybrid navigation system that autonomously switches between different sets of sensing modalities to enable full field navigation, both inside and outside of crop. By choosing the appropriate path reference source, the robot can accommodate for loss of GNSS signal quality and leverage row-crop structure to autonomously navigate. However, such switching can be tricky and difficult to execute over scale. Our system provides a solution by automatically switching between an exteroceptive sensing based system, such as Light Detection And Ranging (LiDAR) row-following navigation and waypoints path tracking. In addition, we show how our system can detect when the navigate fails and recover automatically extending the autonomous time and mitigating the necessity of human intervention. Our system shows an improvement of about 750 m per intervention over GNSS-based navigation and 500 m over row following navigation.

* Presented in the 2023 IEEE International Conference on Robotics and Automation (ICRA)

Via

Access Paper or Ask Questions

Breast Cancer Classification Using Gradient Boosting Algorithms Focusing on Reducing the False Negative and SHAP for Explainability

Mar 14, 2024

João Manoel Herrera Pinheiro, Marcelo Becker

Abstract:Cancer is one of the diseases that kill the most women in the world, with breast cancer being responsible for the highest number of cancer cases and consequently deaths. However, it can be prevented by early detection and, consequently, early treatment. Any development for detection or perdition this kind of cancer is important for a better healthy life. Many studies focus on a model with high accuracy in cancer prediction, but sometimes accuracy alone may not always be a reliable metric. This study implies an investigative approach to studying the performance of different machine learning algorithms based on boosting to predict breast cancer focusing on the recall metric. Boosting machine learning algorithms has been proven to be an effective tool for detecting medical diseases. The dataset of the University of California, Irvine (UCI) repository has been utilized to train and test the model classifier that contains their attributes. The main objective of this study is to use state-of-the-art boosting algorithms such as AdaBoost, XGBoost, CatBoost and LightGBM to predict and diagnose breast cancer and to find the most effective metric regarding recall, ROC-AUC, and confusion matrix. Furthermore, our study is the first to use these four boosting algorithms with Optuna, a library for hyperparameter optimization, and the SHAP method to improve the interpretability of our model, which can be used as a support to identify and predict breast cancer. We were able to improve AUC or recall for all the models and reduce the False Negative for AdaBoost and LigthGBM the final AUC were more than 99.41\% for all models.

* 9 pages, 16 figures

Via

Access Paper or Ask Questions

Automatic Routing System for Intelligent Warehouses

Jul 13, 2023

Kelen C. T. Vivaldini, Jorge P. M. Galdames, Thales B. Pasqual, Rafael M. Sobral, Roberto C. Araújo, Marcelo Becker, Glauco A. P. Caurin

Abstract:Automation of logistic processes is essential to improve productivity and reduce costs. In this context, intelligent warehouses are becoming a key to logistic systems thanks to their ability of optimizing transportation tasks and, consequently, reducing costs. This paper initially presents briefly routing systems applied on intelligent warehouses. Then, we present the approach used to develop our router system. This router system is able to solve traffic jams and collisions, generate conflict-free and optimized paths before sending the final paths to the robotic forklifts. It also verifies the progress of all tasks. When a problem occurs, the router system can change the task priorities, routes, etc. in order to avoid new conflicts. In the routing simulations, each vehicle executes its tasks starting from a predefined initial pose, moving to the desired position. Our algorithm is based on Dijkstra's shortest path and the time window approaches and it was implemented in C language. Computer simulation tests were used to validate the algorithm efficiency under different working conditions. Several simulations were carried out using the Player/Stage Simulator to test the algorithms. Thanks to the simulations, we could solve many faults and refine the algorithms before embedding them in real robots.

* 2010 IEEE International Conference on Robotics and Automation, International workshop on Robotics and Intelligent Transportation System, Full Day Workshop, May 7th 2010, Anchorage, Alaska. Organizers,Christian Laugier (INRIA, France), Ming Lin (University of North Carolina, USA), Philippe Martinet IFMA and LASMEA, France),Urbano Nunes (ISR, Portugal)

Via

Access Paper or Ask Questions

Visual Localization and Mapping in Dynamic and Changing Environments

Sep 21, 2022

João Carlos Virgolino Soares, Vivian Suzano Medeiros, Gabriel Fischer Abati, Marcelo Becker, Glauco Caurin, Marcelo Gattass, Marco Antonio Meggiolaro

Figure 1 for Visual Localization and Mapping in Dynamic and Changing Environments

Figure 2 for Visual Localization and Mapping in Dynamic and Changing Environments

Figure 3 for Visual Localization and Mapping in Dynamic and Changing Environments

Figure 4 for Visual Localization and Mapping in Dynamic and Changing Environments

Abstract:The real-world deployment of fully autonomous mobile robots depends on a robust SLAM (Simultaneous Localization and Mapping) system, capable of handling dynamic environments, where objects are moving in front of the robot, and changing environments, where objects are moved or replaced after the robot has already mapped the scene. This paper presents Changing-SLAM, a method for robust Visual SLAM in both dynamic and changing environments. This is achieved by using a Bayesian filter combined with a long-term data association algorithm. Also, it employs an efficient algorithm for dynamic keypoints filtering based on object detection that correctly identify features inside the bounding box that are not dynamic, preventing a depletion of features that could cause lost tracks. Furthermore, a new dataset was developed with RGB-D data especially designed for the evaluation of changing environments on an object level, called PUC-USP dataset. Six sequences were created using a mobile robot, an RGB-D camera and a motion capture system. The sequences were designed to capture different scenarios that could lead to a tracking failure or a map corruption. To the best of our knowledge, Changing-SLAM is the first Visual SLAM system that is robust to both dynamic and changing environments, not assuming a given camera pose or a known map, being also able to operate in real time. The proposed method was evaluated using benchmark datasets and compared with other state-of-the-art methods, proving to be highly accurate.

* 14 pages, 13 figures

Via

Access Paper or Ask Questions

EEG-Based Epileptic Seizure Prediction Using Temporal Multi-Channel Transformers

Sep 18, 2022

Ricardo V. Godoy, Tharik J. S. Reis, Paulo H. Polegato, Gustavo J. G. Lahr, Ricardo L. Saute, Frederico N. Nakano, Helio R. Machado, Americo C. Sakamoto, Marcelo Becker, Glauco A. P. Caurin

Figure 1 for EEG-Based Epileptic Seizure Prediction Using Temporal Multi-Channel Transformers

Figure 2 for EEG-Based Epileptic Seizure Prediction Using Temporal Multi-Channel Transformers

Figure 3 for EEG-Based Epileptic Seizure Prediction Using Temporal Multi-Channel Transformers

Figure 4 for EEG-Based Epileptic Seizure Prediction Using Temporal Multi-Channel Transformers

Abstract:Epilepsy is one of the most common neurological diseases, characterized by transient and unprovoked events called epileptic seizures. Electroencephalogram (EEG) is an auxiliary method used to perform both the diagnosis and the monitoring of epilepsy. Given the unexpected nature of an epileptic seizure, its prediction would improve patient care, optimizing the quality of life and the treatment of epilepsy. Predicting an epileptic seizure implies the identification of two distinct states of EEG in a patient with epilepsy: the preictal and the interictal. In this paper, we developed two deep learning models called Temporal Multi-Channel Transformer (TMC-T) and Vision Transformer (TMC-ViT), adaptations of Transformer-based architectures for multi-channel temporal signals. Moreover, we accessed the impact of choosing different preictal duration, since its length is not a consensus among experts, and also evaluated how the sample size benefits each model. Our models are compared with fully connected, convolutional, and recurrent networks. The algorithms were patient-specific trained and evaluated on raw EEG signals from the CHB-MIT database. Experimental results and statistical validation demonstrated that our TMC-ViT model surpassed the CNN architecture, state-of-the-art in seizure prediction.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions

Multi-Sensor Fusion based Robust Row Following for Compact Agricultural Robots

Jun 28, 2021

Andres Eduardo Baquero Velasquez, Vitor Akihiro Hisano Higuti, Mateus Valverde Gasparino, Arun Narenthiran Sivakumar, Marcelo Becker, Girish Chowdhary

Figure 1 for Multi-Sensor Fusion based Robust Row Following for Compact Agricultural Robots

Figure 2 for Multi-Sensor Fusion based Robust Row Following for Compact Agricultural Robots

Figure 3 for Multi-Sensor Fusion based Robust Row Following for Compact Agricultural Robots

Figure 4 for Multi-Sensor Fusion based Robust Row Following for Compact Agricultural Robots

Abstract:This paper presents a state-of-the-art LiDAR based autonomous navigation system for under-canopy agricultural robots. Under-canopy agricultural navigation has been a challenging problem because GNSS and other positioning sensors are prone to significant errors due to attentuation and multi-path caused by crop leaves and stems. Reactive navigation by detecting crop rows using LiDAR measurements is a better alternative to GPS but suffers from challenges due to occlusion from leaves under the canopy. Our system addresses this challenge by fusing IMU and LiDAR measurements using an Extended Kalman Filter framework on low-cost hardwware. In addition, a local goal generator is introduced to provide locally optimal reference trajectories to the onboard controller. Our system is validated extensively in real-world field environments over a distance of 50.88~km on multiple robots in different field conditions across different locations. We report state-of-the-art distance between intervention results, showing that our system is able to safely navigate without interventions for 386.9~m on average in fields without significant gaps in the crop rows, 56.1~m in production fields and 47.5~m in fields with gaps (space of 1~m without plants in both sides of the row).

Via

Access Paper or Ask Questions