Alert button
Picture for Neeraj Kumar

Neeraj Kumar

Alert button

Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN

Oct 27, 2023
Neeraj Kumar, Ankur Narang, Brejesh Lall

In this paper, we present a Diffusion GAN based approach (Prosodic Diff-TTS) to generate the corresponding high-fidelity speech based on the style description and content text as an input to generate speech samples within only 4 denoising steps. It leverages the novel conditional prosodic layer normalization to incorporate the style embeddings into the multi head attention based phoneme encoder and mel spectrogram decoder based generator architecture to generate the speech. The style embedding is generated by fine tuning the pretrained BERT model on auxiliary tasks such as pitch, speaking speed, emotion,gender classifications. We demonstrate the efficacy of our proposed architecture on multi-speaker LibriTTS and PromptSpeech datasets, using multiple quantitative metrics that measure generated accuracy and MOS.

Viaarxiv icon

Distraction-free Embeddings for Robust VQA

Aug 31, 2023
Atharvan Dogra, Deeksha Varshney, Ashwin Kalyan, Ameet Deshpande, Neeraj Kumar

Figure 1 for Distraction-free Embeddings for Robust VQA
Figure 2 for Distraction-free Embeddings for Robust VQA
Figure 3 for Distraction-free Embeddings for Robust VQA
Figure 4 for Distraction-free Embeddings for Robust VQA

The generation of effective latent representations and their subsequent refinement to incorporate precise information is an essential prerequisite for Vision-Language Understanding (VLU) tasks such as Video Question Answering (VQA). However, most existing methods for VLU focus on sparsely sampling or fine-graining the input information (e.g., sampling a sparse set of frames or text tokens), or adding external knowledge. We present a novel "DRAX: Distraction Removal and Attended Cross-Alignment" method to rid our cross-modal representations of distractors in the latent space. We do not exclusively confine the perception of any input information from various modalities but instead use an attention-guided distraction removal method to increase focus on task-relevant information in latent embeddings. DRAX also ensures semantic alignment of embeddings during cross-modal fusions. We evaluate our approach on a challenging benchmark (SUTD-TrafficQA dataset), testing the framework's abilities for feature and event queries, temporal relation understanding, forecasting, hypothesis, and causal analysis through extensive experiments.

Viaarxiv icon

An Effective Meaningful Way to Evaluate Survival Models

Jun 01, 2023
Shi-ang Qi, Neeraj Kumar, Mahtab Farrokh, Weijie Sun, Li-Hao Kuan, Rajesh Ranganath, Ricardo Henao, Russell Greiner

Figure 1 for An Effective Meaningful Way to Evaluate Survival Models
Figure 2 for An Effective Meaningful Way to Evaluate Survival Models
Figure 3 for An Effective Meaningful Way to Evaluate Survival Models
Figure 4 for An Effective Meaningful Way to Evaluate Survival Models

One straightforward metric to evaluate a survival prediction model is based on the Mean Absolute Error (MAE) -- the average of the absolute difference between the time predicted by the model and the true event time, over all subjects. Unfortunately, this is challenging because, in practice, the test set includes (right) censored individuals, meaning we do not know when a censored individual actually experienced the event. In this paper, we explore various metrics to estimate MAE for survival datasets that include (many) censored individuals. Moreover, we introduce a novel and effective approach for generating realistic semi-synthetic survival datasets to facilitate the evaluation of metrics. Our findings, based on the analysis of the semi-synthetic datasets, reveal that our proposed metric (MAE using pseudo-observations) is able to rank models accurately based on their performance, and often closely matches the true MAE -- in particular, is better than several alternative methods.

* Accepted to ICML 2023 
Viaarxiv icon

The ACROBAT 2022 Challenge: Automatic Registration Of Breast Cancer Tissue

May 29, 2023
Philippe Weitz, Masi Valkonen, Leslie Solorzano, Circe Carr, Kimmo Kartasalo, Constance Boissin, Sonja Koivukoski, Aino Kuusela, Dusan Rasic, Yanbo Feng, Sandra Sinius Pouplier, Abhinav Sharma, Kajsa Ledesma Eriksson, Stephanie Robertson, Christian Marzahl, Chandler D. Gatenbee, Alexander R. A. Anderson, Marek Wodzinski, Artur Jurgas, Niccolò Marini, Manfredo Atzori, Henning Müller, Daniel Budelmann, Nick Weiss, Stefan Heldmann, Johannes Lotz, Jelmer M. Wolterink, Bruno De Santi, Abhijeet Patil, Amit Sethi, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Mahtab Farrokh, Neeraj Kumar, Russell Greiner, Leena Latonen, Anne-Vibeke Laenkholm, Johan Hartman, Pekka Ruusuvuori, Mattias Rantalainen

Figure 1 for The ACROBAT 2022 Challenge: Automatic Registration Of Breast Cancer Tissue
Figure 2 for The ACROBAT 2022 Challenge: Automatic Registration Of Breast Cancer Tissue
Figure 3 for The ACROBAT 2022 Challenge: Automatic Registration Of Breast Cancer Tissue
Figure 4 for The ACROBAT 2022 Challenge: Automatic Registration Of Breast Cancer Tissue

The alignment of tissue between histopathological whole-slide-images (WSI) is crucial for research and clinical applications. Advances in computing, deep learning, and availability of large WSI datasets have revolutionised WSI analysis. Therefore, the current state-of-the-art in WSI registration is unclear. To address this, we conducted the ACROBAT challenge, based on the largest WSI registration dataset to date, including 4,212 WSIs from 1,152 breast cancer patients. The challenge objective was to align WSIs of tissue that was stained with routine diagnostic immunohistochemistry to its H&E-stained counterpart. We compare the performance of eight WSI registration algorithms, including an investigation of the impact of different WSI properties and clinical covariates. We find that conceptually distinct WSI registration methods can lead to highly accurate registration performances and identify covariates that impact performances across methods. These results establish the current state-of-the-art in WSI registration and guide researchers in selecting and developing methods.

Viaarxiv icon

KL Regularized Normalization Framework for Low Resource Tasks

Dec 21, 2022
Neeraj Kumar, Ankur Narang, Brejesh Lall

Figure 1 for KL Regularized Normalization Framework for Low Resource Tasks
Figure 2 for KL Regularized Normalization Framework for Low Resource Tasks
Figure 3 for KL Regularized Normalization Framework for Low Resource Tasks
Figure 4 for KL Regularized Normalization Framework for Low Resource Tasks

Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research has been conducted in the area of adopting large pre-trained datasets for diverse downstream tasks via fine tuning, linear probing, or prompt tuning in low resource settings. Normalization techniques are essential for accelerating training and improving the generalization of deep neural networks and have been successfully used in a wide variety of applications. A lot of normalization techniques have been proposed but the success of normalization in low resource downstream NLP and speech tasks is limited. One of the reasons is the inability to capture expressiveness by rescaling parameters of normalization. We propose KullbackLeibler(KL) Regularized normalization (KL-Norm) which make the normalized data well behaved and helps in better generalization as it reduces over-fitting, generalises well on out of domain distributions and removes irrelevant biases and features with negligible increase in model parameters and memory overheads. Detailed experimental evaluation on multiple low resource NLP and speech tasks, demonstrates the superior performance of KL-Norm as compared to other popular normalization and regularization techniques.

* arXiv admin note: text overlap with arXiv:2106.05469 by other authors 
Viaarxiv icon

Dynamic Molecular Graph-based Implementation for Biophysical Properties Prediction

Dec 20, 2022
Carter Knutson, Gihan Panapitiya, Rohith Varikoti, Neeraj Kumar

Figure 1 for Dynamic Molecular Graph-based Implementation for Biophysical Properties Prediction
Figure 2 for Dynamic Molecular Graph-based Implementation for Biophysical Properties Prediction
Figure 3 for Dynamic Molecular Graph-based Implementation for Biophysical Properties Prediction
Figure 4 for Dynamic Molecular Graph-based Implementation for Biophysical Properties Prediction

Neural Networks (GNNs) have revolutionized the molecular discovery to understand patterns and identify unknown features that can aid in predicting biophysical properties and protein-ligand interactions. However, current models typically rely on 2-dimensional molecular representations as input, and while utilization of 2\3- dimensional structural data has gained deserved traction in recent years as many of these models are still limited to static graph representations. We propose a novel approach based on the transformer model utilizing GNNs for characterizing dynamic features of protein-ligand interactions. Our message passing transformer pre-trains on a set of molecular dynamic data based off of physics-based simulations to learn coordinate construction and make binding probability and affinity predictions as a downstream task. Through extensive testing we compare our results with the existing models, our MDA-PLI model was able to outperform the molecular interaction prediction models with an RMSE of 1.2958. The geometric encodings enabled by our transformer architecture and the addition of time series data add a new dimensionality to this form of research.

* 4 pages and appendix, 3 figures, Ellis Critical assessment of molecular machine learning workshop [ML4Molecules] 2022 poster session 
Viaarxiv icon

Swarm of UAVs for Network Management in 6G: A Technical Review

Oct 06, 2022
Muhammad Asghar Khan, Neeraj Kumar, Syed Agha Hassnain Mohsan, Wali Ullah Khan, Moustafa M. Nasralla, Mohammed H. Alsharif, Justyna ywioek, Insaf Ullah

Figure 1 for Swarm of UAVs for Network Management in 6G: A Technical Review
Figure 2 for Swarm of UAVs for Network Management in 6G: A Technical Review
Figure 3 for Swarm of UAVs for Network Management in 6G: A Technical Review
Figure 4 for Swarm of UAVs for Network Management in 6G: A Technical Review

Fifth-generation (5G) cellular networks have led to the implementation of beyond 5G (B5G) networks, which are capable of incorporating autonomous services to swarm of unmanned aerial vehicles (UAVs). They provide capacity expansion strategies to address massive connectivity issues and guarantee ultra-high throughput and low latency, especially in extreme or emergency situations where network density, bandwidth, and traffic patterns fluctuate. On the one hand, 6G technology integrates AI/ML, IoT, and blockchain to establish ultra-reliable, intelligent, secure, and ubiquitous UAV networks. 6G networks, on the other hand, rely on new enabling technologies such as air interface and transmission technologies, as well as a unique network design, posing new challenges for the swarm of UAVs. Keeping these challenges in mind, this article focuses on the security and privacy, intelligence, and energy-efficiency issues faced by swarms of UAVs operating in 6G mobile networks. In this state-of-the-art review, we integrated blockchain and AI/ML with UAV networks utilizing the 6G ecosystem. The key findings are then presented, and potential research challenges are identified. We conclude the review by shedding light on future research in this emerging field of research.

* 19, 9 
Viaarxiv icon

An Overview of Violence Detection Techniques: Current Challenges and Future Directions

Sep 21, 2022
Nadia Mumtaz, Naveed Ejaz, Shabana Habib, Syed Muhammad Mohsin, Prayag Tiwari, Shahab S. Band, Neeraj Kumar

Figure 1 for An Overview of Violence Detection Techniques: Current Challenges and Future Directions
Figure 2 for An Overview of Violence Detection Techniques: Current Challenges and Future Directions
Figure 3 for An Overview of Violence Detection Techniques: Current Challenges and Future Directions
Figure 4 for An Overview of Violence Detection Techniques: Current Challenges and Future Directions

The Big Video Data generated in today's smart cities has raised concerns from its purposeful usage perspective, where surveillance cameras, among many others are the most prominent resources to contribute to the huge volumes of data, making its automated analysis a difficult task in terms of computation and preciseness. Violence Detection (VD), broadly plunging under Action and Activity recognition domain, is used to analyze Big Video data for anomalous actions incurred due to humans. The VD literature is traditionally based on manually engineered features, though advancements to deep learning based standalone models are developed for real-time VD analysis. This paper focuses on overview of deep sequence learning approaches along with localization strategies of the detected violence. This overview also dives into the initial image processing and machine learning-based VD literature and their possible advantages such as efficiency against the current complex models. Furthermore,the datasets are discussed, to provide an analysis of the current models, explaining their pros and cons with future directions in VD domain derived from an in-depth analysis of the previous methods.

* Artificial Intelligence Review 
Viaarxiv icon

De novo design of protein target specific scaffold-based Inhibitors via Reinforcement Learning

May 21, 2022
Andrew D. McNaughton, Mridula S. Bontha, Carter R. Knutson, Jenna A. Pope, Neeraj Kumar

Figure 1 for De novo design of protein target specific scaffold-based Inhibitors via Reinforcement Learning
Figure 2 for De novo design of protein target specific scaffold-based Inhibitors via Reinforcement Learning
Figure 3 for De novo design of protein target specific scaffold-based Inhibitors via Reinforcement Learning
Figure 4 for De novo design of protein target specific scaffold-based Inhibitors via Reinforcement Learning

Efficient design and discovery of target-driven molecules is a critical step in facilitating lead optimization in drug discovery. Current approaches to develop molecules for a target protein are intuition-driven, hampered by slow iterative design-test cycles due to computational challenges in utilizing 3D structural data, and ultimately limited by the expertise of the chemist - leading to bottlenecks in molecular design. In this contribution, we propose a novel framework, called 3D-MolGNN$_{RL}$, coupling reinforcement learning (RL) to a deep generative model based on 3D-Scaffold to generate target candidates specific to a protein building up atom by atom from the starting core scaffold. 3D-MolGNN$_{RL}$ provides an efficient way to optimize key features by multi-objective reward function within a protein pocket using parallel graph neural network models. The agent learns to build molecules in 3D space while optimizing the activity, binding affinity, potency, and synthetic accessibility of the candidates generated for infectious disease protein targets. Our approach can serve as an interpretable artificial intelligence (AI) tool for lead optimization with optimized activity, potency, and biophysical properties.

* Published at the MLDD workshop, ICLR 2022 
Viaarxiv icon

Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Apr 25, 2022
Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer, Soonmee Cha, Madhura Ingalhalikar, Manali Jadhav, Umang Pandey, Jitender Saini, John Garrett, Matthew Larson, Robert Jeraj, Stuart Currie, Russell Frood, Kavi Fatania, Raymond Y Huang, Ken Chang, Carmen Balana, Jaume Capellades, Josep Puig, Johannes Trenkler, Josef Pichler, Georg Necker, Andreas Haunschmidt, Stephan Meckel, Gaurav Shukla, Spencer Liem, Gregory S Alexander, Joseph Lombardo, Joshua D Palmer, Adam E Flanders, Adam P Dicker, Haris I Sair, Craig K Jones, Archana Venkataraman, Meirui Jiang, Tiffany Y So, Cheng Chen, Pheng Ann Heng, Qi Dou, Michal Kozubek, Filip Lux, Jan Michálek, Petr Matula, Miloš Keřkovský, Tereza Kopřivová, Marek Dostál, Václav Vybíhal, Michael A Vogelbaum, J Ross Mitchell, Joaquim Farinhas, Joseph A Maldjian, Chandan Ganesh Bangalore Yogananda, Marco C Pinho, Divya Reddy, James Holcomb, Benjamin C Wagner, Benjamin M Ellingson, Timothy F Cloughesy, Catalina Raymond, Talia Oughourlian, Akifumi Hagiwara, Chencai Wang, Minh-Son To, Sargam Bhardwaj, Chee Chong, Marc Agzarian, Alexandre Xavier Falcão, Samuel B Martins, Bernardo C A Teixeira, Flávia Sprenger, David Menotti, Diego R Lucio, Pamela LaMontagne, Daniel Marcus, Benedikt Wiestler, Florian Kofler, Ivan Ezhov, Marie Metz, Rajan Jain, Matthew Lee, Yvonne W Lui, Richard McKinley, Johannes Slotboom, Piotr Radojewski, Raphael Meier, Roland Wiest, Derrick Murcia, Eric Fu, Rourke Haas, John Thompson, David Ryan Ormond, Chaitra Badve, Andrew E Sloan, Vachan Vadmal, Kristin Waite, Rivka R Colen, Linmin Pei, Murat Ak, Ashok Srinivasan, J Rajiv Bapuraj, Arvind Rao, Nicholas Wang, Ota Yoshiaki, Toshio Moritani, Sevcan Turk, Joonsang Lee, Snehal Prabhudesai, Fanny Morón, Jacob Mandel, Konstantinos Kamnitsas, Ben Glocker, Luke V M Dixon, Matthew Williams, Peter Zampakis, Vasileios Panagiotopoulos, Panagiotis Tsiganos, Sotiris Alexiou, Ilias Haliassos, Evangelia I Zacharaki, Konstantinos Moustakas, Christina Kalogeropoulou, Dimitrios M Kardamakis, Yoon Seong Choi, Seung-Koo Lee, Jong Hee Chang, Sung Soo Ahn, Bing Luo, Laila Poisson, Ning Wen, Pallavi Tiwari, Ruchika Verma, Rohan Bareja, Ipsa Yadav, Jonathan Chen, Neeraj Kumar, Marion Smits, Sebastian R van der Voort, Ahmed Alafandi, Fatih Incekara, Maarten MJ Wijnenga, Georgios Kapsas, Renske Gahrmann, Joost W Schouten, Hendrikus J Dubbink, Arnaud JPE Vincent, Martin J van den Bent, Pim J French, Stefan Klein, Yading Yuan, Sonam Sharma, Tzu-Chi Tseng, Saba Adabi, Simone P Niclou, Olivier Keunen, Ann-Christin Hau, Martin Vallières, David Fortin, Martin Lepage, Bennett Landman, Karthik Ramadass, Kaiwen Xu, Silky Chotai, Lola B Chambless, Akshitkumar Mistry, Reid C Thompson, Yuriy Gusev, Krithika Bhuvaneshwar, Anousheh Sayah, Camelia Bencheqroun, Anas Belouali, Subha Madhavan, Thomas C Booth, Alysha Chelliah, Marc Modat, Haris Shuaib, Carmen Dragos, Aly Abayazeed, Kenneth Kolodziej, Michael Hill, Ahmed Abbassy, Shady Gamal, Mahmoud Mekhaimar, Mohamed Qayati, Mauricio Reyes, Ji Eun Park, Jihye Yun, Ho Sung Kim, Abhishek Mahajan, Mark Muzi, Sean Benson, Regina G H Beets-Tan, Jonas Teuwen, Alejandro Herrera-Trujillo, Maria Trujillo, William Escobar, Ana Abello, Jose Bernal, Jhon Gómez, Joseph Choi, Stephen Baek, Yusung Kim, Heba Ismael, Bryan Allen, John M Buatti, Aikaterini Kotrotsou, Hongwei Li, Tobias Weiss, Michael Weller, Andrea Bink, Bertrand Pouymayou, Hassan F Shaykh, Joel Saltz, Prateek Prasanna, Sampurna Shrestha, Kartik M Mani, David Payne, Tahsin Kurc, Enrique Pelaez, Heydy Franco-Maldonado, Francis Loayza, Sebastian Quevedo, Pamela Guevara, Esteban Torche, Cristobal Mendoza, Franco Vera, Elvis Ríos, Eduardo López, Sergio A Velastin, Godwin Ogbole, Dotun Oyekunle, Olubunmi Odafe-Oyibotha, Babatunde Osobu, Mustapha Shu'aibu, Adeleye Dorcas, Mayowa Soneye, Farouk Dako, Amber L Simpson, Mohammad Hamghalam, Jacob J Peoples, Ricky Hu, Anh Tran, Danielle Cutler, Fabio Y Moraes, Michael A Boss, James Gimpel, Deepak Kattil Veettil, Kendall Schmidt, Brian Bialecki, Sailaja Marella, Cynthia Price, Lisa Cimino, Charles Apgar, Prashant Shah, Bjoern Menze, Jill S Barnholtz-Sloan, Jason Martin, Spyridon Bakas

Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.

* federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS 
Viaarxiv icon