Alert button
Picture for Niladri Chatterjee

Niladri Chatterjee

Alert button

FL Games: A Federated Learning Framework for Distribution Shifts

Oct 31, 2022
Sharut Gupta, Kartik Ahuja, Mohammad Havaei, Niladri Chatterjee, Yoshua Bengio

Figure 1 for FL Games: A Federated Learning Framework for Distribution Shifts
Figure 2 for FL Games: A Federated Learning Framework for Distribution Shifts
Figure 3 for FL Games: A Federated Learning Framework for Distribution Shifts
Figure 4 for FL Games: A Federated Learning Framework for Distribution Shifts

Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server. However, participating clients typically each hold data from a different distribution, which can yield to catastrophic generalization on data from a different client, which represents a new domain. In this work, we argue that in order to generalize better across non-i.i.d. clients, it is imperative to only learn correlations that are stable and invariant across domains. We propose FL GAMES, a game-theoretic framework for federated learning that learns causal features that are invariant across clients. While training to achieve the Nash equilibrium, the traditional best response strategy suffers from high-frequency oscillations. We demonstrate that FL GAMES effectively resolves this challenge and exhibits smooth performance curves. Further, FL GAMES scales well in the number of clients, requires significantly fewer communication rounds, and is agnostic to device heterogeneity. Through empirical evaluation, we demonstrate that FL GAMES achieves high out-of-distribution performance on various benchmarks.

* Accepted as ORAL at NeurIPS Workshop on Federated Learning: Recent Advances and New Challenges. arXiv admin note: text overlap with arXiv:2205.11101 
Viaarxiv icon

FL Games: A federated learning framework for distribution shifts

May 23, 2022
Sharut Gupta, Kartik Ahuja, Mohammad Havaei, Niladri Chatterjee, Yoshua Bengio

Figure 1 for FL Games: A federated learning framework for distribution shifts
Figure 2 for FL Games: A federated learning framework for distribution shifts
Figure 3 for FL Games: A federated learning framework for distribution shifts
Figure 4 for FL Games: A federated learning framework for distribution shifts

Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server. However, participating clients typically each hold data from a different distribution, whereby predictive models with strong in-distribution generalization can fail catastrophically on unseen domains. In this work, we argue that in order to generalize better across non-i.i.d. clients, it is imperative to only learn correlations that are stable and invariant across domains. We propose FL Games, a game-theoretic framework for federated learning for learning causal features that are invariant across clients. While training to achieve the Nash equilibrium, the traditional best response strategy suffers from high-frequency oscillations. We demonstrate that FL Games effectively resolves this challenge and exhibits smooth performance curves. Further, FL Games scales well in the number of clients, requires significantly fewer communication rounds, and is agnostic to device heterogeneity. Through empirical evaluation, we demonstrate that FL Games achieves high out-of-distribution performance on various benchmarks.

Viaarxiv icon

Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations

Apr 20, 2022
Qingyu Chen, Alexis Allot, Robert Leaman, Rezarta Islamaj Doğan, Jingcheng Du, Li Fang, Wang Kai, Shuo Xu, Yuefu Zhang, Parsa Bagherzadeh, Sabine Bergler, Aakash Bhatnagar, Nidhir Bhavsar, Yung-Chun Chang, Sheng-Jie Lin, Wentai Tang, Hongtong Zhang, Ilija Tavchioski, Shubo Tian, Jinfeng Zhang, Yulia Otmakhova, Antonio Jimeno Yepes, Hang Dong, Honghan Wu, Richard Dufour, Yanis Labrak, Niladri Chatterjee, Kushagri Tandon, Fréjus Laleye, Loïc Rakotoson, Emmanuele Chersoni, Jinghang Gu, Annemarie Friedrich, Subhash Chandra Pujari, Mariia Chizhikova, Naveen Sivadasan, Naveen Sivadasan, Zhiyong Lu

Figure 1 for Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations
Figure 2 for Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations
Figure 3 for Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations
Figure 4 for Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations

The COVID-19 pandemic has been severely impacting global society since December 2019. Massive research has been undertaken to understand the characteristics of the virus and design vaccines and drugs. The related findings have been reported in biomedical literature at a rate of about 10,000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200,000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g., Diagnosis and Treatment) to the articles in LitCovid. Despite the continuing advances in biomedical text mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset, consisting of over 30,000 articles with manually reviewed topics, was created for training and testing. It is one of the largest multilabel classification datasets in biomedical scientific literature. 19 teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181, and 0.9394 for macro F1-score, micro F1-score, and instance-based F1-score, respectively. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development.

Viaarxiv icon

Interpretation of Black Box NLP Models: A Survey

Mar 31, 2022
Shivani Choudhary, Niladri Chatterjee, Subir Kumar Saha

Figure 1 for Interpretation of Black Box NLP Models: A Survey
Figure 2 for Interpretation of Black Box NLP Models: A Survey

An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.

Viaarxiv icon

Temporal Random Indexing of Context Vectors Applied to Event Detection

Sep 30, 2020
Yashank Singh, Niladri Chatterjee

Figure 1 for Temporal Random Indexing of Context Vectors Applied to Event Detection
Figure 2 for Temporal Random Indexing of Context Vectors Applied to Event Detection
Figure 3 for Temporal Random Indexing of Context Vectors Applied to Event Detection
Figure 4 for Temporal Random Indexing of Context Vectors Applied to Event Detection

In this paper we explore new representations for encoding language data. The general method of one-hot encoding grows linearly with the size of the word corpus in space-complexity. We address this by using Random Indexing(RI) of context vectors with non-zero entries. We propose a novel RI representation where we exploit the effect imposing a probability distribution on the number of randomized entries which leads to a class of RI representations. We also propose an algorithm that is log linear in the size of word corpus to track the semantic relationship of the query word to other words for suggesting the events that are relevant to the word in question. Finally we run simulations on the novel RI representations using the proposed algorithms for tweets relevant to the word "iPhone" and present results. The RI representation is shown to be faster and space efficient as compared to BoW embeddings.

* 8 pages, 12 figures 
Viaarxiv icon

An improved Bayesian TRIE based model for SMS text normalization

Aug 04, 2020
Abhinava Sikdar, Niladri Chatterjee

Figure 1 for An improved Bayesian TRIE based model for SMS text normalization
Figure 2 for An improved Bayesian TRIE based model for SMS text normalization
Figure 3 for An improved Bayesian TRIE based model for SMS text normalization
Figure 4 for An improved Bayesian TRIE based model for SMS text normalization

Normalization of SMS text, commonly known as texting language, is being pursued for more than a decade. A probabilistic approach based on the Trie data structure was proposed in literature which was found to be better performing than HMM based approaches proposed earlier in predicting the correct alternative for an out-of-lexicon word. However, success of the Trie based approach depends largely on how correctly the underlying probabilities of word occurrences are estimated. In this work we propose a structural modification to the existing Trie-based model along with a novel training algorithm and probability generation scheme. We prove two theorems on statistical properties of the proposed Trie and use them to claim that is an unbiased and consistent estimator of the occurrence probabilities of the words. We further fuse our model into the paradigm of noisy channel based error correction and provide a heuristic to go beyond a Damerau Levenshtein distance of one. We also run simulations to support our claims and show superiority of the proposed scheme over previous works.

* 7 pages, 8 figures, under review at Pattern Recognition Letters 
Viaarxiv icon

Rough Set based Aggregate Rank Measure & its Application to Supervised Multi Document Summarization

Feb 09, 2020
Nidhika Yadav, Niladri Chatterjee

Figure 1 for Rough Set based Aggregate Rank Measure & its Application to Supervised Multi Document Summarization
Figure 2 for Rough Set based Aggregate Rank Measure & its Application to Supervised Multi Document Summarization
Figure 3 for Rough Set based Aggregate Rank Measure & its Application to Supervised Multi Document Summarization
Figure 4 for Rough Set based Aggregate Rank Measure & its Application to Supervised Multi Document Summarization

Most problems in Machine Learning cater to classification and the objects of universe are classified to a relevant class. Ranking of classified objects of universe per decision class is a challenging problem. We in this paper propose a novel Rough Set based membership called Rank Measure to solve to this problem. It shall be utilized for ranking the elements to a particular class. It differs from Pawlak Rough Set based membership function which gives an equivalent characterization of the Rough Set based approximations. It becomes paramount to look beyond the traditional approach of computing memberships while handling inconsistent, erroneous and missing data that is typically present in real world problems. This led us to propose the aggregate Rank Measure. The contribution of the paper is three fold. Firstly, it proposes a Rough Set based measure to be utilized for numerical characterization of within class ranking of objects. Secondly, it proposes and establish the properties of Rank Measure and aggregate Rank Measure based membership. Thirdly, we apply the concept of membership and aggregate ranking to the problem of supervised Multi Document Summarization wherein first the important class of sentences are determined using various supervised learning techniques and are post processed using the proposed ranking measure. The results proved to have significant improvement in accuracy.

* The paper proposes a novel Rough Set based technique to compute rank in a decision system. This is further evaluated on the problem of Supervised Text Summarization. The paper contains 9 pages, illustrative examples, theoretical properties, and experimental evaluations on standard datasets 
Viaarxiv icon