Alert button
Picture for Reshmi Ghosh

Reshmi Ghosh

Alert button

Leveraging Language Models to Detect Greenwashing

Oct 30, 2023
Avalon Vinella, Margaret Capetz, Rebecca Pattichis, Christina Chance, Reshmi Ghosh

In recent years, climate change repercussions have increasingly captured public interest. Consequently, corporations are emphasizing their environmental efforts in sustainability reports to bolster their public image. Yet, the absence of stringent regulations in review of such reports allows potential greenwashing. In this study, we introduce a novel methodology to train a language model on generated labels for greenwashing risk. Our primary contributions encompass: developing a mathematical formulation to quantify greenwashing risk, a fine-tuned ClimateBERT model for this problem, and a comparative analysis of results. On a test set comprising of sustainability reports, our best model achieved an average accuracy score of 86.34% and F1 score of 0.67, demonstrating that our methods show a promising direction of exploration for this task.

Viaarxiv icon

Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models

Oct 26, 2023
Reshmi Ghosh, Harjeet Singh Kajal, Sharanya Kamath, Dhuri Shrivastava, Samyadeep Basu, Hansi Zeng, Soundararajan Srinivasan

Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentation models on unstructured texts. We find that: (a) Current strategies of pre-training on a large corpus of structured text such as Wiki-727K do not help in transferability to unstructured conversational data. (b) Training from scratch with only a relatively small-sized dataset of the target unstructured domain improves the segmentation results by a significant margin. We stress-test our proposed Topic Segmentation approach by experimenting with multiple loss functions, in order to mitigate effects of imbalance in unstructured conversational datasets. Our empirical evaluation indicates that Focal Loss function is a robust alternative to Cross-Entropy and re-weighted Cross-Entropy loss function when segmenting unstructured and semi-structured chats.

* Accepted to IntelliSys 2023. arXiv admin note: substantial text overlap with arXiv:2211.14954 
Viaarxiv icon

On Surgical Fine-tuning for Language Encoders

Oct 25, 2023
Abhilasha Lodha, Gayatri Belapurkar, Saloni Chalkapurkar, Yuanming Tao, Reshmi Ghosh, Samyadeep Basu, Dmitrii Petrov, Soundararajan Srinivasan

Fine-tuning all the layers of a pre-trained neural language encoder (either using all the parameters or using parameter-efficient methods) is often the de-facto way of adapting it to a new task. We show evidence that for different downstream language tasks, fine-tuning only a subset of layers is sufficient to obtain performance that is close to and often better than fine-tuning all the layers in the language encoder. We propose an efficient metric based on the diagonal of the Fisher information matrix (FIM score), to select the candidate layers for selective fine-tuning. We show, empirically on GLUE and SuperGLUE tasks and across distinct language encoders, that this metric can effectively select layers leading to a strong downstream performance. Our work highlights that task-specific information corresponding to a given downstream task is often localized within a few layers, and tuning only those is sufficient for strong performance. Additionally, we demonstrate the robustness of the FIM score to rank layers in a manner that remains constant during the optimization process.

* Accepted to EMNLP 2023 
Viaarxiv icon

Topic Segmentation in the Wild: Towards Segmentation of Semi-structured & Unstructured Chats

Nov 27, 2022
Reshmi Ghosh, Harjeet Singh Kajal, Sharanya Kamath, Dhuri Shrivastava, Samyadeep Basu, Soundararajan Srinivasan

Figure 1 for Topic Segmentation in the Wild: Towards Segmentation of Semi-structured & Unstructured Chats
Figure 2 for Topic Segmentation in the Wild: Towards Segmentation of Semi-structured & Unstructured Chats
Figure 3 for Topic Segmentation in the Wild: Towards Segmentation of Semi-structured & Unstructured Chats
Figure 4 for Topic Segmentation in the Wild: Towards Segmentation of Semi-structured & Unstructured Chats

Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentation models on unstructured texts. We find that: (a) Current strategies of pre-training on a large corpus of structured text such as Wiki-727K do not help in transferability to unstructured texts. (b) Training from scratch with only a relatively small-sized dataset of the target unstructured domain improves the segmentation results by a significant margin.

* NeurIPS 2022 : ENLSP 
Viaarxiv icon

Reconstruction of Long-Term Historical Demand Data

Sep 10, 2022
Reshmi Ghosh, Michael Craig, H. Scott Matthews, Constantine Samaras, Laure Berti-Equille

Figure 1 for Reconstruction of Long-Term Historical Demand Data
Figure 2 for Reconstruction of Long-Term Historical Demand Data
Figure 3 for Reconstruction of Long-Term Historical Demand Data
Figure 4 for Reconstruction of Long-Term Historical Demand Data

Long-term planning of a robust power system requires the understanding of changing demand patterns. Electricity demand is highly weather sensitive. Thus, the supply side variation from introducing intermittent renewable sources, juxtaposed with variable demand, will introduce additional challenges in the grid planning process. By understanding the spatial and temporal variability of temperature over the US, the response of demand to natural variability and climate change-related effects on temperature can be separated, especially because the effects due to the former factor are not known. Through this project, we aim to better support the technology & policy development process for power systems by developing machine and deep learning 'back-forecasting' models to reconstruct multidecadal demand records and study the natural variability of temperature and its influence on demand.

* Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021  
* Accepted to Tackling Climate Change with Machine Learning Workshop, ICML 2021 
Viaarxiv icon