Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aditya Joshi

LangLingual: A Personalised, Exercise-oriented English Language Learning Tool Leveraging Large Language Models

Oct 27, 2025

Sammriddh Gupta, Sonit Singh, Aditya Joshi, Mira Kim

Abstract:Language educators strive to create a rich experience for learners, while they may be restricted in the extend of feedback and practice they can provide. We present the design and development of LangLingual, a conversational agent built using the LangChain framework and powered by Large Language Models. The system is specifically designed to provide real-time, grammar-focused feedback, generate context-aware language exercises and track learner proficiency over time. The paper discusses the architecture, implementation and evaluation of LangLingual in detail. The results indicate strong usability, positive learning outcomes and encouraging learner engagement.

* 14 pages

Via

Access Paper or Ask Questions

LLMs for Law: Evaluating Legal-Specific LLMs on Contract Understanding

Aug 11, 2025

Amrita Singh, H. Suhan Karaca, Aditya Joshi, Hye-young Paik, Jiaojiao Jiang

Abstract:Despite advances in legal NLP, no comprehensive evaluation covering multiple legal-specific LLMs currently exists for contract classification tasks in contract understanding. To address this gap, we present an evaluation of 10 legal-specific LLMs on three English language contract understanding tasks and compare them with 7 general-purpose LLMs. The results show that legal-specific LLMs consistently outperform general-purpose models, especially on tasks requiring nuanced legal understanding. Legal-BERT and Contracts-BERT establish new SOTAs on two of the three tasks, despite having 69% fewer parameters than the best-performing general-purpose LLM. We also identify CaseLaw-BERT and LexLM as strong additional baselines for contract understanding. Our results provide a holistic evaluation of legal-specific LLMs and will facilitate the development of more accurate contract understanding systems.

* Under review. 4 pages + references

Via

Access Paper or Ask Questions

What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction

Aug 11, 2025

Charlie Wyatt, Aditya Joshi, Flora Salim

Abstract:Transformer-based models primarily rely on Next Token Prediction (NTP), which predicts the next token in a sequence based on the preceding context. However, NTP's focus on single-token prediction often limits a model's ability to plan ahead or maintain long-range coherence, raising questions about how well LLMs can predict longer contexts, such as full sentences within structured documents. While NTP encourages local fluency, it provides no explicit incentive to ensure global coherence across sentence boundaries-an essential skill for reconstructive or discursive tasks. To investigate this, we evaluate three commercial LLMs (GPT-4o, Claude 3.5 Sonnet, and Gemini 2.0 Flash) on Masked Sentence Prediction (MSP) - the task of infilling a randomly removed sentence - from three domains: ROCStories (narrative), Recipe1M (procedural), and Wikipedia (expository). We assess both fidelity (similarity to the original sentence) and cohesiveness (fit within the surrounding context). Our key finding reveals that commercial LLMs, despite their superlative performance in other tasks, are poor at predicting masked sentences in low-structured domains, highlighting a gap in current model capabilities.

* Under Review

Via

Access Paper or Ask Questions

Nek Minit: Harnessing Pragmatic Metacognitive Prompting for Explainable Sarcasm Detection of Australian and Indian English

May 21, 2025

Ishmanbir Singh, Dipankar Srirag, Aditya Joshi

Abstract:Sarcasm is a challenge to sentiment analysis because of the incongruity between stated and implied sentiment. The challenge is exacerbated when the implication may be relevant to a specific country or geographical region. Pragmatic metacognitive prompting (PMP) is a cognition-inspired technique that has been used for pragmatic reasoning. In this paper, we harness PMP for explainable sarcasm detection for Australian and Indian English, alongside a benchmark dataset for standard English. We manually add sarcasm explanations to an existing sarcasm-labeled dataset for Australian and Indian English called BESSTIE, and compare the performance for explainable sarcasm detection for them with FLUTE, a standard English dataset containing sarcasm explanations. Our approach utilising PMP when evaluated on two open-weight LLMs (GEMMA and LLAMA) achieves statistically significant performance improvement across all tasks and datasets when compared with four alternative prompting strategies. We also find that alternative techniques such as agentic prompting mitigate context-related failures by enabling external knowledge retrieval. The focused contribution of our work is utilising PMP in generating sarcasm explanations for varieties of English.

* Under review. 4 pages + references

Via

Access Paper or Ask Questions

A Survey on Multimodal Music Emotion Recognition

Apr 26, 2025

Rashini Liyanarachchi, Aditya Joshi, Erik Meijering

Abstract:Multimodal music emotion recognition (MMER) is an emerging discipline in music information retrieval that has experienced a surge in interest in recent years. This survey provides a comprehensive overview of the current state-of-the-art in MMER. Discussing the different approaches and techniques used in this field, the paper introduces a four-stage MMER framework, including multimodal data selection, feature extraction, feature processing, and final emotion prediction. The survey further reveals significant advancements in deep learning methods and the increasing importance of feature fusion techniques. Despite these advancements, challenges such as the need for large annotated datasets, datasets with more modalities, and real-time processing capabilities remain. This paper also contributes to the field by identifying critical gaps in current research and suggesting potential directions for future research. The gaps underscore the importance of developing robust, scalable, a interpretable models for MMER, with implications for applications in music recommendation systems, therapeutic tools, and entertainment.

Via

Access Paper or Ask Questions

CAMU: Context Augmentation for Meme Understanding

Apr 24, 2025

Girish A. Koushik, Diptesh Kanojia, Helen Treharne, Aditya Joshi

Figure 1 for CAMU: Context Augmentation for Meme Understanding

Figure 2 for CAMU: Context Augmentation for Meme Understanding

Figure 3 for CAMU: Context Augmentation for Meme Understanding

Figure 4 for CAMU: Context Augmentation for Meme Understanding

Abstract:Social media memes are a challenging domain for hate detection because they intertwine visual and textual cues into culturally nuanced messages. We introduce a novel framework, CAMU, which leverages large vision-language models to generate more descriptive captions, a caption-scoring neural network to emphasise hate-relevant content, and parameter-efficient fine-tuning of CLIP's text encoder for an improved multimodal understanding of memes. Experiments on publicly available hateful meme datasets show that simple projection layer fine-tuning yields modest gains, whereas selectively tuning deeper text encoder layers significantly boosts performance on all evaluation metrics. Moreover, our approach attains high accuracy (0.807) and F1-score (0.806) on the Hateful Memes dataset, at par with the existing SoTA framework while being much more efficient, offering practical advantages in real-world scenarios that rely on fixed decision thresholds. CAMU also achieves the best F1-score of 0.673 on the MultiOFF dataset for offensive meme identification, demonstrating its generalisability. Additional analyses on benign confounders reveal that robust visual grounding and nuanced text representations are crucial for reliable hate and offence detection. We will publicly release CAMU along with the resultant models for further research. Disclaimer: This paper includes references to potentially disturbing, hateful, or offensive content due to the nature of the task.

* Under review at ACM MM 2025

Via

Access Paper or Ask Questions

Harnessing Test-time Adaptation for NLU tasks Involving Dialects of English

Mar 17, 2025

Duke Nguyen, Aditya Joshi, Flora Salim

Abstract:Test-time adaptation (TTA) is an excellent method which helps generalize models across domains, tasks, and distributions without the use of labeled datasets. Thus, TTA is very useful in natural language processing (NLP) in the dialectal setting, since oftentimes, models are trained on Standard American English (SAE), evaluated on Indian English or Nigerian English, of which distribution differs significantly from the former. This is especially useful since dialectal datasets are scarce. In this paper, we explore one of the most famous TTA techniques, SHOT, in dialectal NLP. We finetune and evaluate SHOT on different combinations of dialectal GLUE. Our findings show that SHOT is a viable technique when labeled datasets are unavailable. We also theoretically propose the concept of dialectal gap and show that it has a positive correlation with the effectiveness of SHOT. We also find that in many cases, finetuning on SAE yields higher performance than finetuning on dialectal data. Our code is available at https://github.com/dukenguyenxyz/dialect-adaptation

Via

Access Paper or Ask Questions

RACCOON: A Retrieval-Augmented Generation Approach for Location Coordinate Capture from News Articles

Jan 20, 2025

Jonathan Lin, Aditya Joshi, Hye-young Paik, Tri Dung Doung, Deepti Gurdasani

Figure 1 for RACCOON: A Retrieval-Augmented Generation Approach for Location Coordinate Capture from News Articles

Figure 2 for RACCOON: A Retrieval-Augmented Generation Approach for Location Coordinate Capture from News Articles

Figure 3 for RACCOON: A Retrieval-Augmented Generation Approach for Location Coordinate Capture from News Articles

Figure 4 for RACCOON: A Retrieval-Augmented Generation Approach for Location Coordinate Capture from News Articles

Abstract:Geocoding involves automatic extraction of location coordinates of incidents reported in news articles, and can be used for epidemic intelligence or disaster management. This paper introduces Retrieval-Augmented Coordinate Capture Of Online News articles (RACCOON), an open-source geocoding approach that extracts geolocations from news articles. RACCOON uses a retrieval-augmented generation (RAG) approach where candidate locations and associated information are retrieved in the form of context from a location database, and a prompt containing the retrieved context, location mentions and news articles is fed to an LLM to generate the location coordinates. Our evaluation on three datasets, two underlying LLMs, three baselines and several ablation tests based on the components of RACCOON demonstrate the utility of RACCOON. To the best of our knowledge, RACCOON is the first RAG-based approach for geocoding using pre-trained LLMs.

* Accepted at WWW 2025 as a short paper. 4 pages with references

Via

Access Paper or Ask Questions

BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Dec 06, 2024

Dipankar Srirag, Aditya Joshi, Jordan Painter, Diptesh Kanojia

Figure 1 for BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Figure 2 for BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Figure 3 for BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Figure 4 for BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Abstract:Despite large language models (LLMs) being known to exhibit bias against non-mainstream varieties, there are no known labeled datasets for sentiment analysis of English. To address this gap, we introduce BESSTIE, a benchmark for sentiment and sarcasm classification for three varieties of English: Australian (en-AU), Indian (en-IN), and British (en-UK). Using web-based content from two domains, namely, Google Place reviews and Reddit comments, we collect datasets for these language varieties using two methods: location-based and topic-based filtering. Native speakers of the language varieties manually annotate the datasets with sentiment and sarcasm labels. Subsequently, we fine-tune nine large language models (LLMs) (representing a range of encoder/decoder and mono/multilingual models) on these datasets, and evaluate their performance on the two tasks. Our results reveal that the models consistently perform better on inner-circle varieties (i.e., en-AU and en-UK), with significant performance drops for en-IN, particularly in sarcasm detection. We also report challenges in cross-variety generalisation, highlighting the need for language variety-specific datasets such as ours. BESSTIE promises to be a useful evaluative benchmark for future research in equitable LLMs, specifically in terms of language varieties. The BESSTIE datasets, code, and models are currently available on request, while the paper is under review. Please email aditya.joshi@unsw.edu.au.

* 10 pages, 7 figures, under review

Via

Access Paper or Ask Questions

Comparison of Multilingual and Bilingual Models for Satirical News Detection of Arabic and English

Nov 16, 2024

Omar W. Abdalla, Aditya Joshi, Rahat Masood, Salil S. Kanhere

Figure 1 for Comparison of Multilingual and Bilingual Models for Satirical News Detection of Arabic and English

Figure 2 for Comparison of Multilingual and Bilingual Models for Satirical News Detection of Arabic and English

Figure 3 for Comparison of Multilingual and Bilingual Models for Satirical News Detection of Arabic and English

Abstract:Satirical news is real news combined with a humorous comment or exaggerated content, and it often mimics the format and style of real news. However, satirical news is often misunderstood as misinformation, especially by individuals from different cultural and social backgrounds. This research addresses the challenge of distinguishing satire from truthful news by leveraging multilingual satire detection methods in English and Arabic. We explore both zero-shot and chain-of-thought (CoT) prompting using two language models, Jais-chat(13B) and LLaMA-2-chat(7B). Our results show that CoT prompting offers a significant advantage for the Jais-chat model over the LLaMA-2-chat model. Specifically, Jais-chat achieved the best performance, with an F1-score of 80\% in English when using CoT prompting. These results highlight the importance of structured reasoning in CoT, which enhances contextual understanding and is vital for complex tasks like satire detection.

* ALTA 2024 (Selected for publication)

Via

Access Paper or Ask Questions