Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox


SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

Feb 02, 2017
Deepak Gupta, Shubham Tripathi, Asif Ekbal, Pushpak Bhattacharyya



Use of social media has grown dramatically during the last few years. Users follow informal languages in communicating through social media. The language of communication is often mixed in nature, where people transcribe their regional language with English and this technique is found to be extremely popular. Natural language processing (NLP) aims to infer the information from these text where Part-of-Speech (PoS) tagging plays an important role in getting the prosody of the written text. For the task of PoS tagging on Code-Mixed Indian Social Media Text, we develop a supervised system based on Conditional Random Field classifier. In order to tackle the problem effectively, we have focused on extracting rich linguistic features. We participate in three different language pairs, ie. English-Hindi, English-Bengali and English-Telugu on three different social media platforms, Twitter, Facebook & WhatsApp. The proposed system is able to successfully assign coarse as well as fine-grained PoS tag labels for a given a code-mixed sentence. Experiments show that our system is quite generic that shows encouraging performance levels on all the three language pairs in all the domains.

* 5 pages, ICON 2016 


Share this with someone who'll enjoy it:

   Access Paper Source



Share this with someone who'll enjoy it: