Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yongfeng Huang

Neural Chinese Word Segmentation with Dictionary Knowledge

Jul 11, 2018

Junxin Liu, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, Xing Xie

Figure 1 for Neural Chinese Word Segmentation with Dictionary Knowledge

Figure 2 for Neural Chinese Word Segmentation with Dictionary Knowledge

Figure 3 for Neural Chinese Word Segmentation with Dictionary Knowledge

Figure 4 for Neural Chinese Word Segmentation with Dictionary Knowledge

Abstract:Chinese word segmentation (CWS) is an important task for Chinese NLP. Recently, many neural network based methods have been proposed for CWS. However, these methods require a large number of labeled sentences for model training, and usually cannot utilize the useful information in Chinese dictionary. In this paper, we propose two methods to exploit the dictionary information for CWS. The first one is based on pseudo labeled data generation, and the second one is based on multi-task learning. The experimental results on two benchmark datasets validate that our approach can effectively improve the performance of Chinese word segmentation, especially when training data is insufficient.

* This paper has been accepted by The Seventh CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC 2018)

Via

Access Paper or Ask Questions

Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network

Apr 23, 2018

Zhongliang Yang, Yongfeng Huang, Yiran Jiang, Yuxi Sun, Yu-Jin Zhan, Pengcheng Luo

Figure 1 for Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network

Figure 2 for Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network

Figure 3 for Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network

Figure 4 for Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network

Abstract:Automatically extracting useful information from electronic medical records along with conducting disease diagnoses is a promising task for both clinical decision support(CDS) and neural language processing(NLP). Most of the existing systems are based on artificially constructed knowledge bases, and then auxiliary diagnosis is done by rule matching. In this study, we present a clinical intelligent decision approach based on Convolutional Neural Networks(CNN), which can automatically extract high-level semantic information of electronic medical records and then perform automatic diagnosis without artificial construction of rules or knowledge bases. We use collected 18,590 copies of the real-world clinical electronic medical records to train and test the proposed model. Experimental results show that the proposed model can achieve 98.67\% accuracy and 96.02\% recall, which strongly supports that using convolutional neural network to automatically learn high-level semantic features of electronic medical records and then conduct assist diagnosis is feasible and effective.

* 9 pages, 4 figures, Accepted by Scientific Reports

Via

Access Paper or Ask Questions

Image Captioning with Object Detection and Localization

Jun 08, 2017

Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang

Figure 1 for Image Captioning with Object Detection and Localization

Figure 2 for Image Captioning with Object Detection and Localization

Figure 3 for Image Captioning with Object Detection and Localization

Figure 4 for Image Captioning with Object Detection and Localization

Abstract:Automatically generating a natural language description of an image is a task close to the heart of image understanding. In this paper, we present a multi-model neural network method closely related to the human visual system that automatically learns to describe the content of images. Our model consists of two sub-models: an object detection and localization model, which extract the information of objects and their spatial relationship in images respectively; Besides, a deep recurrent neural network (RNN) based on long short-term memory (LSTM) units with attention mechanism for sentences generation. Each word of the description will be automatically aligned to different objects of the input image when it is generated. This is similar to the attention mechanism of the human visual system. Experimental results on the COCO dataset showcase the merit of the proposed method, which outperforms previous benchmark models.

Via

Access Paper or Ask Questions

Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval

Mar 20, 2017

Yuting Hu, Liang Zheng, Yi Yang, Yongfeng Huang

Figure 1 for Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval

Figure 2 for Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval

Figure 3 for Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval

Figure 4 for Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval

Abstract:This paper contributes a new large-scale dataset for weakly supervised cross-media retrieval, named Twitter100k. Current datasets, such as Wikipedia, NUS Wide and Flickr30k, have two major limitations. First, these datasets are lacking in content diversity, i.e., only some pre-defined classes are covered. Second, texts in these datasets are written in well-organized language, leading to inconsistency with realistic applications. To overcome these drawbacks, the proposed Twitter100k dataset is characterized by two aspects: 1) it has 100,000 image-text pairs randomly crawled from Twitter and thus has no constraint in the image categories; 2) text in Twitter100k is written in informal language by the users. Since strongly supervised methods leverage the class labels that may be missing in practice, this paper focuses on weakly supervised learning for cross-media retrieval, in which only text-image pairs are exploited during training. We extensively benchmark the performance of four subspace learning methods and three variants of the Correspondence AutoEncoder, along with various text features on Wikipedia, Flickr30k and Twitter100k. Novel insights are provided. As a minor contribution, inspired by the characteristic of Twitter100k, we propose an OCR-based cross-media retrieval method. In experiment, we show that the proposed OCR-based method improves the baseline performance.

Via

Access Paper or Ask Questions