Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ting Liu

Victor

Class Binarization to NeuroEvolution for Multiclass Classification

Aug 26, 2023

Gongjin Lan, Zhenyu Gao, Lingyao Tong, Ting Liu

Abstract:Multiclass classification is a fundamental and challenging task in machine learning. The existing techniques of multiclass classification can be categorized as (i) decomposition into binary (ii) extension from binary and (iii) hierarchical classification. Decomposing multiclass classification into a set of binary classifications that can be efficiently solved by using binary classifiers, called class binarization, which is a popular technique for multiclass classification. Neuroevolution, a general and powerful technique for evolving the structure and weights of neural networks, has been successfully applied to binary classification. In this paper, we apply class binarization techniques to a neuroevolution algorithm, NeuroEvolution of Augmenting Topologies (NEAT), that is used to generate neural networks for multiclass classification. We propose a new method that applies Error-Correcting Output Codes (ECOC) to design the class binarization strategies on the neuroevolution for multiclass classification. The ECOC strategies are compared with the class binarization strategies of One-vs-One and One-vs-All on three well-known datasets Digit, Satellite, and Ecoli. We analyse their performance from four aspects of multiclass classification degradation, accuracy, evolutionary efficiency, and robustness. The results show that the NEAT with ECOC performs high accuracy with low variance. Specifically, it shows significant benefits in a flexible number of binary classifiers and strong robustness.

* 14 pages, 17 figures

Via

Access Paper or Ask Questions

A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions

Aug 24, 2023

Jiawei Lin, Jiaqi Guo, Shizhao Sun, Weijiang Xu, Ting Liu, Jian-Guang Lou, Dongmei Zhang

Figure 1 for A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions

Figure 2 for A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions

Figure 3 for A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions

Figure 4 for A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions

Abstract:Creating layouts is a fundamental step in graphic design. In this work, we propose to use text as the guidance to create graphic layouts, i.e., Text-to-Layout, aiming to lower the design barriers. Text-to-Layout is a challenging task, because it needs to consider the implicit, combined, and incomplete layout constraints from text, each of which has not been studied in previous work. To address this, we present a two-stage approach, named parse-then-place. The approach introduces an intermediate representation (IR) between text and layout to represent diverse layout constraints. With IR, Text-to-Layout is decomposed into a parse stage and a place stage. The parse stage takes a textual description as input and generates an IR, in which the implicit constraints from the text are transformed into explicit ones. The place stage generates layouts based on the IR. To model combined and incomplete constraints, we use a Transformer-based layout generation model and carefully design a way to represent constraints and layouts as sequences. Besides, we adopt the pretrain-then-finetune strategy to boost the performance of the layout generation model with large-scale unlabeled layouts. To evaluate our approach, we construct two Text-to-Layout datasets and conduct experiments on them. Quantitative results, qualitative analysis, and user studies demonstrate the effectiveness of our approach.

* Accepted by ICCV2023

Via

Access Paper or Ask Questions

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

Aug 23, 2023

Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng

Abstract:We are concerned with a challenging scenario in unpaired multiview video learning. In this case, the model aims to learn comprehensive multiview representations while the cross-view semantic information exhibits variations. We propose Semantics-based Unpaired Multiview Learning (SUM-L) to tackle this unpaired multiview learning problem. The key idea is to build cross-view pseudo-pairs and do view-invariant alignment by leveraging the semantic information of videos. To facilitate the data efficiency of multiview learning, we further perform video-text alignment for first-person and third-person videos, to fully leverage the semantic knowledge to improve video representations. Extensive experiments on multiple benchmark datasets verify the effectiveness of our framework. Our method also outperforms multiple existing view-alignment methods, under the more challenging scenario than typical paired or unpaired multimodal or multiview learning. Our code is available at https://github.com/wqtwjt1996/SUM-L.

* Proceedings of IEEE International Conference on Computer Vision (ICCV) 2023

Via

Access Paper or Ask Questions

Through the Lens of Core Competency: Survey on Evaluation of Large Language Models

Aug 15, 2023

Ziyu Zhuang, Qiguang Chen, Longxuan Ma, Mingda Li, Yi Han, Yushan Qian, Haopeng Bai, Zixian Feng, Weinan Zhang, Ting Liu

Abstract:From pre-trained language model (PLM) to large language model (LLM), the field of natural language processing (NLP) has witnessed steep performance gains and wide practical uses. The evaluation of a research field guides its direction of improvement. However, LLMs are extremely hard to thoroughly evaluate for two reasons. First of all, traditional NLP tasks become inadequate due to the excellent performance of LLM. Secondly, existing evaluation tasks are difficult to keep up with the wide range of applications in real-world scenarios. To tackle these problems, existing works proposed various benchmarks to better evaluate LLMs. To clarify the numerous evaluation tasks in both academia and industry, we investigate multiple papers concerning LLM evaluations. We summarize 4 core competencies of LLM, including reasoning, knowledge, reliability, and safety. For every competency, we introduce its definition, corresponding benchmarks, and metrics. Under this competency architecture, similar tasks are combined to reflect corresponding ability, while new tasks can also be easily added into the system. Finally, we give our suggestions on the future direction of LLM's evaluation.

Via

Access Paper or Ask Questions

Fine-grained building roof instance segmentation based on domain adapted pretraining and composite dual-backbone

Aug 10, 2023

Guozhang Liu, Baochai Peng, Ting Liu, Pan Zhang, Mengke Yuan, Chaoran Lu, Ningning Cao, Sen Zhang, Simin Huang, Tao Wang

Abstract:The diversity of building architecture styles of global cities situated on various landforms, the degraded optical imagery affected by clouds and shadows, and the significant inter-class imbalance of roof types pose challenges for designing a robust and accurate building roof instance segmentor. To address these issues, we propose an effective framework to fulfill semantic interpretation of individual buildings with high-resolution optical satellite imagery. Specifically, the leveraged domain adapted pretraining strategy and composite dual-backbone greatly facilitates the discriminative feature learning. Moreover, new data augmentation pipeline, stochastic weight averaging (SWA) training and instance segmentation based model ensemble in testing are utilized to acquire additional performance boost. Experiment results show that our approach ranks in the first place of the 2023 IEEE GRSS Data Fusion Contest (DFC) Track 1 test phase ($mAP_{50}$:50.6\%). Note-worthily, we have also explored the potential of multimodal data fusion with both optical satellite imagery and SAR data.

Via

Access Paper or Ask Questions

HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Aug 10, 2023

Chaoran Lu, Ningning Cao, Pan Zhang, Ting Liu, Baochai Peng, Guozhang Liu, Mengke Yuan, Sen Zhang, Simin Huang, Tao Wang

Figure 1 for HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Figure 2 for HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Figure 3 for HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Figure 4 for HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Abstract:Unifying the correlative single-view satellite image building extraction and height estimation tasks indicates a promising way to share representations and acquire generalist model for large-scale urban 3D reconstruction. However, the common spatial misalignment between building footprints and stereo-reconstructed nDSM height labels incurs degraded performance on both tasks. To address this issue, we propose a Height-hierarchy Guided Dual-decoder Network (HGDNet) to estimate building height. Under the guidance of synthesized discrete height-hierarchy nDSM, auxiliary height-hierarchical building extraction branch enhance the height estimation branch with implicit constraints, yielding an accuracy improvement of more than 6% on the DFC 2023 track2 dataset. Additional two-stage cascade architecture is adopted to achieve more accurate building extraction. Experiments on the DFC 2023 Track 2 dataset shows the superiority of the proposed method in building height estimation ({\delta}1:0.8012), instance extraction (AP50:0.7730), and the final average score 0.7871 ranks in the first place in test phase.

Via

Access Paper or Ask Questions

EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

Jul 17, 2023

Jiacheng Ruan, Mingye Xie, Jingsheng Gao, Ting Liu, Yuzhuo Fu

Figure 1 for EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

Figure 2 for EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

Figure 3 for EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

Figure 4 for EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

Abstract:Transformer and its variants have been widely used for medical image segmentation. However, the large number of parameter and computational load of these models make them unsuitable for mobile health applications. To address this issue, we propose a more efficient approach, the Efficient Group Enhanced UNet (EGE-UNet). We incorporate a Group multi-axis Hadamard Product Attention module (GHPA) and a Group Aggregation Bridge module (GAB) in a lightweight manner. The GHPA groups input features and performs Hadamard Product Attention mechanism (HPA) on different axes to extract pathological information from diverse perspectives. The GAB effectively fuses multi-scale information by grouping low-level features, high-level features, and a mask generated by the decoder at each stage. Comprehensive experiments on the ISIC2017 and ISIC2018 datasets demonstrate that EGE-UNet outperforms existing state-of-the-art methods. In short, compared to the TransFuse, our model achieves superior segmentation performance while reducing parameter and computation costs by 494x and 160x, respectively. Moreover, to our best knowledge, this is the first model with a parameter count limited to just 50KB. Our code is available at https://github.com/JCruan519/EGE-UNet.

* 10 pages, 4 figures, 2 tables. This paper has been early accepted by MICCAI 2023 and has received the MICCAI Student-Author Registration (STAR) Award

Via

Access Paper or Ask Questions

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Jul 06, 2023

Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman(+7 more)

Figure 1 for VideoGLUE: Video General Understanding Evaluation of Foundation Models

Figure 2 for VideoGLUE: Video General Understanding Evaluation of Foundation Models

Figure 3 for VideoGLUE: Video General Understanding Evaluation of Foundation Models

Figure 4 for VideoGLUE: Video General Understanding Evaluation of Foundation Models

Abstract:We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task. Moreover, we propose a scalar VideoGLUE score (VGS) to measure an FMs efficacy and efficiency when adapting to general video understanding tasks. Our main findings are as follows. First, task-specialized models significantly outperform the six FMs studied in this work, in sharp contrast to what FMs have achieved in natural language and image understanding. Second,video-native FMs, whose pretraining data contains the video modality, are generally better than image-native FMs in classifying motion-rich videos, localizing actions in time, and understanding a video of more than one action. Third, the video-native FMs can perform well on video tasks under light adaptations to downstream tasks(e.g., freezing the FM backbones), while image-native FMs win in full end-to-end finetuning. The first two observations reveal the need and tremendous opportunities to conduct research on video-focused FMs, and the last confirms that both tasks and adaptation methods matter when it comes to the evaluation of FMs.

Via

Access Paper or Ask Questions

UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language

Jul 06, 2023

Nuwa Xi, Sendong Zhao, Haochun Wang, Chi Liu, Bing Qin, Ting Liu

Figure 1 for UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language

Figure 2 for UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language

Figure 3 for UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language

Figure 4 for UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language

Abstract:Decoding text stimuli from cognitive signals (e.g. fMRI) enhances our understanding of the human language system, paving the way for building versatile Brain-Computer Interface. However, existing studies largely focus on decoding individual word-level fMRI volumes from a restricted vocabulary, which is far too idealized for real-world application. In this paper, we propose fMRI2text, the first openvocabulary task aiming to bridge fMRI time series and human language. Furthermore, to explore the potential of this new task, we present a baseline solution, UniCoRN: the Unified Cognitive Signal ReconstructioN for Brain Decoding. By reconstructing both individual time points and time series, UniCoRN establishes a robust encoder for cognitive signals (fMRI & EEG). Leveraging a pre-trained language model as decoder, UniCoRN proves its efficacy in decoding coherent text from fMRI series across various split settings. Our model achieves a 34.77% BLEU score on fMRI2text, and a 37.04% BLEU when generalized to EEGto-text decoding, thereby surpassing the former baseline. Experimental results indicate the feasibility of decoding consecutive fMRI volumes, and the effectiveness of decoding different cognitive signals using a unified structure.

* the 61st Annual Meeting of the Association for Computational Linguistics

Via

Access Paper or Ask Questions

I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

Jun 09, 2023

Longxuan Ma, Weinan Zhang, Shuhan Zhou, Churui Sun, Changxin Ke, Ting Liu

Figure 1 for I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

Figure 2 for I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

Figure 3 for I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

Figure 4 for I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

Abstract:A simile is a figure of speech that compares two different things (called the tenor and the vehicle) via shared properties. The tenor and the vehicle are usually connected with comparator words such as "like" or "as". The simile phenomena are unique and complex in a real-life dialogue scene where the tenor and the vehicle can be verbal phrases or sentences, mentioned by different speakers, exist in different sentences, or occur in reversed order. However, the current simile research usually focuses on similes in a triplet tuple (tenor, property, vehicle) or a single sentence where the tenor and vehicle are usually entities or noun phrases, which could not reflect complex simile phenomena in real scenarios. In this paper, we propose a novel and high-quality multilingual simile dialogue (MSD) dataset to facilitate the study of complex simile phenomena. The MSD is the largest manually annotated simile data ($\sim$20K) and it contains both English and Chinese data. Meanwhile, the MSD data can also be used on dialogue tasks to test the ability of dialogue systems when using similes. We design 3 simile tasks (recognition, interpretation, and generation) and 2 dialogue tasks (retrieval and generation) with MSD. For each task, we provide experimental results from strong pre-trained or state-of-the-art models. The experiments demonstrate the challenge of MSD and we have released the data/code on GitHub.

* 13 Pages, 1 Figure, 12 Tables, ACL 2023 findings

Via

Access Paper or Ask Questions