Alert button
Picture for Gang Sun

Gang Sun

Alert button

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

May 21, 2023
Huadai Liu, Rongjie Huang, Jinzheng He, Gang Sun, Ran Shen, Xize Cheng, Zhou Zhao

Figure 1 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Figure 2 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Figure 3 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Figure 4 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries given relational databases, which has been traditionally implemented in a cascaded manner while facing the following challenges: 1) model training is faced with the major issue of data scarcity, where limited parallel data is available; and 2) the systems should be robust enough to handle diverse out-of-domain speech samples that differ from the source data. In this work, we propose the first direct speech-to-SQL parsing model Wav2SQL which avoids error compounding across cascaded systems. Specifically, 1) to accelerate speech-driven SQL parsing research in the community, we release a large-scale and multi-speaker dataset MASpider; 2) leveraging the recent progress in the large-scale pre-training, we show that it alleviates the data scarcity issue and allow for direct speech-to-SQL parsing; and 3) we include the speech re-programming and gradient reversal classifier techniques to reduce acoustic variance and learned style-agnostic representation, improving generalization to unseen out-of-domain custom data. Experimental results demonstrate that Wav2SQL avoids error compounding and achieves state-of-the-art results by up to 2.5\% accuracy improvement over the baseline.

Viaarxiv icon

SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation

May 10, 2023
Ran Shen, Gang Sun, Hao Shen, Yiling Li, Liangfeng Jin, Han Jiang

Figure 1 for SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation
Figure 2 for SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation
Figure 3 for SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation
Figure 4 for SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation

Converting text into the structured query language (Text2SQL) is a research hotspot in the field of natural language processing (NLP), which has broad application prospects. In the era of big data, the use of databases has penetrated all walks of life, in which the collected data is large in scale, diverse in variety, and wide in scope, making the data query cumbersome and inefficient, and putting forward higher requirements for the Text2SQL model. In practical applications, the current mainstream end-to-end Text2SQL model is not only difficult to build due to its complex structure and high requirements for training data, but also difficult to adjust due to massive parameters. In addition, the accuracy of the model is hard to achieve the desired result. Based on this, this paper proposes a pipelined Text2SQL method: SPSQL. This method disassembles the Text2SQL task into four subtasks--table selection, column selection, SQL generation, and value filling, which can be converted into a text classification problem, a sequence labeling problem, and two text generation problems, respectively. Then, we construct data formats of different subtasks based on existing data and improve the accuracy of the overall model by improving the accuracy of each submodel. We also use the named entity recognition module and data augmentation to optimize the overall model. We construct the dataset based on the marketing business data of the State Grid Corporation of China. Experiments demonstrate our proposed method achieves the best performance compared with the end-to-end method and other pipeline methods.

* 8 pages, 6 figures 
Viaarxiv icon

ESCM: An Efficient and Secure Communication Mechanism for UAV Networks

Apr 26, 2023
Haoxiang Luo, Yifan Wu, Gang Sun, Hongfang Yu, Shizhong Xu, Mohsen Guizani

Figure 1 for ESCM: An Efficient and Secure Communication Mechanism for UAV Networks
Figure 2 for ESCM: An Efficient and Secure Communication Mechanism for UAV Networks
Figure 3 for ESCM: An Efficient and Secure Communication Mechanism for UAV Networks
Figure 4 for ESCM: An Efficient and Secure Communication Mechanism for UAV Networks

UAV (unmanned aerial vehicle) is gradually entering various human activities. It has also become an important part of satellite-air-ground-sea integrated network (SAGS) for 6G communication. In order to achieve high mobility, UAV has strict requirements on communication latency, and it cannot be illegally controlled as weapons of attack with malicious intentions. Therefore, an efficient and secure communication method specifically designed for UAV network is required. This paper proposes a communication mechanism named ESCM for the above requirements. For high efficiency of communication, ESCM designs a routing protocol based on artificial bee colony algorithm (ABC) for UAV network to accelerate communication between UAVs. Meanwhile, we plan to use blockchain to guarantee the communication security of UAV networks. However, blockchain has unstable links in high mobility network scenarios, resulting in low consensus efficiency and high communication overhead. Therefore, ESCM also introduces the concept of the digital twin, mapping the UAVs from the physical world into Cyberspace, transforming the UAV network into a static network. And this virtual UAV network is called CyberUAV. Then, in CyberUAV, we design a blockchain system and propose a consensus algorithm based on network coding, named proof of network coding (PoNC). PoNC not only ensures the security of ESCM, but also further improves the performance of ESCM through network coding. Simulation results show that ESCM has obvious advantages in communication efficiency and security. Moreover, encoding messages through PoNC consensus can increase the network throughput, and make mobile blockchain static through digital twin can improve the consensus success rate.

Viaarxiv icon

Performance Analysis and Comparison of Non-ideal Wireless PBFT and RAFT Consensus Networks in 6G Communications

Apr 18, 2023
Haoxiang Luo, Xiangyue Yang, Hongfang Yu, Gang Sun, Shizhong Xu, Bo Lei

Figure 1 for Performance Analysis and Comparison of Non-ideal Wireless PBFT and RAFT Consensus Networks in 6G Communications
Figure 2 for Performance Analysis and Comparison of Non-ideal Wireless PBFT and RAFT Consensus Networks in 6G Communications
Figure 3 for Performance Analysis and Comparison of Non-ideal Wireless PBFT and RAFT Consensus Networks in 6G Communications
Figure 4 for Performance Analysis and Comparison of Non-ideal Wireless PBFT and RAFT Consensus Networks in 6G Communications

Due to advantages in security and privacy, blockchain is considered a key enabling technology to support 6G communications. Practical Byzantine Fault Tolerance (PBFT) and RAFT are seen as the most applicable consensus mechanisms (CMs) in blockchain-enabled wireless networks. However, previous studies on PBFT and RAFT rarely consider the channel performance of the physical layer, such as path loss and channel fading, resulting in research results that are far from real networks. Additionally, 6G communications will widely deploy high-frequency signals such as terahertz (THz) and millimeter wave (mmWave), while performances of PBFT and RAFT are still unknown when these signals are transmitted in wireless PBFT or RAFT networks. Therefore, it is urgent to study the performance of non-ideal wireless PBFT and RAFT networks with THz and mmWave signals, to better make PBFT and RAFT play a role in the 6G era. In this paper, we study and compare the performance of THz and mmWave signals in non-ideal wireless PBFT and RAFT networks, considering Rayleigh Fading (RF) and close-in Free Space (FS) reference distance path loss. Performance is evaluated by five metrics: consensus success rate, latency, throughput, reliability gain, and energy consumption. Meanwhile, we find and derive that there is a maximum distance between two nodes that can make CMs inevitably successful, and it is named the active distance of CMs. The research results not only analyze the performance of non-ideal wireless PBFT and RAFT networks, but also provide important references for the future transmission of THz and mmWave signals in PBFT and RAFT networks.

* arXiv admin note: substantial text overlap with arXiv:2303.15759 
Viaarxiv icon

Performance Analysis of Non-ideal Wireless PBFT Networks with mmWave and Terahertz Signals

Mar 28, 2023
Haoxiang Luo, Xiangyue Yang, Hongfang Yu, Gang Sun, Shizhong Xu, Long Luo

Figure 1 for Performance Analysis of Non-ideal Wireless PBFT Networks with mmWave and Terahertz Signals
Figure 2 for Performance Analysis of Non-ideal Wireless PBFT Networks with mmWave and Terahertz Signals
Figure 3 for Performance Analysis of Non-ideal Wireless PBFT Networks with mmWave and Terahertz Signals
Figure 4 for Performance Analysis of Non-ideal Wireless PBFT Networks with mmWave and Terahertz Signals

Due to advantages in security and privacy, blockchain is considered a key enabling technology to support 6G communications. Practical Byzantine Fault Tolerance (PBFT) is seen as the most applicable consensus mechanism in blockchain-enabled wireless networks. However, previous studies on PBFT do not consider the channel performance of the physical layer, such as path loss and channel fading, resulting in research results that are far from real networks. Additionally, 6G communications will widely deploy high frequency signals such as millimeter wave (mmWave) and terahertz (THz), while the performance of PBFT is still unknown when these signals are transmitted in wireless PBFT networks. Therefore, it is urgent to study the performance of non-ideal wireless PBFT networks with mmWave and THz siganls, so as to better make PBFT play a role in 6G era. In this paper, we study and compare the performance of mmWave and THz signals in non-ideal wireless PBFT networks, considering Rayleigh Fading (RF) and close-in Free Space (FS) reference distance path loss. Performance is evaluated by consensus success rate and delay. Meanwhile, we find and derive that there is a maximum distance between two nodes that can make PBFT consensus inevitably successful, and it is named active distance of PBFT in this paper. The research results not only analyze the performance of non-ideal wireless PBFT networks, but also provide an important reference for the future transmission of mmWave and THz signals in PBFT networks.

* IEEE International Conference on Metaverse Computing, Networking and Applications (MetaCom) 2023 
Viaarxiv icon

HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification

Sep 21, 2022
Xiangzuo Huo, Gang Sun, Shengwei Tian, Yan Wang, Long Yu, Jun Long, Wendong Zhang, Aolun Li

Figure 1 for HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification
Figure 2 for HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification
Figure 3 for HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification
Figure 4 for HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification

Medical image classification has developed rapidly under the impetus of the convolutional neural network (CNN). Due to the fixed size of the receptive field of the convolution kernel, it is difficult to capture the global features of medical images. Although the self-attention-based Transformer can model long-range dependencies, it has high computational complexity and lacks local inductive bias. Much research has demonstrated that global and local features are crucial for image classification. However, medical images have a lot of noisy, scattered features, intra-class variation, and inter-class similarities. This paper proposes a three-branch hierarchical multi-scale feature fusion network structure termed as HiFuse for medical image classification as a new method. It can fuse the advantages of Transformer and CNN from multi-scale hierarchies without destroying the respective modeling so as to improve the classification accuracy of various medical images. A parallel hierarchy of local and global feature blocks is designed to efficiently extract local features and global representations at various semantic scales, with the flexibility to model at different scales and linear computational complexity relevant to image size. Moreover, an adaptive hierarchical feature fusion block (HFF block) is designed to utilize the features obtained at different hierarchical levels comprehensively. The HFF block contains spatial attention, channel attention, residual inverted MLP, and shortcut to adaptively fuse semantic information between various scale features of each branch. The accuracy of our proposed model on the ISIC2018 dataset is 7.6% higher than baseline, 21.5% on the Covid-19 dataset, and 10.4% on the Kvasir dataset. Compared with other advanced models, the HiFuse model performs the best. Our code is open-source and available from https://github.com/huoxiangzuo/HiFuse.

Viaarxiv icon

PerFED-GAN: Personalized Federated Learning via Generative Adversarial Networks

Feb 18, 2022
Xingjian Cao, Gang Sun, Hongfang Yu, Mohsen Guizani

Figure 1 for PerFED-GAN: Personalized Federated Learning via Generative Adversarial Networks
Figure 2 for PerFED-GAN: Personalized Federated Learning via Generative Adversarial Networks
Figure 3 for PerFED-GAN: Personalized Federated Learning via Generative Adversarial Networks
Figure 4 for PerFED-GAN: Personalized Federated Learning via Generative Adversarial Networks

Federated learning is gaining popularity as a distributed machine learning method that can be used to deploy AI-dependent IoT applications while protecting client data privacy and security. Due to the differences of clients, a single global model may not perform well on all clients, so the personalized federated learning method, which trains a personalized model for each client that better suits its individual needs, becomes a research hotspot. Most personalized federated learning research, however, focuses on data heterogeneity while ignoring the need for model architecture heterogeneity. Most existing federated learning methods uniformly set the model architecture of all clients participating in federated learning, which is inconvenient for each client's individual model and local data distribution requirements, and also increases the risk of client model leakage. This paper proposes a federated learning method based on co-training and generative adversarial networks(GANs) that allows each client to design its own model to participate in federated learning training independently without sharing any model architecture or parameter information with other clients or a center. In our experiments, the proposed method outperforms the existing methods in mean test accuracy by 42% when the client's model architecture and data distribution vary significantly.

Viaarxiv icon

CoFED: Cross-silo Heterogeneous Federated Multi-task Learning via Co-training

Feb 17, 2022
Xingjian Cao, Zonghang Li, Hongfang Yu, Gang Sun

Figure 1 for CoFED: Cross-silo Heterogeneous Federated Multi-task Learning via Co-training
Figure 2 for CoFED: Cross-silo Heterogeneous Federated Multi-task Learning via Co-training
Figure 3 for CoFED: Cross-silo Heterogeneous Federated Multi-task Learning via Co-training
Figure 4 for CoFED: Cross-silo Heterogeneous Federated Multi-task Learning via Co-training

Federated Learning (FL) is a machine learning technique that enables participants to train high-quality models collaboratively without exchanging their private data. Participants in cross-silo FL settings are independent organizations with different task needs, and they are concerned not only with data privacy, but also with training independently their unique models due to intellectual property. Most existing FL schemes are incapability for the above scenarios. In this paper, we propose a communication-efficient FL scheme, CoFED, based on pseudo-labeling unlabeled data like co-training. To the best of our knowledge, it is the first FL scheme compatible with heterogeneous tasks, heterogeneous models, and heterogeneous training algorithms simultaneously. Experimental results show that CoFED achieves better performance with a lower communication cost. Especially for the non-IID settings and heterogeneous models, the proposed method improves the performance by 35%.

Viaarxiv icon

Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks

Oct 29, 2018
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Andrea Vedaldi

Figure 1 for Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks
Figure 2 for Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks
Figure 3 for Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks
Figure 4 for Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks

While the use of bottom-up local operators in convolutional neural networks (CNNs) matches well some of the statistics of natural images, it may also prevent such models from capturing contextual long-range feature interactions. In this work, we propose a simple, lightweight approach for better context exploitation in CNNs. We do so by introducing a pair of operators: gather, which efficiently aggregates feature responses from a large spatial extent, and excite, which redistributes the pooled information to local features. The operators are cheap, both in terms of number of added parameters and computational complexity, and can be integrated directly in existing architectures to improve their performance. Experiments on several datasets show that gather-excite can bring benefits comparable to increasing the depth of a CNN at a fraction of the cost. For example, we find ResNet-50 with gather-excite operators is able to outperform its 101-layer counterpart on ImageNet with no additional learnable parameters. We also propose a parametric gather-excite operator pair which yields further performance gains, relate it to the recently-introduced Squeeze-and-Excitation Networks, and analyse the effects of these changes to the CNN feature activation statistics.

* accepted for publication at NIPS 2018 
Viaarxiv icon

Squeeze-and-Excitation Networks

Oct 25, 2018
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Enhua Wu

Figure 1 for Squeeze-and-Excitation Networks
Figure 2 for Squeeze-and-Excitation Networks
Figure 3 for Squeeze-and-Excitation Networks
Figure 4 for Squeeze-and-Excitation Networks

The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at minimal additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251%, surpassing the winning entry of 2016 by a relative improvement of ~25%. Models and code are available at https://github.com/hujie-frank/SENet.

* journal version of the CVPR 2018 paper, submitted to IEEE Trans. PAMI 
Viaarxiv icon