Alert button
Picture for Xin Zhang

Xin Zhang

Alert button

Department of Agricultural and Biological Engineering, Mississippi State University

Real-time Strawberry Detection Based on Improved YOLOv5s Architecture for Robotic Harvesting in open-field environment

Sep 01, 2023
Zixuan He, Salik Ram Khanal, Xin Zhang, Manoj Karkee, Qin Zhang

This study proposed a YOLOv5-based custom object detection model to detect strawberries in an outdoor environment. The original architecture of the YOLOv5s was modified by replacing the C3 module with the C2f module in the backbone network, which provided a better feature gradient flow. Secondly, the Spatial Pyramid Pooling Fast in the final layer of the backbone network of YOLOv5s was combined with Cross Stage Partial Net to improve the generalization ability over the strawberry dataset in this study. The proposed architecture was named YOLOv5s-Straw. The RGB images dataset of the strawberry canopy with three maturity classes (immature, nearly mature, and mature) was collected in open-field environment and augmented through a series of operations including brightness reduction, brightness increase, and noise adding. To verify the superiority of the proposed method for strawberry detection in open-field environment, four competitive detection models (YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, and YOLOv8s) were trained, and tested under the same computational environment and compared with YOLOv5s-Straw. The results showed that the highest mean average precision of 80.3% was achieved using the proposed architecture whereas the same was achieved with YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, and YOLOv8s were 73.4%, 77.8%, 79.8%, 79.3%, respectively. Specifically, the average precision of YOLOv5s-Straw was 82.1% in the immature class, 73.5% in the nearly mature class, and 86.6% in the mature class, which were 2.3% and 3.7%, respectively, higher than that of the latest YOLOv8s. The model included 8.6*10^6 network parameters with an inference speed of 18ms per image while the inference speed of YOLOv8s had a slower inference speed of 21.0ms and heavy parameters of 11.1*10^6, which indicates that the proposed model is fast enough for real time strawberry detection and localization for the robotic picking.

* 20 pages; 15 figures 
Viaarxiv icon

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Aug 31, 2023
Xin Zhang, Dong Zhang, Shimin Li, Yaqian Zhou, Xipeng Qiu

Current speech large language models build upon discrete speech representations, which can be categorized into semantic tokens and acoustic tokens. However, existing speech tokens are not specifically designed for speech language modeling. To assess the suitability of speech tokens for building speech language models, we established the first benchmark, SLMTokBench. Our results indicate that neither semantic nor acoustic tokens are ideal for this purpose. Therefore, we propose SpeechTokenizer, a unified speech tokenizer for speech large language models. SpeechTokenizer adopts the Encoder-Decoder architecture with residual vector quantization (RVQ). Unifying semantic and acoustic tokens, SpeechTokenizer disentangles different aspects of speech information hierarchically across different RVQ layers. Furthermore, We construct a Unified Speech Language Model (USLM) leveraging SpeechTokenizer. Experiments show that SpeechTokenizer performs comparably to EnCodec in speech reconstruction and demonstrates strong performance on the SLMTokBench benchmark. Also, USLM outperforms VALL-E in zero-shot Text-to-Speech tasks. Code and models are available at https://github.com/ZhangXInFD/SpeechTokenizer/.

* SpeechTokenizer project page is https://0nutation.github.io/SpeechTokenizer.github.io/ 
Viaarxiv icon

MRI Field-transfer Reconstruction with Limited Data: Regularization by Neural Style Transfer

Aug 21, 2023
Guoyao Shen, Yancheng Zhu, Hernan Jara, Sean B. Andersson, Chad W. Farris, Stephan Anderson, Xin Zhang

Figure 1 for MRI Field-transfer Reconstruction with Limited Data: Regularization by Neural Style Transfer
Figure 2 for MRI Field-transfer Reconstruction with Limited Data: Regularization by Neural Style Transfer
Figure 3 for MRI Field-transfer Reconstruction with Limited Data: Regularization by Neural Style Transfer
Figure 4 for MRI Field-transfer Reconstruction with Limited Data: Regularization by Neural Style Transfer

Recent works have demonstrated success in MRI reconstruction using deep learning-based models. However, most reported approaches require training on a task-specific, large-scale dataset. Regularization by denoising (RED) is a general pipeline which embeds a denoiser as a prior for image reconstruction. The potential of RED has been demonstrated for multiple image-related tasks such as denoising, deblurring and super-resolution. In this work, we propose a regularization by neural style transfer (RNST) method to further leverage the priors from the neural transfer and denoising engine. This enables RNST to reconstruct a high-quality image from a noisy low-quality image with different image styles and limited data. We validate RNST with clinical MRI scans from 1.5T and 3T and show that RNST can significantly boost image quality. Our results highlight the capability of the RNST framework for MRI reconstruction and the potential for reconstruction tasks with limited data.

* 30 pages, 8 figures, 2 tables, 1 algorithm chart 
Viaarxiv icon

Towards General Text Embeddings with Multi-stage Contrastive Learning

Aug 07, 2023
Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang

We present GTE, a general-purpose text embedding model trained with multi-stage contrastive learning. In line with recent advancements in unifying various NLP tasks into a single format, we train a unified text embedding model by employing contrastive learning over a diverse mixture of datasets from multiple sources. By significantly increasing the number of training data during both unsupervised pre-training and supervised fine-tuning stages, we achieve substantial performance gains over existing embedding models. Notably, even with a relatively modest parameter count of 110M, GTE$_\text{base}$ outperforms the black-box embedding API provided by OpenAI and even surpasses 10x larger text embedding models on the massive text embedding benchmark. Furthermore, without additional fine-tuning on each programming language individually, our model outperforms previous best code retrievers of similar size by treating code as text. In summary, our model achieves impressive results by effectively harnessing multi-stage contrastive learning, offering a powerful and efficient text embedding model with broad applicability across various NLP and code-related tasks.

Viaarxiv icon

Data-Driven Modeling with Experimental Augmentation for the Modulation Strategy of the Dual-Active-Bridge Converter

Aug 03, 2023
Xinze Li, Josep Pou, Jiaxin Dong, Fanfan Lin, Changyun Wen, Suvajit Mukherjee, Xin Zhang

Figure 1 for Data-Driven Modeling with Experimental Augmentation for the Modulation Strategy of the Dual-Active-Bridge Converter
Figure 2 for Data-Driven Modeling with Experimental Augmentation for the Modulation Strategy of the Dual-Active-Bridge Converter
Figure 3 for Data-Driven Modeling with Experimental Augmentation for the Modulation Strategy of the Dual-Active-Bridge Converter
Figure 4 for Data-Driven Modeling with Experimental Augmentation for the Modulation Strategy of the Dual-Active-Bridge Converter

For the performance modeling of power converters, the mainstream approaches are essentially knowledge-based, suffering from heavy manpower burden and low modeling accuracy. Recent emerging data-driven techniques greatly relieve human reliance by automatic modeling from simulation data. However, model discrepancy may occur due to unmodeled parasitics, deficient thermal and magnetic models, unpredictable ambient conditions, etc. These inaccurate data-driven models based on pure simulation cannot represent the practical performance in physical world, hindering their applications in power converter modeling. To alleviate model discrepancy and improve accuracy in practice, this paper proposes a novel data-driven modeling with experimental augmentation (D2EA), leveraging both simulation data and experimental data. In D2EA, simulation data aims to establish basic functional landscape, and experimental data focuses on matching actual performance in real world. The D2EA approach is instantiated for the efficiency optimization of a hybrid modulation for neutral-point-clamped dual-active-bridge (NPC-DAB) converter. The proposed D2EA approach realizes 99.92% efficiency modeling accuracy, and its feasibility is comprehensively validated in 2-kW hardware experiments, where the peak efficiency of 98.45% is attained. Overall, D2EA is data-light and can achieve highly accurate and highly practical data-driven models in one shot, and it is scalable to other applications, effortlessly.

* IEEE.Trans.Ind.Electron. Early Access (2023) 1-11  
* 11 pages 
Viaarxiv icon

Particle swarm optimization with state-based adaptive velocity limit strategy

Aug 02, 2023
Xinze Li, Kezhi Mao, Fanfan Lin, Xin Zhang

Figure 1 for Particle swarm optimization with state-based adaptive velocity limit strategy
Figure 2 for Particle swarm optimization with state-based adaptive velocity limit strategy
Figure 3 for Particle swarm optimization with state-based adaptive velocity limit strategy
Figure 4 for Particle swarm optimization with state-based adaptive velocity limit strategy

Velocity limit (VL) has been widely adopted in many variants of particle swarm optimization (PSO) to prevent particles from searching outside the solution space. Several adaptive VL strategies have been introduced with which the performance of PSO can be improved. However, the existing adaptive VL strategies simply adjust their VL based on iterations, leading to unsatisfactory optimization results because of the incompatibility between VL and the current searching state of particles. To deal with this problem, a novel PSO variant with state-based adaptive velocity limit strategy (PSO-SAVL) is proposed. In the proposed PSO-SAVL, VL is adaptively adjusted based on the evolutionary state estimation (ESE) in which a high value of VL is set for global searching state and a low value of VL is set for local searching state. Besides that, limit handling strategies have been modified and adopted to improve the capability of avoiding local optima. The good performance of PSO-SAVL has been experimentally validated on a wide range of benchmark functions with 50 dimensions. The satisfactory scalability of PSO-SAVL in high-dimension and large-scale problems is also verified. Besides, the merits of the strategies in PSO-SAVL are verified in experiments. Sensitivity analysis for the relevant hyper-parameters in state-based adaptive VL strategy is conducted, and insights in how to select these hyper-parameters are also discussed.

* 33 pages, 8 figures 
Viaarxiv icon

Artificial-Intelligence-Based Triple Phase Shift Modulation for Dual Active Bridge Converter with Minimized Current Stress

Aug 01, 2023
Xinze Li, Xin Zhang, Fanfan Lin, Changjiang Sun, Kezhi Mao

The dual active bridge (DAB) converter has been popular in many applications for its outstanding power density and bidirectional power transfer capacity. Up to now, triple phase shift (TPS) can be considered as one of the most advanced modulation techniques for DAB converter. It can widen zero voltage switching range and improve power efficiency significantly. Currently, current stress of the DAB converter has been an important performance indicator when TPS modulation is applied for smaller size and higher efficiency. However, to minimize the current stress when the DAB converter is under TPS modulation, two difficulties exist in analysis process and realization process, respectively. Firstly, three degrees of modulation variables in TPS modulation bring challenges to the analysis of current stress in different operating modes. This analysis and deduction process leads to heavy computational burden and also suffers from low accuracy. Secondly, to realize TPS modulation, if a lookup table is adopted after the optimization of modulation variables, modulation performance will be unsatisfactory because of the discrete nature of lookup table. Therefore, an AI-based TPS modulation (AI-TPSM) strategy is proposed in this paper. Neural network (NN) and fuzzy inference system (FIS) are utilized to deal with the two difficulties mentioned above. With the proposed AI-TPSM, the optimization of TPS modulation for minimized current stress will enjoy high degree of automation which can relieve engineers' working burden and improve accuracy. In the end of this paper, the effectiveness of the proposed AI-TPSM has been experimentally verified with a 1 kW prototype.

* 12 pages, 29 figures 
Viaarxiv icon

Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency

Aug 01, 2023
Xinze Li, Xin Zhang, Fanfan Lin, Changjiang Sun, Kezhi Mao

Figure 1 for Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency
Figure 2 for Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency
Figure 3 for Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency
Figure 4 for Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency

Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to achieve good comprehensive performance. Conventionally, to design a hybrid modulation, harmonic approach or piecewise approach is used, but they suffer from time-consuming model building process and inaccuracy. Therefore, an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation is proposed. Generally, the HEPS modulation is developed in an automated fashion, which alleviates cumbersome model building process while keeping high model accuracy. In HEPS modulation, two EPS strategies are considered to realize optimal efficiency with full ZVS operation over entire operating ranges. Specifically, to build data-driven models of ZVS and efficiency performance, extreme gradient boosting (XGBoost), which is a state-of-the-art ensemble learning algorithm, is adopted. Afterwards, particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) is utilized to select the best EPS strategy and optimize modulation parameters. With 1 kW hardware experiments, the feasibility of HEPS has been verified, achieving optimal efficiency with maximum of 97.1% and full-range ZVS operation.

* 13 pages, 32 figures 
Viaarxiv icon

Evaluating Large Language Models for Radiology Natural Language Processing

Jul 27, 2023
Zhengliang Liu, Tianyang Zhong, Yiwei Li, Yutong Zhang, Yi Pan, Zihao Zhao, Peixin Dong, Chao Cao, Yuxiao Liu, Peng Shu, Yaonai Wei, Zihao Wu, Chong Ma, Jiaqi Wang, Sheng Wang, Mengyue Zhou, Zuowei Jiang, Chunlin Li, Jason Holmes, Shaochen Xu, Lu Zhang, Haixing Dai, Kai Zhang, Lin Zhao, Yuanhao Chen, Xu Liu, Peilong Wang, Pingkun Yan, Jun Liu, Bao Ge, Lichao Sun, Dajiang Zhu, Xiang Li, Wei Liu, Xiaoyan Cai, Xintao Hu, Xi Jiang, Shu Zhang, Xin Zhang, Tuo Zhang, Shijie Zhao, Quanzheng Li, Hongtu Zhu, Dinggang Shen, Tianming Liu

Figure 1 for Evaluating Large Language Models for Radiology Natural Language Processing
Figure 2 for Evaluating Large Language Models for Radiology Natural Language Processing
Figure 3 for Evaluating Large Language Models for Radiology Natural Language Processing
Figure 4 for Evaluating Large Language Models for Radiology Natural Language Processing

The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP). LLMs have revolutionized a multitude of domains, and they have made a significant impact in the medical field. Large language models are now more abundant than ever, and many of these models exhibit bilingual capabilities, proficient in both English and Chinese. However, a comprehensive evaluation of these models remains to be conducted. This lack of assessment is especially apparent within the context of radiology NLP. This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports, a crucial component of radiology NLP. Specifically, the ability to derive impressions from radiologic findings is assessed. The outcomes of this evaluation provide key insights into the performance, strengths, and weaknesses of these LLMs, informing their practical applications within the medical domain.

Viaarxiv icon