Abstract:Transformer-based methods for RGB-D Salient Object Detection (SOD) have gained significant interest, owing to the transformer's exceptional capacity to capture long-range pixel dependencies. Nevertheless, current RGB-D SOD methods face challenges, such as the quadratic complexity of the attention mechanism and the limited local detail extraction. To overcome these limitations, we propose a novel Superpixel Token Enhancing Network (STENet), which introduces superpixels into cross-modal interaction. STENet follows the two-stream encoder-decoder structure. Its cores are two tailored superpixel-driven cross-modal interaction modules, responsible for global and local feature enhancement. Specifically, we update the superpixel generation method by expanding the neighborhood range of each superpixel, allowing for flexible transformation between pixels and superpixels. With the updated superpixel generation method, we first propose the Superpixel Attention Global Enhancing Module to model the global pixel-to-superpixel relationship rather than the traditional global pixel-to-pixel relationship, which can capture region-level information and reduce computational complexity. We also propose the Superpixel Attention Local Refining Module, which leverages pixel similarity within superpixels to filter out a subset of pixels (i.e., local pixels) and then performs feature enhancement on these local pixels, thereby capturing concerned local details. Furthermore, we fuse the globally and locally enhanced features along with the cross-scale features to achieve comprehensive feature representation. Experiments on seven RGB-D SOD datasets reveal that our STENet achieves competitive performance compared to state-of-the-art methods. The code and results of our method are available at https://github.com/Mark9010/STENet.




Abstract:Since the breakthrough of ChatGPT, large language models (LLMs) have garnered significant attention in the research community. With the development of LLMs, the question of text style transfer for conversational models has emerged as a natural extension, where chatbots may possess their own styles or even characters. However, standard evaluation metrics have not yet been established for this new settings. This paper aims to address this issue by proposing the LMStyle Benchmark, a novel evaluation framework applicable to chat-style text style transfer (C-TST), that can measure the quality of style transfer for LLMs in an automated and scalable manner. In addition to conventional style strength metrics, LMStyle Benchmark further considers a novel aspect of metrics called appropriateness, a high-level metrics take account of coherence, fluency and other implicit factors without the aid of reference samples. Our experiments demonstrate that the new evaluation methods introduced by LMStyle Benchmark have a higher correlation with human judgments in terms of appropriateness. Based on LMStyle Benchmark, we present a comprehensive list of evaluation results for popular LLMs, including LLaMA, Alpaca, and Vicuna, reflecting their stylistic properties, such as formality and sentiment strength, along with their appropriateness.




Abstract:In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining. Our goal is to provide an easy-to-use BERT pretraining toolkit for the research community and industry. Thus, the pretraining of popular language models on customized datasets is affordable with limited resources. Experiments show that, to achieve the same or better GLUE scores, the time cost of our toolkit is over $6\times$ times less for BERT Base and $9\times$ times less for BERT Large when compared with the original BERT paper. The documentation and code are released at https://github.com/extreme-bert/extreme-bert under the Apache-2.0 license.