Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bo Wang

Tencent, WeChat Pay

How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

Sep 18, 2024

Charlotte Bunne, Yusuf Roohani, Yanay Rosen, Ankit Gupta, Xikun Zhang, Marcel Roed, Theo Alexandrov, Mohammed AlQuraishi, Patricia Brennan, Daniel B. Burkhardt(+32 more)

Figure 1 for How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

Figure 2 for How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

Abstract:The cell is arguably the smallest unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision of AI-powered Virtual Cells, where robust representations of cells and cellular systems under different conditions are directly learned from growing biological data across measurements and scales. We discuss desired capabilities of AI Virtual Cells, including generating universal representations of biological entities across scales, and facilitating interpretable in silico experiments to predict and understand their behavior using Virtual Instruments. We further address the challenges, opportunities and requirements to realize this vision including data needs, evaluation strategies, and community standards and engagement to ensure biological accuracy and broad utility. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration. With open science collaborations across the biomedical ecosystem that includes academia, philanthropy, and the biopharma and AI industries, a comprehensive predictive understanding of cell mechanisms and interactions is within reach.

Via

Access Paper or Ask Questions

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Sep 17, 2024

Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang(+1 more)

Figure 1 for jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Figure 2 for jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Figure 3 for jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Figure 4 for jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Abstract:We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Additionally, Matryoshka Representation Learning is integrated into the training process, allowing flexible truncation of embedding dimensions without compromising performance. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks.

* 20 pages, pp11-13 references, pp14-20 appendix and experiment tables

Via

Access Paper or Ask Questions

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Sep 13, 2024

Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang(+28 more)

Figure 1 for Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Figure 2 for Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Figure 3 for Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Figure 4 for Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Abstract:We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: \textit{controlled music generation} and \textit{post-production editing}. For controlled music generation, our system enables vocal music generation with performance controls from multi-modal inputs, including style descriptions, audio references, musical scores, and voice prompts. For post-production editing, it offers interactive tools for editing lyrics and vocal melodies directly in the generated audio. We encourage readers to listen to demo audio examples at https://team.doubao.com/seed-music .

* Seed-Music technical report, 20 pages, 5 figures

Via

Access Paper or Ask Questions

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models

Sep 07, 2024

Michael Günther, Isabelle Mohr, Bo Wang, Han Xiao

Abstract:Many use cases require retrieving smaller portions of text, and dense vector-based retrieval systems often perform better with shorter text segments, as the semantics are less likely to be "over-compressed" in the embeddings. Consequently, practitioners often split text documents into smaller chunks and encode them separately. However, chunk embeddings created in this way can lose contextual information from surrounding chunks, resulting in suboptimal representations. In this paper, we introduce a novel method called "late chunking," which leverages long context embedding models to first embed all tokens of the long text, with chunking applied after the transformer model and just before mean pooling. The resulting chunk embeddings capture the full contextual information, leading to superior results across various retrieval tasks without the need for additional training. Moreover, our method is generic enough to be applied to any long-context embedding model.

* 4 pages, early draft

Via

Access Paper or Ask Questions

Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Sep 04, 2024

Rohan Jha, Bo Wang, Michael Günther, Georgios Mastrapas, Saba Sturua, Isabelle Mohr, Andreas Koukounas, Mohammad Kalim Akram, Nan Wang, Han Xiao

Figure 1 for Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Figure 2 for Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Figure 3 for Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Figure 4 for Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Abstract:Multi-vector dense models, such as ColBERT, have proven highly effective in information retrieval. ColBERT's late interaction scoring approximates the joint query-document attention seen in cross-encoders while maintaining inference efficiency closer to traditional dense retrieval models, thanks to its bi-encoder architecture and recent optimizations in indexing and search. In this paper, we introduce a novel architecture and a training framework to support long context window and multilingual retrieval. Our new model, Jina-ColBERT-v2, demonstrates strong performance across a range of English and multilingual retrieval tasks,

* 8 pages, references at pp7,8; EMNLP workshop submission

Via

Access Paper or Ask Questions

D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching

Aug 23, 2024

Jingyu Liu, Minquan Wang, Ye Ma, Bo Wang, Aozhu Chen, Quan Chen, Peng Jiang, Xirong Li

Abstract:Videos showcasing specific products are increasingly important for E-commerce. Key moments naturally exist as the first appearance of a specific product, presentation of its distinctive features, the presence of a buying link, etc. Adding proper sound effects (SFX) to these key moments, or video decoration with SFX (VDSFX), is crucial for enhancing the user engaging experience. Previous studies about adding SFX to videos perform video to SFX matching at a holistic level, lacking the ability of adding SFX to a specific moment. Meanwhile, previous studies on video highlight detection or video moment retrieval consider only moment localization, leaving moment to SFX matching untouched. By contrast, we propose in this paper D&M, a unified method that accomplishes key moment detection and moment to SFX matching simultaneously. Moreover, for the new VDSFX task we build a large-scale dataset SFX-Moment from an E-commerce platform. For a fair comparison, we build competitive baselines by extending a number of current video moment detection methods to the new task. Extensive experiments on SFX-Moment show the superior performance of the proposed method over the baselines. Code and data will be released.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

Aug 22, 2024

Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Ershuai Wang, Qin Zhou, Ziyan Huang, Pengju Lyu, Jian He, Bo Wang

Abstract:Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a large-scale and diverse dataset, including 4650 CT scans with various cancer types from over 40 medical centers. The winning team established a new state-of-the-art with a deep learning-based cascaded framework, achieving average Dice Similarity Coefficient scores of 92.3% for organs and 64.9% for lesions on the hidden multi-national testing set. The dataset and code of top teams are publicly available, offering a benchmark platform to drive further innovations https://codalab.lisn.upsaclay.fr/competitions/12239.

* MICCAI 2024 FLARE Challenge Summary

Via

Access Paper or Ask Questions

SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

Aug 17, 2024

Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan

Figure 1 for SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

Figure 2 for SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

Figure 3 for SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

Figure 4 for SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

Abstract:Sparse view NeRF is challenging because limited input images lead to an under constrained optimization problem for volume rendering. Existing methods address this issue by relying on supplementary information, such as depth maps. However, generating this supplementary information accurately remains problematic and often leads to NeRF producing images with undesired artifacts. To address these artifacts and enhance robustness, we propose SSNeRF, a sparse view semi supervised NeRF method based on a teacher student framework. Our key idea is to challenge the NeRF module with progressively severe sparse view degradation while providing high confidence pseudo labels. This approach helps the NeRF model become aware of noise and incomplete information associated with sparse views, thus improving its robustness. The novelty of SSNeRF lies in its sparse view specific augmentations and semi supervised learning mechanism. In this approach, the teacher NeRF generates novel views along with confidence scores, while the student NeRF, perturbed by the augmented input, learns from the high confidence pseudo labels. Our sparse view degradation augmentation progressively injects noise into volume rendering weights, perturbs feature maps in vulnerable layers, and simulates sparse view blurriness. These augmentation strategies force the student NeRF to recognize degradation and produce clearer rendered views. By transferring the student's parameters to the teacher, the teacher gains increased robustness in subsequent training iterations. Extensive experiments demonstrate the effectiveness of our SSNeRF in generating novel views with less sparse view degradation. We will release code upon acceptance.

Via

Access Paper or Ask Questions

SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Aug 13, 2024

Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang(+3 more)

Figure 1 for SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Figure 2 for SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Figure 3 for SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Figure 4 for SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Abstract:Large language models (LLMs) have shown remarkable achievements across various language tasks.To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Assistant (SparkRA) based on our SciLit-LLM. SparkRA is accessible online and provides three primary functions: literature investigation, paper reading, and academic writing. As of July 30, 2024, SparkRA has garnered over 50,000 registered users, with a total usage count exceeding 1.3 million.

Via

Access Paper or Ask Questions

U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training

Aug 11, 2024

Zhuoyan Liu, Bo Wang, Ye Li

Figure 1 for U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training

Figure 2 for U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training

Figure 3 for U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training

Figure 4 for U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training

Abstract:Underwater object detection has higher requirements of running speed and deployment efficiency for the detector due to its specific environmental challenges. NMS of two- or one-stage object detectors and transformer architecture of query-based end-to-end object detectors are not conducive to deployment on underwater embedded devices with limited processing power. As for the detrimental effect of underwater color cast noise, recent underwater object detectors make network architecture or training complex, which also hinders their application and deployment on underwater vehicle platforms. In this paper, we propose the Underwater DECO with improved deNoising training (U-DECN), the query-based end-to-end object detector (with ConvNet encoder-decoder architecture) for underwater color cast noise that addresses the above problems. We integrate advanced technologies from DETR variants into DECO and design optimization methods specifically for the ConvNet architecture, including Separate Contrastive DeNoising Forward and Deformable Convolution in SIM. To address the underwater color cast noise issue, we propose an underwater color denoising query to improve the generalization of the model for the biased object feature information by different color cast noise. Our U-DECN, with ResNet-50 backbone, achieves 61.4 AP (50 epochs), 63.3 AP (72 epochs), 64.0 AP (100 epochs) on DUO, and 21 FPS (5 times faster than Deformable DETR and DINO 4 FPS) on NVIDIA AGX Orin by TensorRT FP16, outperforming the other state-of-the-art query-based end-to-end object detectors. The code is available at https://github.com/LEFTeyex/U-DECN.

Via

Access Paper or Ask Questions