



Abstract:We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across diverse tasks. In this report, we mainly provide a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages, hoping that this report can inspire further research. Seed1.5-VL is now accessible at https://www.volcengine.com/ (Volcano Engine Model ID: doubao-1-5-thinking-vision-pro-250428)




Abstract:In recent years, facial makeup transfer has attracted growing attention due to its efficiency and flexibility in transferring makeup styles between different faces. Although recent works have achieved realistic results, most of them fail to handle heavy makeup styles with multiple colors and subtle details. Hence we propose a novel GAN model to handle heavy makeup transfer, while maintaining the robustness to different poses and expressions. Firstly, a Makeup Multi-Extraction Network is introduced to learn region-wise makeup features from multiple layers. Then, a key transferring module called Detailed Region-Adaptive Normalization is proposed to fuse different levels of makeup styles in an adaptive way, making great improvement to the quality of heavy makeup transfer. With the outputs from the two components, Makeup Transfer Network is used to perform makeup transfer. To evaluate the efficacy of our proposed method, we collected a new makeup dataset containing a wide range of heavy styles. Experiments show that our method achieves state-of-the-art results both on light and heavy makeup styles, and is robust to different poses and expressions.