Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daisuke Okanohara

Preferred Elements, Inc.

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Apr 24, 2025

Rei Higuchi, Ryotaro Kawata, Naoki Nishikawa, Kazusato Oko, Shoichiro Yamaguchi, Sosuke Kobayashi, Seiya Tokui, Kohei Hayashi, Daisuke Okanohara, Taiji Suzuki

Abstract:The ability to acquire latent semantics is one of the key properties that determines the performance of language models. One convenient approach to invoke this ability is to prepend metadata (e.g. URLs, domains, and styles) at the beginning of texts in the pre-training data, making it easier for the model to access latent semantics before observing the entire text. Previous studies have reported that this technique actually improves the performance of trained models in downstream tasks; however, this improvement has been observed only in specific downstream tasks, without consistent enhancement in average next-token prediction loss. To understand this phenomenon, we closely investigate how prepending metadata during pre-training affects model performance by examining its behavior using artificial data. Interestingly, we found that this approach produces both positive and negative effects on the downstream tasks. We demonstrate that the effectiveness of the approach depends on whether latent semantics can be inferred from the downstream task's prompt. Specifically, through investigations using data generated by probabilistic context-free grammars, we show that training with metadata helps improve model's performance when the given context is long enough to infer the latent semantics. In contrast, the technique negatively impacts performance when the context lacks the necessary information to make an accurate posterior inference.

Via

Access Paper or Ask Questions

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Oct 10, 2024

Kenshin Abe, Kaizaburo Chubachi, Yasuhiro Fujita, Yuta Hirokawa, Kentaro Imajo, Toshiki Kataoka, Hiroyoshi Komatsu, Hiroaki Mikami, Tsuguo Mogami, Shogo Murai(+9 more)

Figure 1 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Figure 2 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Figure 3 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Figure 4 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Abstract:We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4.

Via

Access Paper or Ask Questions

Speed-accuracy trade-off for the diffusion models: Wisdom from nonequlibrium thermodynamics and optimal transport

Jul 05, 2024

Kotaro Ikeda, Tomoya Uda, Daisuke Okanohara, Sosuke Ito

Abstract:We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Based on the techniques of stochastic thermodynamics, we derive the speed-accuracy trade-off for the diffusion models, which is a trade-off relationship between the speed and accuracy of data generation in diffusion models. Our result implies that the entropy production rate in the forward process affects the errors in data generation. From a stochastic thermodynamic perspective, our results provide quantitative insight into how best to generate data in diffusion models. The optimal learning protocol is introduced by the conservative force in stochastic thermodynamics and the geodesic of space by the 2-Wasserstein distance in optimal transport theory. We numerically illustrate the validity of the speed-accuracy trade-off for the diffusion models with different noise schedules such as the cosine schedule, the conditional optimal transport, and the optimal transport.

* 26 pages, 5 figures

Via

Access Paper or Ask Questions

Generative Model for Constructing Reaction Path from Initial to Final States

Jan 19, 2024

Akihide Hayashi, So Takamoto, Ju Li, Daisuke Okanohara

Abstract:Mapping out reaction pathways and their corresponding activation barriers is a significant aspect of molecular simulation. Given their inherent complexity and nonlinearity, even generating a initial guess of these paths remains a challenging problem. Presented in this paper is an innovative approach that utilizes neural networks to generate initial guess for these reaction pathways. The proposed method is initiated by inputting the coordinates of the initial state, followed by progressive alterations to its structure. This iterative process culminates in the generation of the approximate representation of the reaction path and the coordinates of the final state. The application of this method extends to complex reaction pathways illustrated by organic reactions. Training was executed on the Transition1x dataset, an organic reaction pathway dataset. The results revealed generation of reactions that bore substantial similarities with the corresponding test data. The method's flexibility allows for reactions to be generated either to conform to predetermined conditions or in a randomized manner.

Via

Access Paper or Ask Questions

Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Feb 26, 2022

Tomoki Ando, Hiroto Iino, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata

Figure 1 for Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Figure 2 for Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Figure 3 for Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Figure 4 for Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Abstract:We propose a new method for collision-free path planning by Conditional Generative Adversarial Networks (cGANs) by mapping its latent space to only the collision-free areas of the robot joint space when an obstacle map is given as a condition. When manipulating a robot arm, it is necessary to generate a trajectory that avoids contact with the robot itself or the surrounding environment for safety reasons, and it is convenient to generate multiple arbitrary trajectories appropriate for respective purposes. In the proposed method, various trajectories to avoid obstacles can be generated by connecting the start and goal with arbitrary line segments in this latent space. Our method simply provides this collision-free latent space after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated and actual UR5e 6-DoF robotic arm. We confirmed that different trajectories can be generated according to different optimization conditions.

* 8 pages, 6 figures. Submitted to RA-L (IEEE Robotics and Automation Letters) with IROS 2022 Option. An accompanying video is available at https://www.youtube.com/watch?v=bZTbWxLt6Bo. arXiv admin note: substantial text overlap with arXiv:2202.07203

Via

Access Paper or Ask Questions

Collision-free Path Planning in the Latent Space through cGANs

Feb 15, 2022

Tomoki Ando, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata

Figure 1 for Collision-free Path Planning in the Latent Space through cGANs

Figure 2 for Collision-free Path Planning in the Latent Space through cGANs

Figure 3 for Collision-free Path Planning in the Latent Space through cGANs

Figure 4 for Collision-free Path Planning in the Latent Space through cGANs

Abstract:We show a new method for collision-free path planning by cGANs by mapping its latent space to only the collision-free areas of the robot joint space. Our method simply provides this collision-free latent space after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated two-link robot arm.

* 10pages, 9figures

Via

Access Paper or Ask Questions

Neural Multi-scale Image Compression

May 16, 2018

Ken Nakanishi, Shin-ichi Maeda, Takeru Miyato, Daisuke Okanohara

Figure 1 for Neural Multi-scale Image Compression

Figure 2 for Neural Multi-scale Image Compression

Figure 3 for Neural Multi-scale Image Compression

Figure 4 for Neural Multi-scale Image Compression

Abstract:This study presents a new lossy image compression method that utilizes the multi-scale features of natural images. Our model consists of two networks: multi-scale lossy autoencoder and parallel multi-scale lossless coder. The multi-scale lossy autoencoder extracts the multi-scale image features to quantized variables and the parallel multi-scale lossless coder enables rapid and accurate lossless coding of the quantized variables via encoding/decoding the variables in parallel. Our proposed model achieves comparable performance to the state-of-the-art model on Kodak and RAISE-1k dataset images, and it encodes a PNG image of size $768 \times 512$ in 70 ms with a single GPU and a single CPU process and decodes it into a high-fidelity image in approximately 200 ms.

* 15 pages, 15 figures

Via

Access Paper or Ask Questions

A Deep-Learning Approach for Operation of an Automated Realtime Flare Forecast

Jun 06, 2016

Yuko Hada-Muranushi, Takayuki Muranushi, Ayumi Asai, Daisuke Okanohara, Rudy Raymond, Gentaro Watanabe, Shigeru Nemoto, Kazunari Shibata

Figure 1 for A Deep-Learning Approach for Operation of an Automated Realtime Flare Forecast

Figure 2 for A Deep-Learning Approach for Operation of an Automated Realtime Flare Forecast

Figure 3 for A Deep-Learning Approach for Operation of an Automated Realtime Flare Forecast

Figure 4 for A Deep-Learning Approach for Operation of an Automated Realtime Flare Forecast

Abstract:Automated forecasts serve important role in space weather science, by providing statistical insights to flare-trigger mechanisms, and by enabling tailor-made forecasts and high-frequency forecasts. Only by realtime forecast we can experimentally measure the performance of flare-forecasting methods while confidently avoiding overlearning. We have been operating unmanned flare forecast service since August, 2015 that provides 24-hour-ahead forecast of solar flares, every 12 minutes. We report the method and prediction results of the system.

* 6 pages, 4 figures

Via

Access Paper or Ask Questions