Picture for Lukas Galke

Lukas Galke

Gumbel-MPNN: Graph Rewiring with Gumbel-Softmax

Add code
Aug 24, 2025
Viaarxiv icon

Guarded Query Routing for Large Language Models

Add code
May 20, 2025
Viaarxiv icon

Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?

Add code
Feb 17, 2025
Figure 1 for Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
Figure 2 for Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
Figure 3 for Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
Figure 4 for Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
Viaarxiv icon

FlexDeMo: Decoupled Momentum Optimization for Fully and Hybrid Sharded Training

Add code
Feb 10, 2025
Viaarxiv icon

A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification

Add code
Jan 23, 2025
Figure 1 for A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Figure 2 for A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Figure 3 for A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Figure 4 for A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Viaarxiv icon

Continual Learning for Encoder-only Language Models via a Discrete Key-Value Bottleneck

Add code
Dec 11, 2024
Viaarxiv icon

Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code Search

Add code
Nov 26, 2024
Viaarxiv icon

Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal

Add code
Nov 20, 2024
Figure 1 for Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal
Figure 2 for Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal
Figure 3 for Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal
Figure 4 for Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal
Viaarxiv icon

When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization

Add code
Nov 08, 2024
Figure 1 for When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization
Figure 2 for When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization
Figure 3 for When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization
Figure 4 for When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization
Viaarxiv icon

Tokenization and Morphology in Multilingual Language Models: A~Comparative Analysis of mT5 and ByT5

Add code
Oct 15, 2024
Figure 1 for Tokenization and Morphology in Multilingual Language Models: A~Comparative Analysis of mT5 and ByT5
Figure 2 for Tokenization and Morphology in Multilingual Language Models: A~Comparative Analysis of mT5 and ByT5
Figure 3 for Tokenization and Morphology in Multilingual Language Models: A~Comparative Analysis of mT5 and ByT5
Figure 4 for Tokenization and Morphology in Multilingual Language Models: A~Comparative Analysis of mT5 and ByT5
Viaarxiv icon