Picture for Jungwook Choi

Jungwook Choi

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

Add code
Jul 03, 2024
Viaarxiv icon

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

Add code
Nov 09, 2023
Figure 1 for Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
Figure 2 for Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
Figure 3 for Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
Figure 4 for Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
Viaarxiv icon

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Add code
Aug 13, 2023
Figure 1 for Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Figure 2 for Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Figure 3 for Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Figure 4 for Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Viaarxiv icon

PillarAcc: Sparse PointPillars Accelerator for Real-Time Point Cloud 3D Object Detection on Edge Devices

Add code
May 15, 2023
Figure 1 for PillarAcc: Sparse PointPillars Accelerator for Real-Time Point Cloud 3D Object Detection on Edge Devices
Figure 2 for PillarAcc: Sparse PointPillars Accelerator for Real-Time Point Cloud 3D Object Detection on Edge Devices
Figure 3 for PillarAcc: Sparse PointPillars Accelerator for Real-Time Point Cloud 3D Object Detection on Edge Devices
Figure 4 for PillarAcc: Sparse PointPillars Accelerator for Real-Time Point Cloud 3D Object Detection on Edge Devices
Viaarxiv icon

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Add code
Feb 23, 2023
Figure 1 for Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
Figure 2 for Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
Figure 3 for Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
Figure 4 for Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
Viaarxiv icon

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

Add code
Jan 29, 2023
Figure 1 for Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Figure 2 for Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Figure 3 for Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Figure 4 for Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Viaarxiv icon

Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization

Add code
Jan 04, 2023
Figure 1 for Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization
Figure 2 for Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization
Figure 3 for Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization
Figure 4 for Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization
Viaarxiv icon

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

Add code
Nov 20, 2022
Figure 1 for Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Figure 2 for Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Figure 3 for Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Figure 4 for Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Viaarxiv icon

Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores

Add code
Feb 24, 2022
Figure 1 for Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores
Figure 2 for Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores
Figure 3 for Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores
Figure 4 for Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores
Viaarxiv icon

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Add code
Dec 03, 2021
Figure 1 for NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
Figure 2 for NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
Figure 3 for NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
Figure 4 for NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
Viaarxiv icon