Picture for Shahar Katz

Shahar Katz

AlignTree: Efficient Defense Against LLM Jailbreak Attacks

Add code
Nov 15, 2025
Figure 1 for AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Figure 2 for AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Figure 3 for AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Figure 4 for AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Viaarxiv icon

Execution Guided Line-by-Line Code Generation

Add code
Jun 12, 2025
Figure 1 for Execution Guided Line-by-Line Code Generation
Figure 2 for Execution Guided Line-by-Line Code Generation
Figure 3 for Execution Guided Line-by-Line Code Generation
Figure 4 for Execution Guided Line-by-Line Code Generation
Viaarxiv icon

Segment-Based Attention Masking for GPTs

Add code
Dec 24, 2024
Viaarxiv icon

Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Add code
Dec 22, 2024
Viaarxiv icon

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Add code
Feb 20, 2024
Figure 1 for Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Figure 2 for Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Figure 3 for Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Figure 4 for Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Viaarxiv icon

Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT

Add code
May 22, 2023
Figure 1 for Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT
Figure 2 for Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT
Figure 3 for Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT
Figure 4 for Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT
Viaarxiv icon