Picture for Aaron Sandoval

Aaron Sandoval

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans

Add code
Dec 12, 2025
Figure 1 for Factor(U,T): Controlling Untrusted AI by Monitoring their Plans
Figure 2 for Factor(U,T): Controlling Untrusted AI by Monitoring their Plans
Figure 3 for Factor(U,T): Controlling Untrusted AI by Monitoring their Plans
Viaarxiv icon

Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models

Add code
Dec 12, 2025
Figure 1 for Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models
Figure 2 for Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models
Figure 3 for Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models
Figure 4 for Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models
Viaarxiv icon