Agent UQ on $\tau^{2}$-Bench Harness

Official codebase of the paper "Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities", ACL 2026, a position paper on agent uncertainty quantification (UQ).

By Changdae Oh¹, Seongheon Park¹, To Eun Kim², Jiatong Li¹, Wendi Li¹, Samuel Yeh¹,
Sean Du³, Hamed Hassani⁴, Paul Bogdan⁵, Dawn Song⁶, and Sharon Li¹.

¹University of Wisconsin--Madison, ²Carnegie Mellon University, ³Nanyang Technological University,
⁴University of Pennsylvania, ⁵University of Southern California, ⁶University of California, Berkeley

News

[Apr 10, 2026] $\tau^2$-bench UQ artifacts (actual trajectories and uncertainty measurements) used in our paper are now available on HuggingFace datasets🤗
[Apr 5, 2026] AgentUQ position paper got accepted to ACL 2026 (main conference)🎉
[Feb 26, 2026] AgentUQ position paper got accepted to ICLR 2026 Workshop, Agentic AI in the Wild: From Hallucinations to Reliable Autonomy🎉

Code Under Cleaning Phase

almost there

Citation

@article{oh2026uncertainty,
    title={Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities},
    author={Oh, Changdae and Park, Seongheon and Kim, To Eun and Li, Jiatong and Li, Wendi and Yeh, Samuel and Du, Xuefeng and Hassani, Hamed and Bogdan, Paul and Song, Dawn and Li, Sharon},
    journal={arXiv preprint arXiv:2602.05073},
    year={2026}
}

License

This work is released under the MIT License.

Acknolwedgement

This project builds on $\tau^2$-bench by Sierra Research. The original benchmark framework, domains, evaluation system, and task definitions are their work.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent UQ on $\tau^{2}$-Bench Harness

News

Code Under Cleaning Phase

Citation

License

Acknolwedgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Agent UQ on $\tau^{2}$-Bench Harness

News

Code Under Cleaning Phase

Citation

License

Acknolwedgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages