Currently the evaluation of lora adapters is not supported in our evaluation.py scripts.
This means that the agents always store merged weights, even when they train adapters.
We could change the evaluation to automatically merge adapters. This would keep the disk footprint of the benchmark much lower.
Currently the evaluation of lora adapters is not supported in our
evaluation.pyscripts.This means that the agents always store merged weights, even when they train adapters.
We could change the evaluation to automatically merge adapters. This would keep the disk footprint of the benchmark much lower.