Skip to content

Improve niah test#133

Merged
alessiodevoto merged 13 commits into
mainfrom
aledev/passkey
Sep 5, 2025
Merged

Improve niah test#133
alessiodevoto merged 13 commits into
mainfrom
aledev/passkey

Conversation

@alessiodevoto

Copy link
Copy Markdown
Collaborator

PR description

This PR contains minimal improvements over the recently merged NIAH test. It allows to insert the needle at different depths (to avoid loading the model in memory multiple times) and fixes a minor bug in the evaluation pipeline, allowing for model = "auto".

Checklist

  • Tests are working (make test)
  • Code is formatted correctly (make style, on errors try fix with make format)
  • Copyright header is included
  • All commits are signed-off using git commit -s
  • (new press) mypress_press.py is in the presses directory
  • (new press) MyPress is in __init__.py
  • (new press) README.md is updated with a 1 liner about the new press in the Available presses section
  • (new press) New press is in the default_presses list in tests/default_presses.py
  • (new press) A docstring is provided that follows the same structure as the existing ones

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Sep 1, 2025

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@alessiodevoto

Copy link
Copy Markdown
Collaborator Author

/ok to test bf80954

@JoelSeniorLiang

Copy link
Copy Markdown
Contributor

Another issue, fix the dataset by llama's tokenizer will result in longer sequence in NIAH-like tasks for others, like Qwen family, since they use different lengths in digitals for one token. That usually leads to a lower performance for Qwen in KVPress eval utilities.

@alessiodevoto

alessiodevoto commented Sep 2, 2025

Copy link
Copy Markdown
Collaborator Author

Hi @JoelSeniorLiang, true! That only applies to RULER though, where we have pre-computed lengths. Here I wanted to avoid this problem (as you said it is important for NIAH), so we tokenize with the model's own tokenizer and count length and depth with that one !

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
@alessiodevoto

Copy link
Copy Markdown
Collaborator Author

/ok to test b51f0fd

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
@alessiodevoto

Copy link
Copy Markdown
Collaborator Author

/ok to test a99fcf2

@alessiodevoto alessiodevoto merged commit 70ccaf1 into main Sep 5, 2025
3 checks passed
@alessiodevoto alessiodevoto deleted the aledev/passkey branch September 5, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants