Skip to content

fix: hash request kwargs and headers correctly#255

Merged
D4Vinci merged 4 commits into
D4Vinci:devfrom
yetval:fix/request-fingerprint
Apr 30, 2026
Merged

fix: hash request kwargs and headers correctly#255
D4Vinci merged 4 commits into
D4Vinci:devfrom
yetval:fix/request-fingerprint

Conversation

@yetval
Copy link
Copy Markdown
Contributor

@yetval yetval commented Apr 26, 2026

Summary
Request fingerprint collisions: Request.update_fingerprint() previously hashed only kwarg names and lowercased header values. Distinct requests could collapse to the same fingerprint when fp_include_kwargs or fp_include_headers were enabled, which can silently break scheduler deduplication, cache replay, and checkpoint restore. Fix: hash kwarg names together with their values and preserve header values as-is.

Repro
from scrapling.spiders.request import Request

r1 = Request("https://example.com", timeout=1)
r2 = Request("https://example.com", timeout=2)

assert r1.update_fingerprint(include_kwargs=True) != r2.update_fingerprint(include_kwargs=True)

Files changed
scrapling/spiders/request.py — fingerprint kwargs/header handling

@samrusani
Copy link
Copy Markdown

I did a quick local check against this patch and the behavior matches the PR description:

from scrapling.spiders.request import Request

assert Request("https://example.com", timeout=1).update_fingerprint(include_kwargs=True) != Request("https://example.com", timeout=2).update_fingerprint(include_kwargs=True)
assert Request("https://example.com", headers={"X-Test": "A"}).update_fingerprint(include_headers=True) != Request("https://example.com", headers={"X-Test": "a"}).update_fingerprint(include_headers=True)

tests/spiders/test_request.py also passed locally for me after applying the patch (28 passed).

One small suggestion: it would be worth adding these as regression tests in tests/spiders/test_request.py, since the contribution guide asks bug fixes to include code that reproduces the bug. That would lock in both the kwargs-value collision and the header-value case handling.

@yetval
Copy link
Copy Markdown
Contributor Author

yetval commented Apr 26, 2026

Hey @samrusani, I have went ahead and added tests. Thanks for the suggestions!

Comment thread scrapling/spiders/request.py Outdated
kwargs = (key.lower() for key in self._session_kwargs.keys() if key.lower() not in ("data", "json"))
data["kwargs"] = "".join(set(_convert_to_bytes(key).hex() for key in kwargs))
filtered_kwargs = {
key.lower(): str(value)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using str(value) for kwarg values is fragile if someone passes a non-primitive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just fixed this.

@yetval yetval requested a review from D4Vinci April 30, 2026 13:59
Copy link
Copy Markdown
Owner

@D4Vinci D4Vinci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@D4Vinci
Copy link
Copy Markdown
Owner

D4Vinci commented Apr 30, 2026

Good catch as always @yetval
We will merge once the checks pass again.

@D4Vinci D4Vinci added bug Something isn't working Ready-to-Merge This PR is approved and ready to merge once the author addresses the raised points. labels Apr 30, 2026
@D4Vinci D4Vinci merged commit 90a853d into D4Vinci:dev Apr 30, 2026
5 checks passed
@D4Vinci D4Vinci mentioned this pull request May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Ready-to-Merge This PR is approved and ready to merge once the author addresses the raised points.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants