feat(dns-instrument): capture redirect chain hops via onHeadersReceived#1158
Conversation
…op tracking [CL-6] - Remove onHeadersReceivedEventDetails from shared PendingResponse - Add redirect_url column to dns_responses schema - Rename onCompleteDnsHandler to onHeadersReceivedDnsHandler
There was a problem hiding this comment.
Pull request overview
This PR updates the DNS instrumentation to log at webRequest.onHeadersReceived instead of onCompleted, enabling capture of DNS records across redirect chains, and synchronizes storage schemas to persist the associated URL per hop.
Changes:
- Switch DNS logging trigger to
onHeadersReceivedand record the per-event URL asredirect_url. - Add
redirect_urlto thedns_responsesschema across SQLite (SQL), Parquet (PyArrow), and the WebExtension TypeScript types. - Strengthen the DNS instrumentation test to assert that recorded rows contain URL content (not just row count).
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
Extension/src/background/dns-instrument.ts |
Moves DNS collection to onHeadersReceived and stores redirect_url = details.url. |
Extension/src/types/browser-web-request-event-details.ts |
Adds a typed onHeadersReceived details interface with optional ip. |
Extension/src/schema.ts |
Adds optional redirect_url to the DnsResolved record type. |
openwpm/storage/schema.sql |
Adds redirect_url column to dns_responses. |
openwpm/storage/parquet_schema.py |
Adds redirect_url (and syncs used_address) for dns_responses Parquet schema. |
test/test_dns_instrument.py |
Adds assertions around redirect_url presence/content. |
test/storage/test_values.py |
Updates generated test values for dns_responses to include redirect_url/used_address. |
.gitignore |
Ignores local Crosslink/Claude generated state files and directories. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert result["redirect_url"] is not None | ||
| assert "test.localhost:8000" in result["redirect_url"] | ||
|
|
||
| # Each redirect hop should record the URL it was associated with | ||
| redirect_urls = [r["redirect_url"] for r in results] | ||
| assert all(url is not None for url in redirect_urls) |
There was a problem hiding this comment.
Because the test indexes into results[0] from an unfiltered SELECT * FROM dns_responses (no ORDER BY), the new behavior of logging on onHeadersReceived (redirect hops + potentially additional requests) can make row ordering nondeterministic and the assertions flaky. Consider filtering the query to the expected request/hostname and/or adding an explicit ORDER BY (e.g., id or time_stamp) before selecting the row(s) to assert against.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1158 +/- ##
=======================================
Coverage 62.16% 62.16%
=======================================
Files 40 40
Lines 3898 3898
=======================================
Hits 2423 2423
Misses 1475 1475 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Adds a /CONNECTION_ABORT/ handler to the test server that sends headers then forcibly closes the connection, and a test proving DNS data is still captured. This guards against reverting from onHeadersReceived to onCompleted, which would miss aborted requests.
Summary
onCompletedtoonHeadersReceivedto capture all redirect chain hopsredirect_urlcolumn todns_responsesschema (SQL, Parquet, TypeScript)PendingResponseclass — DNS works independentlyonCompleteDnsHandlertoonHeadersReceivedDnsHandlerSupersedes #1021. Incorporates fixes from adversarial review (VDD methodology).
VDD Review History
Test plan
test_dns_instrument.pyverifies redirect_url content per hoppre-commit run --all-filespasses