Improve get_custody_columns validation, caching and error handling#6308
Conversation
|
I don't really like the complexity in error handling here, will work on simplifying it tomorrow. |
b9a2cfd to
b455de6
Compare
get_custody_columns and improve error handlingget_custody_columns validation, caching and error handling
|
@pawanjay176 I've re-written the PR and I think it's in a better place now and ready for review. |
b4c6f0d to
3e040e5
Compare
f3e99e9 to
73a340c
Compare
73a340c to
82a578f
Compare
# Conflicts: # beacon_node/lighthouse_network/src/discovery/subnet_predicate.rs # beacon_node/lighthouse_network/src/peer_manager/peerdb.rs # beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs # beacon_node/lighthouse_network/src/types/globals.rs # beacon_node/network/src/service.rs # consensus/types/src/data_column_subnet_id.rs
…ute_custody_requirement_subnets` function.
…onnection since we get them on metadata anyway.
5def040 to
0a717ff
Compare
| try_create_int_gauge_vec( | ||
| "peers_per_custody_subnet_count", | ||
| "The current count of peers by custody subnet count", | ||
| &["custody_subnet_count"], |
There was a problem hiding this comment.
I would consider doing a histogram here as we can have buckets instead of exact values. Say we have a peer with csc 128 and another with 127, can grafana bucket these two labels into the same range?
There was a problem hiding this comment.
Yeah I think buckets are useful, but historgrams accumulate observersations over time and it doesn't have the ability to track the current count for each bucket at a specific moment. I feel like gauge is more suitable but if you have a way to represent the metric in histogram I'm interested to hear.
There was a problem hiding this comment.
In Lodestar we reset the histogram buckets before every scrape. It feels weird but renders nicely on Grafana.
There was a problem hiding this comment.
I think it's a good idea. I've tried it and unfortunately couldn't find a way to set it with the prometheus crate though.
Closest thing that I could find was histogram.metric().clear_histogram(); but it doesn't do anything to the actual metric.
I'm going to merge this now, if we find any better way to do it, happy to do it in a follow up PR.
|
@mergify queue |
✅ The pull request has been merged automaticallyDetailsThe pull request has been merged automatically at c0b4f01 |
Add retropgf funding (sigp#6324) Migrate from `ethereum-types` to `alloy-primitives` (sigp#6078) * Remove use of ethers_core::RlpStream * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Remove old code * Simplify keccak call * Remove unused package * Merge branch 'unstable' of https://github.com/ethDreamer/lighthouse into remove_use_of_ethers_core * Merge branch 'unstable' into remove_use_of_ethers_core * Run clippy * Merge branch 'remove_use_of_ethers_core' of https://github.com/dospore/lighthouse into remove_use_of_ethers_core * Check all cargo fmt * migrate to alloy primitives init * fix deps * integrate alloy-primitives * resolve dep issues * more changes based on dep changes * add TODOs * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Revert lock * Add BeaconBlocksByRange v3 * continue migration * Revert "Add BeaconBlocksByRange v3" This reverts commit e3ce7fc. * impl hash256 extended trait * revert some uneeded diffs * merge conflict resolved * fix subnet id rshift calc * rename to FixedBytesExtended * debugging * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix failed test * fixing more tests * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * introduce a shim to convert between the two u256 types * move alloy to wrokspace * align alloy versions * update * update web3signer test certs * refactor * resolve failing tests * linting * fix graffiti string test * fmt * fix ef test * resolve merge conflicts * remove udep and revert cert * cargo patch * cyclic dep * fix build error * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve conflicts, update deps * merge unstable * fmt * fix deps * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve merge conflicts * resolve conflicts, make necessary changes * Remove patch * fmt * remove file * merge conflicts * sneaking in a smol change * bump versions * Merge remote-tracking branch 'origin/unstable' into migrate-to-alloy-primitives * Updates for peerDAS * Update ethereum_hashing to prevent dupe * updated alloy-consensus, removed TODOs * cargo update * endianess fix * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fmt * fix merge * fix test * fixed_bytes crate * minor fixes * convert u256 to i64 * panic free mixin to_low_u64_le * from_str_radix * computbe_subnet api and ensuring we use big-endian * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix test * Simplify subnet_id test * Simplify some more tests * Add tests to fixed_bytes crate * Merge branch 'unstable' into migrate-to-alloy-primitives update libp2p to version 0.54 (sigp#6249) * update libp2p to version 0.54.0 * address review * Merge branch 'unstable' of github.com:sigp/lighthouse into update-libp2p * Merge branch 'update-libp2p' of github.com:sigp/lighthouse into update-libp2p Drop block data from BlockError and BlobError (sigp#5735) * Drop block data from BlockError and BlobError * Debug release tests * Fix release tests * Revert unnecessary changes to lighthouse_metrics Update manual checkpoint sync content in Lighthouse book (sigp#6338) * Update manual checkpiont sync * Update faq * Minor revision * Minor revision Light Client Bug Fix (sigp#6299) * Light Client Bug Fix Ignore Rust 1.82 warnings about void patterns (sigp#6357) * Ignore Rust 1.82 warnings about void patterns Enable `large_stack_frames` lint (sigp#6343) * Enable `large_stack_frames` lint Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric (sigp#6340) * Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric, and move metrics into the compute function, recording only successful computation. * Move `discard_timer_on_break` usage to caller site. * Merge branch 'unstable' into compute-data-column-metric * Merge branch 'unstable' into compute-data-column-metric Metadata request ordering (sigp#6336) * Send metadata request ordering * Merge branch 'unstable' into metadata-order Clarify validator monitor block log (sigp#6342) * Clarify validator monitor block log * Merge branch 'unstable' into clarify-block-log Check known parent on rpc blob process (sigp#5893) * Check known parent on rpc blob process * fix test * Merge branch 'unstable' of https://github.com/sigp/lighthouse into blob-unknown-parent Remove beta tag from gossipsub 1.2 (sigp#6344) * Remove the beta tag from gossipsub v1.2 * fix clippy * Merge branch 'unstable' into remove-beta-tag Fix lints for Rust 1.81 (sigp#6363) * Fix lints for Rust 1.81 Use tikv-jemallocator instead of jemallocator (sigp#6354) * Use tikv-jemallocator instead of jemallocator * Merge branch 'unstable' into tikv-jemallocator * Bump tikv-jemallocator and tikv-jemalloc-ctl Improve `get_custody_columns` validation, caching and error handling (sigp#6308) * Improve `get_custody_columns` validation, caching and error handling. * Merge branch 'unstable' into get-custody-columns-error-handing * Fix failing test and add more test. * Fix failing test and add more test. * Merge branch 'unstable' into get-custody-columns-error-handing * Add unit test to make sure the default specs won't panic on the `compute_custody_requirement_subnets` function. * Add condition when calling `compute_custody_subnets_from_metadata` and update logs. * Validate `csc` when returning from enr. Remove `csc` computation on connection since we get them on metadata anyway. * Add `peers_per_custody_subnet_count` to track peer csc and supernodes. * Disconnect peers with invalid metadata and find other peers instead. * Fix sampling tests. * Merge branch 'unstable' into get-custody-columns-error-handing * Merge branch 'unstable' into get-custody-columns-error-handing Delete legacy payload reconstruction (sigp#6213) * Delete legacy payload reconstruction * Delete unneeded failing test * Merge remote-tracking branch 'origin/unstable' into remove-more-ethers * Merge remote-tracking branch 'origin/unstable' into remove-more-ethers * Cleanups simplify rpc codec logic (sigp#6304) * simplify rpc codec logic * Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec * Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec * Merge branch 'unstable' of github.com:sigp/lighthouse into simply-rpc-codec * Merge branch 'unstable' into simplify-rpc-codec * Merge branch 'unstable' into simplify-rpc-codec
Add retropgf funding (sigp#6324) Migrate from `ethereum-types` to `alloy-primitives` (sigp#6078) * Remove use of ethers_core::RlpStream * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Remove old code * Simplify keccak call * Remove unused package * Merge branch 'unstable' of https://github.com/ethDreamer/lighthouse into remove_use_of_ethers_core * Merge branch 'unstable' into remove_use_of_ethers_core * Run clippy * Merge branch 'remove_use_of_ethers_core' of https://github.com/dospore/lighthouse into remove_use_of_ethers_core * Check all cargo fmt * migrate to alloy primitives init * fix deps * integrate alloy-primitives * resolve dep issues * more changes based on dep changes * add TODOs * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Revert lock * Add BeaconBlocksByRange v3 * continue migration * Revert "Add BeaconBlocksByRange v3" This reverts commit e3ce7fc. * impl hash256 extended trait * revert some uneeded diffs * merge conflict resolved * fix subnet id rshift calc * rename to FixedBytesExtended * debugging * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix failed test * fixing more tests * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * introduce a shim to convert between the two u256 types * move alloy to wrokspace * align alloy versions * update * update web3signer test certs * refactor * resolve failing tests * linting * fix graffiti string test * fmt * fix ef test * resolve merge conflicts * remove udep and revert cert * cargo patch * cyclic dep * fix build error * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve conflicts, update deps * merge unstable * fmt * fix deps * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve merge conflicts * resolve conflicts, make necessary changes * Remove patch * fmt * remove file * merge conflicts * sneaking in a smol change * bump versions * Merge remote-tracking branch 'origin/unstable' into migrate-to-alloy-primitives * Updates for peerDAS * Update ethereum_hashing to prevent dupe * updated alloy-consensus, removed TODOs * cargo update * endianess fix * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fmt * fix merge * fix test * fixed_bytes crate * minor fixes * convert u256 to i64 * panic free mixin to_low_u64_le * from_str_radix * computbe_subnet api and ensuring we use big-endian * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix test * Simplify subnet_id test * Simplify some more tests * Add tests to fixed_bytes crate * Merge branch 'unstable' into migrate-to-alloy-primitives update libp2p to version 0.54 (sigp#6249) * update libp2p to version 0.54.0 * address review * Merge branch 'unstable' of github.com:sigp/lighthouse into update-libp2p * Merge branch 'update-libp2p' of github.com:sigp/lighthouse into update-libp2p Drop block data from BlockError and BlobError (sigp#5735) * Drop block data from BlockError and BlobError * Debug release tests * Fix release tests * Revert unnecessary changes to lighthouse_metrics Update manual checkpoint sync content in Lighthouse book (sigp#6338) * Update manual checkpiont sync * Update faq * Minor revision * Minor revision Light Client Bug Fix (sigp#6299) * Light Client Bug Fix Ignore Rust 1.82 warnings about void patterns (sigp#6357) * Ignore Rust 1.82 warnings about void patterns Enable `large_stack_frames` lint (sigp#6343) * Enable `large_stack_frames` lint Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric (sigp#6340) * Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric, and move metrics into the compute function, recording only successful computation. * Move `discard_timer_on_break` usage to caller site. * Merge branch 'unstable' into compute-data-column-metric * Merge branch 'unstable' into compute-data-column-metric Metadata request ordering (sigp#6336) * Send metadata request ordering * Merge branch 'unstable' into metadata-order Clarify validator monitor block log (sigp#6342) * Clarify validator monitor block log * Merge branch 'unstable' into clarify-block-log Check known parent on rpc blob process (sigp#5893) * Check known parent on rpc blob process * fix test * Merge branch 'unstable' of https://github.com/sigp/lighthouse into blob-unknown-parent Remove beta tag from gossipsub 1.2 (sigp#6344) * Remove the beta tag from gossipsub v1.2 * fix clippy * Merge branch 'unstable' into remove-beta-tag Fix lints for Rust 1.81 (sigp#6363) * Fix lints for Rust 1.81 Use tikv-jemallocator instead of jemallocator (sigp#6354) * Use tikv-jemallocator instead of jemallocator * Merge branch 'unstable' into tikv-jemallocator * Bump tikv-jemallocator and tikv-jemalloc-ctl Improve `get_custody_columns` validation, caching and error handling (sigp#6308) * Improve `get_custody_columns` validation, caching and error handling. * Merge branch 'unstable' into get-custody-columns-error-handing * Fix failing test and add more test. * Fix failing test and add more test. * Merge branch 'unstable' into get-custody-columns-error-handing * Add unit test to make sure the default specs won't panic on the `compute_custody_requirement_subnets` function. * Add condition when calling `compute_custody_subnets_from_metadata` and update logs. * Validate `csc` when returning from enr. Remove `csc` computation on connection since we get them on metadata anyway. * Add `peers_per_custody_subnet_count` to track peer csc and supernodes. * Disconnect peers with invalid metadata and find other peers instead. * Fix sampling tests. * Merge branch 'unstable' into get-custody-columns-error-handing * Merge branch 'unstable' into get-custody-columns-error-handing Delete legacy payload reconstruction (sigp#6213) * Delete legacy payload reconstruction * Delete unneeded failing test * Merge remote-tracking branch 'origin/unstable' into remove-more-ethers * Merge remote-tracking branch 'origin/unstable' into remove-more-ethers * Cleanups simplify rpc codec logic (sigp#6304) * simplify rpc codec logic * Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec * Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec * Merge branch 'unstable' of github.com:sigp/lighthouse into simply-rpc-codec * Merge branch 'unstable' into simplify-rpc-codec * Merge branch 'unstable' into simplify-rpc-codec
…igp#6308) * Improve `get_custody_columns` validation, caching and error handling. * Merge branch 'unstable' into get-custody-columns-error-handing * Fix failing test and add more test. * Fix failing test and add more test. * Merge branch 'unstable' into get-custody-columns-error-handing # Conflicts: # beacon_node/lighthouse_network/src/discovery/subnet_predicate.rs # beacon_node/lighthouse_network/src/peer_manager/peerdb.rs # beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs # beacon_node/lighthouse_network/src/types/globals.rs # beacon_node/network/src/service.rs # consensus/types/src/data_column_subnet_id.rs * Add unit test to make sure the default specs won't panic on the `compute_custody_requirement_subnets` function. * Add condition when calling `compute_custody_subnets_from_metadata` and update logs. * Validate `csc` when returning from enr. Remove `csc` computation on connection since we get them on metadata anyway. * Add `peers_per_custody_subnet_count` to track peer csc and supernodes. * Disconnect peers with invalid metadata and find other peers instead. * Fix sampling tests. * Merge branch 'unstable' into get-custody-columns-error-handing * Merge branch 'unstable' into get-custody-columns-error-handing
Issue Addressed
This PR addresses two issues:
get_custody_columnsto prevent infinite loop when supplied custody subnet count is invalid (> total number of data column subnets).network_globals.custody_columnsto the caller site (see @pawanjay176's comment here)NetworkGlobalsinitialization rather than computing it every time.Error handling
For node's
custody_subnet_countfrom local metadata:For peer's
custody_subnet_countfrom ENR & metadata:custody_subnet_countis invalid, fall back tocustody_requirementcustody_requirement, they are still useful to us