feat: update to zarr3#2
Conversation
WalkthroughThe update enhances the project documentation, specifically the README, to clarify the functionality related to retrieving dClimate GIS Zarr datasets via IPFS and STAC. It also modifies the Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant F as load_ipfs_via_stac
participant H as get_dataset_hamt_cid_from_stac
participant I as _get_dataset_by_ipfs_cid
participant X as xr.open_zarr
C->>F: Request dataset via STAC
F->>H: Get HAMT root IPFS CID
H-->>F: Return CID
F->>I: Load dataset by IPFS CID
I-->>F: Return dataset
F->>X: Open Zarr dataset
X-->>F: Return Zarr dataset
F-->>C: Provide dataset
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pyproject.toml (1)
18-18: Consider using minimum version constraintsWhile removing specific version constraints gives flexibility, it's generally a good practice to specify minimum versions to ensure compatibility. Consider using
xarray>=X.Y.Zandzarr>=3.0.0to ensure compatibility with the Zarr3 update.Also applies to: 20-20
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
README.md(1 hunks)dclimate_zarr_client/geotemporal_data.py(1 hunks)dclimate_zarr_client/ipfs_retrieval.py(2 hunks)pyproject.toml(2 hunks)tests/conftest.py(2 hunks)tests/test_client.py(2 hunks)
🔇 Additional comments (9)
dclimate_zarr_client/ipfs_retrieval.py (2)
8-8: Added import for Zarr3 supportThe import of
IPFSZarr3from thepy_hamtmodule is correctly added to support the Zarr3 update.
203-205: Properly implemented Zarr3 storage adapterThe implementation correctly wraps the HAMT store with
IPFSZarr3before passing it toxr.open_zarr, which is necessary for compatibility with Zarr3. Theread_only=Trueparameter is properly preserved.dclimate_zarr_client/geotemporal_data.py (1)
500-500: Simplified rolling aggregation implementationThis change simplifies the code by directly calling the aggregation method on the rolled object instead of using
getattron theDatasetRollingclass. This makes the code more readable and maintainable while maintaining the same functionality.README.md (1)
80-82: Standardized virtual environment namingThe update to use
.venvinstead ofmyenvfollows industry best practices. The.venvname is recognized by many Python tools and IDEs automatically, and keeping it hidden (with the dot prefix) helps reduce clutter in directory listings.tests/test_client.py (2)
77-77: Updated ZipStore import path for Zarr3 compatibilityCorrectly updated the import path from
zarr.ZipStoretozarr.storage.ZipStoreto match the structural changes in the Zarr3 library.
89-89: Updated ZipStore import path for Zarr3 compatibilityConsistently updated the second instance of ZipStore to use the correct import path from the
storagesubmodule.tests/conftest.py (3)
10-10: Appropriate import addition for Zarr3 compatibilityGood addition of the explicit import for
zarr.storagewhich supports the updated path to accessZipStore.
19-19: Correctly updated ZipStore import pathThe change from
zarr.ZipStoretozarr.storage.ZipStorealigns with Zarr3's organization where storage classes are properly namespaced.
25-27: Improved formatting and correct ZipStore import pathThe ZipStore path has been correctly updated to use the storage submodule, and the code formatting has been improved by splitting parameters across multiple lines for better readability.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (7)
tests/test_zarr_encryption_ipfs.py (5)
11-11: Remove unusedregister_codecimport.According to the static analysis hints,
numcodecs.register_codecis never used in this file. Removing it will help maintain a clean import list.- from numcodecs import register_codec🧰 Tools
🪛 Ruff (0.8.2)
11-11:
numcodecs.register_codecimported but unusedRemove unused import:
numcodecs.register_codec(F401)
12-12: Remove commented-out import if not needed.This commented-out line might be a leftover. Consider removing it to keep the codebase clean and consistent.
- # from zarr.codecs import BytesCodec
13-13: Remove unusedBytesBytesCodecfrom the import.
BytesBytesCodecis imported but not referenced anywhere in this file. Keeping imports minimal ensures clarity and avoids confusion.-from numcodecs.zarr3 import BytesBytesCodec, Blosc +from numcodecs.zarr3 import Blosc🧰 Tools
🪛 Ruff (0.8.2)
13-13:
numcodecs.zarr3.BytesBytesCodecimported but unusedRemove unused import:
numcodecs.zarr3.BytesBytesCodec(F401)
71-71: Reassess repeatedzarr.registry.register_codeccalls with changing keys.Registering the same codec multiple times for the same identifier but with different encryption keys may lead to confusion or unintended side effects, especially if different parts of the code rely on a single registered codec.
Also applies to: 107-107, 120-120, 162-162, 173-173
123-124: Remove or restore commented-out assertions if still relevant.These commented-out checks might be important for ensuring encryption correctness. Uncomment them or remove them altogether to maintain clarity and avoid confusion.
- # assert test_ds["temp"].encoding["compressors"][1]._encryption_key is not None - # Check length of compressor is just 1 for default ZstdCodecdclimate_zarr_client/encryption_codec.py (2)
9-24: Consider storing the encryption key per instance if concurrency is expected.
_encryption_keyis maintained at the class level. If multiple coroutines attempt encoding/decoding concurrently with different keys, it may lead to race conditions or incorrect decryptions. Moving the key to an instance attribute could mitigate concurrency issues.
36-37: Ensure consistent key validation for better user feedback.The check for a 32-byte key is correct, but consider adding more context in the error message (e.g., expected length vs. actual length) to help users diagnose configuration issues faster.
- raise ValueError("Encryption key must be 32 bytes") + raise ValueError(f"Encryption key must be 32 bytes, but got {len(encryption_key)} bytes")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dclimate_zarr_client/encryption_codec.py(1 hunks)tests/test_zarr_encryption_ipfs.py(5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
11-11: numcodecs.register_codec imported but unused
Remove unused import: numcodecs.register_codec
(F401)
13-13: numcodecs.zarr3.BytesBytesCodec imported but unused
Remove unused import: numcodecs.zarr3.BytesBytesCodec
(F401)
🪛 GitHub Actions: pre-commit
tests/test_zarr_encryption_ipfs.py
[error] 1-1: pre-commit hook 'trailing-whitespace' failed. Files were modified by this hook.
dclimate_zarr_client/encryption_codec.py
[error] 1-1: pre-commit hook 'end-of-file-fixer' failed. Files were modified by this hook.
🪛 GitHub Actions: Python application
tests/test_zarr_encryption_ipfs.py
[error] 134-134: ValueError: MAC check failed
🔇 Additional comments (3)
dclimate_zarr_client/encryption_codec.py (3)
66-84: Confirm that asynchronous decryption meets performance and concurrency needs.As
_decode_singlerelies onasyncio.to_thread, ensure your environment is prepared for thread-based concurrency and that read operations won't block other critical tasks.
85-106: Review parallel encoding usage.Like the decryption method,
_encode_singleoffloads work to a separate thread. Verify that creating multiple threads for chunk-based encryptions best suits your performance and resource constraints.
107-117: Validate additional overhead for nonce and tag.The calculated addition of 40 bytes per chunk is standard for XChaCha20-Poly1305. Confirm that this overhead is acceptable, especially if your dataset has many small chunks.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_zarr_encryption_ipfs.py (1)
10-14: Update imports for zarr3 compatibilityThe import changes correctly add zarr and json modules needed for zarr3 compatibility. There are a couple of improvements to make:
- The import
numcodecs.register_codecis no longer used since you're now usingzarr.registry.register_codec.- The static analysis correctly points out that
BytesBytesCodecis imported but not directly used in this file.import os import shutil import tempfile from multiformats import CID import numpy as np import pandas as pd import xarray as xr import pytest import zarr import json -from numcodecs import register_codec -# from zarr.codecs import BytesCodec -from numcodecs.zarr3 import BytesBytesCodec, Blosc +from numcodecs.zarr3 import Blosc from Crypto.Random import get_random_bytes from py_hamt import HAMT, IPFSStore, IPFSZarr3🧰 Tools
🪛 Ruff (0.8.2)
12-12:
numcodecs.register_codecimported but unusedRemove unused import:
numcodecs.register_codec(F401)
14-14:
numcodecs.zarr3.BytesBytesCodecimported but unusedRemove unused import:
numcodecs.zarr3.BytesBytesCodec(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_zarr_encryption_ipfs.py(5 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
tests/test_zarr_encryption_ipfs.py (1)
dclimate_zarr_client/encryption_codec.py (1)
EncryptionCodec(9-117)
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
12-12: numcodecs.register_codec imported but unused
Remove unused import: numcodecs.register_codec
(F401)
14-14: numcodecs.zarr3.BytesBytesCodec imported but unused
Remove unused import: numcodecs.zarr3.BytesBytesCodec
(F401)
🪛 GitHub Actions: pre-commit
tests/test_zarr_encryption_ipfs.py
[error] 1-1: pre-commit hook 'trailing-whitespace' failed. Files were modified by this hook.
🔇 Additional comments (5)
tests/test_zarr_encryption_ipfs.py (5)
71-71: Correct codec registration for zarr3The update to use
zarr.registry.register_codecwith an explicit codec identifier is the correct approach for zarr3 compatibility.Also applies to: 159-159
76-76: Update encoding to use compressors instead of filtersThe change from using
filterstocompressorscorrectly aligns with zarr3 terminology, and includes both compression and encryption in a sequential pipeline.
82-95: Great addition of zarr.json validationThese validation tests are excellent for ensuring the codec is correctly registered and configured in the zarr3 metadata. Testing the codec configuration ensures that no sensitive data like encryption keys are serialized.
118-121: Updated assertions for zarr3 compressorsThe assertions are correctly updated to check for
compressorsinstead offiltersto align with zarr3 terminology.
127-127: Properly using IPFSZarr3 wrapperThe introduction of
IPFSZarr3to wrap the HAMT store and the corresponding updates to read/write operations are consistent with the zarr3 migration. This provides a cleaner abstraction for working with IPFS storage in zarr3.Also applies to: 131-131, 143-143, 146-146, 161-161
| zarr.registry.register_codec("xchacha20poly1305", EncryptionCodec) | ||
There was a problem hiding this comment.
Fix trailing whitespace
The code has trailing whitespace on line 72 which is causing pipeline failures.
# Register the codec
zarr.registry.register_codec("xchacha20poly1305", EncryptionCodec)
-
+📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| zarr.registry.register_codec("xchacha20poly1305", EncryptionCodec) | |
| zarr.registry.register_codec("xchacha20poly1305", EncryptionCodec) | |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
tests/test_zarr_encryption_ipfs.py (3)
10-14: Clean up unused importsThere are two unused imports according to the static analysis:
numcodecs.register_codec(line 12)numcodecs.zarr3.BytesBytesCodec(line 14)import zarr import json -from numcodecs import register_codec -# from zarr.codecs import BytesCodec -from numcodecs.zarr3 import BytesBytesCodec, Blosc +# from zarr.codecs import BytesCodec +from numcodecs.zarr3 import Blosc🧰 Tools
🪛 Ruff (0.8.2)
12-12:
numcodecs.register_codecimported but unusedRemove unused import:
numcodecs.register_codec(F401)
14-14:
numcodecs.zarr3.BytesBytesCodecimported but unusedRemove unused import:
numcodecs.zarr3.BytesBytesCodec(F401)
23-32: Use contextlib.suppress for cleaner error handlingThe try-except-pass pattern can be replaced with
contextlib.suppressfor better readability.# Register the codec type *once* globally or in a fixture # This only tells Zarr about the *class*, not the key. @pytest.fixture(scope="module", autouse=True) def register_encryption_codec(): - try: - # Attempt registration, ignore if already done - zarr.registry.register_codec("xchacha20poly1305", EncryptionCodec) - except ValueError: # Or whatever error your registry throws on duplicate - pass + from contextlib import suppress + # Attempt registration, ignore if already done + with suppress(ValueError): # Or whatever error your registry throws on duplicate + zarr.registry.register_codec("xchacha20poly1305", EncryptionCodec)🧰 Tools
🪛 Ruff (0.8.2)
27-31: Use
contextlib.suppress(ValueError)instead oftry-except-pass(SIM105)
187-194: Use a more specific exception class in pytest.raisesUsing
pytest.raises(Exception)is too broad and might catch unexpected exceptions. Specify the exact exception class you're expecting (likeValueErroror a crypto-specific exception).# Use the *wrong* key in the context with EncryptionCodec.key_context(wrong_encryption_key): loaded_failure = xr.open_zarr(store=hamt1_read_z3, consolidated=False) # Accessing data should raise an exception since we don't have the correct encryption key - with pytest.raises(Exception): + with pytest.raises((ValueError, RuntimeError)): # or the specific exception class from the crypto library _ = loaded_failure["temp"].values🧰 Tools
🪛 Ruff (0.8.2)
192-192:
pytest.raises(Exception)should be considered evil(B017)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dclimate_zarr_client/encryption_codec.py(1 hunks)tests/test_zarr_encryption_ipfs.py(3 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
tests/test_zarr_encryption_ipfs.py (1)
dclimate_zarr_client/encryption_codec.py (3)
EncryptionCodec(11-120)key_context(33-41)compute_encoded_size(110-120)
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
12-12: numcodecs.register_codec imported but unused
Remove unused import: numcodecs.register_codec
(F401)
14-14: numcodecs.zarr3.BytesBytesCodec imported but unused
Remove unused import: numcodecs.zarr3.BytesBytesCodec
(F401)
27-31: Use contextlib.suppress(ValueError) instead of try-except-pass
(SIM105)
118-119: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
181-181: Local variable test_read is assigned to but never used
Remove assignment to unused variable test_read
(F841)
192-192: pytest.raises(Exception) should be considered evil
(B017)
🪛 GitHub Actions: pre-commit
tests/test_zarr_encryption_ipfs.py
[error] 179-179: Ruff: Local variable 'test_read' is assigned to but never used (F841).
[error] 1-1: pre-commit hook 'trailing-whitespace' failed. Files were modified by this hook.
[error] 1-1: pre-commit hook 'ruff' failed. Files were modified by this hook.
[error] 1-1: pre-commit hook 'ruff-format' failed. Files were modified by this hook.
dclimate_zarr_client/encryption_codec.py
[error] 1-1: pre-commit hook 'end-of-file-fixer' failed. Files were modified by this hook.
🔇 Additional comments (8)
tests/test_zarr_encryption_ipfs.py (3)
33-41: Good use of session-scoped fixture for consistent encryption keyThe fixture effectively ensures the same encryption key is used throughout the test session, which is essential for testing encrypt/decrypt operations reliably.
44-95: Comprehensive update for Zarr3 compatibilityThe function now correctly uses Zarr3 patterns with compressors instead of filters, and properly manages the encryption key through the context manager. The validation of the zarr.json file is a good addition to ensure codec registration works as expected.
116-126: Good test coverage for key validation and new functionalityThe use of the context manager in
test_bad_encryption_keysis appropriate, and the newtest_compute_encoded_sizetest verifies the additional functionality added to the codec.🧰 Tools
🪛 Ruff (0.8.2)
118-119: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
dclimate_zarr_client/encryption_codec.py (5)
1-15: Excellent modernization for Zarr3 compatibilityThe transition to
BytesBytesCodecand the use of contextvars for thread-safe key management are excellent improvements. The imports are well organized and the class structure is now much more appropriate for Zarr3.🧰 Tools
🪛 GitHub Actions: pre-commit
[error] 1-1: pre-commit hook 'end-of-file-fixer' failed. Files were modified by this hook.
16-24: Improved constructor with better default and documentationAdding a default value for the header parameter and improving the docstring makes the class more user-friendly and self-documenting.
25-42: Greatly improved key management with context manager patternThe context manager pattern for key management is a significant security improvement over the previous approach. It ensures keys are properly cleaned up and provides thread safety through contextvars.
43-67: Well-designed serialization methods for Zarr3 compatibilityThe
from_dictandto_dictmethods are well-implemented with proper type hints and comprehensive docstrings. They correctly handle the Zarr3 serialization format.
68-109: Effective use of asynchronous processing for crypto operationsThe async methods with
asyncio.to_threadare a good approach for handling potentially blocking crypto operations, enabling better performance in IO-bound scenarios.
| def compute_encoded_size(self, input_byte_length: int, chunk_spec) -> int: | ||
| """Calculate the size of the encoded data. | ||
|
|
||
| Args: | ||
| input_byte_length (int): Size of the input bytes. | ||
| chunk_spec: Array specification (unused in this calculation). | ||
|
|
||
| Returns: | ||
| int: Encoded size (input size + 40 bytes for nonce and tag). | ||
| """ | ||
| return input_byte_length + 40 # 24 bytes nonce + 16 bytes tag No newline at end of file |
There was a problem hiding this comment.
Missing newline at end of file
The static analysis indicates an end-of-file issue. Add a newline at the end of the file to fix this pipeline failure.
return input_byte_length + 40 # 24 bytes nonce + 16 bytes tag
+📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def compute_encoded_size(self, input_byte_length: int, chunk_spec) -> int: | |
| """Calculate the size of the encoded data. | |
| Args: | |
| input_byte_length (int): Size of the input bytes. | |
| chunk_spec: Array specification (unused in this calculation). | |
| Returns: | |
| int: Encoded size (input size + 40 bytes for nonce and tag). | |
| """ | |
| return input_byte_length + 40 # 24 bytes nonce + 16 bytes tag | |
| def compute_encoded_size(self, input_byte_length: int, chunk_spec) -> int: | |
| """Calculate the size of the encoded data. | |
| Args: | |
| input_byte_length (int): Size of the input bytes. | |
| chunk_spec: Array specification (unused in this calculation). | |
| Returns: | |
| int: Encoded size (input size + 40 bytes for nonce and tag). | |
| """ | |
| return input_byte_length + 40 # 24 bytes nonce + 16 bytes tag |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
dclimate_zarr_client/encryption_codec.py (1)
99-109:⚠️ Potential issueAdd newline at end of file
The file is missing a newline at the end, which is causing pipeline failures.
return input_byte_length + 40 # 24 bytes nonce + 16 bytes tag +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dclimate_zarr_client/encryption_codec.py(2 hunks)tests/test_zarr_encryption_ipfs.py(5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
12-12: numcodecs.register_codec imported but unused
Remove unused import: numcodecs.register_codec
(F401)
14-14: numcodecs.zarr3.BytesBytesCodec imported but unused
Remove unused import: numcodecs.zarr3.BytesBytesCodec
(F401)
🪛 GitHub Actions: pre-commit
tests/test_zarr_encryption_ipfs.py
[error] 1-1: pre-commit hook 'trailing-whitespace' failed. Files were modified by this hook.
dclimate_zarr_client/encryption_codec.py
[error] 1-1: pre-commit hook 'end-of-file-fixer' failed. Files were modified by this hook.
🔇 Additional comments (12)
tests/test_zarr_encryption_ipfs.py (6)
22-30: Good addition of a session-scoped fixture!Creating a session-scoped fixture for the encryption key ensures consistency across tests and follows best practices.
84-85: Good implementation of the compressors pipelineThe implementation correctly uses the Zarr3 compressor chain with Blosc first for compression and EncryptionCodec second for encryption, following Zarr3 best practices.
90-102: Excellent validation of codec configurationThe assertions validate that the codec is correctly registered and configured in the zarr.json file, which helps ensure the serialization is working as expected.
118-122: Good test for compute_encoded_sizeThis test validates the new method that calculates the encoded size with nonce and tag overhead, which is important for estimating storage requirements.
138-144: Good implementation of IPFSZarr3 wrapperUsing the IPFSZarr3 wrapper around the HAMT store correctly implements the Zarr3 compatibility layer.
168-177: Good negative test caseTesting with an incorrect encryption key ensures that encrypted data is properly protected and inaccessible without the correct key.
dclimate_zarr_client/encryption_codec.py (6)
9-10: Good inheritance from BytesBytesCodecThe update to inherit from
BytesBytesCodeccorrectly aligns with Zarr3's codec architecture.
14-24: Nice improvement with default header valueAdding a default value for the
headerparameter makes the codec more user-friendly while maintaining customization options.
33-44: Good implementation of from_dictThe
from_dictmethod correctly implements the Zarr3 configuration loading pattern with proper type hinting and error handling.
46-55: Well-implemented to_dict methodThe
to_dictmethod correctly implements the Zarr3 configuration serialization pattern with the appropriate structure.
57-76: Excellent async implementation of decodeUsing
asyncio.to_threadfor the blocking cryptographic operations is a great approach to prevent I/O blocking in asynchronous contexts.
77-97: Good async implementation of encodeThe encryption logic is well-structured with clear separation between buffer handling and cryptographic operations.
| }, | ||
| attrs={"description": "Test dataset with random weather data"}, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Fix trailing whitespace
The trailing whitespace on this line is causing pipeline failures.
-
+📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2 +/- ##
=======================================
Coverage ? 65.42%
=======================================
Files ? 8
Lines ? 778
Branches ? 111
=======================================
Hits ? 509
Misses ? 221
Partials ? 48 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
dclimate_zarr_client/encryption_codec.py (1)
27-33:Details
❓ Verification inconclusive
Consider replacing global state with a context manager.
The static
_encryption_keyclass variable creates global state which can cause issues in concurrent environments. Consider implementing a context manager as suggested in the AI summary to manage key lifecycle more safely.@classmethod @contextmanager def key_context(cls, key: bytes | None): """Temporarily set encryption key within a context.""" if key is not None: if len(key) != 32: # 32 bytes = 64 hex chars raise ValueError("Encryption key must be 32 bytes") old_key = cls._encryption_key cls._encryption_key = key try: yield finally: cls._encryption_key = old_key else: yield
🌐 Web query:
Is global state in Python classes thread-safe?💡 Result:
Global state in Python classes is not inherently thread-safe. While Python's Global Interpreter Lock (GIL) prevents concurrent execution of bytecode across threads, thread safety depends on the specific operations performed:
Shared State Risks
Class attributes and global variables can become corrupted in multi-threaded environments when:
- Multiple threads modify the same attribute/variable
- Operations require multiple bytecode steps (e.g.,
x += 1is not atomic[2][6])- Methods lack synchronization for state-modifying operations[3][6]
Thread-Safe Implementation Strategies
To ensure safety:import threading class SharedState: _lock = threading.Lock() data = {} @classmethod def update(cls, key, value): with cls._lock: # Critical section protection cls.data[key] = value
- Use
threading.Lock()for mutual exclusion in critical sections[1][3]- Leverage atomic operations where possible (e.g.,
queue.Queuefor thread-safe data exchange)[3]- Design classes to minimize shared mutable state[3][5]
GIL Limitations
While the GIL ensures bytecode-level atomicity for single operations like:L.append(x) # Atomic under GIL[2]It doesn't protect multi-step operations like:
if x in cache: # Non-atomic check-then-act cache[x] += 1Race conditions can occur between the existence check and modification[2][6].
For singleton patterns requiring thread-safety, a locked metaclass approach is effective:
class SingletonMeta(type): _instances = {} _lock = threading.Lock() def __call__(cls, *args, **kwargs): with cls._lock: # Thread-safe instantiation if cls not in cls._instances: cls._instances[cls] = super().__call__(*args, **kwargs) return cls._instances[cls]Key recommendations[1][3][5]:
- Always use locks when modifying shared class/global state
- Prefer thread-safe data structures (
queue.Queue,collections.deque)- Avoid global state where possible through dependency injection
- Test thread interactions explicitly using stress tests
Without explicit synchronization mechanisms, concurrent access to global class state risks data races, inconsistent states, and hard-to-debug issues[2][6].
Citations:
- 1: https://python.plainenglish.io/singleton-pattern-in-python-managing-global-state-with-thread-safety-166d21279904
- 2: https://discuss.python.org/t/atomic-and-thread-safe-in-python-world/51575
- 3: https://www.cloudthat.com/resources/blog/writing-thread-safe-programs-in-python
- 4: https://forums.raspberrypi.com/viewtopic.php?t=234164
- 5: https://labex.io/tutorials/python-how-to-modify-global-state-safely-436783
- 6: https://discuss.python.org/t/what-is-thread-safety-as-applied-to-python/13685
Action Required: Revisit Global Key Management to Address Thread Safety
The current class-level
_encryption_keyis set via a static method, which introduces global mutable state and is not inherently thread-safe. Transitioning to a context manager—as outlined below—can help by limiting the key's scope and lifetime. However, if the encryption key may be modified concurrently in a multi-threaded environment, consider integrating explicit thread synchronization (e.g., using athreading.Lock) within the context manager.Recommendations:
- Replace the current
set_encryption_keymethod with a context manager to temporarily set and safely restore the encryption key.- Evaluate if thread-safety is a concern in your use case and, if so, add locking to protect state transitions.
@classmethod @contextmanager def key_context(cls, key: bytes | None): """Temporarily set encryption key within a context.""" if key is not None: if len(key) != 32: # 32 bytes = 64 hex chars raise ValueError("Encryption key must be 32 bytes") # Optionally, add thread-safety via a lock if concurrent access is expected. old_key = cls._encryption_key cls._encryption_key = key try: yield finally: cls._encryption_key = old_key else: yieldPlease review these changes carefully and assess whether additional thread synchronization is needed for your environment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
dclimate_zarr_client/encryption_codec.py(2 hunks)tests/test_ipfs_retrieval.py(2 hunks)tests/test_zarr_encryption_ipfs.py(5 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
tests/test_ipfs_retrieval.py (1)
dclimate_zarr_client/geotemporal_data.py (1)
time_range(345-366)
🪛 GitHub Actions: pre-commit
tests/test_ipfs_retrieval.py
[error] 303-303: ruff-format: 1 file reformatted. Please check the changes made by the pre-commit hook.
🔇 Additional comments (11)
tests/test_ipfs_retrieval.py (1)
33-33: Appropriate update to test mock.The updated IPFS hash value is correctly matched to work with the updated dependencies for Zarr3.
tests/test_zarr_encryption_ipfs.py (5)
24-30: Good use of session-scoped fixture.Using a session-scoped fixture for the encryption key improves test consistency and allows for reliable assertions between test functions.
34-94: Well-structured Zarr3 compatibility updates.The changes to utilize compressors instead of filters align properly with Zarr3's approach. Setting
consolidated=Falseis also appropriate for IPFS HAMT stores.
97-108: Good validation of codec configuration.The added assertions properly verify that the codec is correctly registered and configured in the zarr.json file, which is essential for ensuring proper serialization in Zarr3.
125-128: Appropriate test for new functionality.The test for
compute_encoded_sizeverifies the new functionality added to the codec and confirms that it correctly accounts for the additional bytes needed for the nonce and tag.
131-186: Proper adaptation to IPFSZarr3.The test has been correctly updated to use the new IPFSZarr3 wrapper class and verifies that encryption still works correctly with the new Zarr3 structure.
dclimate_zarr_client/encryption_codec.py (5)
1-12: Proper updates for Zarr3 compatibility.The imports have been updated to use Zarr3 types, and the class now correctly inherits from
BytesBytesCodecwhich is the appropriate base class for Zarr3 codecs that transform bytes to bytes.
16-26: Good default value and improved docstrings.The constructor now includes a sensible default value for the header and has well-structured docstrings with type annotations.
35-55: Excellent Zarr3 serialization methods.The
from_dictandto_dictmethods properly implement the Zarr3 serialization interface, allowing the codec to be correctly configured from metadata.
56-98: Good implementation of async encoding/decoding.The async methods for encoding and decoding chunks allow for better performance with large datasets by moving CPU-bound operations to separate threads using
asyncio.to_thread.
99-109: Accurate encoded size calculation.The
compute_encoded_sizemethod correctly accounts for the additional 40 bytes needed for the nonce (24 bytes) and tag (16 bytes) when calculating the encoded size.
feat: initial work to move over to dynamic stac vs cid list
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (12)
dclimate_zarr_client/ipfs_retrieval.py (3)
23-26: Avoid callinglogging.basicConfigin library code.
It’s typically best practice to let the application decide how to configure logging so as not to override user-defined settings. Consider removing or conditionally applying it only if no existing logging configuration is found.
98-249: Two-step IPNS fetch approach.
Retrying with and withoutnocacheis a clever way to handle gateway behaviors. Consider refactoring the repeated request logic into a helper function to reduce duplication.
696-894: Refactor lengthylist_datasetsfor readability.
While functional, this routine is quite large. Splitting out child-collection or item-link handling into smaller helpers could improve maintainability and testability.pyproject.toml (1)
18-20: Removed version pins for xarray and zarr.
While this gives more flexibility, it also introduces some uncertainty if backward-incompatible updates are released. Consider specifying version ranges to balance stability with upgrades.README.md (2)
144-144: Fix missing comma in sentenceThere's a missing comma after "IPFS gateway" in this sentence.
-To run your own IPFS gateway follow the instructions for [installing ipfs] +To run your own IPFS gateway, follow the instructions for [installing ipfs]🧰 Tools
🪛 LanguageTool
[uncategorized] ~144-~144: A comma might be missing here.
Context: ...e/main/notebooks). To run your own IPFS gateway follow the instructions for [installing...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
146-146: Remove trailing punctuation from headingAccording to Markdown linting standards (rule MD026), headings should not have trailing punctuation.
-## Create and activate a virtual environment: +## Create and activate a virtual environment🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
146-146: Trailing punctuation in heading
Punctuation: ':'(MD026, no-trailing-punctuation)
tests/test_ipfs_retrieval.py (3)
34-53: Well-structured test for hash retrieval success pathThe test provides thorough coverage of the success path for getting an IPNS name hash, including verifying that the cache update function is called correctly.
However, the nested 'with' statements could be simplified for better readability:
endpoint_data = {"cpc-precip-conus": "bafy-real-or-mock-hash", "other": "hash2"} - with patch("requests.get") as mock_get: - mock_get.return_value.raise_for_status.return_value = None - mock_get.return_value.json.return_value = endpoint_data - # Mock file operations to simulate cache update - with ( - patch("builtins.open", mock_open()) as _, - patch("os.path.dirname", return_value="/fake/dir"), - patch( - "dclimate_zarr_client.ipfs_retrieval.update_cache_if_changed" - ) as mock_update, - ): + with patch("requests.get") as mock_get, \ + patch("builtins.open", mock_open()) as _, \ + patch("os.path.dirname", return_value="/fake/dir"), \ + patch("dclimate_zarr_client.ipfs_retrieval.update_cache_if_changed") as mock_update: + mock_get.return_value.raise_for_status.return_value = None + mock_get.return_value.json.return_value = endpoint_data + # Mock file operations to simulate cache update
61-170: Comprehensive fallback testing with nested context managersThe tests provide excellent coverage of various fallback scenarios for the IPNS hash retrieval functionality. The test assertions are clear and verify the expected behavior in each case.
However, there are multiple instances of nested 'with' statements that could be simplified throughout this section for better readability and adherence to Ruff's SIM117 rule.
For example, in the test_get_ipns_name_hash_fallback_success function:
- with patch( - "requests.get", side_effect=requests.RequestException("Simulated Error") - ): - with patch("os.path.exists", return_value=True): - fake_json = '{"cpc-precip-conus":"bafkreihashfromlocalfile"}' - # Mock open within the context of the function call - with patch( - "dclimate_zarr_client.ipfs_retrieval.open", - mock_open(read_data=fake_json), - create=True, - ): + with patch("requests.get", side_effect=requests.RequestException("Simulated Error")), \ + patch("os.path.exists", return_value=True), \ + patch("dclimate_zarr_client.ipfs_retrieval.open", + mock_open(read_data='{"cpc-precip-conus":"bafkreihashfromlocalfile"}'), + create=True):Similar simplifications could be applied to tests at lines 78-84, 88-99, 104-122, 126-140, 144-155, and 159-170.
🧰 Tools
🪛 Ruff (0.8.2)
61-64: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
78-81: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
88-91: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
93-98: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
110-118: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
144-147: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
159-162: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
198-261: Consider removing commented-out testsThere are several commented-out test functions that appear to be for functionality that has been changed or removed. While they might provide historical context, they also clutter the codebase and could lead to confusion.
Consider either:
- Removing these commented-out tests entirely if they're no longer needed
- Adding a clear comment explaining why they're kept for future reference, with an explanation of the functionality they were testing and why they're no longer applicable
tests/conftest.py (2)
116-149: Robust IPFS connection check with logging.
The function appropriately handles connection, timeout, and otherrequestsexceptions.
Consider using a logging utility instead of
165-178: Global constants for known dataset IDs and parameters.
Storing these in a test-specific constants section is a straightforward approach.
If reused beyond test files, consider centralizing them in a shared constants module.tests/test_integration_ipfs.py (1)
43-56: Localis_ipfs_runninghelper mirrors conftest approach.
Duplicating or partially reusing logic is fine for test clarity here, but consider unifying if feasible.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
README.md(3 hunks)dclimate_zarr_client/__init__.py(1 hunks)dclimate_zarr_client/cids.json(1 hunks)dclimate_zarr_client/client.py(8 hunks)dclimate_zarr_client/dclimate_zarr_errors.py(2 hunks)dclimate_zarr_client/ipfs_retrieval.py(3 hunks)pyproject.toml(2 hunks)tests/conftest.py(3 hunks)tests/test_client.py(3 hunks)tests/test_integration_ipfs.py(1 hunks)tests/test_ipfs_retrieval.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- dclimate_zarr_client/cids.json
🧰 Additional context used
🧬 Code Graph Analysis (6)
dclimate_zarr_client/__init__.py (1)
dclimate_zarr_client/client.py (3)
load_ipfs_via_stac(26-84)load_s3(87-103)geo_temporal_query(106-263)
dclimate_zarr_client/ipfs_retrieval.py (1)
dclimate_zarr_client/dclimate_zarr_errors.py (4)
DatasetNotFoundError(25-26)NoMetadataFoundError(17-18)IpfsConnectionError(61-62)StacCatalogError(65-66)
dclimate_zarr_client/client.py (2)
dclimate_zarr_client/ipfs_retrieval.py (2)
get_dataset_hamt_cid_from_stac(252-433)_get_dataset_by_ipfs_cid(601-679)dclimate_zarr_client/geotemporal_data.py (1)
GeotemporalData(22-648)
tests/test_ipfs_retrieval.py (2)
dclimate_zarr_client/ipfs_retrieval.py (3)
get_ipns_name_hash(514-557)update_cache_if_changed(500-511)list_datasets(733-893)dclimate_zarr_client/dclimate_zarr_errors.py (3)
DatasetNotFoundError(25-26)IpfsConnectionError(61-62)StacCatalogError(65-66)
tests/test_integration_ipfs.py (3)
dclimate_zarr_client/client.py (2)
load_ipfs_via_stac(26-84)geo_temporal_query(106-263)dclimate_zarr_client/ipfs_retrieval.py (2)
fetch_json_from_ipns(99-248)get_dataset_hamt_cid_from_stac(252-433)dclimate_zarr_client/dclimate_zarr_errors.py (3)
IpfsConnectionError(61-62)StacCatalogError(65-66)DatasetNotFoundError(25-26)
tests/test_client.py (4)
dclimate_zarr_client/dclimate_zarr_errors.py (9)
DatasetNotFoundError(25-26)IpfsConnectionError(61-62)StacCatalogError(65-66)NoDataFoundError(21-22)ConflictingGeoRequestError(9-10)InvalidForecastRequestError(29-30)ConflictingAggregationRequestError(13-14)InvalidExportFormatError(43-44)SelectionTooLargeError(5-6)dclimate_zarr_client/client.py (1)
geo_temporal_query(106-263)tests/conftest.py (1)
forecast_ds(27-32)dclimate_zarr_client/geotemporal_data.py (1)
time_range(345-366)
🪛 Ruff (0.8.2)
tests/test_ipfs_retrieval.py
61-64: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
78-81: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
88-91: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
93-98: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
110-118: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
144-147: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
159-162: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
tests/test_client.py
105-107: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🪛 LanguageTool
README.md
[uncategorized] ~144-~144: A comma might be missing here.
Context: ...e/main/notebooks). To run your own IPFS gateway follow the instructions for [installing...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
🪛 markdownlint-cli2 (0.17.2)
README.md
146-146: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
🔇 Additional comments (53)
dclimate_zarr_client/ipfs_retrieval.py (7)
4-4: No issues with adding logging import.
9-17: Imports from py_hamt and custom errors look good.
27-32: Consider concurrency for in-memory cache.
_stac_hamt_cid_cacheis a globally shared dictionary. If used in multi-threaded or multi-process scenarios, ensure proper synchronization or concurrency handling to avoid data races.Would you like guidance on making this cache thread-safe?
33-53: Configurable IPFSStore creation.
This helper neatly centralizes IPFS gateway/RPC initialization logic, making it straightforward to extend or override in the future.
55-96: Robust JSON fetch with comprehensive error handling.
Catching a variety of network and parsing exceptions is excellent for ensuring your system doesn’t fail silently on IPFS retrieval issues.
251-433: Comprehensive STAC catalog traversal.
This function covers multiple edge cases for invalid links or STAC structures; the structured approach and use of warnings is solid.
599-680: Clean IPFS-based dataset loading.
LeveragingHAMTandIPFSZarr3with strong exception handling ensures reliability when dealing with Zarr data on IPFS.dclimate_zarr_client/__init__.py (2)
2-6: Renamed reference toload_ipfs_via_stac.
Using the new function name improves clarity and matches the updated implementation.
13-13: Exportload_ipfs_via_stacin__all__.
Exposing it clearly in the public API is consistent with the rest of the module’s architecture.pyproject.toml (1)
39-45: Added custom Pytest markers.
Definingipfsandintegrationmarkers clarifies the test scope and requirements, which is good practice for organizing tests.dclimate_zarr_client/dclimate_zarr_errors.py (3)
26-26: Documentation improvement for DatasetNotFoundErrorThe updated docstring now correctly reflects that this error is raised when datasets can't be found via any of the supported retrieval methods (IPNS, STAC, or S3).
52-52: Documentation improvement for PathNotFoundErrorThe updated docstring now correctly includes IPFS as a path source that can trigger this error, alongside the existing AWS S3 reference.
61-67: Appropriate new error classes for IPFS and STAC supportThe newly added error classes provide specific error handling for IPFS connectivity issues and STAC catalog traversal problems, which aligns well with the PR's goal of enhancing IPFS/STAC support.
README.md (4)
3-9: Improved project description with clear technical contextThe updated README introduction clearly explains the project's purpose and technical approach, highlighting the use of IPFS, STAC catalog, and py-hamt for accessing Zarr datasets. This provides users with a much better understanding of what the library does and how it works.
37-37: Updated module description reflects new capabilitiesThe description of
ipfs_retrieval.pynow properly explains its expanded functionality, including IPNS name resolution, STAC catalog traversal, and interaction with IPFS gateways and RPC endpoints.
47-141: Excellent comprehensive usage examplesThe new usage section provides highly detailed examples covering different access methods (IPFS and S3), query approaches, and proper error handling. These examples will significantly help users understand how to use the library effectively, particularly the new IPFS/STAC functionality.
149-151: Updated virtual environment naming conventionThe change from
myenvto.venvfollows modern Python conventions for virtual environment naming. The corresponding activation commands have been correctly updated for both Unix and Windows systems.tests/test_client.py (6)
19-22: Appropriate imports for new error classesThe imports of the new error classes align with the error handling changes in the PR and are necessary for the new test cases.
25-33: Good use of centralized test constantsImporting test constants from conftest.py is a good practice that improves test maintainability and ensures consistent test data across test files.
111-125: Clarified fixture naming and comprehensive patchingThe fixture has been appropriately renamed to
patch_s3to better reflect its purpose. Patching the function in both modules where it might be called ensures consistent behavior throughout the tests.
214-343: Well-structured functional tests for IPFS retrievalThese new functional tests provide comprehensive coverage of the IPFS dataset retrieval functionality, with robust error handling and thorough assertions. They test both basic functionality (loading datasets) and more complex operations (geo-temporal queries with various parameters).
The tests properly catch and report specific expected exceptions as well as unexpected errors, making test failures more informative.
344-474: Comprehensive error handling testsThe new error handling tests ensure that appropriate exceptions are raised for various invalid input scenarios. This improves the robustness of the client by verifying it properly validates inputs and provides meaningful error messages.
The tests use constants from conftest.py for consistency and include message checking to ensure detailed error information.
475-578: Improved organization and validation for S3 testsMoving the S3 tests to a class structure improves organization. The extensive validation of the forecast dataset fixture before running the actual test helps prevent confusing test failures by ensuring the test data has the expected structure.
tests/test_ipfs_retrieval.py (3)
7-26: Updated imports and module-level fixture for IPFS connectivityThe imports have been appropriately updated to include the new error classes and constants needed for the tests. Adding the module-level pytest marker ensures IPFS connectivity is checked before running tests, which prevents confusing test failures when IPFS is unavailable.
177-195: Well-structured functional test for dataset listingThis test effectively verifies the
list_datasetsfunctionality by checking both the structure of the return value and that it contains expected datasets. The error handling is appropriate for a functional test, with specific catches for expected error types and a generic catch for unexpected errors.
308-406: Improved cache update tests and helper functionThe cache update tests have been enhanced with better clarity and coverage of edge cases. The
get_cache_pathhelper function now has improved documentation, and the tests properly verify cache update behavior in various scenarios, including file not found and JSON decode errors.The removal of commented-out assertion lines is a good cleanup that improves code readability.
tests/conftest.py (5)
6-6: Imports look correct.
The newly added imports (os,requests, andzarr.storage) are necessary for environment configuration, IPFS checks, and direct ZipStore usage.Also applies to: 10-10, 12-12
21-22: Fixture documentation is helpful.
Retaining local fixtures that do not require IPFS improves test independence and helps differentiate local and IPFS test coverage.
28-31: Similar approach for forecast fixture is consistent.
Having a separate fixture for forecast data reaffirms modularity and keeps IPFS-unrelated tests consistent with the real data process.
109-114: Session-scoped IPFS gateway URL fixture.
Providing a single session fixture to retrieve the IPFS gateway URL is a good practice and ensures consistency across tests.
151-163: Autouse fixture to skip tests if IPFS is unavailable.
This session-scoped approach effectively prevents failures from cascading.dclimate_zarr_client/client.py (8)
15-18: Relevant IPFS retrieval imports.
Bringing in_get_dataset_by_ipfs_cidandget_dataset_hamt_cid_from_stacclarifies the direct usage.
20-23: Catalog IPNS constant offers quick reference.
DefiningDCLIMATE_STAC_CATALOG_IPNSat the top keeps the root catalog path discoverable.
26-66: Docstring improvements forload_ipfs_via_stac.
Removing theas_ofparameter simplifies usage and clarifies the IPFS-based approach.
Ensure the readme and references elsewhere are updated to avoid confusion if users relied onas_of.
67-82: HAMT CID resolution and dataset loading.
The flow is straightforward: fetch the CID via STAC, then load the dataset.
Error handling is sufficiently delegated to the imported functions.
106-126:geo_temporal_queryparameter updates.
Replacingas_ofwith IPFS gateway/rpc stems aligns with the updated approach.
Documentation is consistent, although consider making it more explicit that historical snapshots are no longer an option.
141-148: Revised doc references for IPFS and S3 usage.
Good for clarifying which parameter is required given the data source.
222-235: Conditional branching for IPFS or S3 data retrieval.
The checks for missingbucket_nameor invalid source are well-handled.
262-262: Returning data in dictionary format.
Falling back todata.as_dict()is consistent with the documentedoutput_format=="array"path.tests/test_integration_ipfs.py (14)
3-15: File-level configuration and library skipping.
Auto-skipping tests if required libraries are missing promotes stable CI.
16-31: Comprehensive client imports with caution on internal usage.
Importing internal functions (_get_dataset_by_ipfs_cid,_stac_hamt_cid_cache) is acceptable for integration tests but watch for potential maintenance overhead if signatures change.
34-41: Global known dataset references.
Clear naming ensures readability for test usage.
58-67: Module-scoped fixture to skip on IPFS unavailability.
Pairs well with the localis_ipfs_running, achieving the skip logic at integration test scope.
76-90:test_fetch_root_catalog_from_ipns: Basic IPNS fetch test.
Verifies the root catalog retrieval robustly; good assertion coverage on returned JSON.
92-111:test_get_hamt_cid_success: Normal dataset retrieval path.
Checks both function success and format of the resulting CID.
114-125:test_get_hamt_cid_not_found: Negative path coverage.
RaisesDatasetNotFoundErroras expected for unknown dataset IDs.
128-149:test_get_hamt_cid_caching: Ensures caching mechanism.
Timing checks highlight the performance benefits of caching.
152-172:test_load_dataset_internal: Validates_get_dataset_by_ipfs_cid.
Ensures that the direct, lower-level IPFS loading path remains functional.
175-192:test_load_ipfs_via_stac_success: Main load function test.
Demonstrates the top-level client usage and verifies presence of a known variable in the dataset.
195-203:test_load_ipfs_via_stac_not_found: Expects error for nonexistent dataset.
Ensures consistent error handling for unknown STAC dataset IDs.
206-234:test_geo_temporal_query_ipfs_point_success: Single-point retrieval.
Verifies the correct filtering and data extraction, including range checks on time coords.
237-268:test_geo_temporal_query_ipfs_netcdf_output: NetCDF serialization check.
Ensures user-chosen output format is properly returned and remains valid.
271-291:test_list_datasets_ipfs: Catalog listing validation.
Checks that the catalog fetch is successful, returning a non-empty list of known dataset IDs.
Summary by CodeRabbit
EncryptionCodecclass to support asynchronous operations and improved configuration handling.