Add: Ethereum address detection to CryptoRecognizer#1837
Add: Ethereum address detection to CryptoRecognizer#1837kyoungbinkim wants to merge 6 commits intomicrosoft:mainfrom
Conversation
| "pyyaml", | ||
| "phonenumbers (>=8.12,<10.0.0)", | ||
| "pydantic (>=2.0.0,<3.0.0)", | ||
| "eth-hash[pycryptodome]>=0.5.0", |
There was a problem hiding this comment.
Is there a simple way to add this functionality without relying on this 3rd party package?
There was a problem hiding this comment.
@omri374 Since hashlib does not support Keccak, I used an alternative approach. Without relying on external libraries, I had to implement it directly in Python. It is likely to be inefficient.
There was a problem hiding this comment.
# keccak256.py
# Pure Python Keccak-256 (NOT SHA3-256)
ROT = [
[0, 36, 3, 41, 18],
[1, 44, 10, 45, 2],
[62, 6, 43, 15, 61],
[28, 55, 25, 21, 56],
[27, 20, 39, 8, 14],
]
RC = [
0x0000000000000001, 0x0000000000008082,
0x800000000000808A, 0x8000000080008000,
0x000000000000808B, 0x0000000080000001,
0x8000000080008081, 0x8000000000008009,
0x000000000000008A, 0x0000000000000088,
0x0000000080008009, 0x000000008000000A,
0x000000008000808B, 0x800000000000008B,
0x8000000000008089, 0x8000000000008003,
0x8000000000008002, 0x8000000000000080,
0x000000000000800A, 0x800000008000000A,
0x8000000080008081, 0x8000000000008080,
0x0000000080000001, 0x8000000080008008,
]
def rol(x, n):
return ((x << n) | (x >> (64 - n))) & 0xFFFFFFFFFFFFFFFF
def keccak_f(state):
for rnd in range(24):
# θ
C = [state[x] ^ state[x+5] ^ state[x+10] ^ state[x+15] ^ state[x+20] for x in range(5)]
D = [C[(x-1)%5] ^ rol(C[(x+1)%5], 1) for x in range(5)]
for x in range(5):
for y in range(5):
state[x + 5*y] ^= D[x]
# ρ + π
B = [0]*25
for x in range(5):
for y in range(5):
B[y + 5*((2*x+3*y)%5)] = rol(state[x + 5*y], ROT[x][y])
# χ
for x in range(5):
for y in range(5):
state[x + 5*y] = B[x + 5*y] ^ ((~B[(x+1)%5 + 5*y]) & B[(x+2)%5 + 5*y])
# ι
state[0] ^= RC[rnd]
return state
def keccak_256(data: bytes) -> bytes:
rate = 1088 // 8 # 136 bytes
state = [0] * 25
# Padding (Keccak padding 0x01 ... 0x80)
padded = bytearray(data)
padded.append(0x01)
while (len(padded) % rate) != rate - 1:
padded.append(0x00)
padded.append(0x80)
# Absorb
for i in range(0, len(padded), rate):
block = padded[i:i+rate]
for j in range(rate // 8):
state[j] ^= int.from_bytes(block[8*j:8*j+8], "little")
state = keccak_f(state)
# Squeeze
out = bytearray()
while len(out) < 32:
for j in range(rate // 8):
out += state[j].to_bytes(8, "little")
if len(out) >= 32:
break
state = keccak_f(state)
return bytes(out[:32])there is keccak-256 python source from GPT
There was a problem hiding this comment.
@kyoungbinkim
any chance hashlib sha-3 can be used?
There was a problem hiding this comment.
As far as I know, when Ethereum was implemented, SHA-3 had not yet been standardized, so ETH adopted Keccak-256. That is why it is not implemented in hashlib. So I’m not sure whether it’s possible to implement Keccak-256 using hashlib.
There was a problem hiding this comment.
I wonder if the checksum check worth the extra codebase additions or is the regex is specific enough?
@omri374 ?
There was a problem hiding this comment.
I agree, the regex patterns seem specific enough. If we see it causes many false positives, we can add the validation logic. I'd vote for not adding another dependency just for this at this point.
There was a problem hiding this comment.
@omri374, as well as implementing ourselves, only if deemed necessary?
There was a problem hiding this comment.
@kyoungbinkim would you be interested in continuing this, considering the conversation here?
Add Ethereum address detection to CryptoRecognizer
Summary
Extends CryptoRecognizer to detect and validate Ethereum (ETH) addresses using EIP-55 checksum validation.
Ref
Changes
0x[a-fA-F0-9]{40})eth-hash[pycryptodome]dependencyTesting
Checklist