GitHub - paahaad/block-crumbs: This project fetches solana blocks, extracts some fields from instructions, and stores them in ClickHouse for analytics.

Task

Your task is to build a lightweight system that ingests Solana block data and makes it queryable for research purposes.

Approach

I used the Carbon indexer framework to handle the data pipeline, letting me focus on the core logic. With PassthroughDecoder, I receive raw transactions and manually extract all fields in TransactionAnalyzer.

Quick Start

1. Start ClickHouse (config in fs/volumes/)

docker compose up -d

2. Run the indexer

cp env.example .env  # add your Helius API key
cargo run

Schema Design

Why two tables?

Transactions — granular per-tx data for detailed analysis (fee breakdown, Jito detection, cu limit, tx complexity)
Block Stats — pre-aggregated block-level metrics to avoid expensive GROUP BY slot queries at read time

1. Transactions Table

CREATE TABLE IF NOT EXISTS transactions (
    slot UInt64,
    block_time DateTime64(3) DEFAULT 0,
    tx_index UInt64,
    signature String,
    fee_payer String,
    success UInt8,
    fee UInt64,
    base_fee UInt64,
    priority_fee UInt64,
    cu_price UInt64,
    cu_consumed UInt64,
    cu_requested UInt32,
    is_jito_bundle UInt8,
    jito_tip_amount UInt64,
    jito_tip_account String,
    landing_method LowCardinality(String),
    num_instructions UInt32,
    num_accounts UInt32,
    num_signers UInt32,
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(toDateTime(block_time))
ORDER BY (slot, tx_index)

Daily partitions — enables efficient time-range queries via partition pruning
Order by (slot, tx_index) — data is stored in the same order as transactions appear in blocks, so queries like "get all txs in slot X" read contiguous data instead of jumping around disk

2. Block Stats Table

CREATE TABLE IF NOT EXISTS block_stats (
    slot UInt64,
    block_time DateTime64(3) DEFAULT 0,
    total_tx UInt32,
    successful_tx UInt32,
    failed_tx UInt32,
    jito_bundle_count UInt32,
    priority_fee_count UInt32,
    base_fee_count UInt32,
    total_fees UInt64,
    total_priority_fees UInt64,
    total_jito_tips UInt64,
    total_cu UInt64,
    median_priority_fee UInt64,
    max_priority_fee UInt64,
    median_cu_price UInt64,
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(toDateTime(block_time))
ORDER BY slot

Order by slot — blocks are naturally sequential, so queries like "stats for slots 1000-2000" read data in order

Signals Extracted

1. Jito Bundle Detection — jito.rs

If a tx transfers SOL to any of the 8 Jito tip accounts, it's a Jito bundle. Also extract the tip amount.

2. Priority Fee Calculation — compute_budget.rs

Extract SetComputeUnitPrice and SetComputeUnitLimit from ComputeBudget instructions. Priority fee = (cu_price × cu_limit) / 1_000_000 — based on requested CU. This also enables analyzing CU efficiency: requested vs actually utilized.

3. Fee Landscape per Block

Aggregate fee distribution across each slot: what portion goes to priority fees, Jito tips, and base fees. This reveals how block space is being monetized and the competition dynamics between MEV (Jito) vs priority fee bidding.

Interesting Finding

~30-50% of requested CU goes unused means users are overpaying priority fees.

Since priority fee = cu_limit × cu_price, requesting more CU than needed means paying for unused compute. From my data, the avg consumed vs requested ratio is roughly 1:2.

SELECT 
    slot,
    sum(cu_consumed) as total_cu_used,
    sum(cu_requested) as total_cu_requested,
    round(sum(cu_consumed) / sum(cu_requested) * 100, 1) as utilization_pct
FROM transactions
WHERE slot >= (SELECT max(slot) - 10 FROM transactions)
GROUP BY slot
ORDER BY slot;

What I'd Build Next

Given more time, I'd explore how validators schedule transactions, analyzing tx ordering within blocks and how priority fees/Jito tips actually affect position. This requires understanding both the validator's scheduler and Jito's bundle processing.

Example Queries

See queries.sql

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
fs/volumes		fs/volumes
src		src
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
docker-compose-clustor.yml		docker-compose-clustor.yml
docker-compose.yml		docker-compose.yml
queries.sql		queries.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Task

Approach

Quick Start

Schema Design

Signals Extracted

Interesting Finding

What I'd Build Next

Example Queries

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Task

Approach

Quick Start

Schema Design

Signals Extracted

Interesting Finding

What I'd Build Next

Example Queries

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages