Skip to content

Commit 98d8e61

Browse files
committed
Add multidisk-jbod-balancing.md
1 parent 0dcbefd commit 98d8e61

1 file changed

Lines changed: 104 additions & 0 deletions

File tree

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
title: "MultiDisk (JBOD) Balancing"
3+
linkTitle: "MultiDisk (JBOD) Balancing"
4+
---
5+
6+
ClickHouse provides two options to balance an insert across disks in a volume with more than one disk: `round_robin` and `least_used` .
7+
8+
## **Round Robin (Default):**
9+
10+
ClickHouse selects the next disk in a round robin manner to write each part created.
11+
12+
This is the default setting and is most effective when parts created on insert are roughly the same size.
13+
14+
Drawbacks: may lead to disk skew
15+
16+
## **Least Used:**
17+
18+
ClickHouse selects the disk with the most available space and writes to that disk.
19+
20+
Changing to least_used when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced.
21+
22+
Drawbacks: may lead to hot-spots
23+
24+
## Configurations
25+
26+
Configurations that can affect disk selected:
27+
28+
- storage policy volume configuration: `least_used_ttl_ms`. Only applies to `least_used` policy, 60s default.
29+
- disk setting: `keep_free_space_bytes` , `keep_free_space_ratio`
30+
31+
Configuration to assist rebalancing:
32+
33+
- MergeTree setting: `min_bytes_to_rebalance_partition_over_jbod`. Setting is not about where the data is written on insert. This setting considers redistribution of parts across disks of the same volume on a merge.
34+
35+
> Note: setting `min_bytes_to_rebalance_partition_over_jbod` does not guarantee balanced partitions and balanced disk usage.
36+
>
37+
38+
Example of least_used policy:
39+
40+
```xml
41+
<clickhouse>
42+
<storage_configuration>
43+
<disks>
44+
<default>
45+
<path>/var/lib/clickhouse/</path>
46+
<keep_free_space_bytes>10737418240</keep_free_space_bytes>
47+
</disk1>
48+
<disk1>
49+
<path>/mnt/disk1/</path>
50+
<keep_free_space_bytes>10737418240</keep_free_space_bytes>
51+
</disk1>
52+
<disk2>
53+
<path>/mnt/disk2/</path>
54+
<keep_free_space_bytes>10737418240</keep_free_space_bytes>
55+
</disk2>
56+
</disks>
57+
<policies>
58+
<hot>
59+
<volumes>
60+
<default>
61+
<disk>disk1</disk>
62+
<disk>disk2</disk>
63+
<load_balancing>least_used</load_balancing>
64+
<least_used_ttl_ms>60000</least_used_ttl_ms> <!-- 60s -->
65+
</default>
66+
</volumes>
67+
</hot>
68+
</policies>
69+
</storage_configuration>
70+
</clickhouse>
71+
```
72+
73+
## Manual Rebalancing Parts over JBOD Disks
74+
75+
```sql
76+
WITH
77+
'%' AS target_tables,
78+
'%' AS target_databases
79+
SELECT sub.q FROM
80+
(
81+
SELECT
82+
'ALTER TABLE ' || parts.database || '.' || parts.`table` || ' MOVE PART \'' || parts.name ||'\' TO DISK \'' || other_disk_candidate || '\';' as q,
83+
parts.database as db,
84+
parts.`table` as t,
85+
parts.name as part_name,
86+
parts.disk_name as part_disk_name,
87+
parts.bytes_on_disk AS part_bytes_on_disk,
88+
sp.storage_policy as part_storage_policy,
89+
arrayJoin(arrayRemove(v.disks, parts.disk_name)) AS other_disk_candidate,
90+
candidate_disks.free_space AS candidate_disk_free_space
91+
FROM system.parts AS parts
92+
INNER JOIN ( SELECT database, `table`, storage_policy FROM system.tables where (name LIKE target_tables) AND (database LIKE target_databases) group by 1, 2, 3 ) AS sp ON sp.`table` = parts.`table` AND sp.database = parts.database
93+
INNER JOIN ( SELECT policy_name, volume_name, disks AS disks FROM system.storage_policies WHERE volume_type = 0 ) AS v ON sp.storage_policy = v.policy_name
94+
INNER JOIN ( SELECT name, free_space FROM system.disks ORDER BY free_space DESC ) AS candidate_disks ON candidate_disks.name = other_disk_candidate
95+
WHERE parts.active = 1
96+
AND (parts.bytes_on_disk >= 10737418240) --10GB prioritize larger parts
97+
AND (parts.`table` LIKE target_tables)
98+
AND (parts.database LIKE target_databases)
99+
AND candidate_disks.free_space > parts.bytes_on_disk*2 -- 2x buffer
100+
ORDER BY parts.bytes_on_disk DESC, candidate_disk_free_space DESC
101+
LIMIT 1 BY db, t, part_name
102+
) as sub
103+
FORMAT TSVRaw
104+
```

0 commit comments

Comments
 (0)