|
| 1 | +--- |
| 2 | +title: "MultiDisk (JBOD) Balancing" |
| 3 | +linkTitle: "MultiDisk (JBOD) Balancing" |
| 4 | +--- |
| 5 | + |
| 6 | +ClickHouse provides two options to balance an insert across disks in a volume with more than one disk: `round_robin` and `least_used` . |
| 7 | + |
| 8 | +## **Round Robin (Default):** |
| 9 | + |
| 10 | +ClickHouse selects the next disk in a round robin manner to write each part created. |
| 11 | + |
| 12 | +This is the default setting and is most effective when parts created on insert are roughly the same size. |
| 13 | + |
| 14 | +Drawbacks: may lead to disk skew |
| 15 | + |
| 16 | +## **Least Used:** |
| 17 | + |
| 18 | +ClickHouse selects the disk with the most available space and writes to that disk. |
| 19 | + |
| 20 | +Changing to least_used when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced. |
| 21 | + |
| 22 | +Drawbacks: may lead to hot-spots |
| 23 | + |
| 24 | +## Configurations |
| 25 | + |
| 26 | +Configurations that can affect disk selected: |
| 27 | + |
| 28 | +- storage policy volume configuration: `least_used_ttl_ms`. Only applies to `least_used` policy, 60s default. |
| 29 | +- disk setting: `keep_free_space_bytes` , `keep_free_space_ratio` |
| 30 | + |
| 31 | +Configuration to assist rebalancing: |
| 32 | + |
| 33 | +- MergeTree setting: `min_bytes_to_rebalance_partition_over_jbod`. Setting is not about where the data is written on insert. This setting considers redistribution of parts across disks of the same volume on a merge. |
| 34 | + |
| 35 | +> Note: setting `min_bytes_to_rebalance_partition_over_jbod` does not guarantee balanced partitions and balanced disk usage. |
| 36 | +> |
| 37 | +
|
| 38 | +Example of least_used policy: |
| 39 | + |
| 40 | +```xml |
| 41 | +<clickhouse> |
| 42 | + <storage_configuration> |
| 43 | + <disks> |
| 44 | + <default> |
| 45 | + <path>/var/lib/clickhouse/</path> |
| 46 | + <keep_free_space_bytes>10737418240</keep_free_space_bytes> |
| 47 | + </disk1> |
| 48 | + <disk1> |
| 49 | + <path>/mnt/disk1/</path> |
| 50 | + <keep_free_space_bytes>10737418240</keep_free_space_bytes> |
| 51 | + </disk1> |
| 52 | + <disk2> |
| 53 | + <path>/mnt/disk2/</path> |
| 54 | + <keep_free_space_bytes>10737418240</keep_free_space_bytes> |
| 55 | + </disk2> |
| 56 | + </disks> |
| 57 | + <policies> |
| 58 | + <hot> |
| 59 | + <volumes> |
| 60 | + <default> |
| 61 | + <disk>disk1</disk> |
| 62 | + <disk>disk2</disk> |
| 63 | + <load_balancing>least_used</load_balancing> |
| 64 | + <least_used_ttl_ms>60000</least_used_ttl_ms> <!-- 60s --> |
| 65 | + </default> |
| 66 | + </volumes> |
| 67 | + </hot> |
| 68 | + </policies> |
| 69 | + </storage_configuration> |
| 70 | +</clickhouse> |
| 71 | +``` |
| 72 | + |
| 73 | +## Manual Rebalancing Parts over JBOD Disks |
| 74 | + |
| 75 | +```sql |
| 76 | +WITH |
| 77 | + '%' AS target_tables, |
| 78 | + '%' AS target_databases |
| 79 | +SELECT sub.q FROM |
| 80 | +( |
| 81 | + SELECT |
| 82 | + 'ALTER TABLE ' || parts.database || '.' || parts.`table` || ' MOVE PART \'' || parts.name ||'\' TO DISK \'' || other_disk_candidate || '\';' as q, |
| 83 | + parts.database as db, |
| 84 | + parts.`table` as t, |
| 85 | + parts.name as part_name, |
| 86 | + parts.disk_name as part_disk_name, |
| 87 | + parts.bytes_on_disk AS part_bytes_on_disk, |
| 88 | + sp.storage_policy as part_storage_policy, |
| 89 | + arrayJoin(arrayRemove(v.disks, parts.disk_name)) AS other_disk_candidate, |
| 90 | + candidate_disks.free_space AS candidate_disk_free_space |
| 91 | + FROM system.parts AS parts |
| 92 | + INNER JOIN ( SELECT database, `table`, storage_policy FROM system.tables where (name LIKE target_tables) AND (database LIKE target_databases) group by 1, 2, 3 ) AS sp ON sp.`table` = parts.`table` AND sp.database = parts.database |
| 93 | + INNER JOIN ( SELECT policy_name, volume_name, disks AS disks FROM system.storage_policies WHERE volume_type = 0 ) AS v ON sp.storage_policy = v.policy_name |
| 94 | + INNER JOIN ( SELECT name, free_space FROM system.disks ORDER BY free_space DESC ) AS candidate_disks ON candidate_disks.name = other_disk_candidate |
| 95 | + WHERE parts.active = 1 |
| 96 | + AND (parts.bytes_on_disk >= 10737418240) --10GB prioritize larger parts |
| 97 | + AND (parts.`table` LIKE target_tables) |
| 98 | + AND (parts.database LIKE target_databases) |
| 99 | + AND candidate_disks.free_space > parts.bytes_on_disk*2 -- 2x buffer |
| 100 | + ORDER BY parts.bytes_on_disk DESC, candidate_disk_free_space DESC |
| 101 | + LIMIT 1 BY db, t, part_name |
| 102 | +) as sub |
| 103 | +FORMAT TSVRaw |
| 104 | +``` |
0 commit comments