Skip to content
/ server Public

Conversation

@Thirunarayanan
Copy link
Member

@Thirunarayanan Thirunarayanan commented Jan 28, 2026

MDEV-32067 InnoDB linear read ahead had better be logical

The traditional linear read-ahead, enabled by innodb_read_ahead_threshold=56,
only works if pages are allocated on adjacent page numbers, which is not
always the case for B-tree leaf pages.

After this change, the exact nonzero values of
innodb_read_ahead_threshold matter only for the read-ahead of
undo log pages.

Introduced Multi-Range Read (MRR) aware read-ahead that collects
actual leaf page numbers during B-tree traversal

buf_read_ahead_undo(): Renamed from buf_read_ahead_linear().
This function will no longer be invoked on any BLOB pages
(for which FIL_PAGE_PREV and FIL_PAGE_NEXT were not initialized
consistently) nor on any index pages. For index leaf pages,
we will introduce buf_read_ahead_one() and buf_read_ahead_pages().

buf_read_ahead_one(): Read ahead one (sibling leaf) page.
This logic cannot be disabled.

buf_read_ahead_pages(): Read ahead B-tree index leaf pages.

buf_read_ahead_random(): Split the function into two parts: one
that determines which range of pages should be read, and another
that actually initiates a read of the pages.

btr_pcur_move_to_next_page(): Invoke buf_read_ahead_one()
instead of buf_read_ahead_linear().

btr_pcur_move_backward_from_page(): Implement a fast path of
trying to acquire a latch on the previous page without waiting,
and invoke buf_read_ahead_one() on the preceding page, with the
assumption that we may be accessing that page in the near future.

btr_copy_blob_prefix(): Simplify the logic. On other than
ROW_FORMAT=COMPRESSED BLOB pages, the FIL_PAGE_NEXT field is not
meaningfully initialized. The FIL_PAGE_PREV field is not pointing
to anything meaningful either. buf_read_ahead_linear() expects
these to be set meaningfully. Only the non-default setting
innodb_random_read_ahead=ON might be meaningful here.

btr_cur_t::search_leaf(): Add MRR read-ahead context to collect
leaf page numbers at PAGE_LEVEL=1 during B-tree traversal.
The collected page numbers represent actual leaf pages that
will be accessed, enabling more targeted
read-ahead than linear page number assumptions.

mrr_readahead_ctx_t: New structure for passing MRR context
through the call chain from ha_innobase -> row_search_mvcc()
-> btr_pcur_open() -> search_leaf() and it has
READ_AHEAD_PAGES=64 limit.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Thirunarayanan Thirunarayanan force-pushed the main_MDEV-32067 branch 4 times, most recently from 2e76a99 to 1fc14cb Compare January 29, 2026 16:00
The traditional linear read-ahead, enabled by innodb_read_ahead_threshold=56,
only works if pages are allocated on adjacent page numbers, which is not
always the case for B-tree leaf pages.

After this change, the exact nonzero values of
innodb_read_ahead_threshold matter only for the read-ahead of
undo log pages.

Introduced Multi-Range Read (MRR) aware read-ahead that collects
actual leaf page numbers during B-tree traversal

buf_read_ahead_undo(): Renamed from buf_read_ahead_linear().
This function will no longer be invoked on any BLOB pages
(for which FIL_PAGE_PREV and FIL_PAGE_NEXT were not initialized
consistently) nor on any index pages. For index leaf pages,
we will introduce buf_read_ahead_one() and buf_read_ahead_pages().

buf_read_ahead_one(): Read ahead one (sibling leaf) page.
This logic cannot be disabled.

buf_read_ahead_pages(): Read ahead B-tree index leaf pages.

buf_read_ahead_random(): Split the function into two parts: one
that determines which range of pages should be read, and another
that actually initiates a read of the pages.

btr_pcur_move_to_next_page(): Invoke buf_read_ahead_one()
instead of buf_read_ahead_linear().

btr_pcur_move_backward_from_page(): Implement a fast path of
trying to acquire a latch on the previous page without waiting,
and invoke buf_read_ahead_one() on the preceding page, with the
assumption that we may be accessing that page in the near future.

btr_copy_blob_prefix(): Simplify the logic. On other than
ROW_FORMAT=COMPRESSED BLOB pages, the FIL_PAGE_NEXT field is not
meaningfully initialized. The FIL_PAGE_PREV field is not pointing
to anything meaningful either. buf_read_ahead_linear() expects
these to be set meaningfully. Only the non-default setting
innodb_random_read_ahead=ON might be meaningful here.

btr_cur_t::search_leaf(): Add MRR read-ahead context to collect
leaf page numbers at PAGE_LEVEL=1 during B-tree traversal.
The collected page numbers represent actual leaf pages that
will be accessed, enabling more targeted
read-ahead than linear page number assumptions.

mrr_readahead_ctx_t: New structure for passing MRR context
through the call chain from ha_innobase -> row_search_mvcc()
-> btr_pcur_open() -> search_leaf() and it has
READ_AHEAD_PAGES=64 limit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants