Skip to content

Commit d962a42

Browse files
committed
chunk docstring
1 parent e9ee09e commit d962a42

1 file changed

Lines changed: 23 additions & 5 deletions

File tree

activitysim/core/chunk.py

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,17 +37,35 @@
3737
USS_CHUNK_METHODS = [USS, HYBRID_USS, BYTES]
3838
DEFAULT_CHUNK_METHOD = HYBRID_USS
3939

40-
#
41-
# TRAINING_MODE
42-
#
43-
4440
"""
41+
42+
The chunk_cache table is a record of the memory usage and observed row_size required for chunking the various models.
43+
The row size differs depending on whether memory usage is calculated by rss, uss, or explicitly allocated bytes.
44+
We record all three during training so the mode can be changed without necessitating retraining.
45+
46+
tag, num_rows, rss, uss, bytes, uss_row_size, hybrid_uss_row_size, bytes_row_size
47+
atwork_subtour_frequency.simple, 3498, 86016, 81920, 811536, 24, 232, 232
48+
atwork_subtour_mode_choice.simple, 704, 20480, 20480, 1796608, 30, 2552, 2552
49+
atwork_subtour_scheduling.tour_1, 701, 24576, 24576, 45294082, 36, 64614, 64614
50+
atwork_subtour_scheduling.tour_n, 3, 20480, 20480, 97734, 6827, 32578, 32578
51+
auto_ownership_simulate.simulate, 5000, 77824, 24576, 1400000, 5, 280, 280
52+
4553
MODE_RETRAIN
54+
rebuild chunk_cache table and save/replace in output/cache/chunk_cache.csv
55+
preforms a complete rebuild of chunk_cache table by doing adaptive chunking starting with based on default initial
56+
settings (DEFAULT_INITIAL_ROWS_PER_CHUNK) and observing rss, uss, and allocated bytes to compute rows_size.
57+
This will run somewhat slower than the other modes because of overhead of small first chunk, and possible
58+
instability in the second chunk due to inaccuracies caused by small initial chunk_size sample
4659
4760
MODE_ADAPTIVE
61+
Use the existing chunk_cache to determine the sizing for the first chunk for each model, but also use the
62+
observed row_size to adjust the estimated row_size for subsequent chunks. At the end of hte run, writes the
63+
updated chunk_cache to the output directory, but doesn't overwrite the 'official' cache file. If the user wishes
64+
they can replace the chunk_cache with the updated versions but this is not done automatically as it is not clear
65+
this would be the desired behavior. (Might become clearer over time as this is exercised further.)
4866
4967
MODE_PRODUCTION
50-
since overhead changes we don't necessarily want the same number of rows per chunk every time
68+
Since overhead changes we don't necessarily want the same number of rows per chunk every time
5169
but we do use the row_size from cache which we trust is stable
5270
(the whole point of MODE_PRODUCTION is to avoid the cost of observing overhead)
5371
which is stored in self.initial_row_size because initial_rows_per_chunk used it for the first chunk

0 commit comments

Comments
 (0)