|
37 | 37 | USS_CHUNK_METHODS = [USS, HYBRID_USS, BYTES] |
38 | 38 | DEFAULT_CHUNK_METHOD = HYBRID_USS |
39 | 39 |
|
40 | | -# |
41 | | -# TRAINING_MODE |
42 | | -# |
43 | | - |
44 | 40 | """ |
| 41 | +
|
| 42 | +The chunk_cache table is a record of the memory usage and observed row_size required for chunking the various models. |
| 43 | +The row size differs depending on whether memory usage is calculated by rss, uss, or explicitly allocated bytes. |
| 44 | +We record all three during training so the mode can be changed without necessitating retraining. |
| 45 | +
|
| 46 | +tag, num_rows, rss, uss, bytes, uss_row_size, hybrid_uss_row_size, bytes_row_size |
| 47 | +atwork_subtour_frequency.simple, 3498, 86016, 81920, 811536, 24, 232, 232 |
| 48 | +atwork_subtour_mode_choice.simple, 704, 20480, 20480, 1796608, 30, 2552, 2552 |
| 49 | +atwork_subtour_scheduling.tour_1, 701, 24576, 24576, 45294082, 36, 64614, 64614 |
| 50 | +atwork_subtour_scheduling.tour_n, 3, 20480, 20480, 97734, 6827, 32578, 32578 |
| 51 | +auto_ownership_simulate.simulate, 5000, 77824, 24576, 1400000, 5, 280, 280 |
| 52 | +
|
45 | 53 | MODE_RETRAIN |
| 54 | + rebuild chunk_cache table and save/replace in output/cache/chunk_cache.csv |
| 55 | + preforms a complete rebuild of chunk_cache table by doing adaptive chunking starting with based on default initial |
| 56 | + settings (DEFAULT_INITIAL_ROWS_PER_CHUNK) and observing rss, uss, and allocated bytes to compute rows_size. |
| 57 | + This will run somewhat slower than the other modes because of overhead of small first chunk, and possible |
| 58 | + instability in the second chunk due to inaccuracies caused by small initial chunk_size sample |
46 | 59 |
|
47 | 60 | MODE_ADAPTIVE |
| 61 | + Use the existing chunk_cache to determine the sizing for the first chunk for each model, but also use the |
| 62 | + observed row_size to adjust the estimated row_size for subsequent chunks. At the end of hte run, writes the |
| 63 | + updated chunk_cache to the output directory, but doesn't overwrite the 'official' cache file. If the user wishes |
| 64 | + they can replace the chunk_cache with the updated versions but this is not done automatically as it is not clear |
| 65 | + this would be the desired behavior. (Might become clearer over time as this is exercised further.) |
48 | 66 |
|
49 | 67 | MODE_PRODUCTION |
50 | | - since overhead changes we don't necessarily want the same number of rows per chunk every time |
| 68 | + Since overhead changes we don't necessarily want the same number of rows per chunk every time |
51 | 69 | but we do use the row_size from cache which we trust is stable |
52 | 70 | (the whole point of MODE_PRODUCTION is to avoid the cost of observing overhead) |
53 | 71 | which is stored in self.initial_row_size because initial_rows_per_chunk used it for the first chunk |
|
0 commit comments