add run_batch script formatted for CPG#179
Conversation
|
I don't see any reason not to clean up run_batch_general, but if the major distinction is just path structure, any major reason to not to just have one script that takes a parameter for CPG? (Ie have the path structures in a dictionary with keys for Tbh, I've been longing for ages to make this a CLI with flags rather than the commenting and uncommenting nonsense, if you agree that it's reasonable to do one file here, this would be a good excuse to do that work too. What do you think? Could do it together next week? |
|
(or do it yourself, of course, just if it makes it more fun) |
|
No reason not to combine them. I made this version for a project I was working on and then thought it would be nice to polish and add to the repo, but the extra work to combine (and make CLI) seems worth it! |
bethac07
left a comment
There was a problem hiding this comment.
Make sure the parser defaults match the function defaults, otherwise, very nice work!
run_batch_general.py
Outdated
| parser.add_argument( | ||
| "--plate-format", | ||
| dest="plate_format", | ||
| default=384, |
There was a problem hiding this comment.
Make sure that here and elsewhere (ie rows and columns), the default here matches the function default. Otherwise, if the user DOESN'T pass --plate-format, it will default to 384, so even if they passed rows, it would overwrite them with the 384 plate format.
run_batch_general.py
Outdated
| help="Name of the pipeline to overwrite defaults of Zproj.cppipe, illum.cppipe, qc.cppipe, assaydev.cppipe, analysis.cppipe.", | ||
| ) | ||
| parser.add_argument( | ||
| "--outputstructure", |
There was a problem hiding this comment.
Minor preference for this to be output-structure, but I don't feel strongly about it if you'd rather not
|
@bethac07 I think this now has the cleanup requested. |
run_batch_general.py
Outdated
| parser.add_argument( | ||
| "--source", | ||
| dest="source", | ||
| default="source_4", |
There was a problem hiding this comment.
Default doesn't match function default ("" vs "source_4")
bethac07
left a comment
There was a problem hiding this comment.
Super close! Just a couple last things to think about, sorry!
|
|
||
| * `step` is the step that you would like to make jobs for. | ||
| Supported steps are `zproj`, `illum`, `qc`, `qc_persite`, `assaydev`, and`analysis` | ||
| * `path_style` is the style of the input and output paths. |
There was a problem hiding this comment.
What do you think about making this an optional parameter, with default as default? I know you personally do a lot of CPG processing but I don't know that I think that's going to be a general-enough phenomenon to want to make this flag mandatory on every run.
There was a problem hiding this comment.
(But I want to know what you think!)
There was a problem hiding this comment.
I guess it depends on what we think of as the purpose of run_batch_general.py. Is it our internal resource that we go ahead and make public or is it a public resource that we use a lot internally? If the former then default as default makes sense, but I guess I was thinking of it as the latter, especially as we've been both curating tools that access the CPG and pushing as much data to CPG as we can - in that case cpg as default (or keeping it as a required parameter) makes the most sense.
| * `path_style` is the style of the input and output paths. | ||
| Supported options are `default` or `cpg` (for Cell Painting Gallery structure). | ||
| All paths can be overwritten with flags (see below). | ||
| * `identifier` is the project identifier (e.g. "cpg0000-jump-pilot") |
There was a problem hiding this comment.
Can we have a non-cpg example also, please-and-thanks?
| * `identifier` is the project identifier (e.g. "cpg0000-jump-pilot") | ||
| * `batch` is the name of the data batch (e.g. "2020_11_04_CPJUMP1") | ||
| * `platelist` is the list of plates to process. | ||
| Format the list in quotes with individual plates separated by commas (e.g. "Plate1,Plate2,Plate3") |
There was a problem hiding this comment.
probably want to also specify commas without spaces
|
|
||
| ### Required input for Cell Painting Gallery | ||
|
|
||
| Runs being made off of the Cell Painting Gallery require the additional flag of `-- source <value>` to specify the identifier-specific source of the data. |
| * `--plate-format <value>`: if used, can be `96` or `384` and will overwrite `rows` and `columns` to produce standard 96- or 384-well plate well names (e.g. A01, A02, etc.) | ||
| * `--rows <value>`: a custom list of row labels. | ||
| Will be combined with `columns` to generate well names. | ||
| Separate values with commas and surround with quotation marks (e.g. `"A,B,C,D,E,F,G"`) |
There was a problem hiding this comment.
As above, probably best to specify commas without spaces (here and below)
Adds a run_batch script with formatting that matches data organization in the Cell Painting Gallery.
Uses run_batch_general.py as template with updating, cleanup, and changed file structure
@bethac07 is there any reason that I shouldn't perform similar cleanup on run_batch_general.py: