Return deterministic actions #5597
Conversation
miguelalonsojr
left a comment
There was a problem hiding this comment.
Please see CR feedback.
| # During training, clipping is done in TorchPolicy, but we need to clip before ONNX | ||
| # export as well. | ||
| self._clip_action_on_export = not tanh_squash | ||
| self.deterministic = deterministic |
There was a problem hiding this comment.
Unless it's going to be used outside of the ActionModel class, refactor to make it
self._deterministic.
Co-authored-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
Co-authored-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
Co-authored-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
maryamhonari
left a comment
There was a problem hiding this comment.
looks great! Some nit changes on docs and I believe the setting test for yaml is not set right and I wonder how it's passing CI.
| action_spec: ActionSpec, | ||
| conditional_sigma: bool = False, | ||
| tanh_squash: bool = False, | ||
| deterministic: bool = False, |
There was a problem hiding this comment.
please update the docstring
| test1_settings = run_options.behaviors["test1"] | ||
| assert test1_settings.max_steps == 2 | ||
| assert test1_settings.network_settings.hidden_units == 2000 | ||
| assert not test1_settings.network_settings.deterministic |
There was a problem hiding this comment.
nit: can we use == True just for readability
| agent_action1 = action_model._sample_action(dists) | ||
| agent_action2 = action_model._sample_action(dists) | ||
| agent_action3 = action_model._sample_action(dists) | ||
| assert torch.equal(agent_action1.continuous_tensor, agent_action2.continuous_tensor) |
There was a problem hiding this comment.
some tests on discrete actions would be great!
| test1_settings = run_options.behaviors["test1"] | ||
| assert test1_settings.max_steps == 2 | ||
| assert test1_settings.network_settings.hidden_units == 2000 | ||
| assert not test1_settings.network_settings.deterministic |
There was a problem hiding this comment.
IIUC this test is wrong. Above we set deterministic: true in yaml, so it should be assert test1_settings.network_settings.deterministic == True, right
?
Co-authored-by: Maryam Honari <honari.m94@gmail.com>
Co-authored-by: Maryam Honari <honari.m94@gmail.com>
Co-authored-by: Maryam Honari <honari.m94@gmail.com>
…y-Technologies/ml-agents into develop-staging-determinstic-action
Co-authored-by: Maryam Honari <honari.m94@gmail.com>
Proposed change(s)
This PR will add an ability to retrieve actions deterministically based on the input to the model. A new run-options configuration has been added as well as a new CLI flag
--deterministic.Types of change(s)
Checklist
Other comments