Skip to content

Goal conditioning grid world : Example of goal conditioning#5193

Merged
vincentpierre merged 10 commits into
mainfrom
goal-conditioning-grid-world-3
Mar 31, 2021
Merged

Goal conditioning grid world : Example of goal conditioning#5193
vincentpierre merged 10 commits into
mainfrom
goal-conditioning-grid-world-3

Conversation

@vincentpierre

Copy link
Copy Markdown
Contributor

Proposed change(s)

Making GridWorld use the new goal conditioning.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@vincentpierre vincentpierre self-assigned this Mar 29, 2021
@vincentpierre vincentpierre marked this pull request as ready for review March 29, 2021 20:53
@vincentpierre vincentpierre changed the title Goal conditioning grid world 3 Goal conditioning grid world : Example of goal conditioning Mar 29, 2021

public override void CollectObservations(VectorSensor sensor)
{
Array values = Enum.GetValues(typeof(GridGoal));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen somewhere else? It feels like abuse of CollectObservations(), since it's not touching the input VectorSensor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorSensor is null here, I do not see an issue with this. Goal Signal is an observation, so it makes sense to me that it is called in CollectObservation.
Would it be better if I put this logic into a CollectGoal method with no arguments that I call in CollectObservations ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectGoal is maybe for the example (but let's not add it Agent). Let me think about a better way.

One problem (which I didn't realize until now) is that we don't check for null CollectObservationsSensor during the normal update step:

CollectObservations(collectObservationsSensor);

but we do check for null when the agent is done:
if (collectObservationsSensor != null)
{
// Make sure the latest observations are being passed to training.
collectObservationsSensor.Reset();
using (m_CollectObservationsChecker.Start())
{
CollectObservations(collectObservationsSensor);
}
}

if (hit.Where(col => col.gameObject.CompareTag("goal")).ToArray().Length == 1)
{
SetReward(1f);
ProvideReward(GridGoal.Plus);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty confusing since the "goal" tag doesn't really mean that it's the goal anymore. Can you change them to e.g. "plus" and "ex"?

Or maybe this would be a good opportunity to stop using physics collision checks, and change the example to use a 2D array of enums? That would probably speed up training too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the tags to plus and ex. I think making the grid a 2D array of enums is a good idea, but out of scope for this.

@chriselion

Copy link
Copy Markdown
Contributor

(sorry, can't comment inline for the file removal). gridworld.png is still being referenced:

$ git grep gridworld.png 
docs/Learning-Environment-Design-Agents.md:![Agent RenderTexture Debug](images/gridworld.png)
docs/Learning-Environment-Examples.md:![GridWorld](images/gridworld.png)

(and that should have failed the link checker)

Comment thread config/ppo/GridWorld.yaml Outdated
@vincentpierre

Copy link
Copy Markdown
Contributor Author

gridworld.png is still being referenced

gridworld.png is still there (It is only smaller)

@ervteng

ervteng commented Mar 29, 2021

Copy link
Copy Markdown
Contributor

Might not be related to this PR, but should we add a warning in the docs about using hypernetworks for larger hidden_units values? We might even be able to auto-detect it in settings.py, e.g. if the resulting model will be bigger than 50mb print a warning

@vincentpierre

Copy link
Copy Markdown
Contributor Author

Might not be related to this PR, but should we add a warning in the docs about using hypernetworks for larger hidden_units values? We might even be able to auto-detect it in settings.py, e.g. if the resulting model will be bigger than 50mb print a warning

There is this line in the documentation:

If set to `hyper` (default) a [HyperNetwork](https://arxiv.org/pdf/1609.09106.pdf)
will be used to generate some of the
weights of the policy using the goal observations as input. Note that using a
HyperNetwork requires a lot of computations, it is recommended to use a smaller
number of hidden units in the policy to alleviate this.

I am hesitant to throw a warning if the model is going to be large because we never know what the user has in mind...

Comment thread docs/Learning-Environment-Examples.md Outdated
@@ -82,16 +82,16 @@ you would like to contribute environments, please see our

![GridWorld](images/gridworld.png)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible to link to this environment in the goal signal docs and the Changelog? Just in case a user wants an example of how to use these features

@vincentpierre vincentpierre merged commit 92ff2c2 into main Mar 31, 2021
@delete-merged-branch delete-merged-branch Bot deleted the goal-conditioning-grid-world-3 branch March 31, 2021 22:17
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Apr 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants