Add networksettings to reward providers by andrewcoh · Pull Request #4982 · Unity-Technologies/ml-agents

andrewcoh · 2021-02-19T20:00:56Z

Proposed change(s)

Describe the changes made in this PR.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

andrewcoh · 2021-02-19T20:05:16Z

            enum_key = RewardSignalType(key)
            t = enum_key.to_settings()
            d_final[enum_key] = strict_to_cls(val, t)
+            if "encoding_size" in val:


Backward compatible with old configs

Can you ad a comment around this code so we will remember it ?

ervteng · 2021-02-19T20:12:15Z

cc @hvpeteet, @sini - this will introduce a (backwards-compatible) change for the YAMLs - just checking to make sure it won't cause any issues.

vincentpierre

Need to edit the documentation before merging.

vincentpierre · 2021-02-19T21:23:04Z

    gamma: float = 0.99
    strength: float = 1.0
-    normalize: bool = False
+    network_settings: NetworkSettings = attr.ib(factory=NetworkSettings)


I would make this one optional and if it is None, then use the Policy's network settings rather than our own defaults. How does that sound ?

I think I'd prefer to use our defaults since it's possible the policy has significantly more capacity than is needed i.e. the Crawler policy of 3/512 vs what we use for the discriminator 2/128. That being said, I also realize this enables users to specify memory which we probably want to explicitly prevent in the reward providers. cc @ervteng

Not opposed to either route, they have their own pros/cons. Either way as long as it's documented it should be fine.
Is getting the Policy settings super ugly?

Im not sure how future proof it is for multi-agent scenarios. We could have different policies to select from. Additionally, we currently create reward signals in the optimizer/torch_optimizer.py and in the future i think it will be necessary to remove the policy from the optimizer (also for multiagent) in which case this would need to be addressed by either keeping the policy around/moving the creation of the reward provider. My vote is for default network settings

vincentpierre · 2021-02-19T21:23:40Z

            enum_key = RewardSignalType(key)
            t = enum_key.to_settings()
            d_final[enum_key] = strict_to_cls(val, t)
+            if "encoding_size" in val:


Can you ad a comment around this code so we will remember it ?

hvpeteet

Looks good to me, we don't use these fields for anything cloud specific.

add network settings to reward providers

b68fbdc

andrewcoh changed the base branch from master to fix-gail February 19, 2021 20:01

andrewcoh requested review from ervteng and vincentpierre and removed request for ervteng February 19, 2021 20:01

andrewcoh commented Feb 19, 2021

View reviewed changes

Comment thread config/imitation/Hallway.yaml

update pyramids rnd config

4f83bbf

andrewcoh commented Feb 19, 2021

View reviewed changes

Comment thread ml-agents/mlagents/trainers/settings.py Outdated

remove print statement

00eb2b2

ervteng reviewed Feb 19, 2021

View reviewed changes

Comment thread ml-agents/mlagents/trainers/torch/components/reward_providers/gail_reward_provider.py Outdated

fix vail tests

d971b14

andrewcoh requested a review from hvpeteet February 19, 2021 20:16

andrewcoh mentioned this pull request Feb 19, 2021

Set ignore done=False in GAIL #4971

Merged

10 tasks

vincentpierre approved these changes Feb 19, 2021

View reviewed changes

hvpeteet approved these changes Feb 19, 2021

View reviewed changes

andrewcoh added 4 commits February 20, 2021 08:52

add comment to RewardSignal structure

55ebc79

set default encoding size to optional int

fbb3ebe

update documentations in training config

cc72294

update changelog

22db076

ervteng approved these changes Feb 22, 2021

View reviewed changes

raise warning if memory specified in reward providers

b43e20c

andrewcoh merged commit 5fcbbc4 into fix-gail Feb 22, 2021

delete-merged-branch Bot deleted the fix-gail-networksettings branch February 22, 2021 21:21

github-actions Bot locked as resolved and limited conversation to collaborators Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add networksettings to reward providers#4982

Add networksettings to reward providers#4982
andrewcoh merged 9 commits into
fix-gailfrom
fix-gail-networksettings

andrewcoh commented Feb 19, 2021

Uh oh!

Uh oh!

andrewcoh Feb 19, 2021

Uh oh!

vincentpierre Feb 19, 2021

Uh oh!

Uh oh!

Uh oh!

ervteng commented Feb 19, 2021

Uh oh!

vincentpierre left a comment

Uh oh!

vincentpierre Feb 19, 2021

Uh oh!

andrewcoh Feb 20, 2021

Uh oh!

ervteng Feb 22, 2021

Uh oh!

andrewcoh Feb 22, 2021

Uh oh!

vincentpierre Feb 19, 2021

Uh oh!

hvpeteet left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

andrewcoh commented Feb 19, 2021

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

Uh oh!

Uh oh!

andrewcoh Feb 19, 2021

Choose a reason for hiding this comment

Uh oh!

vincentpierre Feb 19, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ervteng commented Feb 19, 2021

Uh oh!

vincentpierre left a comment

Choose a reason for hiding this comment

Uh oh!

vincentpierre Feb 19, 2021

Choose a reason for hiding this comment

Uh oh!

andrewcoh Feb 20, 2021

Choose a reason for hiding this comment

Uh oh!

ervteng Feb 22, 2021

Choose a reason for hiding this comment

Uh oh!

andrewcoh Feb 22, 2021

Choose a reason for hiding this comment

Uh oh!

vincentpierre Feb 19, 2021

Choose a reason for hiding this comment

Uh oh!

hvpeteet left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants