[docs] Documentation for POCA and cooperative behaviors by ervteng · Pull Request #5056 · Unity-Technologies/ml-agents

ervteng · 2021-03-08T22:50:54Z

Proposed change(s)

Documentation for COMA2 and MultiAgentGroup.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

…tic-mm

Integrate into CC

andrewcoh · 2021-03-11T20:04:50Z

+* Agents within groups should always set the `Max Steps` parameter the Agent script to 0, meaning
+they will never reach a max step. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire


Suggested change

* Agents within groups should always set the `Max Steps` parameter the Agent script to 0, meaning

they will never reach a max step. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire

* Agents within groups should always set the `Max Steps` parameter in the Agent script to 0. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire

andrewcoh · 2021-03-11T20:06:59Z

+makes learning what to do as an individual difficult - you may get a win
+for doing nothing, and a loss for doing your best.
+
+In ML-Agents, we provide MA-POCA (MultiAgent POsthumous Credit Assignment), which


Should we say "paper coming soon" or something?

I think it is fine to not say anything. Although I am worried someone will coin the name.

vincentpierre · 2021-03-11T20:30:02Z

+}
+
+// if the team scores a goal
+m_AgentGroup.AddGroupReward(score);


Suggested change

m_AgentGroup.AddGroupReward(score);

m_AgentGroup.AddGroupReward(rewardForGoal);

vincentpierre · 2021-03-11T20:30:44Z

+ResetScene();
+```
+
+Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train


Suggested change

Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train

Multi Agent Groups can only be trained with the MA-POCA trainer, which is explicitly designed to train

Hmm, this isn't exactly true - Multi Agent Groups will run and try to train with PPO but their behaviors won't be very collaborative. I changed it to the stronger-but-not-as-hard "should be trained with".

vincentpierre · 2021-03-11T20:30:56Z

+```
+
+Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train
+cooperative environments. This can be enabled by using the `coma` trainer - see the


Suggested change

cooperative environments. This can be enabled by using the `coma` trainer - see the

cooperative environments. This can be enabled by using the `poca` trainer - see the

vincentpierre · 2021-03-11T20:32:13Z

+Team Id. If this playing field is duplicated many times in the Scene (e.g. for training
+speedup), there should be two Groups _per playing field_, and two unique Team Ids
+_for the entire Scene_. In environments with both Groups and Team Ids configured, MA-POCA and
+self-play can be used together for training.


Maybe a little image will help?

Added a small diagram of the difference

vincentpierre · 2021-03-11T20:32:48Z

+For an example of how to set up cooperative environments, see the
+[Cooperative PushBlock](Learning-Environment-Examples.md#cooperative-push-block) and
+[Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) example environments.


Suggested change

For an example of how to set up cooperative environments, see the

[Cooperative PushBlock](Learning-Environment-Examples.md#cooperative-push-block) and

[Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) example environments.

Remove until the environments are actually merged.

vincentpierre · 2021-03-11T20:37:46Z

+* If an Agent finished earlier, e.g. completed tasks/be removed/be killed in the game, do not call
+`EndEpisode()` on the Agent. Instead, disable the Agent and re-enable it when the next episode starts,
+or destroy the agent entirely.


Give an explanation:
"This is because calling EndEpisode will call OnEpisodeBegin, hence resetting the Agent immediately. This is usually not the desired behavior when training a group of Agents."
It is possible to call EndEpisode it just will most likely not be what the user expects.

Added this explanation 👍

vincentpierre · 2021-03-11T20:39:15Z

+`EndEpisode()` on the Agent. Instead, disable the Agent and re-enable it when the next episode starts,
+or destroy the agent entirely.
+
+* If an Agent is disabled in a scene, it must be re-registered to the MultiAgentGroup.


disabled or destroyed right ?

destroyed = gone. no way it can be re-registered right?

Then I would say, if a previously disabled agent is re-enabled it must be re-registered

vincentpierre · 2021-03-11T20:39:33Z

+
+* Group rewards are meant to reinforce agents to act in the group's best interest instead of
+individual ones, and are treated differently than individual agent rewards during
+training. So calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent


Suggested change

training. So calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent

training. So calling `AddGroupReward()` is not equivalent to calling `agent.AddReward()` on each agent

vincentpierre · 2021-03-11T20:40:05Z

      - Recommended Minimum: 1
      - Recommended Maximum: 20
  - Benchmark Mean Reward: Depends on the number of tiles.
+


Add these along side the environment addition PRs

Removed and moved to the environment PR

vincentpierre · 2021-03-11T20:41:11Z

+makes learning what to do as an individual difficult - you may get a win
+for doing nothing, and a loss for doing your best.
+
+In ML-Agents, we provide MA-POCA (MultiAgent POsthumous Credit Assignment), which


I think it is fine to not say anything. Although I am worried someone will coin the name.

vincentpierre · 2021-03-11T22:41:51Z

 Cooperative behavior in ML-Agents can be enabled by instantiating a `SimpleMultiAgentGroup`,
 typically in an environment controller or similar script, and adding agents to it
-using the `RegisterAgent(Agent agent)` method. Using `MultiAgentGroup` enables the
+using the `RegisterAgent(Agent agent)` method. Note that all agents added to the same `MultiAgentGroup`


I think we only have SimpleMultiAgentGroup and IMultiAgentGroup, no MultiAgentGroup. (To verify)

Confirmed and fixed

Ervin Teng and others added 30 commits December 21, 2020 15:34

Make comms one-hot

d2e315d

Fix S tag

5cf76e3

Merge branch 'master' into develop-centralizedcritic-mm

8708f70

Additional changes

44fb8b5

Some more fixes

56f9dbf

Self-attention Centralized Critic

a468075

separate entity encoder and RSA

db184d9

clean up args in mha

32cbdee

more cleanups

c90472c

fixed tests

d429b53

Merge branch 'develop-attention-refactor' into develop-centralizedcri…

44093f2

…tic-mm

Merge branch 'develop-attention-refactor' into develop-centralizedcri…

1dc0059

…tic-mm

entity embeddings work with no max

2b5b994

Integrate into CC

remove group id

cd84fe3

very rough sketch for TeamManager interface

eed2fce

One layer for entity embed

fe41094

Use 4 heads

3822b18

add defaults to linear encoder, initialize ent encoders

3f4b2b5

Merge branch 'master' into develop-centralizedcritic-mm

c7c7d4c

Merge branch 'develop-lin-enc-def' into develop-centralizedcritic-mm

f391b35

add team manager id to proto

f706a91

team manager for hallway

cee5466

add manager to hallway

195978c

send and process team manager id

10f336e

remove print

f0bf657

Merge branch 'develop-centralizedcritic-mm' into develop-cc-teammanager

e03c79e

small cleanup

1118089

default behavior for baseTeamManager

13a90b1

add back statsrecorder

36d1b5b

update

376d500

Ervin T added 7 commits March 10, 2021 21:00

Move common loss functions for PPO and POCA (#5079)

20c8759

Turn on the SimpleMultiAgentGroup

2ed7f46

Add dungeon escape screenshot

bb04d14

[poca] Remove add_groupmate_rewards from settings (#5082)

8511f9f

Merge branch 'main' into develop-coma2-trainer

445c1f0

Untrack PB Collab Config

f98c615

Update comment and fix reporting of group dones

65af6ff

andrewcoh reviewed Mar 11, 2021

View reviewed changes

andrewcoh approved these changes Mar 11, 2021

View reviewed changes

Merge branch 'develop-coma2-trainer' into develop-coma2-docs

adbe1b2

vincentpierre reviewed Mar 11, 2021

View reviewed changes

Ervin Teng added 6 commits March 11, 2021 15:43

Address comments

4c0986d

Fix coma reference

3ed9702

Remove mention of envs

4134d54

Add diagram, correct capitalizations

a500410

correct some more capitalizations

ea7914e

Address some comments

d972351

dongruoping approved these changes Mar 11, 2021

View reviewed changes

vincentpierre reviewed Mar 11, 2021

View reviewed changes

Remove dungeon escape

ed462aa

vincentpierre approved these changes Mar 11, 2021

View reviewed changes

Ervin Teng added 2 commits March 11, 2021 19:24

Clean up docs a bit

92ad505

Fix references to MultiAgentGroup

12fdc1d

delete-merged-branch Bot deleted the branch main March 12, 2021 01:48

Merge branch 'main' into develop-coma2-docs

0ff8ac8

ervteng changed the base branch from develop-coma2-trainer to main March 12, 2021 02:04

ervteng merged commit 847d723 into main Mar 12, 2021

delete-merged-branch Bot deleted the develop-coma2-docs branch March 12, 2021 02:05

github-actions Bot locked as resolved and limited conversation to collaborators Mar 12, 2022

		* Agents within groups should always set the `Max Steps` parameter the Agent script to 0, meaning
		they will never reach a max step. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire

	m_AgentGroup.AddGroupReward(score);
	m_AgentGroup.AddGroupReward(rewardForGoal);

	Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train
	Multi Agent Groups can only be trained with the MA-POCA trainer, which is explicitly designed to train

	cooperative environments. This can be enabled by using the `coma` trainer - see the
	cooperative environments. This can be enabled by using the `poca` trainer - see the

	For an example of how to set up cooperative environments, see the
	[Cooperative PushBlock](Learning-Environment-Examples.md#cooperative-push-block) and
	[Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) example environments.

	training. So calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent
	training. So calling `AddGroupReward()` is not equivalent to calling `agent.AddReward()` on each agent

Conversation

ervteng commented Mar 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

Uh oh!

andrewcoh Mar 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ervteng commented Mar 8, 2021 •

edited

Loading

andrewcoh Mar 11, 2021 •

edited

Loading