[docs] Documentation for POCA and cooperative behaviors#5056
Conversation
Integrate into CC
| * Agents within groups should always set the `Max Steps` parameter the Agent script to 0, meaning | ||
| they will never reach a max step. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire |
There was a problem hiding this comment.
| * Agents within groups should always set the `Max Steps` parameter the Agent script to 0, meaning | |
| they will never reach a max step. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire | |
| * Agents within groups should always set the `Max Steps` parameter in the Agent script to 0. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire |
| makes learning what to do as an individual difficult - you may get a win | ||
| for doing nothing, and a loss for doing your best. | ||
|
|
||
| In ML-Agents, we provide MA-POCA (MultiAgent POsthumous Credit Assignment), which |
There was a problem hiding this comment.
Should we say "paper coming soon" or something?
There was a problem hiding this comment.
I think it is fine to not say anything. Although I am worried someone will coin the name.
| } | ||
|
|
||
| // if the team scores a goal | ||
| m_AgentGroup.AddGroupReward(score); |
There was a problem hiding this comment.
| m_AgentGroup.AddGroupReward(score); | |
| m_AgentGroup.AddGroupReward(rewardForGoal); |
| ResetScene(); | ||
| ``` | ||
|
|
||
| Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train |
There was a problem hiding this comment.
| Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train | |
| Multi Agent Groups can only be trained with the MA-POCA trainer, which is explicitly designed to train |
There was a problem hiding this comment.
Hmm, this isn't exactly true - Multi Agent Groups will run and try to train with PPO but their behaviors won't be very collaborative. I changed it to the stronger-but-not-as-hard "should be trained with".
| ``` | ||
|
|
||
| Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train | ||
| cooperative environments. This can be enabled by using the `coma` trainer - see the |
There was a problem hiding this comment.
| cooperative environments. This can be enabled by using the `coma` trainer - see the | |
| cooperative environments. This can be enabled by using the `poca` trainer - see the |
| Team Id. If this playing field is duplicated many times in the Scene (e.g. for training | ||
| speedup), there should be two Groups _per playing field_, and two unique Team Ids | ||
| _for the entire Scene_. In environments with both Groups and Team Ids configured, MA-POCA and | ||
| self-play can be used together for training. |
There was a problem hiding this comment.
Maybe a little image will help?
There was a problem hiding this comment.
Added a small diagram of the difference
| For an example of how to set up cooperative environments, see the | ||
| [Cooperative PushBlock](Learning-Environment-Examples.md#cooperative-push-block) and | ||
| [Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) example environments. |
There was a problem hiding this comment.
| For an example of how to set up cooperative environments, see the | |
| [Cooperative PushBlock](Learning-Environment-Examples.md#cooperative-push-block) and | |
| [Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) example environments. |
Remove until the environments are actually merged.
| * If an Agent finished earlier, e.g. completed tasks/be removed/be killed in the game, do not call | ||
| `EndEpisode()` on the Agent. Instead, disable the Agent and re-enable it when the next episode starts, | ||
| or destroy the agent entirely. |
There was a problem hiding this comment.
Give an explanation:
"This is because calling EndEpisode will call OnEpisodeBegin, hence resetting the Agent immediately. This is usually not the desired behavior when training a group of Agents."
It is possible to call EndEpisode it just will most likely not be what the user expects.
There was a problem hiding this comment.
Added this explanation 👍
| `EndEpisode()` on the Agent. Instead, disable the Agent and re-enable it when the next episode starts, | ||
| or destroy the agent entirely. | ||
|
|
||
| * If an Agent is disabled in a scene, it must be re-registered to the MultiAgentGroup. |
There was a problem hiding this comment.
disabled or destroyed right ?
There was a problem hiding this comment.
destroyed = gone. no way it can be re-registered right?
There was a problem hiding this comment.
Then I would say, if a previously disabled agent is re-enabled it must be re-registered
|
|
||
| * Group rewards are meant to reinforce agents to act in the group's best interest instead of | ||
| individual ones, and are treated differently than individual agent rewards during | ||
| training. So calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent |
There was a problem hiding this comment.
| training. So calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent | |
| training. So calling `AddGroupReward()` is not equivalent to calling `agent.AddReward()` on each agent |
| - Recommended Minimum: 1 | ||
| - Recommended Maximum: 20 | ||
| - Benchmark Mean Reward: Depends on the number of tiles. | ||
|
|
There was a problem hiding this comment.
Add these along side the environment addition PRs
There was a problem hiding this comment.
Removed and moved to the environment PR
| makes learning what to do as an individual difficult - you may get a win | ||
| for doing nothing, and a loss for doing your best. | ||
|
|
||
| In ML-Agents, we provide MA-POCA (MultiAgent POsthumous Credit Assignment), which |
There was a problem hiding this comment.
I think it is fine to not say anything. Although I am worried someone will coin the name.
| Cooperative behavior in ML-Agents can be enabled by instantiating a `SimpleMultiAgentGroup`, | ||
| typically in an environment controller or similar script, and adding agents to it | ||
| using the `RegisterAgent(Agent agent)` method. Using `MultiAgentGroup` enables the | ||
| using the `RegisterAgent(Agent agent)` method. Note that all agents added to the same `MultiAgentGroup` |
There was a problem hiding this comment.
I think we only have SimpleMultiAgentGroup and IMultiAgentGroup, no MultiAgentGroup. (To verify)
There was a problem hiding this comment.
Confirmed and fixed
Proposed change(s)
Documentation for COMA2 and MultiAgentGroup.
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
Types of change(s)
Checklist
Other comments