Skip to content

[rllib] example and docs on how to use parametric actions with DQN / PG algorithms#3384

Merged
ericl merged 22 commits into
ray-project:masterfrom
ericl:pa-model
Nov 28, 2018
Merged

[rllib] example and docs on how to use parametric actions with DQN / PG algorithms#3384
ericl merged 22 commits into
ray-project:masterfrom
ericl:pa-model

Conversation

@ericl

@ericl ericl commented Nov 22, 2018

Copy link
Copy Markdown
Contributor

What do these changes do?

Add examples of how to work with parametric action spaces (e.g., OpenAI 5 style).

Related issue number

Closes #3364

cc @zegerhoogeboom


def transform(self, observation):
if not isinstance(observation, OrderedDict):
observation = OrderedDict(sorted(list(observation.items())))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops this is kind of an important bug fix

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, note that that is a check against the space spec, this is sorting the observation dict.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9538/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9536/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9539/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9540/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9541/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9544/
Test FAILed.

@ericl ericl changed the title [rllib] example and docs on how to use parametric actions with pg algorithms [rllib] example and docs on how to use parametric actions with DQN / PG algorithms Nov 22, 2018
@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9546/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9548/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9549/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9550/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9551/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9552/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9554/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9555/
Test FAILed.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9556/
Test FAILed.

@ericl ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Nov 23, 2018
@zegerhoogeboom

zegerhoogeboom commented Nov 23, 2018

Copy link
Copy Markdown

Very sorry for my wrong previous comment, the issue was simply that I didn't install Ray from your branch. Both DQN and the PPO are working beautifully with the masking. I'm not using the action embeddings, but those are at least working in the example you created.

@ericl

ericl commented Nov 23, 2018

Copy link
Copy Markdown
Contributor Author

That should be fixed in the latest update -- try pulling.

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9576/
Test FAILed.

@ericl

ericl commented Nov 27, 2018

Copy link
Copy Markdown
Contributor Author

Ping @richardliaw

@richardliaw richardliaw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tf.boolean_mask(x, tf.logical_not(tf.is_inf(x))) could be cleaner (for future ref)


def transform(self, observation):
if not isinstance(observation, OrderedDict):
observation = OrderedDict(sorted(list(observation.items())))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericl ericl left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is_inf won't work is because inf is numerically unstable, so we use tf.float32.min instead

@ericl ericl merged commit f0df97d into ray-project:master Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests-ok The tagger certifies test failures are unrelated and assumes personal liability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants