-
-
Notifications
You must be signed in to change notification settings - Fork 215
[ENH] V1 → V2 API Migration - Tasks #1611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ported functions to APIv1
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1611 +/- ##
==========================================
+ Coverage 52.04% 54.16% +2.12%
==========================================
Files 36 63 +27
Lines 4333 5092 +759
==========================================
+ Hits 2255 2758 +503
- Misses 2078 2334 +256 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geetu040
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a high-level review, I noticed a few points that need adjustment:
- Caching can likely be removed from the SDK, since these concerns should be handled by the base client.
- I don't see the
api_contextbeing used intasks/functions, so it's not clear to me how the SDK is actually using the new API interface here. - Instead of moving entire methods out of
tasks/functions.py, it would be better to stick to the goal of minimal SDK changes while enabling v2 support. - API calls should be updated at the specific root functions (for example
_get_task_description,OpenMLTask._download_split). - For listing tasks, please follow the approach discussed in #1575 comment.
for more information, see https://pre-commit.ci
geetu040
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some comments, please take a look and make sure the signature of all methods in TasksAPI, TasksV1 and TasksV2 stay same.
for more information, see https://pre-commit.ci
- removing this since it was not part of the sdk previously - some tests fail because of the timeout in stacked PRs - this option can easily be added if needed in future
|
|
||
| return pd.DataFrame.from_dict(tasks, orient="index") | ||
|
|
||
| def _get_estimation_procedure_list(self) -> builtins.list[dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geetu040 We already have implemented this function in the estimation_procedure resource, so I think we don't need it to be here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked, once we merge your PR we can replace it with the _get_details function yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the plan
openml/_api/resources/task.py
Outdated
| task_type: TaskType | int | None = None, | ||
| **kwargs: Any, | ||
| ) -> pd.DataFrame: | ||
| raise NotImplementedError(self._not_supported(method="list")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use self._not_supported(method="list")
no need for raise NotImplementedError()
| assert isinstance(procs, list) | ||
| assert len(procs) > 0 | ||
| assert "id" in procs[0] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to add TestTasksV2 class, and also add v1 and v2 matching tests and fallback tests as mentioned here: #1575 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont need to add TestTasksV2 given those tests are a subset of TestTasksV1. Matching tests are in the shared class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you still need TestTasksV2 and check if they raise the not implemented error
i think you should use this design then, since there are many common utilities
tests/test_api/test_versions.py
| task_type: TaskType | int | None = None, | ||
| **kwargs: Any, | ||
| ) -> pd.DataFrame: | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geetu040
Should we all use the same style or not?
for me, I have splitted this this into 3 functions for more readability and less coupling
- list()
- _build_url()
- _parse_list_xml()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need private functions to match each other in terms of nomenclature or functionality as long as we share the same common function names (which we do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nomenclature is preferred
I would say you should do this, will make it better anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, but I don't necessarily agree. Something similar happened in another project where we were working on 2 algorithms and I suggested something similar but Franz mentioned it is not something that can consistently be maintained across the entire codebase hence does not need change. Given we have 8-9 resources it does not feel fruitful to have people share the same nomenclature for private functions across all PRs, I don't have a strong opinion on this though and I will be making the requested change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah if your function is lengthy then it's always better to break it into components, otherwise agreed this is not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah I agree if the function is lengthy, over here (eman's suggestion) we are just removing 15 lines of code and replacing with 1, hence the hesitance. Anyways, I did change it so the pr is ready for review again :D
geetu040
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- rename
taskstotask, inTestTasksV1, `test_tasks.py just for consistency - and other comments from @EmanAbdelhaleem's review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\
| @pytest.mark.uses_test_server() | ||
| def test_get_task(self): | ||
| task = openml.tasks.get_task(1, download_data=True) # anneal; crossvalidation | ||
| assert isinstance(task, OpenMLTask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say you keep this test and only remove the lines that check for cache, basically remove all the lines after these 2
Since all tests that were testing get function used cache so all were removed and there was no unit-test left for get_task method, therefore, keep this test as said above
Metadata