Skip to content

Make pandas an optional dependency#1831

Open
zeel2104 wants to merge 1 commit intoonline-ml:mainfrom
zeel2104:remove-hard-pandas-dependency
Open

Make pandas an optional dependency#1831
zeel2104 wants to merge 1 commit intoonline-ml:mainfrom
zeel2104:remove-hard-pandas-dependency

Conversation

@zeel2104
Copy link
Copy Markdown

Summary

This PR removes pandas as a hard dependency.

Changes made:

  • move pandas from core dependencies to an optional extra
  • lazy-load pandas in batch-related code paths
  • raise a clear ImportError when learn_many, predict_many, transform_many, and related pandas-dependent operations are used without pandas
  • remove unnecessary internal pandas usage in places where it was not required
  • add tests covering behavior when pandas is not installed

Validation

Tested without pandas installed:

  • python -m pytest river\utils\test_pandas.py

Tested with pandas and scikit-learn installed:

  • python -m pytest river\compose\test_.py river\compose\test_product.py river\preprocessing\test_scale.py river\multiclass\test_ovr.py river\naive_bayes\test_naive_bayes.py river\proba\test_gaussian.py river\covariance\test_emp.py river\compat\test_sklearn.py river\anomaly\test_lof.py

Copy link
Copy Markdown
Member

@MaxHalford MaxHalford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there! I just have a couple of questions.

Comment thread river/anomaly/svm.py
super().learn_one(x, y=1)

def learn_many(self, X):
pd = utils.pandas.import_pandas()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good pattern, I like it

def predict_proba_many(self, X):
pd = utils.pandas.import_pandas()
try:
return pd.Series(self.estimator.predict_proba(self._align_df(X)), columns=self.classes)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that fixes a bug I guess, I'm surprised it didn't get caught before

Comment thread river/utils/pandas.py
Comment on lines +22 to +28
def requires_pandas(method):
@functools.wraps(method)
def wrapper(*args, **kwargs):
import_pandas()
return method(*args, **kwargs)

return wrapper
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually used?

Comment thread pyproject.toml
]

[project.optional-dependencies]
pandas = [
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some instructions to the installation docs to explain that mini-batch is an opt-in feature and requires pandas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants