Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/survey/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
This is a survey of the capabilities of SystemDS compared to other popular languages and libraries. The survey has been
broken down into 3 components based on the different phases of the ML lifecycle: source, build, and deploy. A further
grouping is based on the DSLs supported by SystemDS: DML and Python.

# References
1. Boehm, Matthias, Arun Kumar, and Jun Yang. "Data management in machine learning systems." Synthesis Lectures on Data
Management 11.1 (2019): 1-173.
9 changes: 9 additions & 0 deletions docs/survey/build/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Overview

The build phase of the ML lifecycle includes the following tasks [1]:
1. Model training
2. Model tuning and analysis
3. Model validation

# References
1. Hapke, Hannes, and Catherine Nelson. Building machine learning pipelines. O'Reilly Media, 2020.
9 changes: 9 additions & 0 deletions docs/survey/deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Overview

The deploy phase of the ML lifecycle includes the following tasks [1, 2]:
1. Model deployment
2. Monitoring

# References
1. Hapke, Hannes, and Catherine Nelson. Building machine learning pipelines. O'Reilly Media, 2020.
2. https://www.tensorflow.org/tfx
9 changes: 9 additions & 0 deletions docs/survey/source/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Overview

The source phase of the ML lifecycle includes the following tasks [1]:
1. Data ingestion
2. Data validation
3. Data preprocessing

# References
1. Hapke, Hannes, and Catherine Nelson. Building machine learning pipelines. O'Reilly Media, 2020.
6 changes: 6 additions & 0 deletions docs/survey/source/dml/R.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Data ingestion
# Data validation
# Data preprocessing

# References
1. https://rpubs.com/prtk/900512
6 changes: 6 additions & 0 deletions docs/survey/source/dml/pandas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Data ingestion
# Data validation
# Data preprocessing

# References
1. https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html
6 changes: 6 additions & 0 deletions docs/survey/source/dml/sklearn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Data ingestion
# Data validation
# Data preprocessing

# References
1. https://scikit-learn.org/stable/modules/classes.html