diff --git a/docs/survey/README.md b/docs/survey/README.md new file mode 100644 index 00000000000..8ea5b7f3bf7 --- /dev/null +++ b/docs/survey/README.md @@ -0,0 +1,7 @@ +This is a survey of the capabilities of SystemDS compared to other popular languages and libraries. The survey has been +broken down into 3 components based on the different phases of the ML lifecycle: source, build, and deploy. A further +grouping is based on the DSLs supported by SystemDS: DML and Python. + +# References +1. Boehm, Matthias, Arun Kumar, and Jun Yang. "Data management in machine learning systems." Synthesis Lectures on Data +Management 11.1 (2019): 1-173. diff --git a/docs/survey/build/README.md b/docs/survey/build/README.md new file mode 100644 index 00000000000..c6196d2ee5f --- /dev/null +++ b/docs/survey/build/README.md @@ -0,0 +1,9 @@ +# Overview + +The build phase of the ML lifecycle includes the following tasks [1]: +1. Model training +2. Model tuning and analysis +3. Model validation + +# References +1. Hapke, Hannes, and Catherine Nelson. Building machine learning pipelines. O'Reilly Media, 2020. diff --git a/docs/survey/deploy/README.md b/docs/survey/deploy/README.md new file mode 100644 index 00000000000..90e3b26435d --- /dev/null +++ b/docs/survey/deploy/README.md @@ -0,0 +1,9 @@ +# Overview + +The deploy phase of the ML lifecycle includes the following tasks [1, 2]: +1. Model deployment +2. Monitoring + +# References +1. Hapke, Hannes, and Catherine Nelson. Building machine learning pipelines. O'Reilly Media, 2020. +2. https://www.tensorflow.org/tfx diff --git a/docs/survey/source/README.md b/docs/survey/source/README.md new file mode 100644 index 00000000000..874d3ea8738 --- /dev/null +++ b/docs/survey/source/README.md @@ -0,0 +1,9 @@ +# Overview + +The source phase of the ML lifecycle includes the following tasks [1]: +1. Data ingestion +2. Data validation +3. Data preprocessing + +# References +1. Hapke, Hannes, and Catherine Nelson. Building machine learning pipelines. O'Reilly Media, 2020. diff --git a/docs/survey/source/dml/R.md b/docs/survey/source/dml/R.md new file mode 100644 index 00000000000..9a43edd1af8 --- /dev/null +++ b/docs/survey/source/dml/R.md @@ -0,0 +1,6 @@ +# Data ingestion +# Data validation +# Data preprocessing + +# References +1. https://rpubs.com/prtk/900512 diff --git a/docs/survey/source/dml/pandas.md b/docs/survey/source/dml/pandas.md new file mode 100644 index 00000000000..9e57fcccdca --- /dev/null +++ b/docs/survey/source/dml/pandas.md @@ -0,0 +1,6 @@ +# Data ingestion +# Data validation +# Data preprocessing + +# References +1. https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html diff --git a/docs/survey/source/dml/sklearn.md b/docs/survey/source/dml/sklearn.md new file mode 100644 index 00000000000..4911a3ce32a --- /dev/null +++ b/docs/survey/source/dml/sklearn.md @@ -0,0 +1,6 @@ +# Data ingestion +# Data validation +# Data preprocessing + +# References +1. https://scikit-learn.org/stable/modules/classes.html