The kafka-spark-notebooks folder contains 4 different notebooks:
ks-connection-simple-err.ipynb(scala): connects to theerrors-simpletopic and continuously save the content of the topic as parquet fileks-connection-simple-war.ipynb(scala): connects to thewarnings-simpletopic and continuously save the content of the topic as parquet filedata-preparation.ipynb(scala): joins the errors and warnings streams and performs simple data preparation taskspivot.ipynb(scala): pivots the result of the stream-stream join.
The pm-spark-notebooks folder contains 3 different notebooks:
Notebook_1_DataCleansing_FeatureEngineering.ipynb(python): data exploration, data cleansing and feature engineeringNotebook_2_FeatureEngineering_RollingCompute.ipynb(python): computation of rolling features on different time periodsNotebook_3_Labeling_FeatureSelection_Modeling.ipynb(python): machine learning operations
The data used in the notebooks can be found here
The folder contains the avro schema for the warnings and errors messages.
The folder contains the producer scripts:
simple-producer.py(python): sends messages to simple kafka topicsavro-producer.py(python): sends messages to avro kafka topics