Skip to content

ChanMeng666/water-quality-testing-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Water Quality Testing Data Analysis

Statistical analysis and predictive modeling of water quality parameters using Python.

Overview

This project analyzes a dataset of 500 water samples across five quality parameters to explore relationships between water quality indicators and build predictive models for conductivity.

Key findings:

  • Strong positive correlation (r = 0.705) between pH and dissolved oxygen levels
  • Multi-parameter linear regression model predicts conductivity from pH, temperature, turbidity, and dissolved oxygen
  • OLS regression confirms statistically significant relationships between several parameter pairs (p < 0.05)

Dataset

The dataset (data/water_quality_testing.csv) contains 500 samples with the following parameters:

Parameter Unit Range
pH pH units 6.83 - 7.48
Temperature °C 20.3 - 23.6
Turbidity NTU 3.1 - 5.1
Dissolved Oxygen mg/L 6.0 - 9.9
Conductivity µS/cm 316 - 370

Project Structure

water-quality-testing-data-analysis/
├── data/
│   └── water_quality_testing.csv       # Water quality dataset (500 samples)
├── notebooks/
│   └── water_quality_analysis.ipynb    # Main analysis notebook
├── .gitignore
├── CODE_OF_CONDUCT.md
├── LICENSE
├── README.md
└── requirements.txt

Getting Started

Prerequisites

  • Python 3.8+
  • pip

Installation

git clone https://github.com/ChanMeng666/water-quality-testing-data-analysis.git
cd water-quality-testing-data-analysis
pip install -r requirements.txt

Usage

jupyter notebook notebooks/water_quality_analysis.ipynb

Run all cells (Kernel > Restart & Run All) to reproduce the full analysis.

Analysis Contents

The notebook covers the following topics:

  1. Data Loading and Inspection - Load dataset, examine structure and summary statistics
  2. Distribution Analysis - Histograms with KDE for all parameters
  3. Correlation Analysis - Correlation matrix heatmap and pair plots
  4. pH vs Dissolved Oxygen - Deep dive into the strongest correlation
  5. Parameter Relationships - Regression plots for multiple parameter pairs
  6. Predictive Modeling - Linear regression for conductivity prediction (two-feature and multi-parameter models)
  7. Statistical Modeling (OLS) - Ordinary least squares regression with statsmodels for statistical inference
  8. Conclusions - Summary of key findings

Built With

License

This project is licensed under the MIT License. See LICENSE for details.

Author

Chan Meng - GitHub · LinkedIn · Website

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors