A comprehensive command-line data analysis tool built in C++ that can process CSV files and provide statistical insights.
- Dataset Overview: Display basic information about your data
- Statistical Analysis: Calculate mean, median, standard deviation, min/max for numeric columns
- Categorical Analysis: Frequency distribution and percentage breakdowns
- Correlation Analysis: Find relationships between numeric variables
- Data Preview: View first N rows of your dataset
- Search Functionality: Find specific values across all columns
- Data Filtering: Export subsets based on conditions
- Formatted Display: Clean, tabular output for easy reading
- Export Capability: Save filtered results to new CSV files
- Interactive Menu: User-friendly command-line interface
# Compile using the Makefile
make
# Or compile manually
g++ -std=c++17 -Wall -Wextra -O2 -o data_analyzer main.cpp-
Run the program:
./data_analyzer
-
Enter your CSV filename when prompted
-
Use the interactive menu to explore your data:
- View dataset information
- Analyze specific columns
- Find correlations
- Search for specific values
- Export filtered data
The project includes sample_data.csv with employee information to test the analyzer. You can use your own CSV files as well.
=== ANALYSIS FOR: Salary ===
Count: 10
Mean: 56800.00
Median: 56500.00
Min: 45000.00
Max: 70000.00
Standard Deviation: 8234.56
Sum: 568000.00
=== CORRELATION ANALYSIS ===
Age <-> Salary: 0.743
Age <-> Years_Experience: 0.856
Salary <-> Performance_Score: 0.621
- Object-Oriented Programming: Clean class design with encapsulation
- File I/O: CSV parsing and data export
- STL Usage: Vectors, maps, algorithms, and iterators
- Mathematical Computing: Statistical calculations and correlations
- Memory Management: Efficient data structures
- Error Handling: Robust input validation
- Business data analysis
- Research data processing
- Financial data insights
- Performance metrics analysis
- Quality assurance reporting
- Data structure selection for performance
- Algorithm implementation for statistics
- User interface design for CLI applications
- Code organization and maintainability
- Testing with real datasets
Next Steps to Make It Even More Impactful:
- Add data visualization (ASCII charts)
- Implement machine learning algorithms (linear regression)
- Support for JSON and other data formats
- Multi-threading for large datasets
- Database connectivity
- Web interface using C++ web frameworks
dhrumil246 - Created on 2025-05-24
This project demonstrates practical C++ skills while solving real data analysis problems.