DataAnalysisToolkit is a Python-based data analysis tool designed to streamline various data analysis tasks. It provides the ability to load data from CSV files, perform statistical calculations, detect outliers, clean data, and visualize data.
MIT License
DataAnalysisToolkit is a comprehensive Python package offering a suite of tools designed for efficient data analysis. This toolkit simplifies tasks such as loading CSV data, performing statistical analysis, cleaning data, and visualizing results. It's an ideal tool for data analysts, scientists, and anyone looking to dive into data exploration and machine learning.
This toolkit is an asset for conducting preliminary data analysis, and it seamlessly integrates into larger data processing workflows.
Here's how you can get started with DataAnalysisToolkit:
from data_analysis_toolkit import DataAnalysisToolkit
# Initialize the analyzer with the path to a CSV file
analyzer = DataAnalysisToolkit('../data/test.csv')
# Calculate the mean, median, mode, and trimmed mean of a column
statistics = analyzer.calculate_budget_statistics('column_name')
print(statistics)
# Detect outliers in a column using the z-score method
outliers = analyzer.detect_outliers('column_name')
print(outliers)
# Handle missing values in a column
analyzer.handle_missing_values('column_name', strategy='fill', fill_value=0)
# Drop duplicate rows in the DataFrame
analyzer.drop_duplicates()
# Encode categorical features in the DataFrame
analyzer.encode_categorical_features()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = analyzer.split_data('target_column')
# Plot a histogram of a column
analyzer.plot_data('column_name')
# Export the data to a CSV file
analyzer.export_data('new_file.csv')
Install DataAnalysisToolkit using pip:
pip install dataanalysistoolkit
For detailed documentation, examples, and usage guides, please visit DataAnalysisToolkit Documentation.
Contributions are welcome! For guidelines on how to contribute, please refer to our Contribution Guide.
DataAnalysisToolkit is open-sourced under the MIT License. For more details, see the LICENSE file.
Developed with ❤ by the DataAnalysisToolkit Team.