Data-science

Collection of useful data science topics along with articles, videos, and code

Stars
4K

Data Science

Collection of useful data science topics along with articles and videos.

The Data Scientists Toolkit: 100+ Essential Tools for Modern Analytics

To receive a condensed overview of these tools and additional resources, sign up for CodeCut's free PDF guide. This comprehensive 264-page document covers over 100 essential data science tools, providing you with a valuable reference for your work.

How to Download the Code in This Repository to Your Local Machine

To download the code in this repo, you can simply use git clone

git clone https://github.com/khuyentran1401/Data-science

Contents

  1. MLOps
  2. Data Management Tools
  3. Testing
  4. Productive Tools
  5. Python Helper Tools
  6. Tools for Deployment
  7. Speed-up Tools
  8. Math Tools
  9. Machine Learning
  10. Natural Language Processing
  11. Computer Vision
  12. Time Series
  13. Feature Engineering
  14. Visualization
  15. Mathematical Programming
  16. Scraping
  17. Python
  18. Logging and Debugging
  19. Linear Algebra
  20. Data Structure
  21. Statistics
  22. Web Applications
  23. Share Insights
  24. Cool Tools
  25. Learning Tips
  26. Productive Tips
  27. VSCode
  28. Book Review
  29. Data Science Portfolio

MLOps

Title Article Repository Video
Stop Hard Coding in a Data Science Project Use Configuration Files Instead
Poetry: A Better Way to Manage Python Dependencies
Git for Data Scientists: Learn Git through Practical Examples
Introduction to Weight & Biases: Track and Visualize your Machine Learning Experiments in 3 Lines of Code
Kedro A Python Framework for Reproducible Data Science Project
Orchestrate a Data Science Project in Python With Prefect
Orchestrate Your Data Science Project with Prefect 2.0
DagsHub: a GitHub Supplement for Data Scientists and ML Engineers
4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python
BentoML: Create an ML Powered Prediction Service in Minutes
How to Structure a Data Science Project for Maintainability (with DVC)
How to Structure an ML Project for Reproducibility and Maintainability (with Prefect)
GitHub Actions in MLOps: Automatically Check and Deploy Your ML Model
Create Robust Data Pipelines with Prefect, Docker, and GitHub
Create a Maintainable Data Pipeline with Prefect and DVC
Build a Full-Stack ML Application With Pydantic And Prefect
Streamline Code Updates with DVC and GitHub Actions
Create Observable and Reproducible Notebooks with Hex
Build Reliable Machine Learning Pipelines with Continuous Integration
Automate Machine Learning Deployment with GitHub Actions
How to Build a Fully Automated Data Drift Detection Pipeline

Data Management Tools

Title Article Repository Video
Introduction to DVC: Data Version Control Tool for Machine Learning Projects
Great Expectations: Always Know What to Expect From Your Data
Validate Your pandas DataFrame with Pandera
Introduction to Schema: A Python Libary to Validate your Data
How to Create Fake Data with Faker
Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing
What is dbt (data build tool) and When should you use it?
Streamline dbt Model Development with Notebook-Style Workspace

Testing

Title Article Repository Video
Pytest for Data Scientists
4 Lessor-Known Yet Awesome Tips forPytest
DeepDiff Recursively Find and Ignore Trivial Differences Using Python
Checklist Behavioral Testing of NLP Models
Detect Defects in a Data Pipeline Early with Validation and Notifications
Write Readable Tests for Your Machine Learning Models with Behave

Productive Tools

Title Article Repository
3 Tools to Track and Visualize the Execution of your Python Code
2 Tools to Automatically Reload when Python Files Change
3 Ways to Get Notified with Python
How to Create Reusable Command-Line
How to Strip Outputs and Execute Interactive Code in a Python Script
Sending Slack Notifications in Python with Prefect

Python Helper Tools

Title Article Repository Video
Pydash: A Kitchen Sink of Missing Python Utilities
Write Clean Python Code Using Pipes
Introducing FugueSQL SQL for Pandas, Spark, and Dask DataFrames
Fugue and DuckDB: Fast SQL Code in Python
Simplify Data Science Workflows on BigQuery with Fugue and Python

Tools for Deployment

Title Article Repository
How to Effortlessly Publish your Python Package to PyPI Using Poetry
Typer: Build Powerful CLIs in One Line of Code using Python

Speed-up Tools

Title Article Repository
Cython-A Speed-Up Tool for your Python Function
Train your Machine Learning Model 150x Faster with cuML

Math Tools

Title Article Repository
SymPy: Symbolic Computation in Python

Machine Learning

Title Article Repository Video
How to Monitor And Log your Machine Learning Experiment Remotely with HyperDash
How to Efficiently Fine-Tune your Machine Learning Models
How to Learn Non-linear Dataset with Support Vector Machines
Introduction to IBM Federated Learning: A Collaborative Approach to Train ML Models on Private Data
3 Steps to Improve your Efficiency when Hypertuning ML Models
human-learn: Create a Human Learning Model by Drawing
Patsy: Build Powerful Features with Arbitrary Python Code
SHAP: Explain Any Machine Learning Model in Python
Predict Movie Ratings with User-Based Collaborative Filtering
River: Online Machine Learning in Python
Human-Learn: Rule-Based Learning as an Alternative to Machine Learning

Natural Language Processing

Title Article Repository Video
Sentiment Analysis of LinkedInMessages
Find Common Words in Article with Python Module Newspaper and NLTK
How to Tokenize Tweets with Python
How to Solve Analogies with Word2Vec
What is PyTorch
Convolutional Neural Network in Natural Language Processing
Supercharge your Python String with TextBlob
pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should Know
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge
Build a Robust Conversational Assistant with Rasa
I Analyzed 2k Data Scientist and Data Engineer Jobs and This is What I Found
Checklist Behavioral Testing of NLP Models
PRegEx: Write Human-Readable Regular Expressions in Python
Texthero: Text Preprocessing, Representation, and Visualization for a pandas DataFrame

Computer Vision

Title Article Repository
How to Create an App to Classify Dogs Using fastai and Streamlit

Time Series

Title Article Repository
Kats: a Generalizable Framework to Analyze Time Series Data in Python
How to Detect Seasonality, Outliers, and Changepoints in Your Time Series
4 Tools to Automatically Extract Data from Datetime in Python

Feature Engineering

Title Article Repository Video
3 Ways to Extract Features from Dates with Python
Similarity Encoding for Dirty Categories Using dirty_cat
Snorkel A Human-In-The-Loop Platform to Build Training Data

Visualization

Title Article Repository Video
How to Embed Interactive Charts on your Articles and Personal Website
What I Learned from Scraping 15k Data Science Articles on Medium
How to Create Interactive Plots with Altair
How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization Tool
I Scraped more than 1k Top Machine Learning Github Profiles and this is what I Found
Top 6 Python Libraries for Visualization: Which one to Use?
Introduction to Yellowbrick: A Python Library to Visualize the Prediction of your Machine Learning Model
Visualize Gender-Specific Tweets with Scattertext
Visualize Your Teams Projects Using Python Gantt Chart
How to Create Bindings and Conditions Between Multiple Plots Using Altair
How to Sketch your Data Science Ideas With Excalidraw
Pyvis: Visualize Interactive Network Graphs in Python
Build and Analyze Knowledge Graphs with Diffbot
Observe The Friend Paradox in Facebook Data Using Python
What skills and backgrounds do data scientists have in common?
Visualize Similarities Between Companies With Graph Database
Visualize GitHub Social Network with PyGraphistry
Find the Top Bootcamps for Data Professionals From Over 5k Profiles
floWeaver Turn Flow Data Into a Sankey Diagram In Python
atoti Build a BI Platform in Python
Analyze and Visualize URLs with Network Graph
statsannotations: Add Statistical Significance Annotations on Seaborn Plots

Mathematical Programming

Title Article Repository
How to choose stocks to invest in with Python
Maximize your Productivity with Python
How to Find a Good Match with Python
How to Solve a Staff Scheduling Problem with Python
How to Find Best Locations for your Restaurants with Python
How to Schedule Flights in Python
How to Solve a Production Planning and Inventory Problem in Python

Scraping

Title Article Repository
Web Scrape Movie Database with Beautiful Soup
top-github-scraper: Scrape Top Github Users and Repositories Based On a Keyword in One Line of Code

Python

Title Article Repository Video
6 Common Mistakes to Avoid in Data Science Code
5 Steps to Transform Messy Functions into Production-Ready Code
Numpy Tricks for your Data Science Projects
Timing for Efficient Python Code
How to Use Lambda for Efficient Python Code
Python Tricks for Keeping Track of Your Data
Boost Your Efficiency With Specialized Dictionary Implementations in Python
Dictionary as an Alternative to If-Else
How to Use Zip to Manipulate a List of Tuples
Get the Most out of Your Array With These Four Numpy Methods
3 Python Tricks to Read, Create, and Run Multiple Files Automatically
How to Exclude the Outliers in Pandas DataFrame
Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable
3 Techniques to Effortlessly Import and Execute Python Modules
Simplify Your Functions with Functools Partial and Singledispatch

Logging and Debugging

Title Article Repository Video
How to Create and View Interactive Cheatsheets on the Command-line
Understand CSV Files from your Terminal with XSV
Prettify your Terminal Text With Termcolor and Pyfiglet
Loguru: Simple as Print, Flexible as Logging
Stop Using Print to Debug in Python. Use Icecream Instead
Rich: Generate Rich and Beautiful Text in the Terminal with Python
Create a Beautiful Dashboard in your Terminal with Wtfutil
3 Tools to Monitor and Optimize your Linux System
Ptpython: A Better Python REPL
fd: a Simple but Powerful Tool to Find and Execute Files on the Command Line
Speed Up your Command-Line Navigation with These 3 Tools
Python and Data Science Snippets on the Command Line

Statistics

Title Article Repository
Can Datasets of a Dinosaur and a Circle have Identical Statistics?
Introduction to One-Way ANOVA: A Test to Compare the Means between More than Two Groups
Bayes Theorem, Clearly Explained with Visualization
Detect Change Points with Bayesian Inference and PyMC3
Bayesian Linear Regression with Bambi
Earn More Salary as a Coder Higher Degree or More Years of Experience?

Linear Algebra

Title Article Repository
How to Build a Matrix Module from Scratch
Linear Algebra for Machine Learning: Solve a System of Linear Equations

Data Structure

Title Article Repository
Convex Hull: An Innovative Approach to Gift-Wrap your Data
How to Visualize Social Network With Graph Theory
How to Search Data with KDTree
How to Find the Nearest Hospital with a Voronoi Diagram

Web Applications

Title Article Repository
How to Create an Interactive Startup Growth Calculator with Python
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge
PyWebIO: Write Interactive Web App in Script Way Using Python
PyWebIO 1.3.0: Add Tabs, Pin Input, and Update an Input Based on Another Input
Create an App to Deal with Boredom Using PyWebIO
Build a Robust Workflow to Visualize Trending GitHub Repositories in Python

Share Insights

Title Article Repository
Introduction to Datapane: A Python Library to Build Interactive Reports
Datapanes New Features: Create a Beautiful Dashboard in Python in a Few Lines of Code
Introduction to Datasette: Explore and Publish Your Data in One Line of Code
How to Share your Python Objects Across Different Environments in One Line of Code
How to Share your Jupyter Notebook in 3 Lines of Code with Ngrok
Introduction to Deepnote: Real-time Collaboration on Jupyter Notebook

Cool Tools

Title Article Repository
Simulate Real-life Events in Python Using SimPy
How to Create Mathematical Animations like 3Blue1Brown Using Python

Learning Tips

Title Article Repository
How to Learn Data Science when Life does not Give You a Break
How to Accelerate your Data Science Career by Putting yourself in the Right Environment
To become a Better Data Scientist, you need to Think like a Programmer
How not to be Overwhelmed with Data Science

Productive Tips

Title Article Repository
How to Organize your Data Science Articles with Github
5 Reasons why you should Switch from Jupyter Notebook to Scripts
7 Reasons Why you Should Start Documenting your Code

VSCode

Title Article Repository
How to Leverage Visual Studio Code for your Data Science Projects
Top 4 Code Viewers for Data Scientist in VSCode
Incorporate the Best Practices for Python with These Top 4 VSCode Extensions
Boost Your Efficiency with Customized Code Snippets on VSCode
Top 9 Keyboard Shortcuts in VSCode for Data Scientists

Book Review

Title Article Repository
Python Machine Learning: A Comprehensive Handbook for Machine Learning

Data Science Portfolio

Title Article Repository
How to Create an Elegant Website for your Data Science Portfolio in 10 minutes
Build an Impressive Github Profile in 3 Steps
Badges
Extracted from project README's
View on GitHub Daily Data Science Tips View on YouTube
Related Projects