A Kaggle Downloader Python Package
OTHER License
KaggleDownloader
is a Python class designed to interact with Kaggle, enabling users to authenticate, search, download, and extract datasets. The class can be used both interactively in Jupyter Notebooks or via the command line.
Before using the KaggleDownloader
class, make sure you have:
You can import and use KaggleDownloader
directly within a Jupyter Notebook. Below is a step-by-step guide to authenticate and download datasets using the class methods.
import kaggle_downloader as kd # Assuming you've saved the class in kaggle_downloader.py
import pandas as pd
# Initialize KaggleDownloader
downloader = kd.KaggleDownloader(api_token_path="./kaggle.json")
# Authenticate with Kaggle API
downloader.authenticate_kaggle()
# Search for datasets related to a theme
downloader.search_datasets("netflix")
# Download a specific dataset by its slug
downloader.download_dataset("shivamb/netflix-shows")
df = pd.read_csv("./netflix_titles.csv", delimiter=',', encoding="utf-8", encoding_errors="replace")
df.head()
get_api_token_path()
: Returns the path to the primary Kaggle API token file.get_alternative_token_path()
: Returns the path to the alternative Kaggle API token file.get_path_downloads()
: Returns the download directory path.set_api_token_path(new_path)
: Sets a new path for the Kaggle API token.set_alternative_token_path(new_path)
: Sets a new path for the alternative Kaggle API token.set_path_downloads(new_path)
: Sets a new path for downloaded datasets.authenticate_kaggle()
: Authenticates with the Kaggle API by loading credentials from the token file.authenticate_with_credentials()
: Prompts the user to manually enter Kaggle credentials and saves them to a file.search_datasets(dataset_theme)
: Searches Kaggle for datasets matching a given keyword or theme.download_dataset(dataset_slug)
: Downloads a dataset from Kaggle to the specified directory.extract_zip(zip_file)
: Extracts a downloaded zip file to the download directory.check_kaggle_json()
: Checks if the Kaggle API token file exists at either the primary or alternative path.create_download_directory(path)
: Creates the directory where datasets will be saved, if it doesn't already exist.Alternatively, you can use the KaggleDownloader
class via the command line. The main()
method allows users to run the class and download datasets by specifying the dataset slug as an argument.
First, make sure your script is executable:
chmod +x kaggle_downloader.py
Use the following command to download a dataset from Kaggle:
python kaggle_downloader_package/kaggle_downloader.py benroshan/ecommerce-data
This will authenticate with Kaggle (based on your kaggle.json
token file) and download the dataset to the directory specified in path_downloads
(or the current working directory by default).
dataset_slug
: The Kaggle dataset identifier (slug) that you want to download, e.g., benroshan/ecommerce-data
.kaggle.json
).kaggle_downloader.py
with the actual file name if different.from kaggle_downloader import KaggleDownloader
) if you organize your code differently.Feel free to contribute to this project by submitting issues, feature requests, or pull requests on GitHub.
This project is licensed under the MIT License. See the LICENSE
file for details.
Mariano Gobea Alcoba Email: [email protected]