Python module for exploring Open Data Portal metadata.
This repository contains the files, notebooks, scripts, and data output related to metadata about the New York City Open Data Portal. The techniques and scripts here are extensible to any Socrata open data portal.
For further reference, read the accompanying blog post.
notebooks/Socrata Portal Dataset Counting.ipynb
— definitions of various Socrata dataset-related terms, and anotebooks/NYC Open Data Analysis.ipynb
— Presentation and analysis of metadata about the New York City open datanotebook/JSON-to-Catalog API Match.ipynb
— Notebook generating the datasets.json
file used in thisportal
get_datasets(domain, token)
method.src/portal.py
— Module for working with Socrata portal metadata. Read the docstrings!src/load_catalog.py
— Runnable Python scripts with generates a set of metadata about endpoints (everythingsrc/load_datasets.py
— Runnable Python scripts with generates a set of metadata about "datasets" (atsrc/get_dataset_counts.py
— Runnable Python scripts with generates counts of entities classifiable assrc/load_datasets_using_json_endpoint.py
— Auxillary runnable Python script which generates a set ofload_datasets.py
, which provides richer matched catalog API output (see the docs), is more useful.Thomas Levine did a longitudinal study of Socrata portal instances well worth reading if you're into this sort of thing (there is also a post on Socrata's blog about the aftermath of that effort).