🌍 Earth Engine Public Repository Stats

This project contains notebooks that were used to scrape, parse, and analyze every public Google Earth Engine repository. It also contains most of the raw data that was created in the process.

Background

Google Earth Engine (GEE) is a cloud-computing platform for remote sensing and geospatial analysis that is widely used for scientific research and natural resource management. Some (probably a small fraction) of the code that users have written in GEE is publicly accessible, so I decided to download all ~11 gigabytes of it to see what I could learn about how people are using the platform.

The stats and data here are a snapshot of public repositories in March 2022.

📈 Stats

🌐 Earth Engine Stats

11,175 public repositories
Contributed by 8,344 users
57 million lines of code

🛰️ Image Collection Stats

The most frequently imported platform is Sentinel-2 with 68k imports
Together, Landsat platforms have been imported 134k times, with half of those imports being Landsat 8.

🖼️ Image Stats

Overall, the most frequently imported image is NASA SRTM 30m elevation data.
The most frequently imported Landsat image is ee.Image("LANDSAT/LC08/C01/T1_TOA/LC08_044034_20140318").
For obvious reasons, this incredible scene over the northern Australian coast is the most frequently imported Sentinel-2 image: ee.Image("COPERNICUS/S2/20180422T012719_20180422T012714_T52LHM").

🖥️ Module Stats

External modules have been imported 35k times
The most popular import is ee-palettes, followed by GEET and GEE Tools.

📋 Other Stats

The most frequently used coordinate reference systems are Mercator and Web Mercator, which together make up 91% of all CRS's used.
2019 is the most frequently used year.

Usage

Most of the tabular data is already accessible in the data folder, but if you want to modify the analysis, you can follow the steps below to run it yourself.

Use notebooks/001_scraping_repositories to clone all Earth Engine repositories to local storage.
Use notebooks/002_summarizing_code to count lines of code.
Use notebooks/003_parsing_code to parse all of the source code and build lists of commonly used image collections, modules, points, etc. Export those lists for future use (you can find most of them pre-made in data).
Use notebooks/004_dataset_stats to analyze which datasets are used most commonly.
Use notebooks/005_module_stats to analyze which modules are imported most frequently.

ee_repository_stats