A simple project to explore the number of GCs when doing basic ORM work.
MIT License
Imagine you need to query a hefty number of records back from the database.
The sample app in this repo will create a database of 100,000 small records (see importer.py
)
and then run a query that results in 20,000 being loaded into a single list (see test_app.py
).
That's a lot, but not entirely out of bound of reasonable for certain problem sets.
During the execution of that single query against either SQLAlchemy or MongoEngine, without adjusting Python's GC settings, we get an extreme number of GC collections (1,859 GCs for a single SQLAlchemy query). Yet, clearly none of these records are garbage yet because they haven't even been fully realized from the DB.
Our fix at Talk Python has been to increase the number of surviving allocations required to force a GC from 700 to 50,000. Interestingly, this results in LESS, not more memory used.
The stats below are from Python 3.9.9 running on macOS with Apple Silicon.
SQLAlchemy - 20,000 records in one query
MongoEngine - 20,000 records in one query
python3 -m venv venv
. /venv/bin/activate
pip install -r requirements.txt
python importer/importer.py
python test_app.py
"gc: done"
appears when running with diagnostics onWhat can be done to improve this?
Maybe someday Python will have an adaptive GC where if it runs a collection and finds zero cycles it backs off and if it starts finding more cycles it ramps up or something like that. Until then, we have a few knobs.
In Python, we have the gc
module. This code will allow us to turn down the frequency of GCs:
allocations, gen1, gen2 = gc.get_threshold()
allocations = 50_000 # Start the GC every 50K not 700 surviving container allocations.
gc.set_threshold(allocations, gen1, gen2)
Our experience running this code in production over at Talk Python (podcast and training site) has only been positive: Nearly unchanged memory usage and significant performance improvements for just three lines of code.
Of course, this is a specific use-case: Web pages and APIs that return a non-trivial number of DB objects (MongoEngine documents in our case) that are short-lived. Please test, profile, and monitor your code if you want to try this in your app.