👨🏻💻 Load/dump Django models from/to Excel 2007+ workbooks 📊
MIT License
django-xlsx-serializer
is a Django application designed to handle
the data serialization and deserialization between Django models and Microsoft
Excel 2007+ workbooks. Utilizing the OpenPyXL engine, this tool
provides robust methods to export data from Django databases into XLSX files and
import data from the files back into the databases. This functionality is
essential for applications that require data exchange between Django-based
systems and Excel, facilitating such tasks as data migration, reporting, and
backups.
The app allows you to:
dumpdata
command.loaddata
openpyxl.Workbook
objects)Python | Django | Database engines |
---|---|---|
3.9 | 3.2, 4.0, 4.1, 4.2 | SQLite3, PostgreSQL |
3.10 | 3.2, 4.0, 4.1, 4.2, 5.0, 5.1 | SQLite3, PostgreSQL |
3.11 | 4.1, 4.2, 5.0, 5.1 | SQLite3, PostgreSQL |
3.12 | 4.2, 5.0, 5.1 | SQLite3, PostgreSQL |
All setups require OpenPyXL < 4.
The fastest way to add the package to your Python environment is to download and
install it directly from PyPI. Use pip
:
pip install django-xlsx-serializer
or any other dependency manager of your preference.
As soon as the installation is completed, all the app's functionalities can be
accessed from the xlsx_serializer
module:
import xlsx_serializer
The app is compatible with Excel 2007+ XLSX workbooks only. Adding support for the older XLS format is not planned.
The app utilities can be incorporated into your Django project by following one of the approaches listed below:
All of them associate the app's serializer with the xlsx
format.
In your project settings module add xlsx_serializers
to INSTALLED_APPS
:
INSTALLED_APPS = [
# ...
"xlsx_serializer",
# ...
]
In your project settings module update the SERIALIZATION_MODULES
dictionary:
SERIALIZATION_MODULES = {
# ...
"xlsx": "xlsx_serializer",
# ...
}
In any of the apps installed in your projects (let us call it myapp
), register
the xlsx_serializer
manually in the app's ready
hook:
# myapp/apps.py
from django.apps import AppConfig
from django.core import serializers
class MyAppConfig(AppConfig):
name = "myapp"
def ready(self) -> None:
super().ready()
# ...
# Register serializers.
serializers.register_serializer("xlsx", "xlsx_serializer")
There are many Django projects using a "core" app for defining project-wide utilities (e.g., custom commands, template tags, etc.). The configuration class of such an app is a good place to apply the code snippet above.
The app adopts quite intuitive correspondence between Excel workbooks (i.e., the collections of worksheets) and Django models:
Serialization can be run either by the built-in dumpdata
Django management command:
python manage.py dumpdata --format xlsx --output dump.xlsx
or from Django interactive shell:
>>> from django.core import serializers
>>> from polls.models import Question
>>> serializers.serialize("xlsx", Question.objects.all(), output="dump.xlsx")
# Prints: <openpyxl.workbook.workbook.Workbook object at ...>
Both the command and expression shown above save dump.xlsx
workbook file. The
latter additionally returns an openpyxl.Workbook
object, which can be used
later if necessary (e.g., in development or maintenance scripts).
When serializing, the app creates worksheets named using fully qualified model
labels. For example, the Question
model defined in the polls
app is
serialized to the "polls.Question" worksheet. Excel does not accept worksheet
names longer than 31 characters. If the model's label is longer, it's truncated.
A useful feature allowing you to circumvent this issue is that the output
worksheet names can be customized using the model_sheet_names
option. So, the
command:
>>> workbook = serialize(
"xlsx",
Question.objects.all(),
model_sheet_names={"polls.Question": "Questions"},
)
>>> workbook
# Prints: <openpyxl.workbook.workbook.Workbook object at ...>
results in the polls.Question
model data serialized in the "Questions"
worksheet. Note that this option is not available when using the app via the
dumpdata
command.
The app inspects each key and value of the
model_sheet_names
dictionary. For the keys, it validates whether they represent valid model identifiers. The values, in turn, are checked to see if they are unique, are not too long, and do not contain invalid characters (?
,*
,:
,\
,/
,[
,]
).
Other key points:
DateField
, DateTimeField
, and TimeField
values are serialized asJSONField
values are serialized as JSON strings returned by the respectiveManyToManyField
values are serialized as stringified lists of foreign keys.--natural-primary
/--natural-foreign
flags), the natural keysThe recommended way of employing the app to load the model data from an Excel
fixture to the database is to call it via the loaddata
command:
python manage.py loaddata fixture.xlsx
Deserialization requires the input workbook's worksheets to have names that are
either the fully qualified labels or model names (case-insensitive). The latter
can be applied if the model name is unique. For example, if the project uses
models polls.Question
and exams.Question
, the worksheet named "Question"
will not be deserialized.
Within a worksheet, ensure that the column headers correspond to the field names of the respective model. The app ignores a column if it does not represent a field. Empty rows and columns surrounding the data range are ignored as well. However, the app does not check the data for the missing or invalid values.
Other key points:
DateField
, DateTimeField
, and TimeField
with timezone supportJSONfield
requires values in a format compatible with the JSONManyToManyField
provide string representations of Pythonast.literal_eval
.This is an open-source project that embraces contributions of all types. We require all contributors to adhere to our Code of Conduct. For comprehensive instructions on how to contribute to the project, please refer to our Contributing Guide.
Created and maintained by Kamil Paduszyński (@paduszyk).
Released under the MIT license.