Application component to generate and read Excel files using Apache POI
MIT License
Excesiv is a polyglot application component built using Python, JRuby, MongoDB, and Apache POI to generate and read Excel files off of templates.
During a project, I came accross a problem where I needed to build a small business application to work with data, using Excel as a front-end. I didn't want to write an Excel plugin because I wanted the flexibility and frequent update capabilities of an app running on a server. So as a side project, I wrote Excesiv, an application component that would handle generating and reading Excel files on the server.
At the time, and maybe still today, the most complete library to work with Excel files (and the only one to handle named ranges, which I needed) is Apache POI, written in Java. I don't write Java, and I'd rather not have to learn. More importantly, I wanted to use Python as the main application language, both for my familiarity with it, as well as for its powerful data manipulation and scientific computing libraries (Numpy, Pandas). So I decided to split up the Excesiv component into two modules. The main module is in Python, and speaks to the second smaller module written in JRuby (instead of Java) that uses Apache POI.
This might not be the best solution, and in the future new ways of solving that problem might appear, but it fitted my needs and skills.
To better understand what Excesiv does, feel free to check out the demo.
Excesiv is a polyglot application, so it is split up into two modules:
The Python server is the main module. It offers the web interface to the user, handles user requests, optionally pulls data from a database, manipulates it, and sends tasks to the worker.
The JRuby worker receives tasks from the server and processes them by either writing data to an Excel file and sending the file back to the server, or reading data from an Excel file and sending the data back.
Tasks from the server, and results from the worker, are sent to the other module using MongoDB's capped collections and tailable cursors.
Excel files generated by Excesiv are not created from scratch. They are generated by using a "template" (not in Excel's meaning of templates), which is just the result Excel file emptied of all data, with only column headers, formatting, etc. These template Excel files make heavy use of named ranges to tell Excesiv where to write (or read) the data.
To pass generated Excel files from the worker to the server, Excesiv uses MongoDB's GridFS as a "file system".
The server module is an application component, i.e. it is meant to be included in another application.
To do this you need to install the package:
$ pip install git+git://github.com/nicolahery/excesiv
Later we'll see how to import it into your project and set it up.
The worker module is standalone and runs seperately. In any directory (except your project directory), just clone the GitHub repo:
$ git clone https://github.com/nicolahery/excesiv-worker yourproject-worker
And install dependencies:
$ cd yourproject-worker/
$ jruby -S bundle install
If a new version of Excesiv is released, update both modules.
Update the server module by running:
$ pip install --upgrade git+git://github.com/nicolahery/excesiv
Update the worker module with Git, and install dependencies:
$ cd yourproject-worker/
$ git pull origin master
$ jruby -S bundle install
The server module uses Flask blueprints to allow you to easily add Excesiv to your Python Flask application.
Note: If you're not using Flask, you can always use Excesiv's functions directly with from excesiv import Excesiv
and xs = Excesiv()
. Here we'll only explain how to use with Flask, so please refer to the source if that's the case.
Add Excesiv to your Flask application with register_blueprint
:
# app.py
import os
from flask import Flask
from excesiv import excesiv_blueprint, xs
app = Flask(__name__)
# Add Excesiv's routes and initialization to the app
app.register_blueprint(excesiv_blueprint)
@app.route('/')
def index():
return 'Hello World!'
if __name__ == '__main__':
port = int(os.environ.get('PORT', 5000))
app.run(host='0.0.0.0', port=port)
(See Flask's documentation for more details on blueprints.)
Before running it, make sure you fire up MongoDB. In a new console:
$ mongod
Then you can run the app:
$ python app.py
When you navigate to http://localhost:5000/
you will see it connect to MongoDB.
Right now the app doesn't do much! Read on to learn how to fully use Excesiv.
As mentioned before, Excesiv populates an Excel file used as a template (i.e. an Excel file containing column headers, formatting, etc., but no data).
Let's create a basic template to illustrate how this works. We'll put our template in the directory yourproject/excel
and call it fibonacci.xlsx
. Below is a screenshot of what the sheet looks like:
A couple things to note here:
Excesiv can populate the header of an Excel sheet (in our example, replacing the content of cell B2), as well as the row data (filling in B5, B6, etc.).
An Excesiv Excel template is actually not empty of all data: the first row needs to have some arbitraty values (in this example, cell B5). This is because Excesiv will use that first row to copy the formatting (centered and red in the example above) down to the other rows it creates.
Before Excesiv can use this file, we need to tell it where to populate data in the worksheet. This is where Excel's named ranges come in. Using the Name Manager, we define two named ranges (we will be using their names in our Python application):
What's important here, is the Comment of the named ranges. This is where we tell Excesiv to use that named range when inserting data in the Excel sheet and also what type of named range it is.
In our example, we type header w
in the comment of our header-cell B2 so Excesiv will write to that cell, and just that cell. We type data w
for our row-cell B5 so Excesiv will write to that cell, and iterate to the next row, etc., filling in rows of data.
There are different combinations of comments that have different effects (header
, data
, formula
combined with w
, r
, or w r
). We put r
in the comment when we want to use Excesiv to read an Excel file. The comment formula
works the same as data
(Excesiv will iterate over the rows), except instead of inserting data, it will copy the formula contained in the cell of the first row. The easiest way to understand how all of this works is to study the workbook in the demo, which contains all the different combinations possible.
The last thing to note in the Name Manager is that we defined a "constant" called first_row
and equal to the number of the first row of data in our worksheet. This is very important, as it tells Excesiv where to begin filling in rows of data, and where to look for the cells containing arbitrary values that have the formatting it needs to copy to other rows.
(Note: You don't have to, but I defined fib_sequence
as a relative named range by removing the $ in front of the 5 and making sure that my active cell was on row 5 when defining it. This allows me to use this name in formulas, and whichever row I'm on, the name will always point to the cell in column B of that same row. It is not very useful in this example, but the demo workbook uses it.)
When you are done, save and close the Excel workbook.
Before we move on, we need to let Excesiv know which directory we are putting the Excel templates in. We do so by defining the Flask configuration value 'EXCEL_DIR'
. In app.py
add:
app.config['EXCEL_DIR'] = os.path.join(
os.path.dirname(os.path.abspath(__file__)),
'excel')
Before we can use this template, we need to define Excesiv task methods in our application. Excesiv executes two types of tasks: write
and read
. Each Excel template needs a task method defined for one or both types to be usable.
A write
task method has the following form:
def write_task_method(request):
# Optionally do something with HTTP request
data = do_something_to_generate_data(...)
return {'data': data}
Where request
is a Flask Request object, and data
is a dictionary of the form {'header': {}, 'rows': [{}, {}, ...]}
. Each sub-dictionary of 'header'
and 'rows'
contains keys pointing to the Excel named ranges defined earlier.
A read
task method has the following form:
def read_task_method(result):
response_dict = do_something_to_interpret_result(result['data'])
return response_dict
Where result['data']
is a dictionary of the form {'header': {}, 'rows': [{}, {}, ...]}
containing the data read from the Excel sheet, and response_dict
is a dictionary sent back to the web client as JSON.
In our example, we'll only use a write
task since the read
task would require setting up a file upload function. You can always look into the code of the demo for an example read
task.
First, let's create in app.py
a Fibonacci generator:
def fib():
a, b = 0, 1
while 1:
yield a
a, b = b, a + b
Then, let's define a helper function to generate our task data:
def generate_task_data(n):
data = {}
data['header'] = {'fib_n': n}
data['rows'] = []
fib_generator = fib()
for i in range(n):
data['rows'].append({'fib_sequence': fib_generator.next()})
return data
Finally, we define our task method following the pattern described earlier:
def write_task_method(request):
n = int(request.args.get('n', 10))
data = generate_task_data(n)
return {'data': data}
The last thing we need to do is to tell Excesiv about this task method. We do so by using the register_task_method
function:
from excesiv import excesiv_blueprint, xs
# ...
xs.register_task_method('write', 'fibonacci', write_task_method)
The second argument is the name of the template, and has to match the name of the Excel file (without the extension).
Now we can test it out. Excesiv comes with 3 routes:
'/api/write/<template>', methods=['GET', 'POST']
'/api/read/<template>', methods=['POST']
'/api/files/<id>', methods=['GET']
We'll use the first route to generate our Excel file, and the third one to download it. First, make sure you launched MongoDB in a seperate console:
$ mongod
Then fire up the app:
$ python app.py
And launch the JRuby worker in a new console:
$ cd yourproject-worker/
$ jruby worker.rb
Navigate to http://localhost:5000/api/write/fibonacci?n=11
and you should get a response that looks like:
{file_url: "/api/files/504fa137ba9232aa912204a9"}
Now if we go to http://localhost:5000/api/files/504fa137ba9232aa912204a9
it will download the Excel file just created:
Congratulations, you just built your first Excesiv app! For more advanced usage, please take a look at the demo.
You need to deploy the main application containing the server module as well as delpoy the worker module. Let's start with the main application.
In your project, add a Procfile
containing:
web: python app.py
Also add a requirements.txt
file with the following entry:
git+git://github.com/nicolahery/excesiv
Alternatively, you can use a more production-ready server like gunicorn
by changing the Procfile
to:
web: gunicorn app:app -b 0.0.0.0:$PORT -w 3
And adding gunicorn
to the requirements.txt
file:
git+git://github.com/nicolahery/excesiv
gunicorn==0.15.0
Create the app on Heroku:
$ cd yourproject/
$ heroku create yourproject
Provision the MongoLab add-on that will create a MongoDB instance for the app:
$ heroku addons:add mongolab
And deploy the application:
$ git push heroku master
Now let's deploy the worker module. It is a JRuby app, which requires the use of a custom buildpack when creating it on Heroku:
$ cd yourproject-worker/
$ heroku create --buildpack https://github.com/jruby/heroku-buildpack-jruby.git yourproject-worker
We need both modules to share the same database. We do so by copying the MONGOLAB_URI
config environment variable from the main application to the worker module:
$ heroku config:set -a yourproject-worker `heroku config -a yourproject -s | grep MONGOLAB_URI`
Warning: The MongoDB instance is tied to your main application. If you destroy the main app, it will also destroy the MongoDB instance, even though the worker module uses it.
Finally, deploy the worker app and launch the worker process:
$ git push heroku master
$ heroku ps:scale worker=1
Both modules are now deployed, running, and connected to the database.
The following will assume we are a deploying to a private Stackato VM that can be reached on the host stackato.local
.
Let's first deploy our main application, the one containing the server module. Like above for Heroku, make sure you have a requirements.txt
file with dependencies (we'll also use gunicorn
in this example):
git+git://github.com/nicolahery/excesiv
gunicorn==0.15.0
Then create a stackato.yml
configuration file, by copying the contents of the stackato-sample.yml file from this repository. Make sure you change the name
of the app at the top.
Connect to the Stackato instance:
$ stackato target api.stackato.local
$ stackato login [email protected] --passwd userpass
And push the application using the config file:
$ cd yourproject/
$ stackato push -n
Now let's deploy the worker module. This is a little different than Heroku because at the time of writing, Stackato v2.4 does not support JRuby worker processes out-of-the-box. To work around this, we will package our worker into a .jar
file using Warbler.
First, install Warbler:
$ jruby -S gem install warbler
Then run the command:
$ cd yourproject-worker/
$ jruby -S warble
This will create the yourproject-worker.jar
file.
Next, make a copy of the stackato-sample.yml
file:
$ cp stackato-sample.yml stackato.yml
And change the name
and command
entries to reflect the name of your project and jar
file.
Finally, push the worker app:
$ stackato push -n
Excesiv is still in a state of development. It works, but if you use it in an application, you do so at your own risk.
I also haven't written tests. I know, it's bad. It's on the todo list.
This article written by Ben Wen from MongoLab served as inspiration for using MongoDB as a message queue between two modules of an application, and as a guide for deploying the modules to Heroku: Building a Real-time, Polyglot Application with Node.js, Ruby, MongoDB and Socket.IO.