excesiv

Application component to generate and read Excel files using Apache POI

MIT License

Stars
8

Excesiv

Excesiv is a polyglot application component built using Python, JRuby, MongoDB, and Apache POI to generate and read Excel files off of templates.

Why?

During a project, I came accross a problem where I needed to build a small business application to work with data, using Excel as a front-end. I didn't want to write an Excel plugin because I wanted the flexibility and frequent update capabilities of an app running on a server. So as a side project, I wrote Excesiv, an application component that would handle generating and reading Excel files on the server.

At the time, and maybe still today, the most complete library to work with Excel files (and the only one to handle named ranges, which I needed) is Apache POI, written in Java. I don't write Java, and I'd rather not have to learn. More importantly, I wanted to use Python as the main application language, both for my familiarity with it, as well as for its powerful data manipulation and scientific computing libraries (Numpy, Pandas). So I decided to split up the Excesiv component into two modules. The main module is in Python, and speaks to the second smaller module written in JRuby (instead of Java) that uses Apache POI.

This might not be the best solution, and in the future new ways of solving that problem might appear, but it fitted my needs and skills.

Demo

To better understand what Excesiv does, feel free to check out the demo.

Overview

Modules

Excesiv is a polyglot application, so it is split up into two modules:

The Python server is the main module. It offers the web interface to the user, handles user requests, optionally pulls data from a database, manipulates it, and sends tasks to the worker.

The JRuby worker receives tasks from the server and processes them by either writing data to an Excel file and sending the file back to the server, or reading data from an Excel file and sending the data back.

Message queue

Tasks from the server, and results from the worker, are sent to the other module using MongoDB's capped collections and tailable cursors.

Excel templates

Excel files generated by Excesiv are not created from scratch. They are generated by using a "template" (not in Excel's meaning of templates), which is just the result Excel file emptied of all data, with only column headers, formatting, etc. These template Excel files make heavy use of named ranges to tell Excesiv where to write (or read) the data.

File system

To pass generated Excel files from the worker to the server, Excesiv uses MongoDB's GridFS as a "file system".

Installation

Requirements

Installing the server module

The server module is an application component, i.e. it is meant to be included in another application.

To do this you need to install the package:

$ pip install git+git://github.com/nicolahery/excesiv

Later we'll see how to import it into your project and set it up.

Installing the worker module

The worker module is standalone and runs seperately. In any directory (except your project directory), just clone the GitHub repo:

$ git clone https://github.com/nicolahery/excesiv-worker yourproject-worker

And install dependencies:

$ cd yourproject-worker/
$ jruby -S bundle install

Updating

If a new version of Excesiv is released, update both modules.

Update the server module by running:

$ pip install --upgrade git+git://github.com/nicolahery/excesiv

Update the worker module with Git, and install dependencies:

$ cd yourproject-worker/
$ git pull origin master
$ jruby -S bundle install

Usage

Flask application

The server module uses Flask blueprints to allow you to easily add Excesiv to your Python Flask application.

Note: If you're not using Flask, you can always use Excesiv's functions directly with from excesiv import Excesiv and xs = Excesiv(). Here we'll only explain how to use with Flask, so please refer to the source if that's the case.

Add Excesiv to your Flask application with register_blueprint:

# app.py
import os
from flask import Flask
from excesiv import excesiv_blueprint, xs

app = Flask(__name__)

# Add Excesiv's routes and initialization to the app
app.register_blueprint(excesiv_blueprint)

@app.route('/')
def index():
    return 'Hello World!'

if __name__ == '__main__':
    port = int(os.environ.get('PORT', 5000))
    app.run(host='0.0.0.0', port=port)

(See Flask's documentation for more details on blueprints.)

Before running it, make sure you fire up MongoDB. In a new console:

$ mongod

Then you can run the app:

$ python app.py

When you navigate to http://localhost:5000/ you will see it connect to MongoDB.

Right now the app doesn't do much! Read on to learn how to fully use Excesiv.

Excel templates

As mentioned before, Excesiv populates an Excel file used as a template (i.e. an Excel file containing column headers, formatting, etc., but no data).

Let's create a basic template to illustrate how this works. We'll put our template in the directory yourproject/excel and call it fibonacci.xlsx. Below is a screenshot of what the sheet looks like:

A couple things to note here:

  • Excesiv can populate the header of an Excel sheet (in our example, replacing the content of cell B2), as well as the row data (filling in B5, B6, etc.).

  • An Excesiv Excel template is actually not empty of all data: the first row needs to have some arbitraty values (in this example, cell B5). This is because Excesiv will use that first row to copy the formatting (centered and red in the example above) down to the other rows it creates.

Before Excesiv can use this file, we need to tell it where to populate data in the worksheet. This is where Excel's named ranges come in. Using the Name Manager, we define two named ranges (we will be using their names in our Python application):

What's important here, is the Comment of the named ranges. This is where we tell Excesiv to use that named range when inserting data in the Excel sheet and also what type of named range it is.

In our example, we type header w in the comment of our header-cell B2 so Excesiv will write to that cell, and just that cell. We type data w for our row-cell B5 so Excesiv will write to that cell, and iterate to the next row, etc., filling in rows of data.

There are different combinations of comments that have different effects (header, data, formula combined with w, r, or w r). We put r in the comment when we want to use Excesiv to read an Excel file. The comment formula works the same as data (Excesiv will iterate over the rows), except instead of inserting data, it will copy the formula contained in the cell of the first row. The easiest way to understand how all of this works is to study the workbook in the demo, which contains all the different combinations possible.

The last thing to note in the Name Manager is that we defined a "constant" called first_row and equal to the number of the first row of data in our worksheet. This is very important, as it tells Excesiv where to begin filling in rows of data, and where to look for the cells containing arbitrary values that have the formatting it needs to copy to other rows.

(Note: You don't have to, but I defined fib_sequence as a relative named range by removing the $ in front of the 5 and making sure that my active cell was on row 5 when defining it. This allows me to use this name in formulas, and whichever row I'm on, the name will always point to the cell in column B of that same row. It is not very useful in this example, but the demo workbook uses it.)

When you are done, save and close the Excel workbook.

Before we move on, we need to let Excesiv know which directory we are putting the Excel templates in. We do so by defining the Flask configuration value 'EXCEL_DIR'. In app.py add:

app.config['EXCEL_DIR'] = os.path.join(
                            os.path.dirname(os.path.abspath(__file__)), 
                            'excel')

Task methods

Before we can use this template, we need to define Excesiv task methods in our application. Excesiv executes two types of tasks: write and read. Each Excel template needs a task method defined for one or both types to be usable.

A write task method has the following form:

def write_task_method(request):
    # Optionally do something with HTTP request
    data = do_something_to_generate_data(...)
    return {'data': data}

Where request is a Flask Request object, and data is a dictionary of the form {'header': {}, 'rows': [{}, {}, ...]}. Each sub-dictionary of 'header' and 'rows' contains keys pointing to the Excel named ranges defined earlier.

A read task method has the following form:

def read_task_method(result):
    response_dict = do_something_to_interpret_result(result['data'])
    return response_dict

Where result['data'] is a dictionary of the form {'header': {}, 'rows': [{}, {}, ...]} containing the data read from the Excel sheet, and response_dict is a dictionary sent back to the web client as JSON.

In our example, we'll only use a write task since the read task would require setting up a file upload function. You can always look into the code of the demo for an example read task.

First, let's create in app.py a Fibonacci generator:

def fib():
    a, b = 0, 1
    while 1:
        yield a
        a, b = b, a + b

Then, let's define a helper function to generate our task data:

def generate_task_data(n):
    data = {}
    data['header'] = {'fib_n': n}
    data['rows'] = []
    fib_generator = fib()
    for i in range(n):
        data['rows'].append({'fib_sequence': fib_generator.next()})
    return data

Finally, we define our task method following the pattern described earlier:

def write_task_method(request):
    n = int(request.args.get('n', 10))
    data = generate_task_data(n)
    return {'data': data}

The last thing we need to do is to tell Excesiv about this task method. We do so by using the register_task_method function:

from excesiv import excesiv_blueprint, xs
# ...
xs.register_task_method('write', 'fibonacci', write_task_method)

The second argument is the name of the template, and has to match the name of the Excel file (without the extension).

Now we can test it out. Excesiv comes with 3 routes:

  • '/api/write/<template>', methods=['GET', 'POST']
  • '/api/read/<template>', methods=['POST']
  • '/api/files/<id>', methods=['GET']

We'll use the first route to generate our Excel file, and the third one to download it. First, make sure you launched MongoDB in a seperate console:

$ mongod

Then fire up the app:

$ python app.py

And launch the JRuby worker in a new console:

$ cd yourproject-worker/
$ jruby worker.rb

Navigate to http://localhost:5000/api/write/fibonacci?n=11 and you should get a response that looks like:

{file_url: "/api/files/504fa137ba9232aa912204a9"}

Now if we go to http://localhost:5000/api/files/504fa137ba9232aa912204a9 it will download the Excel file just created:

Congratulations, you just built your first Excesiv app! For more advanced usage, please take a look at the demo.

Deployment

Heroku

You need to deploy the main application containing the server module as well as delpoy the worker module. Let's start with the main application.

In your project, add a Procfile containing:

web: python app.py

Also add a requirements.txt file with the following entry:

git+git://github.com/nicolahery/excesiv

Alternatively, you can use a more production-ready server like gunicorn by changing the Procfile to:

web: gunicorn app:app -b 0.0.0.0:$PORT -w 3

And adding gunicorn to the requirements.txt file:

git+git://github.com/nicolahery/excesiv
gunicorn==0.15.0

Create the app on Heroku:

$ cd yourproject/
$ heroku create yourproject

Provision the MongoLab add-on that will create a MongoDB instance for the app:

$ heroku addons:add mongolab

And deploy the application:

$ git push heroku master

Now let's deploy the worker module. It is a JRuby app, which requires the use of a custom buildpack when creating it on Heroku:

$ cd yourproject-worker/
$ heroku create --buildpack https://github.com/jruby/heroku-buildpack-jruby.git yourproject-worker

We need both modules to share the same database. We do so by copying the MONGOLAB_URI config environment variable from the main application to the worker module:

$ heroku config:set -a yourproject-worker `heroku config -a yourproject -s | grep MONGOLAB_URI`

Warning: The MongoDB instance is tied to your main application. If you destroy the main app, it will also destroy the MongoDB instance, even though the worker module uses it.

Finally, deploy the worker app and launch the worker process:

$ git push heroku master
$ heroku ps:scale worker=1

Both modules are now deployed, running, and connected to the database.

Stackato

The following will assume we are a deploying to a private Stackato VM that can be reached on the host stackato.local.

Let's first deploy our main application, the one containing the server module. Like above for Heroku, make sure you have a requirements.txt file with dependencies (we'll also use gunicorn in this example):

git+git://github.com/nicolahery/excesiv
gunicorn==0.15.0

Then create a stackato.yml configuration file, by copying the contents of the stackato-sample.yml file from this repository. Make sure you change the name of the app at the top.

Connect to the Stackato instance:

$ stackato target api.stackato.local
$ stackato login [email protected] --passwd userpass

And push the application using the config file:

$ cd yourproject/
$ stackato push -n

Now let's deploy the worker module. This is a little different than Heroku because at the time of writing, Stackato v2.4 does not support JRuby worker processes out-of-the-box. To work around this, we will package our worker into a .jar file using Warbler.

First, install Warbler:

$ jruby -S gem install warbler

Then run the command:

$ cd yourproject-worker/
$ jruby -S warble

This will create the yourproject-worker.jar file.

Next, make a copy of the stackato-sample.yml file:

$ cp stackato-sample.yml stackato.yml

And change the name and command entries to reflect the name of your project and jar file.

Finally, push the worker app:

$ stackato push -n

Limitations

Excesiv is still in a state of development. It works, but if you use it in an application, you do so at your own risk.

I also haven't written tests. I know, it's bad. It's on the todo list.

Acknowledgements

This article written by Ben Wen from MongoLab served as inspiration for using MongoDB as a message queue between two modules of an application, and as a guide for deploying the modules to Heroku: Building a Real-time, Polyglot Application with Node.js, Ruby, MongoDB and Socket.IO.