row

It's a reference implementation of the new tabular file standard discussed at dataprotocols repository.

It looks like a CSV but have stronger rules, lowering ambiguation and complexity of processing (see "Specification" below).

Specification

Blank lines or lines starting with # are discarded by the parser;
The first valid line must be the header;
The second valid line must be the field types;
Rows must be separated by character 0x0A (new line, often represented as
\n). Any 0x0A in the file are row separators (no exceptions);
Fields must be separated by 0x09 (tabular space, often represented as
0x09). Any 0x09 in the file are field separators (no exceptions);
Field values must be encoded in UTF-8 without BOM (byte-order-marker).
Binary data should be encoded as base64 or any other format that uses UTF-8
(or a subset of it, like ASCII) as output;
Escape sequences for use inside field data:
- \n for new line (0x0A),
- \t for tabular space (0x09),
- \N for null (absence of data),
- \# to start a new row with # as the first character of the first field,
- \\ for back slash (0x5C).
The file could be or could not be gzip-compressed (the parser should detect
automatically)

Field Types

bool
int
float
date
datetime
text
binary

Example of Usage

The interface looks like Python's csv module: has DictReader and DictWriter classes.

Giving the file brazilian-cities.row, we can read it like this:

# coding: utf-8

import row

cities = row.DictReader(open('brazilian-cities.row', 'rb'))
for city in cities:
    if city['state'] != 'RJ':
        continue
    area = city['area']
    inhabitants = city['inhabitants']
    density = inhabitants / area
    print('{}:'.format(city['city']))
    print('  area        = {:8.2f} km²'.format(area))
    print('  inhabitants = {:8d} citizens'.format(inhabitants))
    print('  density     = {:8.2f} citizens/km²'.format(density))

TODO

Specification

Support for variable (unknown) metadata
Support for format versioning metadata
May add optional RDF metadata
Add 'binary' type (base64-encoded)

Python Library

Python3 support

Tests

First be sure you installed all dependencies:

pip install -r requirements-development.txt

Then, to run the tests, just execute:

make test

Related Projects

pyeconlab

Python Library for Economists

03 Jul 2014 1

vertica-python

Official native Python client for the Vertica Analytics Database.

07 Aug 2013 378

lcdio

Lowest Common Denominator IO. Everything is a list of dictionaries!

04 Sep 2024 2

rows

A common, beautiful interface to tabular data, no matter the format

07 May 2014 865

records

SQL for Humans™

24 Dec 2014 7,057

brasil

Python library to access Brazilian data through Brasil.IO API

12 Jul 2018 12

treetable

Print in ascii art a table with a tree-like structure

01 Feb 2019 19

meza

A Python toolkit for processing tabular data

19 Aug 2015 414

csvs-to-sqlite

Convert CSV files into a SQLite database

13 Nov 2017 859

tablite

multiprocessing enabled out-of-memory data analysis library for tabular data.

27 Jun 2020 36

python-tabulate

Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated fr...

02 Sep 2019 2,137

datatable

A Python package for manipulating 2-dimensional tabular data structures

03 Mar 2017 1,809

turu-py

Simple Database API for Typed Python

12 Dec 2023 6

pyexcel

Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

22 Aug 2014 1,202

rasterio

Rasterio reads and writes geospatial raster datasets

04 Nov 2013 2,153