Reference implementation of a new tabular data format created by dataprotocols
It's a reference implementation of the new tabular file standard discussed at dataprotocols repository.
It looks like a CSV but have stronger rules, lowering ambiguation and complexity of processing (see "Specification" below).
#
are discarded by the parser;0x0A
(new line, often represented as\n
). Any 0x0A
in the file are row separators (no exceptions);0x09
(tabular space, often represented as0x09
). Any 0x09
in the file are field separators (no exceptions);\n
for new line (0x0A
),\t
for tabular space (0x09
),\N
for null (absence of data),\#
to start a new row with #
as the first character of the first field,\\
for back slash (0x5C
).bool
int
float
date
datetime
text
binary
The interface looks like Python's csv
module: has DictReader
and
DictWriter
classes.
Giving the file brazilian-cities.row
, we can read it like this:
# coding: utf-8
import row
cities = row.DictReader(open('brazilian-cities.row', 'rb'))
for city in cities:
if city['state'] != 'RJ':
continue
area = city['area']
inhabitants = city['inhabitants']
density = inhabitants / area
print('{}:'.format(city['city']))
print(' area = {:8.2f} km²'.format(area))
print(' inhabitants = {:8d} citizens'.format(inhabitants))
print(' density = {:8.2f} citizens/km²'.format(density))
First be sure you installed all dependencies:
pip install -r requirements-development.txt
Then, to run the tests, just execute:
make test