A tiny library for reading con files
MIT License
** Rationale
One of the main drawbacks of visualization is the need to read in specific file
formats which may not map cleanly to delimited file specifications like csv
files. Additionally, languages which are meant for interactive visuals, like
Python or R often have sub-par file I/O capabilities for arbitrary formats.
To this end, a core C++ library with an Apache Arraow compatibility layer is a
reasonable solution which allows minimal, expressive bindings to supported Arrow
languages, while also providing simple enough inter-operability with unsupported
languages like Fortran.
*** Existing solutions
It is tempting to reach for or recreate an entire ecosystem of discipline
specific code, as for example, the Atomic Simulation Environment (ASE) or
AtomsBase.jl, both of which are excellent pedagogical aids with a rich
machninery of helpers. Often these are not designed for interoperability or
speed.
** Design Decisions
Currently this implements the con format specification as written out by eON,
so some assumptions are made about the input files, not all of which are
currently tested / guaranteed to throw (contributions are welcome for additional
sanity checks).
*** Single Frames
That is we expect: #+begin_src bash Random Number Seed Time 15.345600 21.702000 100.000000 90.000000 90.000000 90.000000 0 0 218 0 1 2 2 2 63.546000 1.007930 Cu Coordinates of Component 1 0.63940000000000108 0.90450000000000019 6.97529999999999539 1 0 3.19699999999999873 0.90450000000000019 6.97529999999999539 1 1 H Coordinates of Component 2 8.68229999999999968 9.94699999999999740 11.73299999999999343 0 2 7.94209999999999550 9.94699999999999740 11.73299999999999343 0 3 Random Number Seed Time 15.345600 21.702000 100.000000 90.000000 90.000000 90.000000 0 0 218 0 1 2 2 2 63.546000 1.007930 Cu Coordinates of Component 1 0.63940000000000108 0.90450000000000019 6.97529999999999539 1 0 3.19699999999999873 0.90450000000000019 6.97529999999999539 1 1 H Coordinates of Component 2 8.85495714285713653 9.94699999999999740 11.16538571428571380 0 2 7.76944285714285154 9.94699999999999740 11.16538571428571380 0 3 #+end_src
Nothing else. No whitespace or lines between the con entries.
**** Why?
We read the entire file to memory and then map it to a set of strings. The first
9 lines are parsed to figure out how many lines are needed for the rest of the
(first) frame, and then this logic is repeated en-masse until the lines run out.
Better memory management / streaming files / more errors and sanity checks are
all welcome as pull requests.
** Development
*** Developing locally
A pre-commit job is setup on CI to enforce consistent styles, so it is best to
set it up locally as well (using [[https://pypa.github.io/pipx][pipx]] for isolation):
#+begin_src sh
pipx run pre-commit run --all-files
pipx run pre-commit install #+end_src