zoa

serialized structured data and it's textual representation

UNLICENSE License

Stars
4

[!]##########################################################################[/] [h1]zoa: serialized data made easy[/]

["]Most content has moved to https://github.com/civboot/civlua/tree/main/zoa["]

["]This document is written using @cxt in README.cxt[/]

zoa is a set of standards related to serialized structured data and it's textual representation and processing. It is part of the @Civboot project, specifically @fngi.

["] zoa is named after "protozoa", which is itself a nod to @protobuf. Yes, I'm aware protozoa are not fungi, but the name was too good.[/]

zoa encompases the following technologies:[+]

  • [b]zty[b] .zty: specification for structured types similar to @protobuf
  • [b]zoab[b] .zoa: a binary structured data composed of only data and array.
  • [b]zoat[b] .zoa: an extensible text format for representing zoab in a
    human readable way. Zoat intentionally has little functionality on it's
    own but allows extensibility and a standardized syntax for the different
    tools built on top of it.
  • [b]zoac[b] .zc: a configuration lanugage built on zoat and and another
    langauge (i.e. @fngi), allowing execution of arbitrary (but
    stateless) user-defined functions.
  • [b]zoash[b] .zh: a text-based shell built on zoac allowing creating of
    processes, mutation of files or anything else a computer user might want
    to do. zoac functions are still callable to process data, but other
    functions are available as well perform side effects. File extension:
    .zh
    [/]

["]Yes, zoab and zoat have the same file extension. This is intentional and you will see how below![/]

Both zoab and zoat are extremely simple.

zoab is about the simplest possible general purpose structured data that can exist. Having only two types allows it to be compact and fast. The use of single byte signals alows it to be trivially deserialized by even embedded system.

zoat is simpler than yaml but can achieve the same functionaly but with far less ambiguitity. The only thing it is missing is integer/float/literal-map/[ ]none types, which are not actually necessary for serializing or deserializing arbitrary structured data. Zoat also has massively increased functionality over yaml with features like variable storage and concatenation.

[t set=protobuf r=https://developers.google.com/protocol-buffers]protobuf[/] [t set=cxt r=https://github.com/civboot/cxt]cxt[/] [t set=Civboot r=https://civboot.org]Civboot[/] [t set=fngi r=http://github.com/civboot/fngi]fngi[/]

[!]##########################[/] [h2]zoat[/]

zoat (like zoab) has only two data types: data and array. Data start at the first non-whitespace character (except for | or {, which we will get to) and continue until a pipe. For example:

[###] This is a zoat string. It ends with a pipe| [###]

An array is specified using curly braces and can be nested, like this: [###] { first string in array| second string in array| { nested array element 1| nested array element 2| } } [###]

Leading whitespace is always ignored until the first character, after which newlines are treated as spaces and special characters use c-like escapes. Here are the valid escapes in text: [###] \n newline \t tab | a literal '|' character { a literal '{' character } a literal '}' character \s \ a literal space character \xHH hex byte of value 0xHH <newline> line continuation. When \ is used at the end of a line no space will be insert when going to the next line.

  An example,
  there is one space after the comma above. \
  \  However there are three spaces after the period since
  an escape was used both before and after the newline.|

[###]

The following are how to use commands. Commands must be specified immediately following either a pipe or curly brace:

[###] |X Where X is a non-whitespace character. Runs a command, some of which are defined below. Note that multiple '|' together are ignored, i.e. '|| | | ||' is the same as '|'.

{ Starts an array. Characters following are always treated as text (never a command), but whitespace is still ignored. Example: {array item 1| array item 2}

} Ends an array. Characters following are always treated as text (never a command), but whitespace is still ignored. Example above.

          Note: {{}} will create an empty array inside a array. However,
          adding '|' does nothing. i.e. '|{| {||} |}|' is exactly the same.
          This allows for trailing pipes. If you've ever used JSON you know
          the pain of not being able to use trailing commas, this avoids
          that pain.

| A set of data until the next '|', '{' or '}'. There must be whitespace (newline, tab or space) after | otherwise the character will be treated as a command. Whitespace will be ignored until the first non-whitespace character.

|\ Starts text immediately using the escape. Example: |\nThis text started on a new line.|

|* Line comment. Text is ignored until a newline.

{* Block comment {nesting allowed}, it must be closed with: } After a closing block comment a new command can start immediately. Example: { comment *} |h1 header

|''' Plain-text/code block. Any number of ' can be used, and must be escaped with the same number of ', which ends the item (but will not execute a command. As a general rule only '|' executes commands). The raw text will ignore escapes (i.e. \n) and preserve newlines, except for the first newline if it immediately follows the opening ' block or the last newline if it immediately preceeds the ' block. Also, the text will be de-indented by the indent at the start of the command. Example: |''' This is some raw text. It will have no indent. I can use { | } \n etc and they are interpreted literally. I can even end with a space. ''' This is a new item, as if the previous had ended with pipe|

|+ Concatenate (join) this item to the last item of the same type. Note that text starts immediately. Example: all one |+single item. ==> all one single item.| {1| 2} |+ {3| 4} ==> {1| 2| 3| 4}

|. Begins the variable compiler, similar to fngi's. When assigning variables, values are not added until referenced. Examples:

            |.foo = fngi |              |* set foo equal to "fngi "
            { I think |.foo|is great } |* { I think |fngi |is great }

          An array can be accessed by index using @<index>

            |.myArray @0 @3|            |* C equivalent: myArray[0][3]

          If the data is an array of two-length arrays with strings as the
          first item, you can access by key with '.'. Whitespace or special
          characters in the keys are not supported.

            |.myDict.foo.bar|

|.+ Joins the variable to the last item. Example: { I think |.+foo|+is great } |* { I think fngi is great }

|[0-9] An integer (big-endian). 0x (hex), 0b (binary) and 0c (char) are supported. (see Type Specification Dyn)

|-[0-9] A negative integer (big-endian, see Type Specification Dyn) [###]

[!]##########################[/] [h2]zoab: structured array and byte data[/]

The binary representation of zoa is called zoab. Like zoat it is composed of only two types: data and array. Both can only have sections of length 63, but a join bit may be used to make longer values.

The start of a zoab item has a single byte which specifies it's type, join status and length. it looks like:

[###] Bitmap Description JTLL LLLL : J=join bit T=type bit L=length bits (0-63) [###]

[+]

  • J can be 1, which will cause it to be "joined" to the next item (which
    must be the same type). This allows the true length to be longer than 63
    (any length is possible with multiple joins).
  • T can be 0 for "data" or 1 for "array".
  • L contains a length (0-63) of the data or array.
    [/]

[!]##########################[/] [h2]Binary Runtime Type Selection[/]

There is one byte that is invalid for both zoab and utf8 (and ascii): J=1 T=0 L=0, which in binary is 1000 0000 or in hex is 0x80.

Instead of making this byte illegal, we will require it in the zoa(...) function, which is intended to deserialize either zoab or zoat from user inputs, data stored in file systems or data transferred over the wire. If 0x80 is the first character then it is assumed this is zoab data. If 0x80 is [i]not[i] the first byte then it should be assumed that the file/etc is human written zoat.

["] 0x80 will still be illegal for pure-zoab related functions, which expect the data to already be deserialzed.[/]

0x80 will signal that the next byte "steam type". The supported ones are:[+]

  1. pure unstructured data with no special meaning.
  2. mostly human readable data. i.e. struct fields are names, integers are in
    text format, etc. This is typically what should be passed to dbg logs, human
    readable compressed configs, etc.
  3. protozoa data, which is similar to protobuf. Fields are represented by
    user-assigned integers, data is in binary format, etc. The deserializer
    obviously needs access to a backwards-compatible reference (i.e. a struct
    definition) to know how to deserialize this data.
  4. log event, used for zoa's own core logging.
    [/]

We also reserve J=1 T=1 L=0 to mean that the next data value is actually a pointer to more data. This allows fngi (or other languages) to use a maximum block size of 4kB while still permitting arbitrary length zoab data.

[h2]Type Specification[/] ["][b]WARNING:[b] This section is currently being worked on so is not yet polished.[/]

Zoa also has a default type protocol and text specification for generating serializers and deserializers for zoab data. These are typically contained in .zt files.

Zoa types are specified in a language similar to protobufs, with a few key differences. For an overview:

Native types include:[+]

  • Data, Int, Num, Time (integers: seconds, nanoseconds), Path
    (array of components)
  • Arr(generic)
  • IntMap(Int key, generic value), DataMap(Data key, generic value)
  • Dyn: dynamic (runtime) type
    [/]

Users can create their own struct types with the format below. Struct fields can be positional or have an id. Default values can be provided, otherwise all arguments are required.

[###] \ Comments use '' character struct Foo [ a: Int; \ positional required argument. b: Int = 0; \ positional argument with default. c: Data = ||; \ data argument with default. d:0 Int = 0; \ indexed argument with default. d:1 Arr[Int] = 0; ] [###]

The struct protocol is:[+]

  • The first items contains the number of positional args P.
  • The next P items contains the first P specified positional arguments.
    Any unspecified positional args must have a default value.
  • Indexed args can be specified in any order.
  • Any unspecified args must have a default.
    [/]

Similarily, enumerated values can be defined. Enums must always specify their id. All ids from [[0 - max specified]] must be declared (in any order).

[#] struct Cheese [ isCheddar: Int ] struct Cake [ isChocolate: Int ] struct Pizza [ cheese: Cheese = { isCheddar|1| }; hasPeperoni: Int = 0; ]

enum Food [ none:0; \ note: empty type, no data someNuts:3 Int; \ note: declared out of order (OK) pizza:1 Pizza; cake:2 Cake; cheeses:3 Arr[Cheese]; ] [#]

The enum protocol is:[+]

  • The first item contains the enum variant, i.e. noaccount, user, admin
  • If the variant is not an Arr or *Map, the next item is the value.
  • else, the next N values are the data for the array.
    [/]

[!]##########################[/] [h2]Contributing[/]

To build the README.md and run the tests, simply run make.

When opening a PR to submit code to this repository you must include the following disclaimer in your first commit message:

[###] I assent to license this and all future contributions to this project under the dual licenses of the UNLICENSE or MIT license listed in the UNLICENSE and README.md files of this repository. [###]

[!]##########################[/] [h2]LICENSING[/]

This work is part of the @Civboot project and therefore primarily exists for educational purposes. Attribution to the authors and project is appreciated but not necessary.

Therefore this body of work is licensed using the UNLICENSE unless otherwise specified at the beginning of the source file.

If for any reason the UNLICENSE is not valid in your jurisdiction or project, this work can be singly or dual licensed at your discression with the MIT license below.

[###] Copyright 2022 Garrett Berg

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. [###]