avrora

A convenient Elixir library to work with Avro messages, schemas and Confluentยฎ Schema Registry

MIT License

Downloads
558.5K
Stars
97
Committers
20
avrora - Configurable schemas auto-registration

Published by Strech about 4 years ago

It's bin a while we have a release. In this one, you will find many good changes.

New configuration option

Starting this version you can have an explicit split between schema reader and schema writer. This means that you don't need to have local files with schemas if you would like to rely only on the schemas already registered in the registry.

config :avrora,
  # ...
  registry_schemas_autoreg: false

If you disable auto registration and have schema registry configured, two major behavior changes will happen:

  1. Local files will be completely ignored for schema resolution
  2. For encoding and decoding, the schema will be retrieved from the registry (see n.1)

For the case when the schema registry is not configured โ€“ behavior should remain the same ๐Ÿ˜‰

New Avrora.Codec interface

An Encoder module gets some attention and was refactored. It was split into several submodules which implement the same behavior. It allows code to be reused and well tested separately. But everything comes at a cost, hopefully pros and cons should balance each other.

Happy coding everyone and any feedback is welcome ๐Ÿค—

avrora - Extract and Register schemas like a boss

Published by Strech over 4 years ago

In this release, 2 new features emerge one in the public API of the library and one in the CLI capabilities.

Avrora.extract_schema/1

Extracts a schema from the encoded message, useful when you would like to have some metadata about the schema used to encode the message. All the retrieved schemas will be cached accordingly to the settings.

{:ok, pid} = Avrora.start_link()
message =
  <<79, 98, 106, 1, 3, 204, 2, 20, 97, 118, 114, 111, 46, 99, 111, 100, 101, 99,
    8, 110, 117, 108, 108, 22, 97, 118, 114, 111, 46, 115, 99, 104, 101, 109, 97,
    144, 2, 123, 34, 110, 97, 109, 101, 115, 112, 97, 99, 101, 34, 58, 34, 105,
    111, 46, 99, 111, 110, 102, 108, 117, 101, 110, 116, 34, 44, 34, 110, 97, 109,
    101, 34, 58, 34, 80, 97, 121, 109, 101, 110, 116, 34, 44, 34, 116, 121, 112,
    101, 34, 58, 34, 114, 101, 99, 111, 114, 100, 34, 44, 34, 102, 105, 101, 108,
    100, 115, 34, 58, 91, 123, 34, 110, 97, 109, 101, 34, 58, 34, 105, 100, 34, 44,
    34, 116, 121, 112, 101, 34, 58, 34, 115, 116, 114, 105, 110, 103, 34, 125, 44,
    123, 34, 110, 97, 109, 101, 34, 58, 34, 97, 109, 111, 117, 110, 116, 34, 44,
    34, 116, 121, 112, 101, 34, 58, 34, 100, 111, 117, 98, 108, 101, 34, 125, 93,
    125, 0, 84, 229, 97, 195, 95, 74, 85, 204, 143, 132, 4, 241, 94, 197, 178, 106,
    2, 26, 8, 116, 120, 45, 49, 123, 20, 174, 71, 225, 250, 47, 64, 84, 229, 97,
    195, 95, 74, 85, 204, 143, 132, 4, 241, 94, 197, 178, 106>>

{:ok, schema} = Avrora.extract_schema(message)
{:ok,
 %Avrora.Schema{
   full_name: "io.confluent.Payment",
   id: nil,
   json: "{\"namespace\":\"io.confluent\",\"name\":\"Payment\",\"type\":\"record\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"amount\",\"type\":\"double\"}]}",
   lookup_table: #Reference<0.146116641.3853647878.152744>,
   version: nil
 }}

Many thanks to @apellizzn for the help!

mix avrora.reg.schema

A separate mix task to register a specific schema or all found schemas in schemas folder.

For instance, if you configure Avrora schemas folder to be at ./priv/schemas and you want to register a schema io/confluent/Payment.avsc then you can use this command

$ mix avrora.reg.schema --name io.confluent.Payment
schema `io.confluent.Payment` will be registered

NOTE: It will search for schema ./priv/schemas/io/confluent/Payment.avsc

If you would like to register all schemas found under ./priv/schemas then you can simply execute this command

$ mix avrora.reg.schema --all
schema `io.confluent.Payment` will be registered
schema `io.confluent.Wrong' will be skipped due to an error `argument error'

I hope you enjoy it โค๏ธ

avrora - Basic auth for Confluent Schema Registry

Published by Strech over 4 years ago

Starting version 0.11.0 a new configuration option emerges and will accompaniment registry_url option.

config :avrora,
  # ...
  registry_url: "http://...",
  registy_auth: {:basic, ["username", "password"]}
  # ...
avrora - Schema evolution (from ๐Ÿธ into ๐Ÿ‘จโ€๐Ÿš€)

Published by Strech over 4 years ago

This release is a fix for the schema evolution process.

Before v0.10.0 if you have Schema Registry enabled the very first schema version will be used forever, even if you update schema files and restart the service.

Starting v0.10.0 a few major changes happen.

The flow

The schema resolution flow gets changed. Now if you never resolve your schema (i.e it was not found in Avrora.Store.Memory) we always will read the schema file no matter do you have a version in the name or not.

Then, if you do have a version in a name (for instance you do the decoding of the message) we will check the registry and find schema there.

If you don't have a version in a name (for instance you do the encoding of the message) we will try register schema in the registry. Luckily, Confluent Schema Registry allows you to register the same schema many times since they verify its hash sum (i.e this is an idempotent operation).

All the described above leads to the next change, which I consider breaking

The generic name resolution

The names cache TTL was changed from 5 minutes to infinity. Why? Simply because it will be always resolved to the latest available schema and in case if it's compatible we are good. If it's not โ€“ you anyway will have to re-deploy your code (yes, hot-reload is still a question, if you have suggestions/problems โ€“ feel free to create an issue).

And in case if you want some periodic disk-reads of your schemas โ€“ set it to something lower than infinity. But nonetheless, it's a public interface change, so I call it breaking! Boooo ๐Ÿ‘ป

Happy coding ๐Ÿ‘จโ€๐Ÿ’ป and don't forget to wash your hands โœ‹
P.S Thanks @coryodaniel for the issue report and collaboration ๐Ÿค—

avrora - Better erlavro compliance

Published by Strech over 4 years ago

This is a very minor release, changes are internal only.

To avoid discrepancy between erlavro and Avrora libraries, ETS host was renamed and re-implemented to use :avro_schema_store.new/0 call instead :ets.new/2.

Under the hood erlavro is using :ets.new/2, but now this responsibility is shifted from Avrora to erlavro.

Happy coding, everyone ๐Ÿ‘

avrora - Fixed broken ETS references for short-living processes

Published by Strech almost 5 years ago

It turns out that when Avrora is used inside Phoenix which controllers are short living processes, that all generated by Avrora.Schema lookup tables (which is in fact erlang term stores) will be cleaned once the controller finishes request processing.

But all the resolved schemas are stored in Avrora.Storage.Memory, which means that after the Phoenix controller dies all the references to ETS inside the Memory module are broken.

This release introduces an ETS host process that will own all the generated stores.

New functionality

A test helper module was added. Since Avrora.Storage.Memory can share state between tests it's always better to ether mock it or clean state. Here is an example:

defmodule MyTest do
  use ExUnit.Case, async: true

  import Avrora.TestCase
  setup :cleanup_storage!

  test "memory storage was filled" do
    asset Avrora.Storage.Memory.get("some") == nil

    Avrora.Storage.Memory.put("some", 42)
    asset Avrora.Storage.Memory.get("some") == 42
  end

  test "memory storage is clean" do
    asset Avrora.Storage.Memory.get("some") == nil
  end
end

Minor changes

  • Some documentation improvements
  • Avrora.Name was renamed to Avrora.Schema.Name
avrora - Performance and memory improvements

Published by Strech almost 5 years ago

Thanks to @ananthakumaran who spot an issue with a new inter-schema references feature.

Issue become visible for the big schema files. A reference collection process contains a bug which leads to massive memory allocations during schema traversing. Now it has been fixed, yaw ๐Ÿ™Œ

Happy 2020 ๐ŸŽ‰

avrora - Inter-schema references

Published by Strech almost 5 years ago

This is a feature release ๐ŸŽŠ

From the very beginning, this library was heavily inspired by avro_turf simplicity and features. Now it's time to say โ€“ Avrora moves one step closer to the feature set avro_turf provides.

The must-have feature inter-schema references comes to Avrora. Now you can split your huge schema into smaller pieces and glue them together via references.

What is a reference?

Reference is a canonical full name of a schema. Accordingly to Avrora name to location rules if you have schema under io/confluent/Message.avsc its full name (namespace + name) will be io.confluent.Message.

How do references work?

Technically Avro specification doesn't support inter-schema references, only local-schema references. Because of this limitation, inter-schema references implemented via embedding referenced schema into the schema which contains reference and replacing all other references within this schema with local-references.

How to use references?

For example, you have a Messenger schema which contains references to
the Message schema:

priv/schemas/io/confluent/Messenger.avsc

{
  "type": "record",
  "name": "Messenger",
  "namespace": "io.confluent",
  "fields": [
    {
      "name": "inbox",
      "type": {
        "type": "array",
        "items": "io.confluent.Message"
      }
    },
    {
      "name": "archive",
      "type": {
        "type": "array",
        "items": "io.confluent.Message"
      }
    }
  ]
}

priv/schemas/io/confluent/Message.avsc

{
  "type": "record",
  "name": "Message",
  "namespace": "io.confluent",
  "fields": [
    {
      "name": "text",
      "type": "string"
    }
  ]
}

Final compiled schema which will be stored and registered in the Confluent
Schema Registry, will looks like this:

{
  "type": "record",
  "name": "Messenger",
  "namespace": "io.confluent",
  "fields": [
    {
      "name": "inbox",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "Message",
          "fields": [
            {
              "name": "text",
              "type": "string"
            }
          ]
        }
      }
    },
    {
      "name": "archive",
      "type": {
        "type": "array",
        "items": "Message"
      }
    }
  ]
}

๐Ÿ’ข In case of avro_turf field archive will keep its canonical items
type reference io.confluent.Message instead of local reference Message.

avrora - Documentation improvements

Published by Strech about 5 years ago

In this minor release, documentation was greatly improved by @reachfh. More clear descriptions, precise statements and consistency.

Thanks โค๏ธ

avrora - Fixed sub-type name resolution

Published by Strech about 5 years ago

When you define a complex schema with an array of records (for instance) and you want to re-use a type you created in that array definition you can simply use a name of that record.

{
  "type": "record",
  "name": "Messenger",
  "namespace": "io.confluent",
  "fields": [
    {
      "name": "inbox",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "Message",
          "fields": [
            {
              "name": "text",
              "type": "string"
            }
          ]
        }
      }
    },
    {
      "name": "archive",
      "type": {
        "type": "array",
        "items": "io.confluent.Message"
      }
    }
  ]
}

but in Avrora v0.6 it will throw an error because the erlavro schema was stored and used as-is. Meanwhile avro specification allows a record type to be defined once and then be re-used in schema resolution process.

Since version v0.7 resolution of the schema is done through the erlavro build-in mechanism of the schema storage which will take care of all the type names you define in the schema.

Thanks โค๏ธ

avrora - Store better, encode faster ๐Ÿš€

Published by Strech about 5 years ago

In this release, a new configuration option appears names_cache_ttl. It controls for how long a schema name without version or id should be resolved from the memory storage.

It can be set either to a numeric value (measured in milliseconds) or an atom :infinity.

config :avrora, names_cache_ttl: :timer.minutes(5)

In the example above after a 5 minutes schemas resolved by name will be removed and fetched one more time in case of the registry.

It is done not to only boost performance, but also to avoid situations that your service needs to be re-deployed or restarted to receive new schema version.

avrora - Fix schema resolution by name via schema registry

Published by Strech about 5 years ago

Documentation of the Confluent Schema Registry makes false statements in the examples section. A new attribute :id was added to the result of Avrora.Storage.Registry.get/1 (when binary).

From now registry storage will always return filled fields id and version when getting schema by name.

avrora - Night programming brings night fixes

Published by Strech about 5 years ago

Unfortunately, night programming is quite error-prone and as we know the devil is in the detail ๐Ÿ‘ฟ

This release will fix a mistake with encoding and decoding of schemas for schema registry. I misread the documentation from Confluent and was using schema version with magic instead of schema id with magic.

Considering fix a caching behaviour was changed, mainly the caching keys, as:

  • name:version โ€“ for schemas with a known version
  • gid โ€“ for schemas with a known global id
  • name โ€“ for schemas with no magic and no registry enabled

Despite a mistake Avrora.decode/1 should work as expected even for previous releases.

Sorry ๐Ÿ™‡

avrora - Encoding with full controll of the format

Published by Strech about 5 years ago

From now on Avrora.encode/2 supports additional option :format with can be set to:

  • :ocf - embeds schema with Object Container Files format
  • :registry - embeds Confluent Schema Registry magic version
  • :plain - only encode message with nothing embeded
  • :guess - fallbacks to :ocf if can't behave like :registry (default)

with this option, you can format encoded message with either special version magic or with an entire schema.

๐Ÿ’ข IMPORTANT: Object Container Files implies a multiple messages format. This means once you encode your message it will be wrapped in a list.

Breaking changes

This library aims convenience while working with Avro messages and initially encoding of the message without schema registry produces a plain bitstring with nothing embedded.

But what if you have a single place with various messages in it, how you gonna understand which message is encoded with which schema? As you can see a plain encoding could be a problem in certain cases.

To avoid that, an interface of Avrora.Encoder.encode/2 was slightly changed to return you always a decodable message, even if you don't know the schema name. And makes the plain format as a declarative option.

before

message = %{"id" => "tx-1", "amount" => 15.99}

{:ok, pid} = Avrora.start_link()
{:ok, encoded} = Avrora.encode(message, schema_name: "io.confluent.Payment")
<<8, 116, 120, 45, 49, 123, 20, 174, 71, 225, 250, 47, 64>>

Avrora.decode(encoded})
{:error, :undecodable}

after

message = %{"id" => "tx-1", "amount" => 15.99}

{:ok, pid} = Avrora.start_link()
{:ok, encoded} = Avrora.encode(message, schema_name: "io.confluent.Payment")
<<9, 98, 106, 1, 3, 204, 2, 20, 97, 118, 114, 111, 46, 99, 111, 100, 101, 99,
  8, 110, 117, 108, 108, 22, 97, 118, 114, 111, 46, 115, 99, 104, 101, 109, 97,
  144, 2, 123, 34, 110, 97, 109, 101, 115, 112, 97, 99, 101, 34, 58, 34, 105,
  111, 46, 99, 111, 110, 102, 108, 117, 101, 110, 116, 34, 44, 34, 110, 97, 109,
  101, 34, 58, 34, 80, 97, 121, 109, 101, 110, 116, 34, 44, 34, 116, 121, 112,
  101, 34, 58, 34, 114, 101, 99, 111, 114, 100, 34, 44, 34, 102, 105, 101, 108,
  100, 115, 34, 58, 91, 123, 34, 110, 97, 109, 101, 34, 58, 34, 105, 100, 34,
  44, 34, 116, 121, 112, 101, 34, 58, 34, 115, 116, 114, 105, 110, 103, 34, 125,
  44, 123, 34, 110, 97, 109, 101, 34, 58, 34, 97, 109, 111, 117, 110, 116, 34,
  44, 34, 116, 121, 112, 101, 34, 58, 34, 100, 111, 117, 98, 108, 101, 34, 125,
  93, 125, 0, 138, 124, 66, 49, 157, 51, 242, 3, 33, 52, 161, 147, 221, 174,
  114, 48, 2, 26, 8, 116, 120, 45, 49, 123, 20, 174, 71, 225, 250, 47, 64, 138, 
  124, 66, 49, 157, 51, 242, 3, 33, 52, 161, 147, 221, 174, 114, 48>>

Avrora.decode(encoded})
{:ok, [%{"id" => "tx-1", "amount" => 15.99}]}
avrora - Fixed Avro.Encoder exception handling

Published by Strech about 5 years ago

When erlavro throws an error, now it's gracefully handled.

MISC: Also, a library dev configuration schemas_path was set to test/fixtures/schemas

avrora - Application.start/2 callback

Published by Strech about 5 years ago

A callback for automatic application start was implemented

UPDATE: It was removed in v0.3.0 โš ๏ธ

avrora - Add debug output

Published by Strech about 5 years ago

Useful debug output was added to some operations, for instance like that:

20:44:30.005 [debug] reading schema `io.confluent.Payment` from the file /Volumes/SSDCard/Development/github/avrora/test/fixtures/schemas/io/confluent/Payment.avsc

Also, Avrora.Config.schemas_path default value was fixed

avrora - Object Container Files and agnostic decoding

Published by Strech about 5 years ago

Avro messages can be self-sufficient by encoding them together with a schema. To decode them you don't need to know anything about the message.

Now it will be possible to decode them ether via new agnostic Avrora.decode/1 or old Avrora.decode/2, check it out:

message = <<79, 98, 106, 1, 3, 204, 2, 20, 97, 118, 114, 111, 46, 99, 111, 100, 101, 99, 8, 110, 117,108, 108, 22, 97, 118, 114, 111, 46, 115, 99, 104, 101, 109, 97, 144, 2, 123, 34, 110, 97,109, 101, 115, 112, 97, 99, 101, 34, 58, 34, 105, 111, 46, 99, 111, 110, 102, 108, 117, 101,110, 116, 34, 44, 34, 110, 97, 109, 101, 34, 58, 34, 80, 97, 121, 109, 101, 110, 116, 34,44, 34, 116, 121, 112, 101, 34, 58, 34, 114, 101, 99, 111, 114, 100, 34, 44, 34, 102, 105,101, 108, 100, 115, 34, 58, 91, 123, 34, 110, 97, 109, 101, 34, 58, 34, 105, 100, 34, 44,34, 116, 121, 112, 101, 34, 58, 34, 115, 116, 114, 105, 110, 103, 34, 125, 44, 123, 34, 110,97, 109, 101, 34, 58, 34, 97, 109, 111, 117, 110, 116, 34, 44, 34, 116, 121, 112, 101, 34,58, 34, 100, 111, 117, 98, 108, 101, 34, 125, 93, 125, 0, 50, 8, 86, 136, 188, 182, 153, 91,143, 129, 0, 45, 200, 112, 4, 192, 2, 90, 72, 48, 48, 48, 48, 48, 48, 48, 48, 45, 48, 48,48, 48, 45, 48, 48, 48, 48, 45, 48, 48, 48, 48, 45, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48,48, 48, 123, 20, 174, 71, 225, 250, 47, 64, 50, 8, 86, 136, 188, 182, 153, 91, 143, 129, 0,45, 200, 112, 4, 192>>

Avrora.decode(message)
{:ok, [%{"amount" => 15.99, "id" => "00000000-0000-0000-0000-000000000000"}]}

IMPORTANT: Please keep in mind that messages encoded with OCF are always lists.

Also new Avrora.decode/1 supports schema registry, but requires registry_url configuration to be set to non-nil value.

config :avrora,
  registry_url: "http://localhost:8081",      # default to `nil`

Breaking changes

I'm sorry for misunderstanding the concept of Application.start/2. Auto-start of Avrora application was removed, please use manual or supervision tree instead ๐Ÿ’”

avrora - Out of beta, yaw ๐ŸŽ‰

Published by Strech over 5 years ago

One of the main difference from the first version โ€“ no more AvroEx dependency. Instead erlavro is used. This change allows having fewer dependencies and better performance.

Also an internal structure of Avrora.Schema was changed and now it consumes less memory.

avrora - Welcome Avrora!

Published by Strech over 5 years ago

The first release of a library for encoding/decoding AVRO-schemas. It can use both locally stored schemas and schemas uploaded to a confluent schema registry