A convenient Elixir library to work with Avro messages, schemas and Confluentยฎ Schema Registry
MIT License
Published by Strech about 4 years ago
It's bin a while we have a release. In this one, you will find many good changes.
Starting this version you can have an explicit split between schema reader and schema writer. This means that you don't need to have local files with schemas if you would like to rely only on the schemas already registered in the registry.
config :avrora,
# ...
registry_schemas_autoreg: false
If you disable auto registration and have schema registry configured, two major behavior changes will happen:
For the case when the schema registry is not configured โ behavior should remain the same ๐
Avrora.Codec
interfaceAn Encoder module gets some attention and was refactored. It was split into several submodules which implement the same behavior. It allows code to be reused and well tested separately. But everything comes at a cost, hopefully pros and cons should balance each other.
Happy coding everyone and any feedback is welcome ๐ค
Published by Strech over 4 years ago
In this release, 2 new features emerge one in the public API of the library and one in the CLI capabilities.
Extracts a schema from the encoded message, useful when you would like to have some metadata about the schema used to encode the message. All the retrieved schemas will be cached accordingly to the settings.
{:ok, pid} = Avrora.start_link()
message =
<<79, 98, 106, 1, 3, 204, 2, 20, 97, 118, 114, 111, 46, 99, 111, 100, 101, 99,
8, 110, 117, 108, 108, 22, 97, 118, 114, 111, 46, 115, 99, 104, 101, 109, 97,
144, 2, 123, 34, 110, 97, 109, 101, 115, 112, 97, 99, 101, 34, 58, 34, 105,
111, 46, 99, 111, 110, 102, 108, 117, 101, 110, 116, 34, 44, 34, 110, 97, 109,
101, 34, 58, 34, 80, 97, 121, 109, 101, 110, 116, 34, 44, 34, 116, 121, 112,
101, 34, 58, 34, 114, 101, 99, 111, 114, 100, 34, 44, 34, 102, 105, 101, 108,
100, 115, 34, 58, 91, 123, 34, 110, 97, 109, 101, 34, 58, 34, 105, 100, 34, 44,
34, 116, 121, 112, 101, 34, 58, 34, 115, 116, 114, 105, 110, 103, 34, 125, 44,
123, 34, 110, 97, 109, 101, 34, 58, 34, 97, 109, 111, 117, 110, 116, 34, 44,
34, 116, 121, 112, 101, 34, 58, 34, 100, 111, 117, 98, 108, 101, 34, 125, 93,
125, 0, 84, 229, 97, 195, 95, 74, 85, 204, 143, 132, 4, 241, 94, 197, 178, 106,
2, 26, 8, 116, 120, 45, 49, 123, 20, 174, 71, 225, 250, 47, 64, 84, 229, 97,
195, 95, 74, 85, 204, 143, 132, 4, 241, 94, 197, 178, 106>>
{:ok, schema} = Avrora.extract_schema(message)
{:ok,
%Avrora.Schema{
full_name: "io.confluent.Payment",
id: nil,
json: "{\"namespace\":\"io.confluent\",\"name\":\"Payment\",\"type\":\"record\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"amount\",\"type\":\"double\"}]}",
lookup_table: #Reference<0.146116641.3853647878.152744>,
version: nil
}}
Many thanks to @apellizzn for the help!
A separate mix task to register a specific schema or all found schemas in schemas folder.
For instance, if you configure Avrora schemas folder to be at ./priv/schemas
and you want to register a schema io/confluent/Payment.avsc
then you can use this command
$ mix avrora.reg.schema --name io.confluent.Payment
schema `io.confluent.Payment` will be registered
NOTE: It will search for schema ./priv/schemas/io/confluent/Payment.avsc
If you would like to register all schemas found under ./priv/schemas
then you can simply execute this command
$ mix avrora.reg.schema --all
schema `io.confluent.Payment` will be registered
schema `io.confluent.Wrong' will be skipped due to an error `argument error'
I hope you enjoy it โค๏ธ
Published by Strech over 4 years ago
Starting version 0.11.0
a new configuration option emerges and will accompaniment registry_url
option.
config :avrora,
# ...
registry_url: "http://...",
registy_auth: {:basic, ["username", "password"]}
# ...
Published by Strech over 4 years ago
This release is a fix for the schema evolution process.
Before v0.10.0
if you have Schema Registry enabled the very first schema version will be used forever, even if you update schema files and restart the service.
Starting v0.10.0
a few major changes happen.
The schema resolution flow gets changed. Now if you never resolve your schema (i.e it was not found in Avrora.Store.Memory
) we always will read the schema file no matter do you have a version in the name or not.
Then, if you do have a version in a name (for instance you do the decoding of the message) we will check the registry and find schema there.
If you don't have a version in a name (for instance you do the encoding of the message) we will try register schema in the registry. Luckily, Confluent Schema Registry allows you to register the same schema many times since they verify its hash sum (i.e this is an idempotent operation).
All the described above leads to the next change, which I consider breaking
The names cache TTL was changed from 5 minutes to infinity. Why? Simply because it will be always resolved to the latest available schema and in case if it's compatible we are good. If it's not โ you anyway will have to re-deploy your code (yes, hot-reload is still a question, if you have suggestions/problems โ feel free to create an issue).
And in case if you want some periodic disk-reads of your schemas โ set it to something lower than infinity. But nonetheless, it's a public interface change, so I call it breaking! Boooo ๐ป
Happy coding ๐จโ๐ป and don't forget to wash your hands โ
P.S Thanks @coryodaniel for the issue report and collaboration ๐ค
Published by Strech over 4 years ago
This is a very minor release, changes are internal only.
To avoid discrepancy between erlavro
and Avrora
libraries, ETS host was renamed and re-implemented to use :avro_schema_store.new/0
call instead :ets.new/2
.
Under the hood erlavro
is using :ets.new/2
, but now this responsibility is shifted from Avrora
to erlavro
.
Happy coding, everyone ๐
Published by Strech almost 5 years ago
It turns out that when Avrora is used inside Phoenix which controllers are short living processes, that all generated by Avrora.Schema
lookup tables (which is in fact erlang term stores) will be cleaned once the controller finishes request processing.
But all the resolved schemas are stored in Avrora.Storage.Memory
, which means that after the Phoenix controller dies all the references to ETS inside the Memory module are broken.
This release introduces an ETS host process that will own all the generated stores.
A test helper module was added. Since Avrora.Storage.Memory
can share state between tests it's always better to ether mock it or clean state. Here is an example:
defmodule MyTest do
use ExUnit.Case, async: true
import Avrora.TestCase
setup :cleanup_storage!
test "memory storage was filled" do
asset Avrora.Storage.Memory.get("some") == nil
Avrora.Storage.Memory.put("some", 42)
asset Avrora.Storage.Memory.get("some") == 42
end
test "memory storage is clean" do
asset Avrora.Storage.Memory.get("some") == nil
end
end
Avrora.Name
was renamed to Avrora.Schema.Name
Published by Strech almost 5 years ago
Thanks to @ananthakumaran who spot an issue with a new inter-schema references feature.
Issue become visible for the big schema files. A reference collection process contains a bug which leads to massive memory allocations during schema traversing. Now it has been fixed, yaw ๐
Happy 2020 ๐
Published by Strech almost 5 years ago
This is a feature release ๐
From the very beginning, this library was heavily inspired by avro_turf simplicity and features. Now it's time to say โ Avrora
moves one step closer to the feature set avro_turf
provides.
The must-have feature inter-schema references comes to Avrora
. Now you can split your huge schema into smaller pieces and glue them together via references.
Reference is a canonical full name of a schema. Accordingly to Avrora
name to location rules if you have schema under io/confluent/Message.avsc
its full name (namespace + name) will be io.confluent.Message
.
Technically Avro specification doesn't support inter-schema references, only local-schema references. Because of this limitation, inter-schema references implemented via embedding referenced schema into the schema which contains reference and replacing all other references within this schema with local-references.
For example, you have a Messenger
schema which contains references to
the Message
schema:
priv/schemas/io/confluent/Messenger.avsc
{
"type": "record",
"name": "Messenger",
"namespace": "io.confluent",
"fields": [
{
"name": "inbox",
"type": {
"type": "array",
"items": "io.confluent.Message"
}
},
{
"name": "archive",
"type": {
"type": "array",
"items": "io.confluent.Message"
}
}
]
}
priv/schemas/io/confluent/Message.avsc
{
"type": "record",
"name": "Message",
"namespace": "io.confluent",
"fields": [
{
"name": "text",
"type": "string"
}
]
}
Final compiled schema which will be stored and registered in the Confluent
Schema Registry, will looks like this:
{
"type": "record",
"name": "Messenger",
"namespace": "io.confluent",
"fields": [
{
"name": "inbox",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "Message",
"fields": [
{
"name": "text",
"type": "string"
}
]
}
}
},
{
"name": "archive",
"type": {
"type": "array",
"items": "Message"
}
}
]
}
๐ข In case of avro_turf
field archive
will keep its canonical items
type reference io.confluent.Message
instead of local reference Message
.
Published by Strech about 5 years ago
In this minor release, documentation was greatly improved by @reachfh. More clear descriptions, precise statements and consistency.
Thanks โค๏ธ
Published by Strech about 5 years ago
When you define a complex schema with an array of records (for instance) and you want to re-use a type you created in that array definition you can simply use a name of that record.
{
"type": "record",
"name": "Messenger",
"namespace": "io.confluent",
"fields": [
{
"name": "inbox",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "Message",
"fields": [
{
"name": "text",
"type": "string"
}
]
}
}
},
{
"name": "archive",
"type": {
"type": "array",
"items": "io.confluent.Message"
}
}
]
}
but in Avrora v0.6
it will throw an error because the erlavro schema was stored and used as-is. Meanwhile avro specification allows a record type to be defined once and then be re-used in schema resolution process.
Since version v0.7
resolution of the schema is done through the erlavro build-in mechanism of the schema storage which will take care of all the type names you define in the schema.
Thanks โค๏ธ
Published by Strech about 5 years ago
In this release, a new configuration option appears names_cache_ttl
. It controls for how long a schema name without version or id should be resolved from the memory storage.
It can be set either to a numeric value (measured in milliseconds) or an atom :infinity
.
config :avrora, names_cache_ttl: :timer.minutes(5)
In the example above after a 5 minutes schemas resolved by name will be removed and fetched one more time in case of the registry.
It is done not to only boost performance, but also to avoid situations that your service needs to be re-deployed or restarted to receive new schema version.
Published by Strech about 5 years ago
Documentation of the Confluent Schema Registry makes false statements in the examples section. A new attribute :id
was added to the result of Avrora.Storage.Registry.get/1
(when binary).
From now registry storage will always return filled fields id
and version
when getting schema by name.
Published by Strech about 5 years ago
Unfortunately, night programming is quite error-prone and as we know the devil is in the detail ๐ฟ
This release will fix a mistake with encoding and decoding of schemas for schema registry. I misread the documentation from Confluent and was using schema version with magic instead of schema id with magic.
Considering fix a caching behaviour was changed, mainly the caching keys, as:
name:version
โ for schemas with a known versiongid
โ for schemas with a known global idname
โ for schemas with no magic and no registry enabledDespite a mistake Avrora.decode/1
should work as expected even for previous releases.
Sorry ๐
Published by Strech about 5 years ago
From now on Avrora.encode/2
supports additional option :format
with can be set to:
:ocf
- embeds schema with Object Container Files format:registry
- embeds Confluent Schema Registry magic version:plain
- only encode message with nothing embeded:guess
- fallbacks to :ocf
if can't behave like :registry
(default)
with this option, you can format encoded message with either special version magic or with an entire schema.
๐ข IMPORTANT: Object Container Files implies a multiple messages format. This means once you encode your message it will be wrapped in a list.
This library aims convenience while working with Avro messages and initially encoding of the message without schema registry produces a plain bitstring with nothing embedded.
But what if you have a single place with various messages in it, how you gonna understand which message is encoded with which schema? As you can see a plain encoding could be a problem in certain cases.
To avoid that, an interface of Avrora.Encoder.encode/2
was slightly changed to return you always a decodable message, even if you don't know the schema name. And makes the plain format as a declarative option.
message = %{"id" => "tx-1", "amount" => 15.99}
{:ok, pid} = Avrora.start_link()
{:ok, encoded} = Avrora.encode(message, schema_name: "io.confluent.Payment")
<<8, 116, 120, 45, 49, 123, 20, 174, 71, 225, 250, 47, 64>>
Avrora.decode(encoded})
{:error, :undecodable}
message = %{"id" => "tx-1", "amount" => 15.99}
{:ok, pid} = Avrora.start_link()
{:ok, encoded} = Avrora.encode(message, schema_name: "io.confluent.Payment")
<<9, 98, 106, 1, 3, 204, 2, 20, 97, 118, 114, 111, 46, 99, 111, 100, 101, 99,
8, 110, 117, 108, 108, 22, 97, 118, 114, 111, 46, 115, 99, 104, 101, 109, 97,
144, 2, 123, 34, 110, 97, 109, 101, 115, 112, 97, 99, 101, 34, 58, 34, 105,
111, 46, 99, 111, 110, 102, 108, 117, 101, 110, 116, 34, 44, 34, 110, 97, 109,
101, 34, 58, 34, 80, 97, 121, 109, 101, 110, 116, 34, 44, 34, 116, 121, 112,
101, 34, 58, 34, 114, 101, 99, 111, 114, 100, 34, 44, 34, 102, 105, 101, 108,
100, 115, 34, 58, 91, 123, 34, 110, 97, 109, 101, 34, 58, 34, 105, 100, 34,
44, 34, 116, 121, 112, 101, 34, 58, 34, 115, 116, 114, 105, 110, 103, 34, 125,
44, 123, 34, 110, 97, 109, 101, 34, 58, 34, 97, 109, 111, 117, 110, 116, 34,
44, 34, 116, 121, 112, 101, 34, 58, 34, 100, 111, 117, 98, 108, 101, 34, 125,
93, 125, 0, 138, 124, 66, 49, 157, 51, 242, 3, 33, 52, 161, 147, 221, 174,
114, 48, 2, 26, 8, 116, 120, 45, 49, 123, 20, 174, 71, 225, 250, 47, 64, 138,
124, 66, 49, 157, 51, 242, 3, 33, 52, 161, 147, 221, 174, 114, 48>>
Avrora.decode(encoded})
{:ok, [%{"id" => "tx-1", "amount" => 15.99}]}
Published by Strech about 5 years ago
When erlavro throws an error, now it's gracefully handled.
MISC: Also, a library dev configuration schemas_path
was set to test/fixtures/schemas
Published by Strech about 5 years ago
A callback for automatic application start was implemented
UPDATE: It was removed in v0.3.0 โ ๏ธ
Published by Strech about 5 years ago
Useful debug output was added to some operations, for instance like that:
20:44:30.005 [debug] reading schema `io.confluent.Payment` from the file /Volumes/SSDCard/Development/github/avrora/test/fixtures/schemas/io/confluent/Payment.avsc
Also, Avrora.Config.schemas_path
default value was fixed
Published by Strech about 5 years ago
Avro messages can be self-sufficient by encoding them together with a schema. To decode them you don't need to know anything about the message.
Now it will be possible to decode them ether via new agnostic Avrora.decode/1
or old Avrora.decode/2
, check it out:
message = <<79, 98, 106, 1, 3, 204, 2, 20, 97, 118, 114, 111, 46, 99, 111, 100, 101, 99, 8, 110, 117,108, 108, 22, 97, 118, 114, 111, 46, 115, 99, 104, 101, 109, 97, 144, 2, 123, 34, 110, 97,109, 101, 115, 112, 97, 99, 101, 34, 58, 34, 105, 111, 46, 99, 111, 110, 102, 108, 117, 101,110, 116, 34, 44, 34, 110, 97, 109, 101, 34, 58, 34, 80, 97, 121, 109, 101, 110, 116, 34,44, 34, 116, 121, 112, 101, 34, 58, 34, 114, 101, 99, 111, 114, 100, 34, 44, 34, 102, 105,101, 108, 100, 115, 34, 58, 91, 123, 34, 110, 97, 109, 101, 34, 58, 34, 105, 100, 34, 44,34, 116, 121, 112, 101, 34, 58, 34, 115, 116, 114, 105, 110, 103, 34, 125, 44, 123, 34, 110,97, 109, 101, 34, 58, 34, 97, 109, 111, 117, 110, 116, 34, 44, 34, 116, 121, 112, 101, 34,58, 34, 100, 111, 117, 98, 108, 101, 34, 125, 93, 125, 0, 50, 8, 86, 136, 188, 182, 153, 91,143, 129, 0, 45, 200, 112, 4, 192, 2, 90, 72, 48, 48, 48, 48, 48, 48, 48, 48, 45, 48, 48,48, 48, 45, 48, 48, 48, 48, 45, 48, 48, 48, 48, 45, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48,48, 48, 123, 20, 174, 71, 225, 250, 47, 64, 50, 8, 86, 136, 188, 182, 153, 91, 143, 129, 0,45, 200, 112, 4, 192>>
Avrora.decode(message)
{:ok, [%{"amount" => 15.99, "id" => "00000000-0000-0000-0000-000000000000"}]}
IMPORTANT: Please keep in mind that messages encoded with OCF are always lists.
Also new Avrora.decode/1
supports schema registry, but requires registry_url
configuration to be set to non-nil
value.
config :avrora,
registry_url: "http://localhost:8081", # default to `nil`
I'm sorry for misunderstanding the concept of Application.start/2
. Auto-start of Avrora
application was removed, please use manual or supervision tree instead ๐
Published by Strech over 5 years ago
One of the main difference from the first version โ no more AvroEx
dependency. Instead erlavro
is used. This change allows having fewer dependencies and better performance.
Also an internal structure of Avrora.Schema
was changed and now it consumes less memory.
Published by Strech over 5 years ago
The first release of a library for encoding/decoding AVRO-schemas. It can use both locally stored schemas and schemas uploaded to a confluent schema registry