Scala macros for compile-time generation of safe and ultra-fast JSON codecs + circe booster
MIT License
Scala macros for compile-time generation of safe and ultra-fast JSON codecs.
Latest results of benchmarks on JVMs that compare parsing and serialization performance of jsoniter-scala with: borer, circe, circe with jsoniter-scala booster, jackson-module-scala, json4s-jackson, json4s-native, play-json, play-json with jsoniter-scala booster, smithy4s-json, spray-json, uPickle, weePickle, zio-json libraries using different JDK and GraalVM versions on the following environment: Intel® Core™ i9-13900K CPU @ 3.0GHz (max 5.8GHz, performance-cores only), RAM 64Gb DDR5-4800, Ubuntu 24.04 (Linux 6.8), and latest versions of JDK 17/21/24-ea, GraalVM Community JDK 17/21/24-ea, and GraalVM JDK 17/21/24-ea*.
Latest results of benchmarks on browsers that compare performance of jsoniter-scala with: circe, circe with jsoniter-scala booster, play-json, play-json with jsoniter-scala booster, smithy4s-json, uPickle, zio-json compiled by Scala.js 1.17.0 to ES 2015 with GCC v20220202 optimizations applied on Intel® Core™ i7-11800H CPU @ 2.3GHz (max 4.6GHz), RAM 64Gb DDR4-3200, Ubuntu 23.10 (Linux 6.6).
This library had started from macros that reused jsoniter (json-iterator) for Java reader and writer but then the library evolved to have its own core of mechanics for parsing and serialization.
The idea to generate codecs by Scala macros and main details were borrowed from Kryo Macros (originally developed by Alexander Nemish) and adapted for the needs of the JSON domain.
Other Scala macros features were peeped in AVSystem Commons and magnolia libraries.
Ideas for the most efficient parsing and serialization of java.time.*
values were inspired by
DSL-JSON's implementation for java.time.OffsetDateTime
.
Other projects and a blog post that have helped deliver unparalleled safety and performance characteristics for parsing and serialization of numbers:
BigInt
and BigDecimal
values with the O(n^1.5)
O(n^2)
using Java's implementations where n
is a number of digitsA bunch of SWAR technique tricks for JVM platform are based on following projects and a blog post:
Big kudos to all contributors:
null
Scala values and parsing immediately to them in generated codecsString
, Array[Byte]
, java.nio.ByteBuffer
, java.io.InputStream
/java.io.FileInputStream
String
, Array[Byte]
, java.nio.ByteBuffer
, java.io.OutputStream
/java.io.FileOutputStream
Array[Byte]
or java.nio.ByteBuffer
by specifying of position andjava.io.InputStream
/java.io.FileInputStream
without the needString
(while this is much less efficient)String
, BigInt
, BigDecimal
, Option
,Either
, java.util.UUID
, java.time.*
(to/from ISO-8601 representation only), Scala collections, arraysString
, BigInt
, BigDecimal
,java.util.UUID
, java.time.*
, literal types, and value classes for any of themOrdering[K]
instances for keys that are available atmake
macro calljava.time.*
null
values is prohibited by throwing of NullPointerException
errorsnull
values allowed only for optional or collection types (that means the None
value or an emptyequals
method (it mostly concerns non-case classes or other types that have custom codecs)Array[Byte]
, java.nio.ByteBuffer
, java.io.InputStream
/java.io.FileInputStream
where it occurs, and an optional hex dump affected by error part of an internal byte bufferjava.io.InputStream
/java.io.FileInputStream
or serializing tojava.io.OutputStream
/java.io.FileOuputStream
CodecMakerConfig.PrintCodec
type in a scope ofscala-library
(all platforms) andscala-java-time
(replacement of JDKs java.time._
types for Scala.js and Scala Native)UTC
then you should follow thejava.io.Serializable
for easier usage in distributive computingThere are configurable options that can be set in compile-time:
BigDecimal
valuesBigInt
valuesnull
field values and missing fields as Some(None)
and None
valuesOption[Option[_]]
List of options that change parsing and serialization in runtime:
java.io.InputStream
or java.nio.DirectByteBuffer
java.io.InputStream
or java.nio.DirectByteBuffer
java.io.OutputStream
or java.nio.DirectByteBuffer
The v2.13.5.2 release is the last version that supports JDK 8+ and native image compilation with earlier versions of GraalVM.
The v2.13.3.2 release is the last version that supports Scala 2.11.
The v2.30.2 release is the last version that supports Scala Native 0.4+.
For upcoming features and fixes see Commits and Issues page.
Let's assume that you have the following data structures:
case class Device(id: Int, model: String)
case class User(name: String, devices: Seq[Device])
Add the core library with a "compile" scope and the macros library with "compile-internal" or "provided" scopes to your list of sbt dependencies:
libraryDependencies ++= Seq(
// Use the %%% operator instead of %% for Scala.js and Scala Native
"com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-core" % "2.30.15",
// Use the "provided" scope instead when the "compile-internal" scope is not supported
"com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-macros" % "2.30.15" % "compile-internal"
)
In the beginning of Scala CLI script use "dep" scope for the core library or "compileOnly.dep" scope for the macros libary:
//> using dep "com.github.plokhotnyuk.jsoniter-scala::jsoniter-scala-core::2.30.15"
//> using compileOnly.dep "com.github.plokhotnyuk.jsoniter-scala::jsoniter-scala-macros::2.30.15"
Derive a codec for the top-level type that need to be parsed or serialized:
import com.github.plokhotnyuk.jsoniter_scala.macros._
import com.github.plokhotnyuk.jsoniter_scala.core._
given userCodec: JsonValueCodec[User] = JsonCodecMaker.make
That's it! You have generated an instance of com.github.plokhotnyuk.jsoniter_scala.core.JsonValueCodec
for the
whole nested data structure. No need to derive intermediate codecs for inner nested classes like Device
if you are not
going to parse/serialize them from/to JSON in isolation (not as a part of User
) and use the default or the same
derivation configuration for their codecs.
Now use it for parsing and serialization from/to String
:
val user = readFromString[User]("""{"name":"John","devices":[{"id":1,"model":"HTC One X"}]}""")
val json = writeToString(User("John", Seq(Device(2, "iPhone X"))))
When your input comes from the network or disks much more efficient ways are to parse and serialize from/to:
readFromArray
/writeToArray
readFromSubArray
/writeToSubArray
java.nio.ByteBuffer
instances using readFromByteBuffer
/writeToByteBuffer
java.io.InputStream
/java.io.OutputStream
instances using readFromStream
/writeToStream
Also, parsing from bytes will check UTF-8
encoding and throw an error in case of malformed bytes.
To print generated code for codecs add the following line to the scope of the codec derivation before make
call.
given CodecMakerConfig.PrintCodec with {}
For more use cases of jsoniter-scala, please, check out tests:
All Scala 3 only features are tested by specs in this directory.
NOTE: Until official docs will be published, please, use all these tests as tutorials and how-tos to help in your
journey to become happy users. Also, they are recommended to skim through for checking of your expectation before
selection of this library among others.
You can use the following on-line services to generate an initial version of your data structures from JSON samples:
Also, if you have JSON Schema the following on-line service can generate corresponding data structures for you:
And the following library can generate JSON Schema for your existing data structures:
Samples for its integration with different web frameworks and HTTP servers:
Usages of jsoniter-scala in OSS libraries:
java.time._
and BigInt
typesplay.api.libs.json.JsValue
to byte array (or byte buffer, or output stream) and read it backAlso, for usages in other OSS projects see the Dependents
section of peoject's Scala Index page
For all dependent projects it is recommended to use sbt-updates plugin or Scala steward service to keep up with using of the latest releases.
If your system can accept too long untrusted input then check the input length before parsing with readFromStream
or other read...
calls.
Also, if you have an input that is an array of values or white-space separate values then consider parsing it by
scanJsonArrayFromInputStream
or scanJsonValuesFromInputStream
instead of readFromStream
.
make
macro is evaluated in compile-time. It requires no dependency[error] Cannot evaluate a parameter of the 'make' macro call for type 'full.name.of.YourType'. It should not depend on
code from the same compilation module where the 'make' macro is called. Use a separated submodule of the project
to compile all such dependencies before their usage for generation of codecs.
Sometime Scala 2 compiler can fail to compile the make
macro call with the same error message for the configuration
that has not clear dependencies on other code. For those cases workarounds can be simpler than recommended usage of
separated submodule:
make
or make...
macro calls without parametersmake
macro call in the separated object, like in this PR
sbt clean compile stage
or sbt clean test stage
instead of just sbt clean stage
, like inmill clean
if mill's native BSP support is used in IntelliJ IDEAThe workaround is the same for both cases: don't enclose ADT definitions into outer classes, traits or functions, use the outer object (not a class) instead.
make
calls in Scala 3 has limited support of possible expressions for name mapping.Please use examples of CodecMakerConfig
usage from unit tests.
JsonReader
or JsonWriter
isscanJsonValuesFromStream[String](in) { s =>
readFromString[String](s)
}
The workaround is using reentrant parsing or serialization routines for all except the most nested call. That will
create a new instance of JsonReader
or JsonWriter
on each reentrant call:
scanJsonValuesFromStreamReentrant[String](in) { s =>
readFromString[String](s)
}
[error] Referring to non-existent class com.github.plokhotnyuk.jsoniter_scala.macros.Level
[error] called from private com.github.plokhotnyuk.jsoniter_scala.macros.JsonCodecMakerSpec.$anonfun$new$24()void
[error] called from private com.github.plokhotnyuk.jsoniter_scala.macros.JsonCodecMakerSpec.$anonfun$new$1()void
[error] called from constructor com.github.plokhotnyuk.jsoniter_scala.macros.JsonCodecMakerSpec.<init>()void
[error] called from static constructor com.github.plokhotnyuk.jsoniter_scala.macros.JsonCodecMakerSpec.<clinit>()void
[error] called from core module analyzer
The workaround for Scala 2 is to split sources for JVM and other platforms and use Java enum emulation for Scala.js and Scala Native.
Code for JVM:
public enum Level {
HIGH, LOW;
}
Code for Scala.js and Scala Native:
object Level {
val HIGH: Level = new Level("HIGH", 0)
val LOW: Level = new Level("LOW", 1)
val values: Array[Level] = Array(HIGH, LOW)
def valueOf(name: String): Level =
if (HIGH.name() == name) HIGH
else if (LOW.name() == name) LOW
else throw new IllegalArgumentException(s"Unrecognized Level name: $name")
}
final class Level private (name: String, ordinal: Int) extends Enum[Level](name, ordinal)
For Scala 3 the workaround can be the same for all platforms:
enum Level extends Enum[Level] {
case HIGH
case LOW
}
case class DeResult[T](isSucceed: Boolean, data: T, message: String)
case class RootPathFiles(files: List[String])
given JsonValueCodec[DeResult[Option[String]]] = JsonCodecMaker.make
given JsonValueCodec[DeResult[RootPathFiles]] = JsonCodecMaker.make
Current 3.2.x versions of scalac fail with the duplicating definition error like this:
[error] 19 | given JsonValueCodec[DeResult[RootPathFiles]] = JsonCodecMaker.make
[error] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[error] |given_JsonValueCodec_DeResult is already defined as given instance given_JsonValueCodec_DeResult
The workaround is using named instances of codecs:
given codecOfDeResult1: JsonValueCodec[DeResult[Option[String]]] = JsonCodecMaker.make
given codecOfDeResult2: JsonValueCodec[DeResult[RootPathFiles]] = JsonCodecMaker.make
or private type aliases with given
definitions gathered in some trait:
trait DeResultCodecs:
private type DeResult1 = DeResult[Option[String]]
private type DeResult2 = DeResult[RootPathFiles]
given JsonValueCodec[DeResult1] = JsonCodecMaker.make
given JsonValueCodec[DeResult2] = JsonCodecMaker.make
end DeResultCodecs
object DeResultCodecs extends DeResultCodecs
import DeResultCodecs.given
Currently, the JsonCodecMaker.make
call cannot derive codecs for Scala 3 opaque and union types.
The workaround is using a custom codec for these types defined with implicit val
before the JsonCodecMaker.make
call, like here
and here.
If ADT leaf classes/object contains dots in their simple names the default name mapper will strip names up to the
last dot character.
The workaround is to use @named
annotation like here:
sealed abstract class Version(val value: String)
object Version {
@named("8.10") case object `8.10` extends Version("8.10")
@named("8.09") case object `8.09` extends Version("8.9")
}
java.time.*
values escaped encoding of ASCII characters is not supported.implicit val customCodecOfOffsetDateTime: JsonValueCodec[OffsetDateTime] = new JsonValueCodec[OffsetDateTime] {
private[this] val defaultCodec: JsonValueCodec[OffsetDateTime] = JsonCodecMaker.make[OffsetDateTime]
private[this] val maxLen = 44 // should be enough for the longest offset date time value
private[this] val pool = new ThreadLocal[Array[Byte]] {
override def initialValue(): Array[Byte] = new Array[Byte](maxLen + 2)
}
def nullValue: OffsetDateTime = null
def decodeValue(in: JsonReader, default: OffsetDateTime): OffsetDateTime = {
val buf = pool.get
val s = in.readString(null)
val len = s.length
if (len <= maxLen && {
buf(0) = '"'
var bits, i = 0
while (i < len) {
val ch = s.charAt(i)
buf(i + 1) = ch.toByte
bits |= ch
i += 1
}
buf(i + 1) = '"'
bits < 0x80
}) {
try {
return readFromSubArrayReentrant(buf, 0, len + 2, ReaderConfig)(defaultCodec)
} catch {
case NonFatal(_) => ()
}
}
in.decodeError("illegal offset date time")
}
def encodeValue(x: OffsetDateTime, out: JsonWriter): Unit = out.writeVal(x)
}
implicit def
and inline given
methods for generation of custom codes.New anonymous class definition will be duplicated at each inline site
inline given
cases, but for other use cases the compiler will silently generate duplicated codec instances.def
and explicitly derive custom codecs, like here:object Tags {
opaque type Tagged[+V, +T] = Any
type @@[+V, +T] = V & Tagged[V, T]
def tag[T]: [V] => V => V @@ Tag = [V] => (v: V) => v
}
object Graph {
import Tags.{@@, tag}
def tagJsonValueCodec[V, T](codec: JsonValueCodec[V]): JsonValueCodec[V @@ T] = new JsonValueCodec[V @@ T]:
//println("+1")
override def decodeValue(in: JsonReader, default: V @@ T): V @@ T = tag[T](codec.decodeValue(in, default: V))
override def encodeValue(x: V @@ T, out: JsonWriter): Unit = codec.encodeValue(x, out)
override def nullValue: V @@ T = tag[T](codec.nullValue)
trait NodeIdTag
type NodeId = Int @@ NodeIdTag
case class Node(id: NodeId, name: String)
case class Edge(node1: NodeId, node2: NodeId)
given JsonValueCodec[Graph.NodeId] = Graph.tagJsonValueCodec(JsonCodecMaker.make)
given JsonValueCodec[Graph.Node] = JsonCodecMaker.make
given JsonValueCodec[Graph.Edge] = JsonCodecMaker.make
}
Feel free to ask questions in chat, open issues, or contribute by creating pull requests (improvements to docs, code, and tests are highly appreciated).
Currently, the gh-pages
branch contains a lot of historycal data of benchmark results, so to
avoid cloing 10Gb of them use --single-branch
branch option to fetch sources only.
If developing on a fork, make sure to download the git tags (required by the sbt build):
git remote add upstream [email protected]:plokhotnyuk/jsoniter-scala.git
git fetch --tags upstream
Prerequisites for building of Scala.js and Scala Native modules are Clang 18.x and Node.js 16.x. The following sequence of commands works for me:
sudo apt install clang libstdc++-12-dev libgc-dev
curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash
source ~/.bashrc
nvm install 16
node -v
sbt ";dependencyUpdates; reload plugins; dependencyUpdates; reload return"
sbt -java-home /usr/lib/jvm/jdk-11 ++2.13.15 clean coverage jsoniter-scala-coreJVM/test jsoniter-scala-circeJVM/test jsoniter-scala-macrosJVM/test jsoniter-scala-benchmarkJVM/test coverageReport
sbt -java-home /usr/lib/jvm/jdk-11 clean +test +mimaReportBinaryIssues
BEWARE: jsoniter-scala is included into Scala Community Build for Scala 2 and Scala Open Community Build for Scala 3.
Before benchmark running check if your CPU works in performance
mode (not a powersave
one). On Linux use following
commands to print current and set the performance
mode:
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
for i in $(ls /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor); do echo performance | sudo tee $i; done
Then view your CPU frequency with:
cat /proc/cpuinfo | grep -i mhz
Stop un-needed applications and services. List of running services can be printed by:
sudo service --status-all | grep '\[ + \]'
sudo systemctl list-units --state running
Then clear cache memory to improve system performance. One way to clear cache memory on Linux without having to reboot the system:
sudo su
free -m -h && sync && echo 3 > /proc/sys/vm/drop_caches && free -m -h
Sbt plugin for JMH tool is used for benchmarking, to see all their features and options please check Sbt-JMH docs and JMH tool docs
Learn how to write benchmarks in JMH samples and JMH articles posted in Aleksey Shipilёv’s and Nitsan Wakart’s blogs.
List of available options can be printed by:
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -h'
Results of benchmark can be stored in different formats: *.csv, *.json, etc. All supported formats can be listed by:
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -lrf'
JMH allows running benchmarks with different profilers, to get a list of supported use (can require entering of user password):
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -lprof'
Help for profiler options can be printed by following command (<profiler_name>
should be replaced by the name of the
supported profiler from the command above):
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -prof <profiler_name>:help'
For parametrized benchmarks the constant value(s) for parameter(s) can be set by -p
option:
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -p size=1,10,100,1000 ArrayOf.*'
To see throughput with the allocation rate of generated codecs run benchmarks with GC profiler using the following command:
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -prof gc .*Reading.*'
Results that are stored in JSON can be easy plotted in JMH Visualizer by drugging & dropping
of your file to the drop zone or using the source
parameter with an HTTP link to your file in the URL like
here.
On Linux the perf profiler can be used to see CPU event statistics normalized per ops:
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -prof perfnorm TwitterAPIReading.jsoniterScala'
Also, it can be run with a specified list of events. Here is an example of benchmarking using 16 threads to check of CPU stalls:
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -t 16 -prof "perfnorm:event=cycles,instructions,uops_executed.core,uops_executed.stall_cycles,cache-references,cache-misses,cycle_activity.stalls_total,cycle_activity.stalls_mem_any,cycle_activity.stalls_l3_miss,cycle_activity.stalls_l2_miss,cycle_activity.stalls_l1d_miss" .*'
List of available events for the perf profiler can be retrieved by the following command:
perf list
To get a result for some benchmarks with an in-flight recording file from JFR profiler use command like this:
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -prof "jfr:dir=target/jfr-reports" -wi 10 -i 60 TwitterAPIReading.jsoniterScala'
You will get the profile in the jsoniter-scala-benchmark/jvm/target/jfr-reports
directory.
To run benchmarks with recordings by Async profiler, extract
binaries to /opt/async-profiler
directory and set the following runtime variables to capture kernel frames:
sudo sysctl kernel.perf_event_paranoid=1
sudo sysctl kernel.kptr_restrict=0
Then use command like this:
sbt -java-home /usr/lib/jvm/jdk-21 jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -prof "async:dir=target/async-reports;interval=1000000;output=flamegraph;libPath=/opt/async-profiler/lib/libasyncProfiler.so" -jvmArgsAppend "-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints" --p size=128 -wi 5 -i 10 jsoniterScala'
Now you can open direct and reverse flame graphs in the jsoniter-scala-benchmark/jvmtarget/async-reports
directory.
Beware that -XX:+DebugNonSafepoints
can lead to incorrect report due to a bug which was fixed only for JDK 21 currently.
To see list of available events need to start your app or benchmark, and run jps
command. I will show list of PIDs and
names for currently running Java processes. While your Java process still running launch the Async Profiler with the
list
option and ID of your process like here:
$ ~/Projects/com/github/jvm-profiling-tools/async-profiler/profiler.sh list 6924
Basic events:
cpu
alloc
lock
wall
itimer
Perf events:
page-faults
context-switches
cycles
instructions
cache-references
cache-misses
branches
branch-misses
bus-cycles
L1-dcache-load-misses
LLC-load-misses
dTLB-load-misses
mem:breakpoint
trace:tracepoint
Following command can be used to profile and print assembly code of the hottest methods, but it requires a setup of
hsdis
library to make PrintAssembly feature enabled:
sbt jsoniter-scala-benchmarkJVM/clean 'jsoniter-scala-benchmarkJVM/jmh:run -prof perfasm -wi 10 -i 10 -p size=128 BigIntReading.jsoniterScala'
More info about extras, options, and ability to generate flame graphs see in Sbt-JMH docs
Other benchmarks with results for jsoniter-scala:
Use JDK 11+ for building of jsoniter-scala-benchmarkJS
module for Scala 2.13 and Scala 3:
sbt -DassemblyJSBenchmarks -java-home /usr/lib/jvm/jdk-11 +jsoniter-scala-benchmarkJS/fullOptJS
Then open the list of benchmarks in a browser:
cd jsoniter-scala-benchmark/js
open scala-3-fullopt.html
open scala-2.13-fullopt.html
Then select the batch mode with storing results in a .zip
file.
Use the following command for merging unpacked results from browsers: jq -s '[.[][]]' firefox/*.json >firefox.json
The released version of Scala.js benchmarks is available here.
Use the circe-argonaut-compile-times project to compare compilation time of jsoniter-scala for deeply nested product types with other JSON parsers like argonaut, play-json, and circe in 3 modes: auto, semi-auto, and derivation.
For Scala 3 use the scala3-compile-tests project to compare compilation time of jsoniter-scala for Scala 3 enumerations (sum types) with circe in semi-auto mode.
Publish to the local Ivy repo:
sbt clean +publishLocal
Publish to the local Maven repo:
sbt clean +publishM2
For version numbering use Recommended Versioning Scheme that is used in the Scala ecosystem.
Double-check binary and source compatibility, including behavior, and release using the following command on the environment with 16+GB of RAM:
sbt -java-home /usr/lib/jvm/jdk-11 -J-Xmx12g clean release
Do not push changes to GitHub until promoted artifacts for the new version are not available for downloading on Maven Central Repository to avoid binary compatibility check failures in triggered Travis CI builds.
The last step is updating of the tag info in a release list.