TOML-Tiny

A minimal TOML parser and serializer for perl 5

Stars
8
Committers
4

=pod

=encoding UTF-8

=head1 NAME

TOML::Tiny - a minimal, pure perl TOML parser and serializer

=head1 VERSION

version 0.18

=head1 SYNOPSIS

use TOML::Tiny qw(from_toml to_toml);

binmode STDIN, ':encoding(UTF-8)'; binmode STDOUT, ':encoding(UTF-8)';

Decoding TOML

my $toml = do{ local $/; }; my ($parsed, $error) = from_toml $toml;

Encoding TOML

say to_toml({ stuff => { about => ['other', 'stuff'], }, });

Object API

my $parser = TOML::Tiny->new; my $data = $parser->decode($toml); say $parser->encode($data);

=head1 DESCRIPTION

=for html

CTOML::Tiny implements a pure-perl parser and generator for the L<TOML|https://github.com/toml-lang/toml> data format. It conforms to TOML v1.0 (with a few caveats; see L).

CTOML::Tiny strives to maintain an interface compatible to the L and LTOML::Parser modules, and could even be used to override C<$TOML::Parser>:

use TOML; use TOML::Tiny;

local $TOML::Parser = TOML::Tiny->new(...); say to_toml(...);

=head1 EXPORTS

CTOML::Tiny exports the following to functions for compatibility with the L module. See L<TOML/FUNCTIONS>.

=head2 from_toml

Parses a string of C-formatted source and returns the resulting data structure. Any arguments after the first are passed to LTOML::Tiny::Parser's constructor.

If there is a syntax error in the C source, C<from_toml> will die with an explanation which includes the line number of the error.

my $result = eval{ from_toml($toml_string) };

Alternately, this routine may be called in list context, in which case syntax errors will result in returning two values, C and an error message.

my ($result, $error) = from_toml($toml_string);

Additional arguments may be passed after the toml source string; see L.

=head3 GOTCHAS

=over

=item Big integers and floats

C supports integers and floats larger than what many perls support. When CTOML::Tiny encounters a value it may not be able to represent as a number, it will instead return a LMath::BigInt or LMath::BigFloat. This behavior can be overridden by providing inflation routines:

my $toml = TOML::Tiny->new( inflate_float => sub{ return do_something_else_with_floats( $_[0] ); }; );

=back

=head2 to_toml

Encodes a hash ref as a C-formatted string.

my $toml = to_toml({foo => {'bar' => 'bat'}});

[foo]

bar="bat"

=head3 mapping perl to TOML types

=head4 table

=over

=item C ref

=back

=head4 array

=over

=item C ref

=back

=head4 boolean

=over

=item C<\0> or C<\1>

=item LJSON::PP::Boolean

=item LTypes::Serializer::Boolean

=back

=head4 numeric types

These are tricky in perl. When encountering a CMath::Big[Int|Float], that representation is used.

If the value is a defined (non-ref) scalar with the C<SVf_IOK> or C<SVf_NOK> flags set, the value will be emitted unchanged. This is in line with most other packages, so the normal hinting hacks for typed output apply:

number => 0 + $number, string => "" . $string,

=over

=item LMath::BigInt

=item LMath::BigFloat

=item numerical scalars

=back

=head4 datetime

=over

=item RFC3339-formatted string

e.g., C<"1985-04-12T23:20:50.52Z">

=item L

Ls are formatted as C, as expected by C. However, C supports the concept of a "local" time zone, which strays from C by allowing a datetime without a time zone offset. This is represented in perl by a C with a B.

=back

=head4 string

All other non-ref scalars are treated as strings.

=head1 OBJECT API

=head2 new

=over

=item inflate_datetime

By default, CTOML::Tiny treats TOML datetimes as strings in the generated data structure. The C<inflate_datetime> parameter allows the caller to provide a routine to intercept those as they are generated:

use DateTime::Format::RFC3339;

my $parser = TOML::Tiny->new( inflate_datetime => sub{ my ($dt_string) = @_; # DateTime::Format::RFC3339 will set the resulting DateTime's formatter # to itself. Fallback is the DateTime default, ISO8601, with a possibly # floating time zone. return eval{ DateTime::Format::RFC3339->parse_datetime($dt_string) } || DateTime::Format::ISO8601->parse_datetime($dt_string); }, );

=item inflate_boolean

By default, boolean values in a C document result in a C<1> or C<0>. If LTypes::Serialiser is installed, they will instead be CTypes::Serialiser::true or CTypes::Serialiser::false.

If you wish to override this, you can provide your own routine to generate values:

my $parser = TOML::Tiny->new( inflate_boolean => sub{ my $bool = shift; if ($bool eq 'true') { return 'The Truth'; } else { return 'A Lie'; } }, );

=item inflate_integer

TOML integers are 64 bit and may not match the size of the compiled perl's internal integer type. By default, CTOML::Tiny coerces numbers that fit within a perl number by adding C<0>. For bignums, a LMath::BigInt is returned. This may be overridden by providing an inflation routine:

my $parser = TOML::Tiny->new( inflate_integer => sub{ my $parsed = shift; return sprintf 'the number "%d"', $parsed; }; );

=item inflate_float

TOML floats are 64 bit and may not match the size of the compiled perl's internal float type. As with integers, floats are coerced to numbers and large floats are upgraded to LMath::BigFloats. The special strings C and C may also be returned. You can override this by specifying an inflation routine.

my $parser = TOML::Tiny->new( inflate_float => sub{ my $parsed = shift; return sprintf '"%0.8f" is a float', $parsed; }; );

=item strict

C imposes some miscellaneous strictures on C input, such as disallowing trailing commas in inline tables and failing on invalid UTF8 input.

BNote: C was previously called C<strict_arrays>. Both are accepted for backward compatibility, although enforcement of homogenous arrays is no longer supported as it has been dropped from the spec.

=back

=head2 decode

Decodes C and returns a hash ref. Dies on parse error.

=head2 encode

Encodes a perl hash ref as a C-formatted string.

=head2 parse

Alias for C to provide compatibility with CTOML::Parser when overriding the parser by setting C<$TOML::Parser>.

=head1 DIFFERENCES FROM L AND LTOML::Parser

CTOML::Tiny differs in a few significant ways from the L module, particularly in adding support for newer C features and strictness.

L defaults to lax parsing and provides C<strict_mode> to (slightly) tighten things up. CTOML::Tiny defaults to (somewhat) stricter parsing, enabling some extra strictures with L.

CTOML::Tiny supports a number of options which do not exist in L: L</inflate_integer>, L</inflate_float>, and L.

CTOML::Tiny ignores invalid surrogate pairs within basic and multiline strings (L may attempt to decode an invalid pair). Additionally, only those character escapes officially supported by TOML are interpreted as such by CTOML::Tiny.

CTOML::Tiny supports stripping initial whitespace and handles lines terminating with a backslash correctly in multilne strings:

TOML input

x=""" foo"""

y=""" how now brown bureaucrat. """

Perl output

{x => 'foo', y => 'how now brown bureaucrat.'}

CTOML::Tiny includes support for integers specified in binary, octal or hex as well as the special float values C and C.

=head1 SEE ALSO

=over

=item LTOML::Tiny::Grammar

Regexp scraps used by CTOML::Tiny to parse TOML source.

=back

=head1 ACKNOWLEDGEMENTS

Thanks to L<ZipRecruiter|https://www.ziprecruiter.com> for encouraging their employees to contribute back to the open source ecosystem. Without their dedication to quality software development this distribution would not exist.

A big thank you to those who have contributed code or bug reports:

=over

=item L<ijackson|https://github.com/ijackson>

=item L<noctux|https://github.com/noctux>

=item L<oschwald|https://github.com/oschwald>

=item L<jjatria|https://metacpan.org/author/JJATRIA>

=back

=head1 AUTHOR

Jeff Ober [email protected]

=head1 COPYRIGHT AND LICENSE

This software is copyright (c) 2024 by Jeff Ober.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

=cut