goawk

A POSIX-compliant AWK interpreter written in Go, with CSV support

MIT License

Stars
1.9K
Committers
13

Bot releases are hidden (Show)

goawk - v1.28.0 Latest Release

Published by benhoyt about 1 month ago

What's Changed

New Contributors

Full Changelog: https://github.com/benhoyt/goawk/compare/v1.27.0...v1.28.0

goawk - v1.27.0: allow redirection to /dev/stderr on Windows, more

Published by benhoyt 6 months ago

What's changed

Full Changelog: https://github.com/benhoyt/goawk/compare/v1.26.0...v1.27.0

goawk - v1.26.0

Published by benhoyt 8 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/benhoyt/goawk/compare/v1.25.0...v1.26.0

goawk - v1.25.0: PGO, better exit codes, \u escape

Published by benhoyt about 1 year ago

This release includes several minor changes:

  • Build binaries with PGO on Go 1.21. GoAWK on Go 1.21 is about 10% faster than on Go 1.20, and PGO makes it another 5-6% faster.
  • Bumps up the minimum supported Go version from 1.15 to 1.16.
  • Makes the return value of system() and close() for pipes more closely match Gawk's. Internal refactoring of the input and output stream implementation to make this happen. See issues #203 #204 #205 and thanks @juster!
  • Support the \u Unicode string escape that's been added to onetrueawk and Gawk recently. #212
goawk - Version 1.24.0: --csv and bug fixes

Published by benhoyt over 1 year ago

This release contains several minor fixes and the addition of --csv (it's an alias for -i csv). This is for compatibility with onetrueawk and Gawk, which are both adding that option soon.

It also contains a minor backwards-incompatible change: in CSV input mode (-i csv), the behaviour of the two-argument form of split() now parses using CSV splitting, rather than FS. This is also in line with the upcoming --csv feature of onetrueawk and Gawk. This is very unlikely to affect anyone, as CSV mode is relatively recent, and it seems unlikely that anyone is using the two-argument form of split() in CSV mode in any case.

Full list of changes:

goawk - Version 1.23.3

Published by benhoyt over 1 year ago

This is a patch version that fixes panics when compiling AWK code that uses complex, mutually-recursive functions (fixed in PR #187). Thanks @xonixx for the test case.

goawk - Version 1.23.2

Published by benhoyt over 1 year ago

This is patch release to fix a bug where mutually-recursive functions would cause an "undefined function" error in the resolver (PR #184). Thanks @xonixx for the bug report (#183).

goawk - Version 1.23.1

Published by benhoyt over 1 year ago

This is a patch release that fixes a bug in 1.23.0 -- there was a bug that caused a panic in the resolver step with code like function f1(A) {} function f2(x, A) { x[0]; f1(a); f2(a) }. Fixed in #178.

While we're at it, also fix a panic with certain obscure regexes (#179) and limit ARGC to a reasonable maximum (#180).

All three of these issues were found by fuzzing: go test ./interp -fuzz=FuzzSource

goawk - Version 1.23.0

Published by benhoyt over 1 year ago

This release adds a single new feature: support for length(array) in addition to length(string) (#176). Calling length() on an array is quite useful and is supported by all other awk versions (onetrueawk, Gawk, mawk, busybox awk, frawk). In addition, it's been accepted for inclusion into POSIX, though not yet added to the main spec (which seems to take forever).

This release also includes a complete rewrite of the type resolver (#175). Before the code was quite messy and hard to read, now with the two passes I think it's easier to understand and work with. It was certainly easier to add the length(array) feature with the rewritten resolver than before.

goawk - Version 1.22.0

Published by benhoyt over 1 year ago

A fairly minor release, fixing some edge cases and adding support for nextfile:

  • Make constructs like $++lvalue not an error (#168)
  • Optimize constant integer array indexes to avoid toString() at runtime (#169)
  • Allow parsing of cond && var=value and similar expressions (#170)
  • Add GroupingExpr to fix (a)++b parsing issue; pretty-print precedence (#172)
  • Add support for nextfile (#173)

See full list of commits.

goawk - Version 1.21.0

Published by benhoyt almost 2 years ago

Significant changes in this release:

goawk - Version 1.19.0

Published by benhoyt over 2 years ago

Notable changes in this release:

In other news, check out awk-demo, an amazing "old skool demo" written in AWK by @patsie75. It now works under GoAWK, at least on Linux. Clone that repo and run it with awk=goawk ./demo.sh!

Thanks to @ko1nksm for several bug reports.

See full list of commits since v1.18.0.

goawk - Version 1.18.0

Published by benhoyt over 2 years ago

Relatively minor release with the following changes:

See the list of commits.

goawk - Version 1.17.1

Published by benhoyt over 2 years ago

goawk - Version 1.17.0

Published by benhoyt over 2 years ago

Now with proper CSV input and output support! For example, a simple example showing CSV input parsing and the new @"named-field" syntax:

$ goawk -i csv -H '{ print @"Abbreviation" }' testdata/csv/states.csv
AL
AK
AZ
...

Read the full documentation.

This feature was sponsored by the library of the University of Antwerp -- many thanks!

goawk - Version 1.16.0

Published by benhoyt over 2 years ago

goawk - Version 1.15.0

Published by benhoyt over 2 years ago

This release adds no new features. It's a significant performance improvement due to switching the internals of the interpreter from a tree-walking interpreter to a bytecode compiler with a virtual machine interpreter.

Results show that it's 18% faster overall on microbenchmarks, 13% on more real-world benchmarks. It should be fully backwards compatible -- please file an issue if you find a regression!

Read the details here.

goawk - Version 1.14.0

Published by benhoyt over 2 years ago

This reverts the feature from v1.11.0 which changed the builtin functions length, substr, index, and match to use character indexes instead of byte indexes (as per the POSIX spec). The reason is because it changed those functions from O(1) to O(N), which created "accidentally quadratic" behavior in scripts that expected these functions to be O(1).

For example, @xonixx's grok.awk script on a relatively large JSON input file took about 1s in bytes mode (goawk -b), but 8 minutes (!) in the new unicode char default mode. That's extremely problematic.

Like v1.11.0, this release is again a small breaking change, but once again shouldn't affect many scripts (it will again only affect scripts that use constant indexes for substr on non-ASCII strings). I hope not many people are using interp.Config.Bytes or the goawk -b option yet, as those are gone again. Seeing v1.11.0 was only introduced a few weeks ago, I think it's worth the breakage for a performance problem of this magnitude.

Fixes https://github.com/benhoyt/goawk/issues/93: "Major speed regression for gron.awk in goawk 1.11.0+".

goawk - Version 1.13.0

Published by benhoyt almost 3 years ago

Support RS being multiple characters and regular expressions RS (#86), allowing significantly more powerful text processing. This is a Gawk extension to POSIX, which says, "If RS contains more than one character, the results are unspecified."