scipipe

Robust, flexible and resource-efficient pipelines using Go and the commandline

MIT License

Stars
1.1K
Committers
10

Bot releases are hidden (Show)

scipipe - v0.9.4: New ParamCombiner component and Important fix for a deadlock bug

Published by samuell over 5 years ago

This release contains important improvements and bugfixes kindly contributed by @dwmunster:

On a smaller note, the CircleCI configuration is now updated to their new 2.0 syntax.

Upgrade is highly recommended, as usual with:

go get github.com/scipipe/scipipe/...

This release fixed issue #77 which has caused intermittent deadlocks (and seemingly sometimes timeouts of tests).

As usual, update scipipe with:

go get -u github.com/scipipe/scipipe/...

(and don't forget the ...)

scipipe - v0.9.2: Fix bug in FileCombinator on more than 16 files

Published by samuell over 5 years ago

This is a small but important fix for a bug in the FileCombinator components that was introduced in 0.9.1, that caused a deadlock when trying to send more than 16 files through the FileCombinator.

scipipe - v0.9.1: New component: FileCombinator

Published by samuell over 5 years ago

This release adds a new component: FileCombinator.

See the how to for creating file combinations for how to use it.

Often in workflows, we need to generate a list of parameters commandline to drive workflows. For example if we have a list of target proteins for which we want to create predictive models for, for drug molecules, out of a large combined dataset. Optimally this should be doable with any shell command, so that it can be generated based on some existing data. We recently realized that there is not really an easy way to do this in SciPipe currently. Until now.

Now there is the new CommandToParams component.

We will update with docs shortly, but in the meanwhile, the test contains a small (quite dumb) mini example of how to use it.

Note: We have plans to create a more integrated solution for reading data into parameter streams, so that this can be done in the normal shell commands created with workflow.NewProc(), but this dedicated component is created to serve the basic need for this functionality while we get that in place.

Small breaking change: components.{FileReader -> FileToParamsReader}

This release also renames the FileReader component to FileToParamsReader, as we realized that it was not really functioning properly in its previous role, where the inclusion of line breaks was causing troubles. It is now properly unit tested to function for reading individual rows in files into a stream of parameter values to send on the parameter ports of other processes.

This small breaking change means we're ticking the (still pre 1.0) version from 0.8.x to 0.9.x.

scipipe - v0.8.3: New path formatting modifier: basename

Published by samuell over 5 years ago

Now one can use basename in path formatters, to remove everything from a path up to the actual filename.

So, say that you have an input path that is: /some/folder/file.txt

... and that you want to use it, with the port name infile for setting the path of outfile using SetOut() like this:

aProcess.SetOut("outfile", "{i:infile}.some.extension")

which would result in the filename:

/some/folder/file.txt.some.extension

Then, to remove /some/folder/ from the input string, you can do:

aProcess.SetOut("outfile", "{i:infile|basename}.some.extension")

With this, the output file will instead be named just:

file.txt.some.extension

The FileGlobber component, if created with a new components.NewFileGlobberDependent() constructor method, can set a dependent outport as a dependency via its in-port InDependency().

See these lines in the test for an example on how to use it in a workflow.

scipipe - v0.8.1: IMPORTANT Bug fix: Properly detect existing temporary directories

Published by samuell about 6 years ago

This is an important bug fix, and anybody using the previous version, 0.8 is strongly recommended to upgrade.

The release contains primarily a fix for a bug that was apparently introduced in 0.8, with the move to folder-based temporary paths, where existing temporary paths were not properly detected, and unfinished files could potentially be mixed with finished ones.

scipipe -

Published by samuell about 6 years ago

This is the wrong link ... se the v0.8.0 release.

This release contains a very large number of improvements, too large to list individually here, but a few selected one are covered further below. This release brings in another contributor, @jonalv, who did fantastic work on the TeX template for the audit report conversion feature.

Notable new features

A simplified API

Each task are now executed in its own isolated temporary folder, so that extra files generated by commands are properly captured and handled in an atomic way (to avoid mixing up non-finished and finished files).

Among the improved areas is that setting paths is now not even required. If you still want to set the output file extension for outputs, you can do that with the following syntax in an out-port placeholder in commands: {o:portname|.csv}, for the .csv extension.

Furthermore, the many different Process.SetPath... methods are now unified to only two: Process.SetOut(portName string, pattern string) and Process.SetOutFunc(portName string, pathFunc func(Task) string).

SetOut() takes placeholder similar to those used to define the command pattern, such as {i:portname} for input files and {p:param1} for parameters. It also allows certain modifiers after the port name, separated by | characters, such as for trimming the end of a string, which is done like so: {i:bamfile|%.bam}, given that we have an in-port named "bamfile", for which we want to re-use its filename, but without the .bam file extension.

As always, for more information about this, see the documentation.

Graph plotting

SciPipe can now plot the graph of a workflow to a .dot file, which can be converted to PDF with the GraphViz dot command (See the documentation for this feature).

This can be done by adding this line in the workflow Go file:

myWorkflow.PlotGraph("myworkflow.dot")

One can also let SciPipe execute the dot command as well to convert to PDF in one go (requires having GraphViz installed):

myWorkflow.PlotGraphPDF("myworkflow.dot")

An example plot can be seen here:
selection_851

Convert Audit report to TeX / PDF

This is an experimental feature. (See the documentation for this feature).

Usage:

scipipe audit2tex somefile.audit.json
pdflatex somefile.audit.tex
open somefile.audit.pdf

How it looks currently:
selection_850

Convert Audit report to HTML

This is an experimental feature. (See the documentation for this feature).

Usage:

scipipe audit2html somefile.audit.json

How it looks currently:
selection_771

Convert Audit report to Bash

This is an experimental feature. (See the documentation for this feature).

Usage:

scipipe audit2bash somefile.audit.json

How it looks currently:
selection_852

scipipe - v0.7: API naming improvements

Published by samuell over 6 years ago

This release brings some breaking changes in the API, as we try to fix some parts of it that did not feel fluent when writing a lot of workflow code, before we reach 1.0.

The primary are:

  • d92065a289964177b2e40994fd590ea93cdb0390 Connect() -> From()/To()
  • 75512dd20c5e39217f4ecc390a170a4c1df15e80 Connected() -> Ready()
  • 0424377d607f43b00fd9bbc9cec56703ba884fcb Param{In,Out}Port -> {In,Out}ParamPort (+ Short-hand: {In,Out}Param, on Shell Processes)
  • 9dc83104a3924a6030b0a81967da656966b7fcee Keys -> Tags

(See also issue #65 )

The release also brings a new FileGlobber component, which is useful when wanting to send existing files as input to the workflow.

scipipe - v0.6.4: Improved logging and usability on missing (temporary) output files

Published by samuell over 6 years ago

This release brings a number of improvements around logging. The changes include:

  • By default writing logs to a file in addition to stdout
  • Issuing warnings on missing (temporary) output files after a task is executed, with a back-off-and-try-again three times before failing with an error message. (Previously the workflow would hang silently unless debug output was on).
scipipe - v0.6.3: Important bugfix, and two new components

Published by samuell over 6 years ago

This release brings an important bugfix, for an error that prevented receiving a stream of parameters and a stream of input IPs at the same time.

It also brings two new components in the components package: FileSource and ParamSource.

See the releases page for other recent releases.

This is a minor release that mainly consists of a lot of smaller fixes and improvements in the error reporting, handling of FIFO files (named pipes) and more.

Also it includes cleaning up large parts of the source code for improved clarity.

Se the releases page for earlier releases.

scipipe - v0.6.1: New RunToRegex() method + fixes and improvements

Published by samuell over 6 years ago

This is a smaller release, following up to the 0.6 release where we removed all external dependencies.

New features

  • New wf.RunToRegex("process_name_pattern.*"). See 2d231ff74532659aa3d3c9d3d5759834c2b192c2

Minor fixes

  • Add start/stop time stamps to audit log (081576c21ecc13c49ba1b051c5780418b40390fe)
  • Ensure directories are created for outputs also in custom components (ea2b56d24c6c0d9a6b358a1b5cbee72790ddd33a)
  • Fail completely on existing .tmp file, for clearer behavior and error messages (57ddb33b836a3f09a9f839073e26b34163745a62)
  • Add process names in audit logs (not just commands) (7f8db5c573a8f815f75b64eb87ae5b7c2e7714bc)
scipipe - v0.6: All external dependencies removed

Published by samuell over 6 years ago

This release mostly contains a lot of small clean-ups and fixes, including #51, but a larger thing is this:

We removed all external dependencies outside of the Go standard library. This is to make it as easy as possible to use Go's vendoring support, when depending on scipipe, so you can - despite Go's lack of an official package manager with versioning support - can easily fix a specific version of scipipe and include it in your own code and source code repositories, for maximum reproducibility (SciPipe is only a little more than 1k lines of code, so should not be a big deal).

scipipe - v0.5.1: Cleanups related to new v0.5 API

Published by samuell over 6 years ago

This release mainly fixes various issues related to missing error messages, and unneccessary complicated naming in the API, in the new features of the 0.5 release. (See that for more info).

Most notably, the main API method for running partial workflows, is now, along the lines of:

myWorkflow.RunTo("process1", "process2")
scipipe - v0.5: Run parts of workflows. API clean-up. More GoDocs

Published by samuell over 6 years ago

This release contains a pretty deep refactoring to allow an important new feature: running parts of workflows.

An example of running parts of workflow is running up to a specified process, skipping all downstream processes, which can be important when developing or modifying workflows interactively. This new feature is documented here.

There were also various clean-ups of the API, including:

  • It is now not required to connect the last out-port of the workflow, with the wf.ConnectLast(outPort) method, as this is taken care of automagically now.
  • Getters were renamed to not use Get in names, as per Go best practices.
  • IsConnected() was renamed to Connected()

Also, test coverage was improved slightly.

scipipe - v0.4.1: Improve error message on non-existent in-port

Published by samuell over 6 years ago

This a small fix release that mainly improves error messages when trying to access an in-port that does not exist. Something that we see easily happens.

It also introduceds a new helper component, IpToParamConverter, for reading a single value in a file into a parameter port, that can be connected to another component's parameter port.

See the releases page for more info on recent releases and new functionality.

scipipe - v0.4.0: Improvements in audit log and resource constraints

Published by samuell almost 7 years ago

Some highlights from this (pre) release:

  • Improvements and fixes of various limitations in the audit log feature.
    • Audit log files for restarted workflows are now properly prepended with the audit logs from upstream tasks, even if those are not re-run, but only extracted from disk.
  • A new struct field process.CoresPerTask, that can be set after a process is created, to let it occupy more than one slot in the workflow's maxConcurrentTasks setting. This can be useful e.g. for multithreaded programs that use more than on core, or for programs using a lot of memory.