Last updated:
0 purchases
porerefiner 0.9.5
PoreRefiner
To help you manage your pores
Free software: MIT license
Documentation: https://porerefiner.readthedocs.io.
Introduction
PoreRefiner is a software tool to watch Nanopore runs in progress and attach sample information to them, as well as provide an interface for integration with LIMS services and other online systems. It supports both push and pull modalities for data exchange with those systems - push, via a series of configurable notifiers, and pull, via a simple Flask webservice and a Protobuf RPC service. It also includes a command-line interface for working with the run database.
What is it?
You install it on your GridION. It watches the data output directory and notes the creation and modification of sequencing output files. You can attach sample sheets to runs, for instance when you’re multiplexing samples - the MinKnow software is agnostic to sample membership in libraries, so maybe you want something to keep track of that. Then when the run completes (is idle for a period of time) a series of configurable events are triggered. That’s often pretty useful for, let’s say, sending data to your LIMS or through some kind of biosurveillance pipeline.
Installation
PoreRefiner is available as a Python package:
pip install porerefiner
Porerefiner can be started as a simple UNIX daemon:
porerefinerd start --daemonize
but also includes service unit files for administration via systemctl.
Copy the files porerefiner.service and porerefiner.app.service from the package to systemd:
cp /usr/local/lib/python3.7/dist-packages/porerefiner/porerefiner.service /lib/systemd/system
cp /usr/local/lib/python3.7/dist-packages/porerefiner/porerefiner.app.service /lib/systemd/system
systemctl enable porerefiner.service
systemctl enable porerefiner.app.service
Once the package is installed, porerefinerd and prfr should be on your path. You can use porerefinerd init to set up the config file for the porerefiner service, it will prompt you for the save locations of the database, the local socket, nanopore’s output directory, and where the config file should be saved:
$ porerefinerd init
create PoreRefiner config at /etc/porerefiner/config.yaml? [y/N]: y
location of porerefiner RPC socket? [/etc/porerefiner/porerefiner.sock]:
location of database? [/etc/porerefiner/database.db]:
nanopore data output location?: /data
export POREREFINER_CONFIG="/etc/porerefiner/config.yaml"
Then you can start the porerefiner services:
systemctl start porerefiner.service
systemctl start porerefiner.app.service
If you wish to enable the PoreRefiner web interface, you should ensure that port 8844 is reachable from remote hosts.
Writing Plugins
PoreRefiner has a plugin architecture; pip-installable Python packages can make themselves known to PoreRefiner using entry_points in setup.py. The easiest way to write your own plugin notifiers, jobs, and submitters for PoreRefiner is to use the cookiecutter template:
$ cookiecutter https://github.com/CFSAN-Biostatistics/new-porerefiner-plugin
project_name [My Porerefiner Plugin]:
project_slug [my_porerefiner_plugin]:
project_short_description [This is a plugin for Porerefiner, a tool for managining Nanopore sequencing.]:
See the Cookiecutter docs: https://cookiecutter.readthedocs.io/en/1.7.0/
Cookiecutter will create a full project repo and stub classes for your plugin. Open <project_slug>/<project_slug>/<project_slug>.py and you can fill in the method code blocks to implement the various functions of the necessary interfaces.
Notifiers
Notifiers are “fire and forget” handlers for “end-of-run” events; when an hour has elapsed since the last modification of a file in a run (or whatever idle time is configured in config.yaml, the configured notifiers will be fired off with the run event. Out of the box, PoreRefiner comes with three notifiers - a notifier to send OS-based popup “toast” notifications (if pynotifier is installed), a notifier to make an HTTP request to a defined endpoint, and a notifier to send a message into an Amazon Web Services Simple Queue Service (SQS) queue. Notifiers differ from jobs in that they’re assumed to run quickly/instantly and therefore they’re executed synchronously. As a result a long-running notifier can hang the software. For tasks that can’t execute quickly (copying files, etc), use a job.
Jobs
Jobs are processes that are assumed to take longer to execute and thus should execute asynchronously. As a result the job handler interface is more complex, and jobs require submitters to execute to (described below.) Jobs can be triggered either on the idle timeout of an individual file, or of the entire run, simply by extending the appropriate superclass - FileJob and RunJob. The PoreRefiner software will dispatch the correct configured job type, collect any type of process or job ID that is returned, and periodically poll the job’s submitter for completion status. A run’s in-progress jobs can be viewed through the prfr tool.
Submitters
Submitters are the interface between jobs and the execution system. For instance, the HpcSubmitter knows how to use SSH to execute commands on a typical HPC using qsub. PoreRefiner has an additional LocalSubmitter which simply runs commands locally, in a subprocess.
Here’s an example of a simple post-run workflow configuration using the generic file job and the local submitter:
submitters:
- class: LocalSubmitter
jobs:
- class: GenericFileJob
config:
command: cp {file.path} /network/output/{run.name}/{file.name}
More examples to come in the Porerefiner Config Cookbook:
https://github.com/crashfrog/porerefiner-config-cookbook
If you develop a useful or interesting config, please consider contributing it to the cookbook using a pull request.
Using this software
prfr is the end-user client; Minion users should use this tool to monitor runs in progress, load sample sheets, and tag runs and samples.
$ prfr --help
Usage: prfr [OPTIONS] COMMAND [ARGS]...
Command line interface for PoreRefiner, a Nanopore run manager.
Options:
--help Show this message and exit.
Commands:
info Return information about a run, historical or in progress.
load Load a sample sheet to be attached to a run, or to the next run...
ps Show runs in progress, or every tracked run (--all), or with a...
tag Add one or more tags to a run.
template Write a sample sheet template to STDOUT.
untag Remove one or more tags from a run.
If the web service is enabled, users can also upload and attach sample sheets using the web interface.
Administration
When the PoreRefiner service porerefinerd is stopped, it has a number of administrative functions:
$ porerefinerd --help
Usage: porerefiner.py [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
init Find the Nanopore output directory and create the config file.
list List job system stuff.
reset Utility function to reset various state.
start Start the PoreRefiner service.
verify Run various checks.
$ porerefinerd init --help
Usage: porerefiner.py init [OPTIONS]
Find the Nanopore output directory and create the config file.
Options:
--config TEXT
--nanopore_dir TEXT
--help Show this message and exit.
$ porerefinerd list --help
Usage: porerefiner.py list [OPTIONS] COMMAND [ARGS]...
List job system stuff.
Options:
--help Show this message and exit.
Commands:
jobs List the configurable and configured jobs.
notifiers List the configurable and configured notifiers.
submitters List the configureable and configured submitters.
$ porerefinerd reset --help
Usage: porerefiner.py reset [OPTIONS] COMMAND [ARGS]...
Utility function to reset various state.
Options:
--help Show this message and exit.
Commands:
config Reset config to defaults.
database Reset database to empty state.
jobs Reset all jobs to a particular status.
runs Reset all runs to in-progress status.
samplesheets Clear samplesheets that aren't attached to any run.
$ porerefinerd verify --help
Usage: porerefiner.py verify [OPTIONS] COMMAND [ARGS]...
Run various checks.
Options:
--help Show this message and exit.
Commands:
notifiers Verify notifiers by sending notifications.
submitters Verify configuration of job submitters by running their tests.
Features
Automatic detection of runs in progress
Sample sheet and sample tracking through the flowcell/run context, and beyond
Schedule automatic analysis of runs and files in AWS or your HPC
How it works
PoreRefiner uses fsevents to detect filesystem events during a Nanopore run, including the creating of new directories in the Nanopore output folder. Flowcells, runs, and run files can be detected this way. PoreRefiner will update a SQLite database with run information, including what it’s able to pull out of Minknow.
If all of the files of a run have not been modified in an hour, PoreRefiner will mark a completion time for that run. If any of the files in a run have not been modified in an hour, they may be picked up by the Job runner for some subsequent processing.
PoreRefiner presents many interfaces to address integration challenges:
A CLI interface for both human use and simple scripting
A simple HTTP service for communication with LIMS and other services
A Protobuf-RPC service for inter-process communication (Protobuf bindings are available in Python, C, JavaScript, Java, and many other languages)
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.9.5 (2021-06-22)
prfr CLI client now no longer tries to load service config.yaml; it has its own or you can give socket/host/ssl arguments at the command line.
0.9.4 (2021-03-18)
Option in prfr to connect to remote Porerefiner hosts.
0.9.3 (2021-03-15)
Bugfixes to tagging, other improvements
0.9.2 (2021-03-08)
Bugfixes, small improvements
0.9.1b (2021-02-19)
Bugfix - package properly requires Python 3.8 now.
0.9.1 (2020-12-16)
Added triple tags - namespace, name, value - to data model
Protobuf API now accepts triple-tags along with new samples
Added a way for plugins to define and register their own sample sheet parsers
0.9.0 (2020-09-14)
Job system re-work. Plugins will need to be updated for compatibility.
Jobs now expect generators (yield) instead of methods (return). This is to enable multi-step workflows.
Generic job now should be configured with a list of commands, rather than a single command. (it can be a single-item list.)
0.8.21 (2020-08-24)
Bugfix for remote directory rerooting, new tests
0.8.20 (2020-08-20)
Bugfix and tests for config
0.8.19 (2020-08-20)
more convenient access of config from plugins
0.8.18 (2020-08-18)
new command to run jobs on sample data
0.8.17 (2020-08-17)
bugfixes for the bugfixes and more tests
0.8.16 (2020-08-17)
bunch of bug fixes in job system
0.8.15 (2020-08-06)
fixed bug where plugins wouldn’t be imported
0.8.14 (2020-07-29)
fixed bug in job spawn/db fields of Path type
0.8.13 (2020-07-28)
changing log levels for some events; DEBUG now no longer jammed up by SQL spam
0.8.12 (2020-07-24)
bugfixes in sample sheet loading
0.8.11 (2020-07-15)
More platform-universal MD5 hashing of files on completion
0.8.10 (2020-07-15)
Bugfixes in model, barcode_id type
0.8.9 (2020-07-13)
Bugfixes in prfr load, sample sheet handling for v1.0.1 samplesheets (barcode kit id’s)
0.8.8 (2020-07-01)
Improved behavior on job creation; buxfixes in job submit step
0.8.7 (2020-05-21)
Fixed a bug in porerefinerd verify
0.8.6 (2020-05-18)
Fixed a bug that would prevent the watchdog from starting
0.8.5 (2020-04-15)
Samplesheet and data model now distinguish between sequencing and barcoding kits
0.8.4 (2020-03-23)
New plugin architecture, combined with a cookiecutter definition for creating new ones
Removed several in-progress job/submitter types to plugins
0.8.3 (2020-03-13)
prfr now recognizes site config if user config doesn’t exist
0.8.2 (2020-03-12)
Fixed tests, general bugfixes
0.8.1 (2020-03-11)
Improved service files.
0.8.0 (2020-03-09)
First release on PyPI.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.