bioinformatics analysis framework

Weirauch Transcription Factor Research Lab
Center for Autoimmune Genomics and Etiology
Cincinnati Children's Hospital Medical Center

bit.ly/bioreactor-slides

The Team

Ben Albert • Brad Arvin
Juanita Dickhaus • Kevin Ernst

We are collaborating with Dr. Matt Weirauch
at Cincinnati Children's Hospital

Our faculty advisor is Dr. Karen Davis of CEAS

Project Goal #1

Create a web-based functional genomics analysis tool for investigating interactions between human and viral DNA sequences and proteins

"Functional genomics"

It's a field of molecular biology that draws insight from whole-genome sequencing data
(e.g., the Human Genome Project)

…to describe gene and protein interactions and their functions within an organism.

Background

Information flow: DNA → proteins

most generally: DNA makes RNA and RNA makes protein
but many, many other interactions exist, including reverse flow of information

source: Central dogma of molecular biology

Project Goal #2

Develop a reusable, open-source framework which could be used as a basis for new bioinformatics analysis tools

Developing a reusable solution

Bioinformaticicans often want web interfaces for their tools, so that non-Unixy collaborators can use them.

… so we intend to produce a modular, reusable framework for quickly adding a web interface on top of an existing analysis pipeline.

Main Impact

direct benefit to ongoing Weirauch Lab research on viral TFs
facilitates discovery of disease-related links between human / viral DNA

Side Benefits

open source; researchers can construct web UI for own tools
possible code contributions from other developers
feedback from users (e.g., on GitHub issue tracker)

Architecture

Database-backed web application, written in Python.
Front end UI fetches from a REST API backend.

Analysis overview flowchart [full-size view]

Validation / submission

Each analysis has input form elements for tunable parameters, the contents of which are queried from REST endpoints on the server.

The server will validate these and return any problems in a JSON response to the POST request.

Retrieval / visualization

Results of the analysis can be displayed in tabular form, downloaded, or visualized graphically as a sequence logo.

Plumbing

Python + Flask web and console application
Responsive HTML/CSS web framework
Vue.js for form submission / API communication
Celery + RabbitMQ deferred task queue
SQLite database(s)
- database calls go through an ORM so "upgrading" to MySQL or Postgres is a trivial config change
Slurm / LSF - batch processing systems

Management Interface

Command-line interface for database / user management provided by the Click library

Responsive Web UI

Big improvement over Bioreactor's predecessor, which did not adapt to small-form factor screens

REST API

REST: Representational State Transfer
Autocomplete / search functionality for a sample "organisms" table:

Progress so far…

Getting a common development environment for the team


    $ git clone git@github.uc.edu:Bioreactor/bioreactor-vm.git
    $ vagrant box add bioreactor http://url.to/bioreactor.box
    $ vagrant up

We chose Vagrant to manage the VM environment, and Ansible for automated software installation and configuration

VM dev environment

The VM "base box" is created from a Debian 8.0 "Jessie" ISO

Includes only essential packages

Vagrant creates a new VirtualBox VM from the base box
- Handles SSH key generation / exchange and port forwarding
Ansible installs remaining packages for a Python development environment
Includes RabbitMQ, Celery, Apache + mod_wsgi

Shake 'n Bake

Provided setup.sh handles all initial setup tasks, including making the Bioreactor server available at host port 9980:

Running Bioreactor

The VM boots with the Flask app automatically running on startup, but a local development server can be started like this:


    cd /path/to/cloned/bioreactor

    # create a "virtual environment" for our dependencies
    virtualenv venv && source venv/bin/activate

    # install dependencies and 'bioreactor' script
    pip install -e .

    # Initialize the database tables and launch server
    export FLASK_APP=bioreactor/app.py
    bioreactor run

And here's what
that looks like...

(with help from PythonAnywhere)

Future Directions

Management UI with logging and benchmarks
Sequence visualizations
Multiple analysis front-ends hosted from a single Bioreactor instance (multi-tenancy)
Automated unit / integration tests

Credits

The Weirauch Lab Team

The following people contributed to the development of Bioreactor's predecessor, CressInt:

Dr. Matthew Weirauch - Principal Investigator
Dr. Xiaoting Chen - Lead Programmer / Analyst
Frances Soman - code contributor
Dr. Michael Borowczak - BedInt author; technical advisor

bioinformatics analysis framework

The Team

Project Goal #1

"Functional genomics"

Background

Project Goal #2

Developing a reusable solution

Main Impact

Side Benefits

Architecture

Validation / submission

Retrieval / visualization

Plumbing

Management Interface

Responsive Web UI

REST API

Progress so far…

VM dev environment

Shake 'n Bake

Running Bioreactor

And here's what
that looks like...

Future Directions

Credits

The Weirauch Lab Team

Links

THE END

Thanks.

bioinformatics analysis framework

The Team

Project Goal #1

"Functional genomics"

Background

Project Goal #2

Developing a reusable solution

Main Impact

Side Benefits

Architecture

Validation / submission

Retrieval / visualization

Plumbing

Management Interface

Responsive Web UI

REST API

Progress so far…

VM dev environment

Shake 'n Bake

Running Bioreactor

And here's whatthat looks like...

Future Directions

Credits

The Weirauch Lab Team

Links

THE END

Thanks.

And here's what
that looks like...