bioinformatics analysis framework

Weirauch Transcription Factor Research Lab
Center for Autoimmune Genomics and Etiology
Cincinnati Children's Hospital Medical Center

bit.ly/bioreactor-slides

The Team

Ben AlbertBrad Arvin
Juanita DickhausKevin Ernst

We are collaborating with Dr. Matt Weirauch
at Cincinnati Children's Hospital

Our faculty advisor is Dr. Karen Davis of CEAS

Project Goal #1

Create a web-based functional genomics analysis tool for investigating interactions between human and viral DNA sequences and proteins

"Functional genomics"

It's a field of molecular biology that draws insight from whole-genome sequencing data
(e.g., the Human Genome Project)

…to describe gene and protein interactions and their functions within an organism.

Background

Information flow: DNA → proteins

  • most generally: DNA makes RNA and RNA makes protein
  • but many, many other interactions exist, including reverse flow of information

Project Goal #2

Develop a reusable, open-source framework which could be used as a basis for new bioinformatics analysis tools

Developing a reusable solution

Bioinformaticicans often want web interfaces for their tools, so that non-Unixy collaborators can use them.

… so we intend to produce a modular, reusable framework for quickly adding a web interface on top of an existing analysis pipeline.

Main Impact

  • direct benefit to ongoing Weirauch Lab research on viral TFs
  • facilitates discovery of disease-related links between human / viral DNA

Side Benefits

  • open source; researchers can construct web UI for own tools
  • possible code contributions from other developers
  • feedback from users (e.g., on GitHub issue tracker)

Architecture

Database-backed web application, written in Python.
Front end UI fetches from a REST API backend.

Analysis overview flowchart [full-size view]

Validation / submission

Each analysis has input form elements for tunable parameters, the contents of which are queried from REST endpoints on the server.

The server will validate these and return any problems in a JSON response to the POST request.

Retrieval / visualization

Results of the analysis can be displayed in tabular form, downloaded, or visualized graphically as a sequence logo.

see also: Sequence logo on Wikipedia

Plumbing

  • Python + Flask web and console application
  • Responsive HTML/CSS web framework
  • Vue.js for form submission / API communication
  • Celery + RabbitMQ deferred task queue
  • SQLite database(s)
    • database calls go through an ORM so "upgrading" to MySQL or Postgres is a trivial config change
  • Slurm / LSF - batch processing systems

Management Interface

Command-line interface for database / user management provided by the Click library

Responsive Web UI

Big improvement over Bioreactor's predecessor, which did not adapt to small-form factor screens

REST API

REST: Representational State Transfer
Autocomplete / search functionality for a sample "organisms" table:

Progress so far…

Getting a common development environment for the team


    $ git clone git@github.uc.edu:Bioreactor/bioreactor-vm.git
    $ vagrant box add bioreactor http://url.to/bioreactor.box
    $ vagrant up
            

We chose Vagrant to manage the VM environment, and Ansible for automated software installation and configuration

VM dev environment

  • The VM "base box" is created from a Debian 8.0 "Jessie" ISO
  • Vagrant creates a new VirtualBox VM from the base box
  • Ansible installs remaining packages for a Python development environment
  • Includes RabbitMQ, Celery, Apache + mod_wsgi

Shake 'n Bake

Provided setup.sh handles all initial setup tasks, including making the Bioreactor server available at host port 9980:

Running Bioreactor

The VM boots with the Flask app automatically running on startup, but a local development server can be started like this:


    cd /path/to/cloned/bioreactor

    # create a "virtual environment" for our dependencies
    virtualenv venv && source venv/bin/activate

    # install dependencies and 'bioreactor' script
    pip install -e .

    # Initialize the database tables and launch server
    export FLASK_APP=bioreactor/app.py
    bioreactor run
            

And here's what
that looks like...

(with help from PythonAnywhere)

Future Directions

  • Management UI with logging and benchmarks
  • Sequence visualizations
  • Multiple analysis front-ends hosted from a single Bioreactor instance (multi-tenancy)
  • Automated unit / integration tests

Credits

The Weirauch Lab Team

The following people contributed to the development of Bioreactor's predecessor, CressInt:

Links

THE END

Thanks.