"Our project goal is to build the first open and viable PaaS for biobanking" with Jim Dowling
Jim Dowling is the coordinator of the BiobankCloud project, started in December 2012. BiobankCloud is an EU project that will deliver a platform-as-a-service for the storage and analysis of digitized genomic data.
It will provide solutions to the problems of secure storage and efficient analysis of massive amounts of biomedical data as well as the inter-connection of Biobanks.
The storage infrastructure for human biological material is generally known as a biobank. One of the main tenets of biobanking is the digitization of human genomic information for its archival and analysis. That is, in the future, vast amounts of genomic data will be derived from biomaterials stored in biobanks.
The scale of the storage requirements for genomic data is huge - a single human genome amounts to coping with the analysis of three billion base pairs. In addition to the storage of genomic data, its analysis will require both massive parallel computing infrastructure and data-intensive computing tools and services to perform analyses in reasonable time.
A huge wave
As of 2013, a massive wave of big data is approaching, driven by the decreasing cost of sequencing genomic data, which has been halving every four months since 2004. Biobanks store and catalogue human biological material, but they are not prepared to handle this wave of data - there is a biobank bottleneck: a lack of platform support for the storage, analysis and interconnection of the massive amounts of human genomic data that are.
— We are working together with several partners in the project, including the University of Lisbon, Karolinska Institute in Sweden, Humboldt University in Germany and Charité University Hospital also in Germany, says PhD Jim Dowling at KTH, the project coordinator.
To realize the goal of building the first open platform-as-a-service (PaaS) for Biobanking, a project team with deep competencies in different fields of research, from biobanking, bioinformatics, large-scale systems and security, has been assembled. The team includes biobanking experts from Karolinska and Charité, bioinformatics expertise from Humboldt University and systems and security experts from KTH and the University of Lisbon.
— In this project, we will develop a cloud- computing PaaS for the secure storage and analysis of sequenced genomic data, as well as a framework for the inter-connection of such digital biobanks for the purpose of sharing data, says Jim Dowling.
Private cloud platforms
The platform will provide security, storage, data-intensive computing tools and analysis algorithms, and support allowing digital biobanks to share data with one another, all within the existing regulatory frameworks for the storage and usage of genomic data.
The PaaS framework will be designed to run primarily on private cloud platforms. It will typically be installed on one or more racks that support the storage of large amounts of data and support parallel local computation to perform analysis on the nodes storing the data. Such racks can be attached to next-generation sequencing machines and directly store sequenced data as well provide platform services for the analysis of sequence data, as well as the interconnection of Biobanks for data sharing.
— Our project goal is to build the first open and viable PaaS for biobanking. We will build on open-source projects for big data, such as Hadoop, and provide added features to those projects, says Jim Dowling.
80 participating organizations
The platform will be designed in cooperation with bbmri.eu (www.bbmri.eu), the international bbmri-network which is Europe's largest infrastructure, including at the moment 51 institutions and more than 280 participating organizations from 23 countries. The goal is to have our PaaS become part of the bbmri informatics infrastructure for ngs data storage and analysis.