Author Archives: Gurvinder Singh

Genomics, Proteomics, Neuroinformatics.. *ics and Big Data

In previous post I mentioned about our Data Analysis as Service and its motivation & goals. In this post, I will describe what we have been doing in this area and our plans ahead. We have been working in Big data for more than a year now at Uninett. We have a cluster with 18 […]

Evolving Infrastructure for Data Intensive Computing

In earlier post I talked about Data Analytics as Service (DaaS) service, here we will discuss thoughts on how we can provide such a service with current modern data center technologies. Tl;Dr The container movement has taken whole data center approach by surprise. The main reason for this has been performance benefit, flexibility and reproducibility. […]

Data Analysis as Service

The goals for Data Analysis as Service (DaaS):  A service to enable processing of large data sets, so called Big Data, in parallel using data locality principle Able to share research data as well as processing pipeline with corresponding research communities. Imagine you are reading a scientific paper and thought about some modifications which you would […]

Hadoop yarn (2.2.0) setup with Ceph

In this post we will try to make Yarn use Ceph rather than HDFS as a file system. We assume that you already have Yarn cluster setup, if not then you can follow nice guide from Cloudera, you don’t need to install HDFS components. We also assume that you have Ceph cluster up and running, if […]