The goals for Data Analysis as Service (DaaS):
- A service to enable processing of large data sets, so called Big Data, in parallel using data locality principle
- Able to share research data as well as processing pipeline with corresponding research communities.
Imagine you are reading a scientific paper and thought about some modifications which you would want to test out in the current published paper. It would be great if the author provides you a link to a portal and you can rerun the whole analysis with your modifications on a subset or full datasets. Thus research will progress at a faster pace and results will be reproducible and collaboration between researchers can become much easier.
Currently researchers focus on sharing data with other researchers for collaboration or publications. As this enables other researchers to reproduce their results as well as to use their data for future research. The challenges with data analysis is that preparing data for analysis is a big part of analysis and poses a significant challenge in itself. DaaS will enable researchers to not only share their datasets but also the whole processing pipeline. This will enable researchers to collaborate at rate and ease which is not possible currently.
In recent years, the amount of data generated has been increasing exponentially. The data is coming from different sources such as gene sequencing, protein sequencing, neuroscience, sensor networks, network flows, social media, machine logs, satellite images etc. Researchers would like to analyze this data sets with advanced algorithms e.g. machine learning without worrying about the scale of data sets. The current method of using one/few high-end machine or traditional super computing has problem in handling big data. Simply to move 100 TB of data on 10 Gbps link, it takes 22.7 HOURS with full bandwidth. So we need a service where we can process data where it is stored and thus enable an efficient and scalable way of processing large data sets.
Of course you can use Amazon or Google cloud for it. But problems arises as data going out of country, jurisdiction/control (if you think it is not an issue, recommend reading ) and how cost will be covered. Moreover simple Amazon/Google cloud are not so easy to use either, there are large number of startups which are build around making them user friendly. Being a part of NREN  and national supercomputing in Norway, we are working on providing such a service to our researchers. We are also collaborating with our Nordic partners under Glenna project . This will enable us to share the resources among Nordic countries and build knowledge together.