System infrastructure

Modern online ICT services are all running on to of some system infrastructure. UNINETT’s innovation program study and experiment with advanced and innovative solutions for system infrastructure. Low total cost, high reliability and scalability are core parameters.

Live Audio and Video over WebRTC’s datachannel

UNINETT IoU has over the summer developed a WebRTC demonstrator which attempts something “naughty”…

As part of our work on WebRTC as well as our work within low latency collaboration tools, we decided to find an answer to the following research questions:

Is it possible to transfer live audio and video over the data-channel in WebRTC?
If yes, can we achieve lower latency with data-channels than wit WebRTC media-channels?

Our demonstrator, titled WebRTC data-media, is now available (also on github.) In short the demonstrator

  • consists of a node.js based server and a html+css+javascript based webrtc client,
  • applies the socket.io framework to provide “rooms” for peers to communicate basic signaling,
  • sets up a separate independent data-channels for audio and video content,
  • applies “getUserMedia” to grab live audio and video from microphone and camera,
  • applies the “ScriptProcessorNode” class to grab, transfer, and play out raw audio samples,
  • applies canvas‘s “drawImage”” and “toDataURL” to grab, compress and send video frames

The implementation of the demonstrator is a success. Both live audio and video is transferable over webRTC data-channels. Hence the answer to our first question is a definitive “yes”.

However measurements (to be published in our Multimedia Delay Database) show no significant improvement in delay compared to what “vanilla” WebRTC multimedia channels can offer.

For audio, delay is at best similar, but raw data-channel-audio degrades in quality when buffer lengths are reduced to the supported minimum for ScriptProcessorNode, i.e. 265 samples. Packet loss/jitter is probably caused by the fact that ScriptProcessorNode’s javascript code is executed in the web page’s main thread. Utilizing the upcoming AudioWorklet API will potentially imporve upon this since separate threads for audio processing will be available. However AudioWorklets are (when this article is written) not yet supported by any browser. (Only a developers patch seems to exist for Chromium.)

For video, delay is also very similar, at best slightly better with data-channel transfer. The most significant limiting factor in this case seem to be a combination of maximum frame rate provided by the available cameras and the necessary buffering (and buffer copying) of video frames in the code.  A maximum of 30 frame per second implies an added 33ms delay for each frame buffered.

Attempts were made (in early version of the demonstrator) to minimize buffering by pushing raw uncompressed video frames across the data channel. But as the data channel capacity was limited to ~150 Mbps , only very low resolution video (less than VGA) was possible to transmit. Hence no measurements were performed for this version. If data-channel capacity can be increased and/or buffer handling made more efficient by applying multi-threading via Worklets, is currently an open open question.

A future version of the demonstrator will aim to implement and utilized Worklets both for audio- and video-processing.

(Note: This blog will be updated with diagrams an explicit results soon…)

Clean Sky and Netsys 2017

In week 11 (March 13-17) 2017 both Clean Sky‘s (an EU ITN) annual conference as well as the  NetSys 2017 conference took place in Göttingen, Germany. UNINETT visited both events.

The Clean Sky fellows (PhD students) are all progressing steadily with their SDN-NFV topics. A majority of the works focus on optimizing different aspects of a future edge/fog computing environment.  Among the topics presented (some by keynote speakers) this time was

  • ClusPR: An algorithm for optimized placement of both flows and VNF in a topology
  • Profiling the edge network: Work in progress to anonymized web-logs so that they may be applied for user interests analysis
  • Multihop middle-box selection: New DNS record suggested to enable a client to influence how a chain of middle-boxes is to be composed
  • NFV state migration: “Statelets” introduced (small state update packets) to enable close to seamless migration of a VFN.
  • VNF placement in the edge-cloud: Network cost, processing cost with energy parameters are included in  a placement algorithm. IoT is the target domain.
  • Deploying distributed application: A VNF is just a high performance (low delay and/or high throughput) micro-service. Software developer need to supply quantitative information (from code profiling) to deployments engineers. New deployment templates suggested.

UNINETT is currently hosting one of the Clean Sky fellows and supporting him in his work on profiling user behavior to optimized data caching and computation in fog-computing contexts. Web server logs will (hopefully) be made available, after being anonymized, for profiling analysis (ref. pin 2 above).

NetSys 2017 presented work from a fairly broad range of networking research topics. “Single line” summaries of the more relevant presentations, seen from a backbone operators point of view, follows below.

  • Sufian Hameed et al (NUCES) presented a light weight protocol which may utilize SDN equipment in multiple domains (ASes) to block DDoS attacks efficiently.
  • Nicholas Gray et al (University of Würzburg) suggested a hot-standby regime for L4 firewalls.
  • Robert Bauer et al (Karlsruhe Institute of Technology) showed how “flow load” distribution can be realized in an SDN network. A switch with full FIB may be offloaded by having entries moved to neighboring switches.
  • Leonhard Nobach et al (Technische Universität Darmstadt) presented how the balance between applying FPGA or COTS hardware for NFV can be optimized.
  • Keynote speaker Henning Schulzrinne ( Columbia University) emphasized that IoT expose all security deficiencies of the internet. There is currently little incentive for producers and consumers to change this, since none of them are directly affected when IoT devices are exploited for e.g. DDoS attacks. Large scale management (enrollment, updates, …) of IoT devices will be crucial in the future.
  • Cristina Muñoz et al (University of Cambridge) explained how iterative bloom-filters may be applied to reduce FIB size in a named data network (or information centric network, ICN)  node.
  • Keynote speaker Wieland Holfelder (Google Germany GmbH) recommends Googles tensorflow.org project for machine learning.
  • Keynote speaker Rolf Stadler (KTH) showed how a prediction engine can be trained to predict QoE-parameters from system KPI values only (e.g. from statistics in linux servers’s  /proc or just statistics from network switches.)
  • Claas Lorenz (genua GmbH) suggested how complex firewall rule sets may be analyses and verified efficiently.

In search for “the meaning of SDN”

UNINETTs 2015 innovation project on SDN technology has continued the search for a “the meaning of SDN” for an IP backbone network operator. A growing number of vendors and  communities (both commercially driven and more idealistic)  keep on announcing enthusiastically SDN as the way to go while also posting warnings about believing SDN is the panacea for network management challenges.

What is clear so far is that the big players providing could services, e.g. Amazon, Facebook, Microsoft Azure and Google, have made great advancements within the data center management domain by introducing SDN controlled switching hardware and centralized control and orchestration software. Several papers reporting such success where presented at e.g. Sigcomm 2015 . Strict top-down control enables (not surprisingly) configurations which push utilization of resources close to 100%. Near optimal reconfiguration due to dynamic demands is also achievable.

When it comes to SDN applied in an average backbone network (e.g. like UNINETT), a growing number of option seem to emerge, much due to advances made in data center networking.

  • Capacities of open networking SDN hardware has increased to 10/40Gbps
  • SDN controllers have matured. Several open source license free alternatives are now relevant.

But when it comes to orchestration of a overall SDN based infrastructure, the options available are less. Most open frameworks, e.g. like Open Stack, are tuned towards data center resource management. It is not obvious how such frameworks can be reapplied in a backbone context.  The big cloud service players do to a large extent have their inter-data center backbone networks operated by SDN infrastructures. However their orchestration systems are “home grown”, potentially not general enough, and not (yet) publicly available.

UNINETTs SDN2015 innovation project has resulted in gained knowledge in the domain of SDN through a collection of activities.

  • Aryan TaheriMonfared, partly funded by UNINETT,  completed his PhD October 26th 2015. His thesis is titled “Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments”
  • A half day workshop on SDN was successfully held in August 2015.
  • A re-initiation of UNINETTs SDN-lab has been started. Due to cost and timeing factors, a locally located variant of the lab is now in progress (instead of the first inter-city variant). The mail aim now is to better enable experimentation with inter-datacenter traffic management.
  • Participation in the InaaS task of Geant4 1 year JRA2 has been done. Due to limited overlap with activities directly relevant for UNINETT, an observatory role was taken.
  • UNINETT has contributed in applications for research project funding within the SDN-domain.
  • Analysis of controller organization and placement in a futuristic SDN-based UNINETT backbone has been initiated, but not yet concluded.
  • A presentation on datacentre backbone networks was held at the EU FP7 Clean Sky Summer School in Göttingen in September 2015.

 

Workshop on SDN, Summer 2015

UNINETT invited to a another workshop in our series of half day workshops on SDN at the end of the summer, August 27 2015. 8 people attended, arriving from Transpacket, Department of Telematics at NTNU and UNINETT. Two participants attended remotely from Oslo.

The workshop program was the following

Presentation slides will soon become available.

Discussions went lively throughout the workshop, and many aspects and challenges with SDN where addressed. The participants where in general satisfied with the workshop (even though attendance was somewhat lower than expected). Hence UNINETT will strive to offer another workshop in the spring 2016.

SDN at SIGCOMM 2015

Close to 1/3 of all main track presentation at SIGCOMM 2015 in London, August 18-20, addressed challenges and experiences related to data centres. Software Defined Networking was often the actual or assumed underlying technology.

All SIGCOMM 2015 papers are available online  via the conference web site.

A general impression is that most accepted work at SIGCOMM is funded by “the big players”, e.g. Google, Facebook, Microsoft, Cisco. A majority of work presented reports results from mature research often already deployed in pilot (and even production) infrastructures. Hence few “crazy” new ideas are introduced.

Fortunately the poster sessions did give room for some novel and surprising ideas, among them free space optics based intra-data centre networks with physical multicast capabilities.

This post summarises a selection of the papers presented.

  • Best paper award:  Stefano Vissicchio et al from UCLouvain presented their SDN concept added on top of a link-state routed network. A central controller introduce fake nodes by communicating tailored link state announcements to routers in the network, and enable traffic engineering on a source-destination level. If the controller fails, the system default back to standard link-state behaviour.
  • Keynote:  Albert Greenberg from Microsoft explained how the Azure infrastructure is running close to 100% on SDN technology. 40Gbps 4 level clos networks interconnects servers in data centres. Data centre resources are now applied to operate the data centre, e.g. fairly intense active monitoring of end-to-end paths by running traffic generators and sinks.
  • Policy languages: Prakash et al from University of Wisconsin-Madison presented a graph based system for better policy conflict managements. Set theory is applied. It seems to scale well, but results are none-deterministic.
  • Resource management: Several papers presented techniques to optimize placement and access to data centre resource. Scheduling challenges were addressed. Google gave a historical summary of their data centre activities explaining how and what they have learned is important to be able to scale up their installations.
  • Wireless aspects: A set of papers look into the utilizing backscatter, i.e. superimposing signals on top of reflected or transit waves from other sources,  in new ways.  High accuracy positioning with off-the-shelf  wifi equipment was also address by several groups.
  • Video streaming: Work on optimization of content placement in content delivery networks (CDNs) where presented, as well as advanced control theory driven rate control in video players
  • Physical internet: Ramakrishnan Durairajan et al from University of Wisconsin – Madison presented work on mapping physical infrastructure of US based ISPs. Results show that ducts are shared frequently and as many as 80% share at least one duct. Hence care is needed to ensure true resilience when multi-homing to different ISPs.

Otto’s personal notes are available on request.

Evolving Infrastructure for Data Intensive Computing

In earlier post I talked about Data Analytics as Service (DaaS) service, here we will discuss thoughts on how we can provide such a service with current modern data center technologies.

Tl;Dr

The container movement has taken whole data center approach by surprise. The main reason for this has been performance benefit, flexibility and reproducibility. The management of containers is an ongoing challenge and this article tries to cover main technologies in this area ranging from resource management, storage, networking and service discovery.

Long Story

To provide any service in computing world (may be in real universe too [1], story for another time) you need Storage, Computation, Network and Discoverability. The question becomes as how we are going to arrange these components. On top of that we have to support multiple users from multiple tenants. So we end up creating a virtualized environment for each tenant. Later the question becomes at which level we will create virtualization. There is an interesting talk from Byran Cantrill, Joyent discussing about virtualization at different levels from

Hardware -> OS -> Application 

It turns out that the virtualization provided at OS level results in better flexibility as well as performance.

So if we decided to use OS as an abstraction level then the KVM, XEN or any other kind of Hypervisors are out of the question as they result in multiple OS running separately without sharing underlying OS and being lied by Hypervisor about being alone. The picture [2] below shows the different types of Hypervisors.

Hypervisor types

The benefit of using multiple OS is clear isolation among tenants and security. But you suffer in terms of performance as you can not utilize unused resources from underlying host and not having single OS restricts the optimizations in scheduling for CPU and memory. Joyent has done comparison [3] of running containers in their cloud service based on OS level virtualization and Hypervisor level on Amazon AWS. There was no surprise in results as OS level virtualization wins by all means. The issues with current Linux containers is the security concerns. Work has been done on Docker, Rocket and in the Kernel itself to harden it, drop unnecessary capabilities from containers by default.

So now we agree that OS level virtualization is the way to go, the question becomes how we are going to arrange the components to provide an elastic/scalable/fault tolerant service with low system-to-administrator ratio (nice article from about such principles [4]). It is becoming clear that unless you work at Google/Twitter/Facebook/Microsoft, your private cluster will not have capacity to handle your needs all the time. So you will need technologies which allows you to combine resources from public clouds to become as hybrid cloud. I work at the Norwegian NREN and to provide the DaaS service, we need to be flexible and able to combine resources from other providers.

Resource Management

There are four technologies: Openstack, SmartOS Datacenter, Mesos and Kubernetes considered in this article. These technologies are to manage data center resources and provide easier access to users.  Some of them (all except openstack) make whole datacenter appear more like a single large system. I will compare them on the following metrics

Elasticity, Scalability, Fault tolerance, Multi tenancy, Hybrid cloud, System-to-administrator (S/A) ratio

 

Openstack provides resources in terms of virtual machines to users  using hypervisor e.g. KVM underneath. (Although I should not add openstack to comparison because of hypervisor use, but recent efforts on adding docker support make me to add this to our modern datacenter comparison).

OpenStack
Elasticity OK to scale VMs up & down but not as quick as others like Mesos/Kubernetes
Scalability OK for 100s of machines but for larger it becomes harder to manage and scale
Fault Tolerance Possible to setup HA Openstack but requires a larger effort. It is only for Openstack itself but not for VMs as they can not relocated automatically to other compute nodes which is a big drawback in a large data center.
Hybrid Cloud Not supported as abstraction level is not right
Multi Tenancy Good support
S/A Ratio It require high S/A ratio and has been notoriously famous for being hard to upgrade

 

Mesos is a resource manager for your whole cluster. The motto of Mesos is the kernel of your data center. It exposes whole data center as one single computer and provides access to resources in terms of CPU, RAM, Disk. Thus asks users to specify how much resources they need rather than how many VMs they need. As in the end user care about resources to run his/her application not VMs. It is used extensively in many companies e.g. Twitter, Airbnb. It uses frameworks e.g. Apache Spark, Marathon etc which talk to Mesos master to ask for resources.

Mesos
Elasticity Easy to scale your applications as you can simply ask for more resources
Scalability Known to be run on 10000+ machines at twitter
Fault Tolerance It uses Zookeeper to have quorum of master to provide HA support. Also it can reschedule users application automatically upon failure of nodes.
Hybrid Cloud Its possible to combine resources from multiple clouds. As you can have multiple frameworks instances talking to different Mesos masters in private/public clouds.
Multi Tenancy Not good support yet. It is possible but isolation and security of underlying containers is still in question.
S/A Ratio Low as 2-3 people can manage 10000+ machines

Kubernetes is another project to manage cluster running containers as pods. It uses similar principles to Mesos as to provide resources not in terms of VMs. It is a new project and based on experiences from Google Borg [5]. The main difference between Kubernetes and Mesos is that Kubenetes everything is container where as in Mesos you have frameworks e.g. Apache Spark, Marathon which underneath uses containers. So Mesos adds one more layer of abstraction. Check out interesting talk from Tim Hockin on Kubernetes.

Kubernetes
Elasticity Easy to scale your applications as you can simply ask for more resources for yours pods or create more pods with replication controller
Scalability Its a new project but if it will resemble Borg then its going to support thousands of machines
Fault Tolerance Kubernetes is currently not a HA system but there is work going on to add support for it. It has support for rescheduling users applications automatically upon failure of nodes.
Hybrid Cloud Its possible to combine resources from multiple clouds. As it is their motto too on the front page to support private/public and hybrid clouds.
Multi Tenancy Not good support yet. Looking at the roadmap things will change in this area.
S/A Ratio Low as 2-3 people can manage large number of machines

SmartDataCenter (SDC) is from Joyent based on SmartOS. It provides container based service “Triton” in pubic/private cluster using Zones which are production ready. The main benefit is that it uses a technology which is battle tested and ready for production. Downside is that it is based on Open Solaris kernel, so if you need kernel modules that are available only for Linux then you are out of luck. But running Linux userspace applications are fine, as SDC has LX branded Zones to run Linux applications.

SmartDataCenter
Elasticity Easy to scale applications and claims online resizing of resources.
Scalability Been known to run on large cluster (1000+) and providing public cloud service
Fault Tolerance SDC architecture uses a headnode from where all machines boot up in your cluster. As SmartOS is only RAM based OS so headnode can be SPoF in case of power failure.
Hybrid Cloud Not clear from there documentation yet how good support is there for combining private and their public cloud.
Multi Tenancy Good support for it as Zones are well isolated and provide good security to have multiple tenants.
S/A Ratio Low as 2-3 people can manage large number of machines.

Storage

For storage we need a scale out storage system which enables processing of data where it is stored. As the data becomes “big data”, moving it around is not really a solution. The principle to store large data set is to divide it in smaller blocks and store it on multiple servers for performance and reliability. Using this principle, you can have object storage service e.g. Amazon S3 for storage or you add layers on top of it to provide block or file system too using these underlying objects e.g. HDFS for file system.

In open source currently Ceph is one of the main viable candidates for scale out storage. Ceph provides object, block and file system which satisfy above mentioned criteria. The maturity of Ceph varies from type of storage, object and block are stable (Yahoo’s object store) where as file system is not. It will take may be a year more to get file system stable enough to be useful. Mesos and Kubernetes both trying to add support for persistence layer in their resource management part. This will enable running storage components itself inside normal resource managers and lowers the administration effort to manage storage.

Another option is the SDC’s object storage component called Manta. Manta is better than Amazon S3 as you can run computation as well on the data where it is stored. Manta embraces data locality to handle big data but does not have Hadoop bindings (embrace Hadoop guys similarly you did with LX branded zones for Linux).  By having Hadoop bindings, the world of Hadoop will be open to Manta as well, cause you do need to support for distributed machine learning algorithms to go beyond basic word counting. Of course we can run HDFS on top of normal ZFS which SDC uses, then we have to duplicate our Big Data!! Another issue with SDC is the lack of POSIX like distributed file system, it has NFS interface to Manta. But NFS implementation does not have locking support. So it is hard to mount it on multiple containers and run MPI applications or other tools which needs distributed file system. We can run Ceph on ZFS again but for CephFS you need kernel module which is not available on SmartOS till yet.

Providing distributed scalable storage is not an easy challenge. Its all about trade offs, remember the CAP theorem. There is a lot of work going on in this area but there is no clear winner here till yet.

Networking

To build a scale out architecture, we need to connect these containers to each other. The default model of Docker is that each container gets an IP from a bridge docker0. As this is local to each machine, so containers running on different machines can get the same IP address which makes communication between containers hard.

To solve this we need to give each container a unique IP. To do that there as mainly two design options

  • Overlay network: Create a VLAN overlay on physical network and give each container a unique IP address. There are few projects which has chosen this path. OpenvSwitch, Weave, Flannel all uses this method. For example Weave makes it easy to combine containers running in different cloud providers by creating overlay between them. The main concern is that we get a performance hit by doing so. You can use Multicast too in these overlay network e.g in Weave Multicast.
  • No Overlay: A project name Calico is trying to solve the container networking using BGP and Linux ACL. It does not create an overlay network but simply use L3 routing and ACLs to isolate traffic between containers.

With support of IPv6 in docker and other container runtime (rocket from CoreOS, Zones from SDC), there is going to be an IP address for each container, questions remains is using which method. There has not been much documentations on performance of either methods, at the same time approaches are evolving at faster pace to mature. There is Software Defined Networking (SDN) to watch out too. As how we can combine SDN with above mentioned projects to configure network according to demand.

Service Discovery

You have got your cluster with containers dynamically assigned to nodes. Now how you are going to find service X, you need to know on which server service X is running and which port. You will need to configure your load balancer too with IP address of newer service X instance.

Enter the world of service discovery for modern data center. Most of the projects in this area support DNS for discovery with few has support for HTTP based key/value in addition.

  • Single Datacenter: Mesos-DNS is a DNS based service discovery to be used together with Mesos. SkyDNS is service discovery mechanism based on etcd.
  • Multi Datacenter: Consul is a service discovery tool for inter cluster. It is based on etcd as well. It also support service health checkup too

There might be more projects doing service discovery as the container area is very hot and evolving. But it seems Consul is very good candidate to realize service discovery in hybrid cloud infrastructure.

Summary

To provide an infrastructure to handle the challenges for coming years with big data from genomes, sensors, videos we need a scale out infrastructure with fault tolerance and automatic healing. Technologies to run on such infrastructure needs to support data locality as moving data around is not a solution.

There is no clear winner when it comes resource managers. Openstack is bagged with legacy thinking (VMs), no support for Hybrid clouds and low systems-to-administrator ratio. Mesos embraces modern thinking in terms of resources/applications not VMs but started before containers movement went bananas. Therefore it has another layer of abstraction called frameworks. Although frameworks underneath uses Cgroups for isolation, but container movement is not only about isolations. Kubernetes might get this container movement right as lessons are learned from Google Borg (running containers for 5+ years in production). Joyent SDC has been around containers for 8+ years and has production ready system when it comes to resource manager and networking. But Storage layer still leaves you with a feeling of missing some crucial parts. For us a distributed file system is a must, being a NREN and supercomputing organization, as researchers will need to run variety of workloads and will require a scale out data storage layer to store and process data with data locality support.

I have not covered new container OSes (CoreOS, RancherOS, Atomic, Photon) to run Kubernetes/Mesos in this post. Also this post did not cover PaaS solutions (Deis, Openshift, Marathon, Aurora).

Thanks to my colleagues Morten Knutsen & Olav Kvittem for interesting discussion and feedback on the article.

References:

[1]: Programming the Universe: A Quantum Computer Scientist Takes on the Cosmos

[2]: http://en.wikipedia.org/wiki/Hypervisor

[3]: https://www.joyent.com/blog/docker-bake-off-aws-vs-joyent

[4]: On Designing and Deploying Internet-Scale Services

[5]: Large-scale cluster management at Google with Borg

Data Analysis as Service

The goals for Data Analysis as Service (DaaS):

  •  A service to enable processing of large data sets, so called Big Data, in parallel using data locality principle
  • Able to share research data as well as processing pipeline with corresponding research communities.

Imagine you are reading a scientific paper and thought about some modifications which you would want to test out in the current published paper. It would be great if the author provides you a link to a portal and you can rerun the whole analysis with your modifications on a subset or full datasets. Thus research will progress at a faster pace and results will be reproducible and collaboration between researchers can become much easier.

Currently researchers focus on sharing data with other researchers for collaboration or publications. As this enables other researchers to reproduce their results as well as to use their data for future research. The challenges with data analysis is that preparing data for analysis is a big part of analysis and poses a significant challenge in itself. DaaS will enable researchers to not only share their datasets but also the whole processing pipeline. This will enable researchers to collaborate at rate and ease which is not possible currently.

In recent years, the amount of data generated has been increasing exponentially. The data is coming from different sources such as gene sequencing, protein sequencing, neuroscience, sensor networks, network flows, social media, machine logs, satellite images etc. Researchers would like to analyze this data sets with advanced algorithms e.g. machine learning without worrying about the scale of data sets. The current method of using one/few high-end machine or traditional super computing has problem in handling big data. Simply to move 100 TB of data on 10 Gbps link, it takes 22.7 HOURS with full bandwidth. So we need a service where we can process data where it is stored and thus enable an efficient and scalable way of processing large data sets.

Of course you can use Amazon or Google cloud for it. But problems arises as data going out of country, jurisdiction/control (if you think it is not an issue, recommend reading [1]) and how cost will be covered. Moreover simple Amazon/Google cloud are not so easy to use either, there are large number of startups which are build around making them user friendly. Being a part of NREN [2] and national supercomputing in Norway, we are working on providing such a service to our researchers. We are also collaborating with our Nordic partners under Glenna project [3]. This will enable us to share the resources among Nordic countries and build knowledge together.

References:

[1] Who own the future

[2] National Research and Education Network

[2] Glenna NeIC Project

Hadoop yarn (2.2.0) setup with Ceph

In this post we will try to make Yarn use Ceph rather than HDFS as a file system. We assume that you already have Yarn cluster setup, if not then you can follow nice guide from Cloudera, you don’t need to install HDFS components. We also assume that you have Ceph cluster up and running, if not then you can follow nice how to guide from Ceph Docs. You should install the yarn nodemanagers on each hosts in your cluster where Ceph osds are running. Once you have Yarn and Ceph clusters up and running, follow these steps to make Yarn to start using Ceph.

  • Install following packages on each ceph/nodemanager nodes
  • apt-get install libcephfs-java libcephfs-jni
  • Once installed on each host, copy the ceph java lib to hadoop lib folders as
  • cp /usr/share/java/libcephfs-0.72.2.jar /usr/lib/hadoop/lib/
    cp /usr/lib/jni/libcephfs_jni.so* /usr/lib/hadoop/lib/native
  • You also will need to install the cephfs-hadoop plugin jar by compiling this git repo including the patch which add support for Hadoop 2. You can download the compiled jar from here which works with Cdh5 beta2.
  • Once you have build/downloaded the ceph-hadoop plugin jar copy it to hadoop lib folder as
  • cp ~/cephfs-hadoop-1.0-SNAPSHOT.jar /usr/lib/hadoop/lib/
  • Now we have all the libraries we need for Hadoop to run with ceph. Now its the time to setup the configuration in core-site.xml file to make hadoop use ceph. Make sure you have admin.secret file at the path you mentioning in the config options. Add the following options in the config file.
  • <property>
     <name>fs.defaultFS</name>
     <value>ceph://mon-host:6789/</value>
    </property>
    <property>
    <name>ceph.conf.file</name>
     <value>/etc/ceph/ceph.conf</value>
    </property>
    
    <property>
     <name>ceph.auth.id</name>
     <value>admin</value>
    </property>
    
    <property>
     <name>ceph.auth.keyfile</name>
     <value>/etc/hadoop/conf/admin.secret</value>
    </property>
    
    <property>
     <name>fs.ceph.impl</name>
     <value>org.apache.hadoop.fs.ceph.CephFileSystem</value>
    </property>
    
    <property>
     <name>fs.AbstractFileSystem.ceph.impl</name>
     <value>org.apache.hadoop.fs.ceph.CephHadoop2FileSystem</value>
    </property>
    
    <property>
     <name>ceph.object.size</name>
     <value>67108864</value>
    </property>
  • You do need to create the yarn and mapreduce history directories with correct permissions, similar to you would have done for HDFS. Now you can run the following command and see the contents of Ceph “/” path.
  • # hdfs dfs -ls /
    Found 3 items
    drw-r--r-- - yarn 209715222774 2014-04-04 15:24 /benchmarks
    drwx------ - yarn 40390852613 2014-04-04 13:59 /user
    drwxrwxrwt - yarn 20298878 2014-04-04 13:59 /var
  • Now you have Hadoop Yarn running with Ceph. Enjoy :-)

Data Analysis as a Service: Project motivation and abstract

In recent years, the amount of data generated has been increasing exponentially. The data is coming from different sources such as machine logs, gene sequencing, sensor networks, network flows, social media. Researchers in education and research sector from areas e.g. Bioinformatics, computer science, astronomy, environmental science has huge data sets and would like to analyze this data without worrying about the scale of data sets. Thus there is an increasing demand of getting this data to work by storing and processing it in a horizontally scalable way. In recent years there has been a rise of commercially backed distributed open source softwares by the main global actors e.g. Google, Yahoo, Twitter and Facebook. These distributed softwares utilize commodity hardware to store the big data and provide ability to process it locally, thus providing good economy of scale.

Data Analysis as a Service (DaaS) project will investigate the possibility of providing a common infrastructure to researchers where they can store and process their data using advanced algorithms at big scale. In this way, we can contribute in building an Eco-system where researchers can analyze their big data sets and able to share not just data but also the whole processing pipelines. This it will provide a great possibility to collaborate with researchers across different institutions/nations. Moreover researchers can help each other in evolving the Eco-system by adding new functionality and thus improving research and DaaS platform