Hadoop Platform as a Service (PaaS) is here!

Over the past few years, businesses have been increasingly turning to the cloud to meet their Hadoop and Big Data needs. Cloud offerings provide varying levels of virtualization support to organizations, from simple virtual machine hosting to full-blown cloud solutions. One way to explain the various cloud models is using the analogy of “Pizza as a Service” as described by eipserver.com. In this analogy, they describe on-premises solutions as making your own Pizza from scratch, Infrastructure as a Service (IaaS) as buying a frozen Pizza, Platform as a service (PaaS) as ordering pizza delivery, and Software as a Service (SaaS) as simply eating out.

Cloudera has made significant advances in making Hadoop Platform as a Service (PaaS) a reality. Earlier this year, Cloudera announced its Cloudera Altus solution to provide its Hadoop distribution as a service on Amazon Web Services (AWS) or Microsoft Azure (currently in Beta).

The journey of Hadoop to the cloud has been a long one. In the beginning, cloud clusters were simply exact replicas of the on-site Hadoop clusters. As the cloud platform progressed and matured, Dynamic Hadoop Clusters were introduced to allow changes in the size of clusters, and better meet computation needs. The benefit of the Dynamic Hadoop Cluster is its considerable cost savings. It ensures that organizations only pay for resources when they need them.
Dynamic Hadoop Clusters were created by decoupling the data and worker nodes. Data was moved from the nodes to centralized robust storage centres such as Amazon S3 or the Microsoft Azure Data Lake Store. This move resulted in reduced performance compared to traditional Hadoop Clusters, but allowed for significant cost savings to organizations who implemented it.

Over the past few years, both Amazon Web Services and Microsoft Azure HDInsight provided mature solutions for Dynamic Hadoop Cluster in the cloud. The offerings from Amazon and Microsoft allow for dynamic clusters, but still requires the client to actively manage their cluster deployment (e.g. schedule when to add/remove nodes, write and run scripts to change cluster configuration etc.)

From a client perspective, the ideal situation would be to specify what the organization plans to achieve (i.e. the workload the cluster should perform) and let the environment take care of the rest. This approach is truly a Platform as a Service (PaaS) approach. Most providers understand that this is the end goal and are actively moving in this direction.

Cloudera Altus takes dynamic cloud to the next level. It currently supports creating temporary clusters for your scheduled ETL/ELT processes on demand. As the first initialization step to the ETL/ELT process, Cloudera Altus creates and deploys the required cluster on the cloud, then submits the ETL/ELT process to execute on the newly created cluster. Once the ETL/ELT process is complete, the cluster is automatically removed so no further billing will be applied.

With the release of Cloudera Altus solution, it is clear that more businesses will take advantage of Hadoop in the cloud, helping them to cost-effectively drive business transformation.

– Eyal Edelman, Big Data Practice Lead

Suggested Articles

Speak Your Mind