This article is especially for beginners, who enters into BigData and who are eager to start their career in BigData, DataScience and ML and don't know how to start and from where to start. Here in this thread, we discuss how to start learning Apache Spark with some hands-on. I will provide a guide to open a account with Databricks, create a cluster like interface using Databricks community edition which is free. We can also setup Spark in our local Windows Operating System with Jupyter notebook. To setup Spark in your local machine follow the link to Setup PySpark in Windows.
Beginner's Guide:
Data Engineering and Data Analytics are the trending field in day-to-day technology world. Most of the people in IT makes a decision to switch their career and want to move into the BigData space, but not sure how to make this big move. In this tutorial, I will help you guys to make a first step towards your next career move. We would have gone through lots of study material on Apache Spark using python or Scala. Most of us don't know how to setup Apache Spark in our own machine for free, so that we can have some hands-on knowledge and get the real experience of working with Bigdata.We can install Spark on our own personal computer and run our application in local mode, but we also need to have a exposure to cluster like environment to understand the working of cluster mode spark application. Databricks is one of the best cloud platform which helps students and fresh techies to learn for free with minimal cluster configuration. Without any further delay, let us go ahead and create a account in Databricks to start our career move.
Databricks Community Edition:
Databricks is a cloud platform, which offers the developers to work on both data engineering as well as data science effectively. Databricks is unified analytics platform, in which one can- Create cluster with n nodes depending on our application requirement,
- Create a database and databricks table to store data in a structured format.
- Comfort of choosing any language to code as it supports Scala, Python and SQL.
- Easily schedule applications and trigger a mail to the user based on success and failure of Spark application
- Notebook looks like a Jupyter notebook. Easy to develop Spark application and we can also use ML flow to implement our algorithms.
Let us move ahead and start creating our Databricks Community Edition account.
Create an Account:
To create a Databricks account, we need a valid E-mail id. Please follow the below stepsStep 1: Click on this give link to navigate through to Databricks Community Edition login page. Proceed with the signup button as shown in the figure below.
Step 2: Fill all the details in the given form and submit.
Step 3: Once after you submitted the form, you will receive a mail from Databricks for verification, click on the verification link and set password for your account. Please refer below screenshot for the reference.
Now your account is created and you can sign-in to your Databricks Community edition. Home page of Databricks looks like below screenshot.
Video Tutorial:
Cluster Setup:
Now we created a account, but to develop a spark application we need to create a cluster. We can create a free cluster which gives us the Driver memory of 15 GB, with 2 core. If you want to upgrade your subscription later, you can check the pricing details from here.Step 1: Choose the cluster icon in the left side pane and click on the create cluster button
Step 2: Fill in the cluster name and click on create button.
You can notice that it is a free cluster with 15.3 gb of Driver with 2 nodes. Note that cluster will automatically shutdown after 2 hours of inactivity. We can see the cluster is up and running and also shows details about the libraries, notebooks, spark UI, etc., Refer below screenshot.
First program in Databricks:
Our setup is ready and now we can start with our first program in Databricks, You can watch the video given in the above session to have a clear picture.Step 1: Click on Notebook icon in the left pane,
Step 2: Select user and right click, select create --> Notebook
Step 3: Provide name for the notebook, choose language and cluster and click on create button.
Now, we are ready with the setup to learn Spark using Databricks. Hope you enjoyed learning how to setup Databricks account for learnning Spark. Please do comment your doubts in the below comment box. Provide your support by subscribing to my channel.
Happy Learning!!!
0 Comments