In the previous post we have installed Spark development environment up and running. Now I'll explain the Apache Spark basic concepts. These concepts will provide insight into how Spark works. Spark Architecture Spark application consist of a driver process and a set of executor processes. Driver program is the one that contains the user's main() … Continue reading Apache Spark Fundamentals
Category: Technical
Posts related to technical topics
Setting up Spark development environment
Now that you know what Spark is for, we'll show you how to set up and test a Spark development environment on Windows, Linux (Ubuntu), and macOS X — whatever common operating system you are using, this article should give you what you need to be able to start developing Spark applications. What is Spark … Continue reading Setting up Spark development environment
Apache Spark Introduction
In this first Spark article, I'll try to answer the question "What is Spark?" and give you an in-depth overview of what makes it special. I'll outline the main features, including some of the advanced functionality. I'll also show you some of the main building blocks. What is Spark? Apache Spark is an open-source cluster-computing … Continue reading Apache Spark Introduction
Getting started with Django
It's been a while I've been using Django, but when I started it wasn't a easy journey. However, I believe Django is a very easy web framework. Initially I was concerned about the look and feel of Django pages, but this can be easily overcome by plugging in the Bootstrap, which is again a great … Continue reading Getting started with Django
Big Data Introduction
Big data is the IT industry's hottest buzz word. Everyone from developers to decision makers, from a small startups to big names are dealing in it. There are so many resources available online, which give complex theories, but in simple terms what is big data and what is its use because of which industry is … Continue reading Big Data Introduction
Cassandra Quick Basics
Cassandra is a decentralized No-Sql database. It works on multi node cluster where every node is identical to every other node (server symmetry – all node features same). There is no master node concept, as in Hadoop, hence there is no single point of failure. A few features/terms Elastic scalability: able to scale, up or … Continue reading Cassandra Quick Basics
Mockito: Java Unit Testing with Mock Objects
Mockito is an open source testing framework for Java. The framework allows the creation of Test Double objects called, "Mock Objects" in automated unit tests for the purpose of Test-driven Development (TDD) or Behavior Driven Development (BDD). Mockito compared to EasyMock seems to be more easily and has more flexibility. First it's able to mock … Continue reading Mockito: Java Unit Testing with Mock Objects