Introduction to Big Data


All the data nerds out there, you’ll surely be knowing the game of data science right ?. Everybody who has sought knowledge about all Relational Database Management Systems such as MySQL, MariaDB, PostGRE SQL and have taken a glimpse about Non-relational database systems like MongoDB, Cassandra, must be knowing how to implement and operate these database systems for querying data sets. But did you’ll know that these traditional relational database systems are not capable and inefficient to process huge data(size more in GB’s and TB’s). Data exceeding the normal size of megabytes is called as ‘Big Data’.Well, the framework which is used to manage, monitor and query big data is called “Hadoop”.


Apache Hadoop is an open-source software framework used for handling and processing of huge data or data sets using the MapReduce programming model.  The framework also provides various facilities such as data security, data locality, data flexibility, fast querying of big data and power-packed Analytics.  The core of the framework consists of a file system named ‘HDFS’, which also stands for Hadoop Distributed File System and a processing part: MapReduce Programming Model.

The other modules of Big data apart from HDFS and Hadoop common libraries are Hadoop YARN and Hadoop MapReduce. Hadoop’s YARN(Yet Another Resource Negotiator) is also known for managing computing resources in clusters and using them for scheduling users’ applications. However, MapReduce model is used for large-scale data processing.

Hadoop not only has these two modules as its core part. There are other plugins and modules which can be installed on top of HDFS and Hadoop common to enhance the processing of data by providing simplified features for querying and real-time processing of data. Packages that can be installed on top or alongside Hadoop are Apache Pig, Apache Sqoop, Apache Pig, Apache Flume, Zookeeper, Apache Kafka, Apache Oozie, Apache Spark and much more. The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell scripts. Though MapReduce Java code is common, any programming language can be used with “Hadoop Streaming” to implement the “map” and “reduce” parts of the user’s program.

Packages or plugin-modules that can be installed on top of Hadoop

I hope this information regarding Hadoop helps to introduce its basic concepts. We’ll be sharing all the steps to install ‘Apache Hadoop(Newest Version-v2.7.3) on your systems(Linux/Windows) and few of its extra packages in our next post. Stay Tuned!

2 thoughts on “Introduction to Big Data

  1. Simply want to say your article is as surprising. The clarity in your post is simply cool
    and i could assume you are an expert on this subject.
    Well with your permission let me to grab your RSS feed to keep up to date with
    forthcoming post. Thanks a million and please carry on the gratifying work.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s