“ How large companies managing and manipulateing Big data using hadoop ’’

What is Big Data ?

| let’s understand what is big data

Simply put, this is a term that describes the large volumes of data that a business deals with on a daily basis.

| Features Of ‘Big Data’

Big data characteristics can be defined by one or more of these three characteristics:

  • Data that grows so quickly in volume it can’t be processed with conventional means
  • The mining, storage, analysis, sharing and visualisation of data.

| The Vs of Big Data

Big data is often characterized by Vs.

  • Volume, the amount of data generated each second. Volume is often used in reference to tools such as social media, credit cards, phones, photographs.
  • Value, this refers to the worth of the extracted data. Large amounts of data are useless unless you use it correctly.
  • Variety, this describes the different types of data generated. This term is largely used in reference to unstructured data such as images or social media posts.
  • Veracity, this refers to how trustworthy data is. If the data is not accurate or of poor quality, it is of little use.
  • Validity, like veracity this tells us how accurate the data is for its intended use.
  • Volatility refers to the age of the data. As fresh data is generated every hour or even minute stored data can quickly become irrelevant or historic. Volatility also refers to how long data needs to be kept before it can be discarded or archived.
  • Visualisation describes how challenging data can be to use. Limitations such as poor scalability or functionality can impact on visualization. Additionally, data sets can be large and vast. This makes it complicated to use or visualize in a meaningful way.

| what is ‘Hadoop’ ?

| Hadoop Architecture

Hadoop has a Master-Slave Architecture for data storage and distributed data processing using MapReduce and HDFS methods.

NameNode:NameNode represented every files and directory which is used in the namespace

| Features Of ‘Hadoop’

• Suitable for Big Data Analysis

| Network Topology In Hadoop

Topology (Arrangment) of the network, affects the performance of the Hadoop cluster when the size of the Hadoop cluster grows. In addition to the performance, one also needs to care about the high availability and handling of failures. In order to achieve this Hadoop, cluster formation makes use of network topology.

  • Different nodes on the same rack
  • Nodes on different racks of the same data center
  • Nodes in different data centers

| Here is a list of some of the large and small scale companies using Hadoop:

computer engineering