How big MNC’s like Google, Facebook, Instagram etc stores, manages and manipulate Thousands of Terabytes of data with High Speed and High Efficiency…

Irfan
6 min readMar 17, 2021

Ever wondered how big MNC’s like Google, Flipcart, Youtube, Facebook, etc handles that much data ?

Big Data analytics have helped the organization to double its revenue in no time. An intelligent analysis of data is what you need if you wish to succeed in the coming years. Success is why almost all the top MNCs have adopted and started implementing big data practices for their databases.

Today we will see as to how these MNCs are using Big Data to their advantages. The blog covers the following topics-

WHY BIG DATA?

To address the needs of handling complex variety of data we need a mechanism or engineering and Big data helps in simplifying the complex data structures
It is needed to derive insights from complex and hug volumes of data. Data can be enormous but to analysis that we need a system and that is where Big data system helps
It helps in Cost reduction (Big Data) as the systems can be installed at affordable prices as well
It helps in better decision making process as the analytics/algorithms involved provide accurate and appropriate analysis in most of the cases
It is also scalable and can be used from a single machine to many servers

WHAT IS BIG DATA?

Big data is a term that describes the large volume of data — both structured and unstructured — that inundates a business on a day-to-day basis.
Big Data philosophy encompasses unstructured, semi-structured and structured data; however the main focus is on unstructured data.
Big Data represents the Information assets characterized by high Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value
It is all about finding the needle of value (as explained in course)

WHERE IS BIG DATA USED?

Big data is being used in industries that have high volume of unstructured data
Facebook, Amazon, Microsoft, IBM all big companies are using Big Data
It’s can also be used in smaller companies as the software is open source and can be installed on commodity hardware as well

WHEN IS BIG DATA USED?

When there is high volume of unstructured data then big data is being used is almost every case in the world
Also, when there is large amounts of structured or semi-structured data then big data helps derive insights with analytics models so there also big data is being used
Big data also helps in structuring of data and getting the answers through queries so even in querying data, big data is being used.

WHO IS USING/USES BIG DATA?

All the industry segments from social media to health services are using it
Hospitality / Hotel / Travel — applications and websites are using to understand the customer needs and put their pricing models and travel packages accordingly
Health Industry — from predicting ailments to medication, for making health kits and health insurance packages and provide necessary health care, health industry is using big data
Retail business like amazon, Walmart and many FMCG companies are using big data to understand customer behavior and build suitable offers for the customers to increase their sales
Banking and Financial Serves — understanding patterns of customer and their transactions and provide loans/credit cards. For predicting fraud transactions and avoid them in real time
Government — Even with Aadhaar and now a huge database on population, one can understand that government also is using big data to do census calculation, provide subsidies etc.. and plan for government schemes using big data

How much data is stored by various big Tech companies in a day? let us take the example of Google understand how they handle this much data.
How Google Applies Big Data

A data center normally holds petabytes to exabytes of data. Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters.
How much data does google handle??
This is one of those kind of questions whose answer can never be accurate. On a funnier note, it is like a child asking who come first hen or egg?? which is somewhat similar to asking “how much data does google handle??”
Commonly a PC holds 1TB of storage data and a smartphone holds about 64GB, but as days pass there are newer PCs and smartphones with bigger storage than this. We all know Google is the only one who can answer any kind of question!! We simply conclude that Google knows everything!! And Everything means Everything! Now you must be wondering how much data does google handle to answer all these questions!!??
Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.

Characteristics of Big Data

Big Data is categorized by important characteristics.

• VOLUME — Scale of data

• VELOCITY — Analysis of streaming data

• VARIETY — The Different form of data

  • VERACITY — Uncertainty of data

What is Distributed Storage?

A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

There are some tools for crunching of Bigdata …one of them is Hadoop..

Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.

Hadoop consists of four main modules:

  • Hadoop Distributed File System (HDFS) — A distributed file system that runs on standard or low-end hardware. HDFS provides better data throughput than traditional file systems, in addition to high fault tolerance and native support of large datasets.
  • Yet Another Resource Negotiator (YARN) — Manages and monitors cluster nodes and resource usage. It schedules jobs and tasks.
  • MapReduce — A framework that helps programs do the parallel computation on data. The map task takes input data and converts it into a dataset that can be computed in key value pairs. The output of the map task is consumed by reduce tasks to aggregate output and provide the desired result.
  • Hadoop Common — Provides common Java libraries that can be used across all modules.

Thank you for reading!!!!

--

--