Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors

Anna University Notes | Big Data Analytics

UNIT 01 UNDERSTANDING BIG DATA

Introduction to big data – convergence of key trends – unstructured data – industry examples of big
data – web analytics – big data applications– big data technologies – introduction to Hadoop – open
source technologies – cloud and big data – mobile business intelligence – Crowd sourcing analytics
– inter and trans firewall analytics.

UNIT II NOSQL DATA MANAGEMENT

Introduction to NoSQL – aggregate data models – key-value and document data models –
relationships – graph databases – schemaless databases – materialized views – distribution models
– master-slave replication – consistency – Cassandra – Cassandra data model – Cassandra
examples – Cassandra clients

UNIT 03 MAP REDUCE APPLICATIONS

MapReduce workflows – unit tests with MRUnit – test data and local tests – anatomy of MapReduce
job run – classic Map-reduce – YARN – failures in classic Map-reduce and YARN – job scheduling
– shuffle and sort – task execution – MapReduce types – input formats – output formats.

UNIT 04 BASICS OF HADOOP

Data format – analyzing data with Hadoop – scaling out – Hadoop streaming – Hadoop pipes –
design of Hadoop distributed file system (HDFS) – HDFS concepts – Java interface – data flow –
Hadoop I/O – data integrity – compression – serialization – Avro – file-based data structures –
Cassandra – Hadoop integration.

UNIT 05 HADOOP RELATED TOOLS

Hbase – data model and implementations – Hbase clients – Hbase examples – praxis.
Pig – Grunt – pig data model – Pig Latin – developing and testing Pig Latin scripts.
Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL
queries.