Skip to product information
1 of 1

Big Data Analytics with Hadoop and Spark

Regular price $39.95
Sale price $39.95 Regular price
Sale Sold out
Tax included. Shipping calculated at checkout.
Type: Paperback
In stock (100 units), ready to be shipped

FREE PREVIEW

ISBN: 9789365894745
eISBN: 9789365896893
Authors: Shikha Mehta
Rights: Worldwide
Edition: 2026
Pages:  384
Dimension: 7.5*9.25 Inches
Book Type: Paperback

View Product Details

Technologies like Hadoop and Spark, powered by the Cloudera platform, have become essential for storing, processing, and analyzing big data across various industries, including finance, healthcare, e-commerce, and research in today’s data-driven world.

This book systematically navigates the entire ecosystem, starting with big data fundamentals, security, and HDFS architecture before mastering MapReduce through weather and stock data case studies. Readers will gain hands-on experience with the Cloudera framework, learning high-level scripting with Pig Latin and structured data warehousing using HiveQL’s Metastore and partitions. Additionally, it explores NoSQL versatility with HBase and MongoDB’s CAP theorem, followed by Scala programming and Spark’s high-speed in-memory engine. You will learn to optimize queries with the Catalyst optimizer and process complex Parquet or JSON files using Spark SQL DataFrames. The book also covers machine learning pipelines with spark.ml for professional-grade classification and clustering applications.

By the end of this book, readers will be able to develop strong conceptual clarity and practical expertise in big data analytics. This will enable them to confidently design, implement, and manage scalable data processing solutions, preparing them to solve real-world data challenges and take on professional roles in big data engineering and analytics.

WHAT YOU WILL LEARN
● Understand big data concepts, architecture, ethics, and applications.
● Build scalable storage using HDFS and MapReduce.
● Perform data analysis using Pig and Hive.
● Develop NoSQL solutions using HBase and MongoDB.
● Process large datasets using Apache Spark.
● Analyze data using Spark SQL and DataFrames.
● Implement machine learning using PySpark.

WHO THIS BOOK IS FOR
This book is ideal for students, researchers, and academicians. It empowers aspiring big data engineers, data scientists, and software engineers. Readers should possess basic programming knowledge and database fundamentals to master Hadoop and Spark for professional-grade data science and faculty-level instruction.


1. Exploring Big Data
2. Introduction to Hadoop
3. Hadoop Distributed File System and MapReduce
4. Big Data Analysis with Cloudera
5. Stock Data Analysis with Cloudera
6. Understanding Pig for Big Data Processing
7. Operators in Pig Latin
8. Functions in Apache Pig
9. Hive-data Warehousing and SQL-like Queries
10. Data Analysis Using Hive
11. Data Storage and Processing Using HBase
12. MongoDB
13. Introduction to Spark for Big Data Processing
14. Getting Started with Scala Programming
15. Data Analysis with Spark SQL
16. Machine Learning Application Using PySpark

Dr. Shikha Mehta is an academician, researcher, and technology professional with extensive experience in computer science education, data analytics, and big data technologies. She is the HOD, Department of Computer Science and Engineering and Information Technology, at Jaypee Institute of Information Technology, Noida. She has been actively involved in teaching, curriculum design, and faculty development, with a strong focus on bridging theoretical foundations with real-world industry practices.

With deep expertise in big data analytics, distributed computing, and machine learning, Dr. Mehta has designed and delivered advanced training programs on Hadoop, Apache Spark, Python, and data-driven decision-making frameworks for students, faculty members, and working professionals. Her academic contributions include the development of structured, application-oriented learning modules and executive-level instructional content tailored to emerging industry demands.

Dr. Mehta has played a key role in mentoring faculty and learners in adopting modern data analytics tools and scalable data processing frameworks. Her work emphasizes hands-on learning, real-world case studies, and ethical considerations in data usage, enabling learners to build industry-ready skills. Through her teaching, research, and academic leadership, she continues to contribute to the advancement of big data education and analytics-driven innovation.