
e-DBDA will educate the aspirants who want to make an impact in the corporate and academic world in the domain of big data analytics as data scientist and researcher, big data leads/administrators/managers, business analysts and data visualization specialists. The students will be able to work with big data platform, utilise various big data analysis techniques for useful business applications, design efficient algorithms for mining the data from large volumes, analyze the HADOOP and Map Reduce technologies associated with big data analytics, and explore big data applications.
- Graduate in Engineering in IT, Computer Science, Electronics, Telecommunications, Electrical, Instrumentation, OR
- Post Graduate Degree in Computer Science, IT, Electronics, Mathematics, Statistics, Physics, OR
- 4-year Graduation in Bioinformatics, OR
- MCA
Introduction to Business Analytics using some case studies, Summary Statistics, Making Right Business Decisions based on data, Statistical Concepts, Descriptive Statistics and its measures, Probability theory, Probability Distributions (Continuous and discrete- Normal, Binomial and Poisson distribution) and Data, Sampling and Estimation, Statistical Interfaces, Predictive modeling and analysis, Bayes’ Theorem, Central Limit theorem, Data Exploration & preparation, Concepts of Correlation, Covariance, Outliers, Regression Analysis, Forecasting Techniques, Simulation and Risk Analysis, Optimization, Linear, Nonlinear, Integer, Overview of Factor Analysis, Directional Data Analytics, Functional Data Analysis , Non parametric Tests- ANOVA, chi-Square, t-Test, U-Test; Predictive Modelling (From Correlation To Supervised Segmentation): Identifying Informative Attributes, Segmenting Data By Progressive Attributive, Models, Induction And Prediction, Supervised Segmentation, Visualizing Segmentations, Trees As Set Of Rules, Probability Estimation; Overfitting And Its Avoidance: Generalization, Holdout Evaluation Vs Cross Validation; Decision Analytics: Evaluating Classifiers, Analytical Framework, Evaluation, Baseline, Performance And Implications For Investments In Data; Evidence And Probabilities: Explicit Evidence Combination With Bayes Rule, Probabilistic Reasoning;
Python Libraries: Pandas, Numpy, Scipy, Scrapy, Plotly, Beautiful soup
Linux Programming: Installation (Ubuntu and CentOS), Basics of Linux,
Configuring Linux, Shells, Commands, and Navigation, Common Text Editors,
Administering Linux, Introduction to Users and Groups, Linux shell scripting,
shell computing, Introduction to enterprise computing, Remote access
Cloud Computing: Cloud Computing Basics, Understanding Cloud Vendors (AWS/Azure/GCP),
Definition, Characteristics, Components, Cloud provider, SAAS, PAAS, IAAS and
other Organizational scenarios of clouds, Administering & Monitoring cloud
services, benefits and limitations, Deploy application over cloud. Comparison
among SAAS, PAAS, IAAS, Cloud Products and Solutions, Cloud Pricing, Compute
Products and Services, Elastic Cloud Compute, Dashboard.
Database Concepts (File
System and DBMS), OLAP vs OLTP, Database Storage Structures (Table space,
Control files, Data files), Structured and Unstructured data, SQL Commands
(DDL, DML & DCL), Stored functions and procedures in SQL, Conditional
Constructs in SQL, data collection, Designing Database schema, Normal Forms and ER Diagram,
Relational Database modelling, Stored Procedures. The tools and how data can be
gathered in a systematic fashion, Data ware Housing concept, No-SQL,
Data Models - XML, working with MongoDB.
Python Programming: Python basics, If, If- else, Nested if-else,
Looping, For, While, Nested loops, Control Structure, Break, Continue, Pass,
Strings and Tuples, Accessing Strings, Basic Operations, String slices, Working
with Lists, Accessing list, Operations, Function and Methods, Files, Modules,
Dictionaries, Functions and Functional Programming, Declaring and calling
Functions, Declare, assign and retrieve values from Lists, Introducing Tuples,
Accessing tuples, Visualizing using Matplotlib, Seaborn, OOPs
concept, Class and object, Attributes, Inheritance, Overloading, Overriding,
Data hiding, Operations Exception, Exception Handling, except clause,
Try-finally clause, User Defined Exceptions, Data wrangling, Data cleaning
R Programming: Reading and Getting Data into R, Exporting Data from R, Data Objects-Data Types & Data Structure. Viewing Named Objects, Structure of Data Items, Manipulating and Processing Data in R (Creating, Accessing, Sorting data frames, Extracting, Combining, Merging, reshaping data frames), Control Structures, Functions in R (numeric, character, statistical), working with objects, Viewing Objects within Objects, Constructing Data Objects, Packages – Tidyverse, Dplyr, Tidyr etc., Queuing Theory, Interactive reporting with R markdown, Introduction to Rshiny.
OOPs Concepts, Data Types, Operators and
Language, Constructs, Inner Classes and Inheritance, Interface and Package,
Exceptions, Collections, Threads, Java.lang, Java.util, Java Virtual Machine.
Introduction to Big Data: Big Data - Beyond The Hype, Big Data Skills And Sources Of Big Data, Big Data Adoption, Research And Changing Nature Of Data Repositories, Data Sharing And Reuse Practices And Their Implications For Repository Data Curation
Hadoop: Introduction of Big
data programming-Hadoop, The ecosystem and stack, The Hadoop Distributed File
System (HDFS), Components of Hadoop, Design of HDFS, Java interfaces to HDFS,
Architecture overview, Development Environment, Hadoop distribution and basic
commands, Eclipse development, The HDFS command line and web interfaces, The
HDFS Java API (lab), Analyzing the Data with Hadoop, Scaling Out, Hadoop event
stream processing, complex event processing, MapReduce Introduction, Developing
a Map Reduce Application, How Map Reduce Works, The MapReduce Anatomy of a Map
Reduce Job run, Failures, Job Scheduling, Shuffle and Sort, Task execution, Map
Reduce Types and Formats, Map Reduce Features, Real-World MapReduce,
Hadoop Environment: Setting up
a Hadoop Cluster, Cluster specification, Cluster Setup and Installation, Hadoop
Configuration, Security in Hadoop, Administering Hadoop, HDFS – Monitoring
& Maintenance, Hadoop benchmarks
Apache
Airflow: Introduction to Data warehousing and Data lakes, Designing Data
warehousing for an ETL Data Pipeline, Designing Data Lakes for ETL Data
Pipeline, ETL vs ELT
Introduction & Programming
with Hive: Data warehouse system for Hadoop, Optimizing with Combiners and
Practitioners (lab), Bucketing, more common algorithms: sorting, indexing and
searching (lab), Relational manipulation: map-side and reduce-side joins (lab),
evolution, purpose and use, Case Studies on Ingestion and warehousing
HBase: Overview,
comparison and architecture, java client API, CRUD operations and security
Apache Spark APIs for
large-scale data processing: Overview, Linking with Spark, Initializing Spark,
Resilient Distributed Datasets (RDDs), External Datasets, RDD Operations,
Passing Functions to Spark, Job optimization, Working with Key-Value Pairs,
Shuffle operations, RDD Persistence, Removing Data, Shared Variables, EDA using
PySpark, Deploying to a Cluster Spark Streaming, Spark MLlib and ML APIs, Spark
Data Frames/Spark SQL, Integration of Spark and Kafka, Setting up Kafka Producer
and Consumer, Kafka Connect API, Mapreduce, Connecting DB’s with Spark
Business Intelligence-
requirements, content and managements, information Visualization, Data
analytics Life Cycle, Analytic Processes and Tools, Analysis vs. Reporting, MS
Excel: Functions, Formula, charts, Pivots and Lookups, Data Analysis Tool pack:
Descriptive Summaries, Correlation, Regression, Introduction to Power BI, Modern
Data Analytic Tools, Visualization Techniques.
Supervised and Unsupervised Learning, Uses of Machine
learning , Clustering, K means, Hierarchical Clustering, Decision Trees,
Classification problems, Bayesian analysis and Naïve Bayes classifier, Random
forest, Gradient boosting Machines, Association rules learning, PCA, Apriori,
Support vector Machines, Linear and Non liner classification, ARIMA, XG Boost, CAT Boost, Neural Networks
and its application, Tensor flow 2.x framework: Deep learning algorithms, KNN, NLP,
Bert in NLP, NLTK
Topic |
Reference Book |
No. of Questions |
English |
Any High School Grammar Book (e.g. Wren & Martin) |
35 |
Quantitative Aptitude & Reasoning |
Quantitative Aptitude Fully Solved (R. S. Aggrawal) Quantitative Aptitude (M Tyara) Barron’s New GRE 2016 |
|
Computer Fundamentals |
Foundations of Computing (Pradeep Sinha & Priti Sinha) |
40 |
Operating Systems |
Operating System Principles (Silberschatz, Galvin, Gagne) |
|
C Programming |
C Programming Language (Kernighan & Ritchie) Let Us C (Yashavant Kanetkar) |
|
Data Structures |
Data Structures Through C in Depth (S. K. Srivastava) |
|
OOP Concepts |
Test Your C ++ Skills (Yashavant Kanetkar) |