e-Brochure

Logo
Separator

PG Diploma in Big Data Analytics (PG-DBDA)

NSQF level: 8

C-CAT Candidate Login | Download Admission Booklet | FAQ

The theoretical and practical mix of the Post Graduate Diploma in Big Data Analytics (PG-DBDA) programme has the following focus:

  • To explore the fundamental concepts of big data analytics
  • To develop in-depth knowledge and understanding of the big data analytic domain.
  • To learn to analyze the big data using intelligent techniques.
  • To understand the various search methods and visualization techniques.
  • To learn to use various techniques for mining data stream.
  • To understand the applications using Map Reduce Concepts
  • To analyze and solve problems conceptually and practically from diverse industries, such as government manufacturing, retail, education, banking/ finance, healthcare and pharmaceutical.
  • To undertake consulting projects with significant data analysis component for better understanding of the theoretical concepts from statistics, economics and related disciplines.
  • To undertake industrial research projects for the development of future solutions in the domain of data analytics to make an impact in the technological advancement.
  • To use advanced analytical tools/ decision-making tools/ operation research techniques to analyze the complex problems and get ready to develop such new techniques for the future.
  • To learn Cloud Computing, accessing resources and services needed to perform functions with dynamically changing needs.
  • To understand the cloud privacy and security concepts to create secure cloud environment and module will explore various cloud platforms to implement real time cloud applications.
  • To learn Clustering, Parallel Programming and HPC Solutions and their applications.
  • Graduate in Engineering or equivalent (e.g. BE / BTech / 4-year BSc Engg / AMIE / DoEACC B Level) in IT / Computer Science / Electronics / Telecommunications / Electrical / Instrumentation.   OR
  • Post Graduate Degree in Engineering Sciences with corresponding basic degree (e.g. MSc in Computer Science, IT, Electronics)    OR
  • Graduate in any discipline of Engineering or equivalent,    OR
  • Post Graduate Degree in Mathematics / Statistics / Physics / MBA Systems,   OR
  • MCA

Note: The candidates must have secured a minimum of 55% marks in their qualifying examination



The total fees of the course is Rs. 1,00,000/- +Tax (14% service tax + 0.5% Swachh Bharat cess + 0.5% Krishi Kalyan cess  18% GST).

The course fees has to be paid in two installment as per the schedule.
  • First installment is Rs. 10,000/- +Tax  (14% service tax + 0.5% Swachh Bharat cess + 0.5% Krishi Kalyan cess     18% GST).
  • Second installment is Rs. 90,000/- + Tax  (14% service tax + 0.5% Swachh Bharat cess + 0.5% Krishi Kalyan cess 18% GST).
  

Probability & Statistics: Introduction to Statistics- Descriptive Statistics, Summary Statistics Basic probability theory, Statistical Concepts (uni-variate and bi-variate sampling, distributions, re-sampling, statistical Inference, prediction error), Probability Distribution (Continuous and discrete- Normal, Bernoulli, Binomial, Negative Binomial, Geometric and Poisson distribution) , Bayes’ Theorem, Central Limit theorem, Data Exploration & preparation, Concepts of Correlation, Regression, Covariance, Outliers etc.

R Programming: Introduction & Installation of R, R Basics, Finding Help, Code Editors for R, Command Packages, Manipulating and Processing Data in R, Reading and Getting Data into R, Exporting Data from R, Data Objects-Data Types & Data Structure. Viewing Named Objects, Structure of Data Items, Manipulating and Processing Data in R (Creating, Accessing , Sorting data frames, Extracting, Combining, Merging, reshaping data frames), Control Structures, Functions in R (numeric, character, statistical), working with objects, Viewing Objects within Objects, Constructing Data Objects, Building R Packages, Running and Manipulating Packages, Non parametric Tests- ANOVA, chi-Square, t-Test, U-Test, Introduction to Graphical Analysis, Using Plots(Box Plots, Scatter plot, Pie Charts, Bar charts, Line Chart), Plotting variables, Designing Special Plots, Simple Liner Regression, Multiple Regression

  

Introduction to Python, Basic Syntax, Data Types, Variables, Operators, Input/output, Flow of Control (Modules, Branching), If, If- else, Nested if-else, Looping, For, While, Nested loops, Control Structure, Break, Continue, Pass, Strings and Tuples, Accessing Strings, Basic Operations, String slices, Working with Lists, Introduction, Accessing list, Operations, Function and Methods, Files, Modules, Dictionaries, Functions and Functional Programming, Declaring and calling Functions, Declare, assign and retrieve values from Lists, Introducing Tuples, Accessing tuples

Advanced Python: Object Oriented, OOPs concept, Class and object, Attributes, Inheritance, Overloading, Overriding, Data hiding, Operations Exception, Exception Handling, Except clause, Try finally clause, User Defined Exceptions

Python Libraries

Introduction to Machine learning packages like NUMPY, SCIPY, PANDAS etc.

  

Linux History and Operation, Installing and Configuring Linux, Shells, Commands, and Navigation, Common Text Editors, Administering Linux, Introduction to Users and Groups, Linux shell scripting

Java with Scala

80 Hours  
  

Data Types, Operators and Language, Constructs, Inner Classes and Inheritance, Interface and Package, Exceptions, Threads

Introduction, Unified Types, Classes, Traits, Mixin Class Composition, Anonymous Function Syntax, Higher-order Functions, Nested Functions, Currying, Case Classes, Pattern Matching, Singleton Objects, XML Processing, Regular Expression Patterns, Extractor Objects, Sequence Comprehensions, Generic Classes

  

Introduction to Cloud Computing: Definition, Characteristics, Components, Cloud provider, SAAS, PAAS, IAAS and other Organizational scenarios of clouds, Administering & Monitoring cloud services, benefits and limitations, Deploy application over cloud. Comparison among SAAS, PAAS, IAAS, Cloud computing platforms: Infrastructure as service: Amazon EC2, Platform as Service: Google App Engine, Microsoft Azure Utility Computing, Elastic Computing, SLA, clusters, cloud analytics, challenges of cloud environment, HPC in the cloud

Parallel Processing Concepts: Physical Organization and building blocks of High Performance Computing Systems, Processors and Multi-Core Architectures, Vector processing, Super-scalar, In-order execution, Instruction-Level Parallelism etc., FMA, 32 and 64 bit types, ISA, Accelerators such as GPGPUs and Xeon Phi. Threads and Processes, Multi-processing OS, Parallel I/O, General concepts

Parallel Programming Models and Parallel Algorithms Design: Application domains of HPC, Decomposition Techniques: Data parallelism, Functional parallelism, Divide and Conquer etc., Characteristics of Tasks and Interactions, Mapping Techniques for Load Balancing, Methods for Containing Interaction Overheads, Granularity of parallelism, Programming OpenMP

  

Database Concepts (File System and DBMS), Database Storage Structures (Tablespace, Control files, Data files), Structured and Unstructured data, SQL Commands (DDL, DML & DCL), Dataware Housing concept , No-SQL, Data Models - XML, working with MongoDB),

Tools - OLTP and OLAP, data preparation and cleaning techniques

  

Introduction to Big Data- Big data definition, enterprise / structured data, social / unstructured data, unstructured data needs for analytics, What is Big Data, Big Deal about Big Data, Big Data Sources, Industries using Big Data, Big Data challenges.

Hadoop: Introduction of Big data programming-Hadoop, History of Hadoop, The ecosystem and stack, The Hadoop Distributed File System (HDFS), Components of Hadoop, Design of HDFS, Java interfaces to HDFS, Architecture overview, Development Environment, Hadoop distribution and basic commands, Eclipse development, The HDFS command line and web interfaces, The HDFS Java API (lab), Analyzing the Data with Hadoop, Scaling Out, Hadoop event stream processing, complex event processing, MapReduce Introduction, Developing a Map Reduce Application, How Map Reduce Works, The MapReduce Anatomy of a Map Reduce Job run, Failures, Job Scheduling, Shuffle and Sort, Task execution, Map Reduce Types and Formats, Map Reduce Features, Real-World MapReduce,

Hadoop ETL: Hadoop ETL Development, ETL Process in Hadoop, Discussion of ETL functions, Data Extractions, Need of ETL tools, Advantages of ETL tools.

Hadoop Reporting Tools: Jaspersoft (reporting and analytics server), Pentaho (data integration and business analytics), Splunk (platform for IT analytics), Talend (big data integration, data management and application integration)

Introduction to Pig and HIVE- Programming Pig: Engine for executing data flows in parallel on Hadoop, Programming with Hive: Data warehouse system for Hadoop, Optimizing with Combiners and Partitioners (lab), More common algorithms: sorting, indexing and searching (lab), Relational manipulation: map-side and reduce-side joins (lab), evolution, purpose and use, HDFS – Overview and concepts, data flow (read and write), interface to HDFS (HTTP, CLI and Java API), high availability and Name Node federation, Map Reduce developing and deploying programs, optimization techniques, Map Reduce Anatomy, Data flow framework programming Map Reduce best practices and debugging, Introduction to Hadoop ecosystem, integration R with Hadoop

Hadoop Environment: Setting up a Hadoop Cluster, Cluster specification, Cluster Setup and Installation, Hadoop Configuration, Security in Hadoop, Administering Hadoop, HDFS – Monitoring & Maintenance, Hadoop benchmarks, Hadoop in the cloud.

Introduction to Apache Spark and Use Cases

Apache Spark APIs for large-scale data processing: Overview, Linking with Spark, Initializing Spark, Resilient Distributed Datasets (RDDs), External Datasets, RDD Operations, Passing Functions to Spark, Working with Key-Value Pairs, Shuffle operations, RDD Persistence, Removing Data, Shared Variables, Deploying to a Cluster

Apache Phoenix: Apache Phoenix Overview, Need of Phoenix, Features, Installation and Configurations, Views and Multi Tenancy, Secondary Indexes, Joins, Query Optimizations, Roadmap of Phoenix.

  

Information Visualization, Data analytics Life Cycle, Analytic Processes and Tools, Analysis vs. Reporting, Modern Data Analytic Tools, Visualization Techniques, Visual Encodings, Visualization algorithms, Data collection and binding, Cognitive issues, Interactive visualization, Visualizing big data – structured vs unstructured, Visual Analytics, Geomapping, Dashboard Design 

  

Introduction to Business Analytics using some case studies, Making Right Business Decisions based on data, Exploratory Data Analysis - Visualization and Exploring Data, Descriptive Statistical Measures, Probability Distribution and Data, Sampling and Estimation, Statistical Interfaces, Predictive modeling and analysis, Regression Analysis, Forecasting Techniques, Simulation and Risk Analysis, Optimization, Linear, Non linear, Integer, Decision Analysis, Strategy and Analytics

Overview of Factor Analysis, Directional Data Analytics, Functional Data Analysis

  

Supervised and Unsupervised Learning , Uses of Machine learning , Clustering, K means, Hierarchical Clustering, Decision Trees, Oblique trees, Classification problems, Bayesian analysis and Naïve bayes classifier, Random forest, Gradient boosting Machines, Association rules learning, Apriori and FP-growth algorithms, Support vector Machines, Linear and Non liner classification,  ARIMA, ML in real time, Neural Networks and its application, Neural Net & its applications

     
     

Project

120 Hours  
   Software: A Process, Various Phases in s/w Development , Software life cycle agile model (Self Study of other models), Introduction to Coding Standards, Software Quality Assurance

After completing this courses students shall be expert in following things:

  • Big Data tools, Big Data Querying Tools, such as Pig, Hive, and Impala, park
  • Integration of data from multiple data sources
  • NoSQL databases, such as Hbase, MongoDB
  • Knowledge of various ETL techniques and Implementing ETL Process Monitoring performance and advising any necessary infrastructure changes
  • Developing reports, dashboards using D3 & Tableau
  • Big Data Development
  • Business Analyst
C-DACs - Advanced Computing Training School
Address
:
No.1, Old Madras Road Above Bank of Mysore, Near NGEF Bengaluru
Karnataka 560038
Telephone
:
66116400 /01/02/03
Contact Person
:
Binu George & M Savithri
Fax
:
080-25247724
e-Mail
:
actsb[at]cdac[dot]in
Courses
:
PG-DAC, PG-DESD, PG-DMC, PG-DSSD, PG-DBDA, PG-DIoT, PG-DHPCSA
C-DACs - Advanced Computing Training School
Address
:
Plot E 2/1, Blok - GP, Sector - V, Saltlake Electronics Complex, Bidhannagar, Kolkata
West Bengal 700091
Telephone
:
033 2357 5989 / 9846
Contact Person
:
Dr. Amit Chaudhuri
Fax
:
033 23575141
e-Mail
:
amit[dot]chaudhuri[at]cdac[dot]in
Courses
:
PG-DAC, PG-DBDA
C-DACs - Advanced Computing Training School
Address
:
B-30, Sector 62, Institutional Area, Noida
Uttar Pradesh 201307
Telephone
:
0120-3063371-73
Contact Person
:
Mr. V.K. Sharma
Fax
:
0120-3063374
e-Mail
:
cdacacts-noida[at]cdac[dot]in
Courses
:
PG-DAC, PG-DVLSI, PG-DGi, PG-DESD, PG-DMC, PG-DITISS, PG-DBDA
C-DAC Advanced Computing Training School
Address
:
C-DAC ACTS, 5th Floor, Innovation Park, Sr. No. 34/B/1, Panchvati, Pashan, Pune
Maharashtra 411008
Telephone
:
18008430222
Contact Person
:
Mr. Parimal Wagh
Fax
:
NA
e-Mail
:
acts[at]cdac[dot]in
Courses
:
PG-DAC, PG-DVLSI, PG-DESD, PG-DMC, PG-DITISS, PG-DBDA, PG-DIoT, PG-DHPCSA
Institute for Advanced Computing and Software Development
Address
:
Dr. D.Y. Patil Educational Complex, Sector 29, Near Akurdi Railway Station, Pradhikaran, Nigdi Pune
Maharashtra 411044
Telephone
:
020 – 27659509, 27652794
Contact Person
:
Wg. Cdr. (Retd.) P.V.C. Patil
Fax
:
020 –27650229
e-Mail
:
iacsd[at]vsnl[dot]com
Courses
:
PG-DAC, PreDAC, DASDM, PG-DBDA
Knowledge-Divine Information Technology Pvt. Ltd.
Address
:
1278, Second Floor, Suyog Plaza, Above MacDonalds Near Sai Petrol Pump, J.M. Road, Deccan. Pune
Maharashtra 411004
Telephone
:
020 20250898
Contact Person
:
Mr. Nachiketas Bhatkar, CEO
Fax
:
020 41051819
e-Mail
:
contact[at]know-it[dot]co[dot]in
Courses
:
PG-DAC, PreDAC, DASDM, PG-DBDA
Sunbeam Institute of Information Technology
Address
:
Sunbeam, Plot No.R/2, Behind Hotel Fulora Market Yard Road, Gultekadi Pune
Maharashtra 411037
Telephone
:
020-24260308, 24272383
Contact Person
:
Mr. Nitin Kudale, C.E.O.
Fax
:
020 –24260308
e-Mail
:
siit[at]sunbeaminfo[dot]com
Courses
:
PG-DAC, PG-DESD, PG-DMC, PG-DBDA

Q. What is the Eligibility for PG-Diploma in Big Data Analytics?  
A: The eligibility Criteria for PG-DBDA is Candidate holding any one of the following degrees

  • Graduate in Engineering or equivalent (e.g. BE / BTech / 4-year BSc Engg / AMIE / DoEACC B Level) in IT / Computer Science / Electronics / Telecommunications / Electrical / Instrumentation.   OR
  • Post Graduate Degree in Engineering Sciences with corresponding basic degree (e.g. MSc in Computer Science, IT, Electronics)    OR
  • Graduate in any discipline of Engineering or equivalent,    OR
  • Post Graduate Degree in Mathematics / Statistics / Physics / MBA Systems,   OR
  • MCA

Note: The candidates must have secured a minimum of 55% marks in their qualifying examination

Q: What is the selection criterion?  

A: The selection process consists of a C-DAC Common Admission Test (C-CAT).

Q: What is Fee of course? 

A: The fees for the PG-DBDA course is Rs. 1, 00,000/- (Rupees One Lakh only) plus 18 % GST. 

Q: When the course does commence?  

A: The Course commences twice in the year i.e. February & August. Admission Process starts in the month of November & May for the respective batches.

 Q: Duration of the course?  

A: It’s 24 weeks full-time course

Q: Infrastructure Facilities available?  
A: Fully equipped classrooms with adequate capacity to accommodate students and state-of-art labs to explore your computing skills

Q: Hostel & Canteen facility available?  
A: Accommodation for out station candidates is facilitated by some of centres. Please refer Admission Booklet.
 
Q: Bank loan assistance for the other centres?  
A: Facility of educational loans is available for the selected candidates, which will be provided by Nationalized banks only.

Q: Revision of the course contents, is it every six months?  
A: The course contents are revised according to the real world needs and when found relevant to emerging trends.

Q: Do you have centralized placement cell?  
A: Yes. We do have a Centralized Placement Programme where the respective centre actively participates to organize the campus interviews for all the students.
 
Q: What is the value of the course in the international market?  
A: The course has been a trend-setting course due to its unique curriculum and the opportunities that it generates; hence it gives the edge over for the students and gives an international edge.