Course Outline


Pre-requisites

  • The participants must be comfortable with programming constructs in Python
  • Basic algebraic concepts is a pre-requisite, knowledge of statistics is preferable

Duration

  • 5 Days

Lab Setup

  • Hardware Configuration
    • All participants must have a laptop with Internet connectivity
    • At least 4GB of RAM and 20GB of free hard-disk space
  • Software Configuration
    • Install Python and R before the start of the training

Course Outline for Data Science and Machine Learning

Some Pre-requisites
  • Linear Algebra
  • Probability
  • Probability Distribution
Overview of Data Science and Machine Learning
  • What is Data Science and Data Analysis?
    • Cleaning up the data before analysis
    • Data Visualization
  • What is Machine Learning?
    • Supervised v/s unsupervised learning
    • Aritificial Intelligence
    • Deep Learning
Some Basic Machine Learning Concepts
  • Features, Labels and Classifiers
  • Supervised Learning Algorithms
    • Naive Bayes
    • Decision Trees
    • Support Vector Machines
    • Kernel Trick
    • Principal Component Analysis
  • Unsupervised Learning
    • Clustering algorithms
    • K-means
  • Neural Networks
Applications of Machine Learning
  • Spam Detection
  • Recommendation System
  • Handwriting Recognition
  • Face Recognition
Natural Language Processing
  • Working with unstructured data
  • Working with HTML data, scraping
  • Token
  • Introduction to BNF
  • Regular Expression concepts
Technologies used in Machine Learning
  • Octave
  • Matlab
  • Python
  • R
  • Julia
Python - Machine Learning Libraries
  • numpy
  • sklearn
  • scipy
  • textblob
  • nltk
  • textblob
  • matplotlib
  • pandas
  • Jupyter
  • Tensorflow
  • Keras
Overview of sklearn
  • Naive Bayes
  • Decision Trees
  • Support Vector Machines
  • Kernel Trick
  • Principal Component Analysis
  • Clustering algorithms
  • K-means
  • Neural Networks
  • Persisting the models using pickle
Overview of Pandas
  • Data Structures - Series, Data Frames
  • Importing, Analysing and Exporting data with Pandas
  • Descriptive Stastics
  • Aggregation APIs
  • Transform APIs
  • Iteration on data structures
  • Working with text data
Introduction to nltk and textblob
  • Tokenizer
  • POS tagger
  • Text classification
  • Vectorizer
  • tf-idf
  • Sentiment Analysis with textblob - a case study
Introduction to R
  • Installing R and R Studio
  • Atomic Data Types
  • Control Structures in R
  • Functions in R
  • Vectors and Lists
  • Matrices
  • Data Frames
  • Operations on Vectors, Lists, Matrices and Data Frames
  • Loading, analysing and storing your data
  • Basic statistical functions in R
  • Data Transforms in R using dplyr
  • Machine Learning in R
  • Plotting your analysis
Neural Networks
  • Perceptrons
  • Sigmoid Neurons
  • Gradient Descent
  • Back propagation
  • Introduction to Deep Learning
Jupyter
  • Installing Jupyter Notebook
  • Running a Notebook server
  • Notebook basics
  • Using Jupyter with R (R Kernel)
  • Securing your Jupyter notebook
Data Visualization
  • Need for visualization
  • Distribution of one variable - Histogram, Density plot
  • Distribution of multiple variables - Heat map, Surface plots
  • Distribution summary using Box plot, Violin plot
  • Visualization libraries in Python
    • matplotlib
    • Seaborn
    • ggplot and ggplot2
    • Altair
  • Visualization with R

The classroom training will be provided in Bangalore (Bengaluru), Chennai, Hyderabad or Mumbai and will be conducted in the client's premises. All the necessary hardware/software infrastructure must be provided by the client.