Get started with this applied course focusing on the building blocks of analytics and statistics. The multi dimensional course provides 12 hands on data cases across different domains and techniques. Beginner Level course which would take approx. 61 Hours to complete
Linear Regression is one of the most widely leveraged techniques, for building future predictions and forecasts. The applied courses focuses on the different aspects of building and evaluating a robust linear regression model, on real world datasets. The course provides 6 hands on data cases with guided approach for building appropriate solutions. Advance Level, Approx. 23 Hours to complete
Logistic Regression is one of the most widely applied statistical techniques across businesses and verticals. The successful completion of the applied course, will equip the learner to build and evaluate a robust logistic model on a real world dataset to solve a business objective. Advance Level Approx. 20 Hours to complete (5 Hours per week)
The applied course focuses on the different concepts of segmentation via some of the most commonly used supervised and unsupervised algorithms. The data cases discussed within the course are based on real world business problems with a guided approach for building appropriate solutions. Advance Level Approx. 20 Hours to complete (5 Hours per week)"
This applied course focuses on the fundamentals of text data mining like cleansing, Treatment and Visualization of Text Data, which is a kind of must to know for text data analytics. With the help of Real-world datacases based on unstructured text data you get familiarized with the concepts through a guided approach.
This applied course, first one in the series, focuses on the building blocks for any time series analysis, exposing the learner on application of different visual and statistical techniques applied on real world datasets. Upon successfully completion of the course the leaner would have a thorough understanding of the different concepts and experience on their application on data. Beginner Level Approx. 15 Hours to complete (5 Hours per week)
Building a deeper understanding about the data is a very crucial step of any Data Science project. Without understanding the data well you can never draw actionable and impact insights from the data or build Predictive Solutions.This course discusses the different techniques of data mining through which you can build a better understanding about the your data and use it effectively in problem solving.
Today as many people utilize social media and electronic channels to convey their opinions , it has become crucial for businesses to assess the sentiment behind the opinions and act accordingly. So in recent times Sentiment Analysis as a technique has gained a lot of popularity due to its wide ranging application across domains. In this course alongwith other basic text analytics concepts , you will also be introduced to Sentiment Analysis and how one can draw interpretations through analysis of the identified sentiments of a text corpus
NLP and Text Analytics are two of the new age domains sprouting from the advent of the different forms of data generated in the digital world. This applied course provides the learner a thorough understanding of Text Clustering and Classification techniques, with hands on application of those techniques on real world datasets. The structure of the course exposes the learner not only to identify underlying themes in a text dataset through clustering methods, though also enables the learner in building predictive text algorithms through classification techniques. Advance Level Approx. 25 Hours to complete (5 Hours per week)
First among the 3 applied courses focused on Application of Analytics in Banking and Financial Services talks about the vital importance of Data Analytics in the Customer Acquisitions Functions. Through its 6 Datacases it will introduce you to application of Analytics to some of the key problems in Acquisitions.
Second among the 3 applied courses focused on Application of Analytics in Banking and Financial Services talks about the vital importance of Data Analytics in the Customer Engagement Management Functions. Through its 5 Datacases it will introduce you to application of Analytics to some of the key problems in Customer Engagement function
Third among the 3 applied courses focused on Application of Analytics in Banking and Financial Services talks about the vital importance of Data Analytics in the Customer Retention Management Functions. Through its 5 Datacases it will introduce you to application of Analytics to some of the key problems in Customer Retentionfunction
Hyperparameter tuning refers to the task of choosing the optimal combination of parameters that can only be learned before the training process begins. When the performance of the algorithm is not satisfactory, hyperparameter tuning is used to tweak certain parameters for a performance boost. In this course, we will focus on hyperparameter tuning for Linear and Logistic Regression through a hands-on approach and will deep-dive into the understanding of various hyperparameters associated with the algorithms.
Get started with this applied course focusing on the introduction to data preprocessing. This quick course provides opportunities of learning through reading material, Business case and a quiz to test your understanding. Beginner Level Approx. 1 Hour to complete
This course will help you understand the need of explainable AI (XAI) and introduce you to LIME, which is one of techniques for XAI. Learn how LIME can be applied on a real world datacase and how the results can be presented in a human interpretable format. Take the quiz to validate your understanding and then solve a datacase based on the learnings.
This course helps you to understand different types of basics of summary and distribution charts which are majorly used to represent the data in a visible format to reveal the trends and patterns. Learn how and when to use different types of charts to present the actionable insights.
Analysis of Variance (ANOVA) is a statistical method to test difference between two or more means. This applied course focuses on the concepts of ANOVA , its types and applications by using business cases. Basic Level
This course will help you understand the need of explainable AI (XAI) and introduce you to SHAP, which is one of techniques for XAI. Learn how SHAP can be applied on a real world datacase and how the results can be presented in a human interpretable format. Take the quiz to validate your understanding and then solve a datacase based on the learnings.
Gradient Boosting Classification is a machine learning technique, attracting attention for its prediction and accuracy. From Machine Learning competitions to data science businesses gradient boosting is providing best-in-class results. It uses an ensemble of weak prediction models for building a classification model. The model is built in a sequential process, where each new step tries to minimize the errors of the previous step. This course will provide you an in-depth understanding of the algorithm along with its practical use case.
Extreme Gradient Boosting is an optimized machine learning algorithm designed for better speed and performance. It is an extension of gradient boosted decision trees. Since its inception in 2014, the algorithm has achieved high performances on structured data and dominated other algorithms because of its flexibility. This course covers the end-to-end walkthrough of the algorithm.
CatBoost (short for Categorical Boosting) is a recently developed machine learning algorithm that helps data scientists and machine learning engineers in automating their column transformations and provides best-in-class accuracy. It can easily integrate with different deep learning frameworks and can work on diverse data types. It solves a variety of business problems that earlier required extensive data preparation.
This course will give you a brief introduction to the concept and applications of one of the most important and commonly used model validation techniqes: Gain and Lift Charts.
At present, when it comes to domains such as machine learning, data science, web development and others, Python seems to be the most favored choice of programming language. But for these domains, efficient storage and accessibility of data plays an important role. This is where Python Data Structures comes in handy. Data Structures help us in organizing, managing and processing data in an effortless manner.
NumPy (Numerical Python) is a library for scientific computation in Python programming language. It provides easy to use multi-dimensional arrays designed for performing high-level computations at a higher speed. Some other important libraries such as Pandas, matplotlib, scikit-learn are built on top of NumPy library making it an essential tool in the Data Science domain. This course is a deep-dive in understanding the structure and working of the NumPy arrays. Through this course, a solid foundation can be developed for NumPy library.
Pandas is a Python Library that is currently one of the most important tools for data analysis and manipulation. Its key data structure is DataFrame which enables you to represent your data in a tabular format. It provides several essential functionalities that makes working with data simpler. Whether you are an entry-level or an experienced professional in Data Science, Pandas may always be your solution to data analysis problems.
Get started with this applied course focusing on the introduction to data preprocessing. This quick course provides opportunities for learning through reading material, the Business case using Python, and a quiz to test your understanding.
Linear Regression is one of the most widely leveraged techniques, for building future predictions and forecasts. The applied course focuses on writing a python code for building and evaluating a robust Simple linear regression model, on a real-world dataset. The course provides the hands-on case with a guided approach for building appropriate solutions.
Understanding and cleaning the data set before starting with the analysis is an important step while solving any problem. Through this case study, learn how to apply the data cleaning & data treatment techniques on a real business problem
Selecting the right features and building a model is an iterative process and part of any model building activity. As part of this case study, we will learn how to scale the features, perform residual analysis, avoid multicollinearity, and pick only the significant features to build a linear regression model.
Understanding the attrition habits of their customers is quite essential to devise new strategies for improving customer retention. Through this case study learn how to apply Artificial Neural Networks in classification problems.
The first step in the machine learning life cycle is to understand & process the data so that it can be passed on to a machine learning algorithm. In this case study we will learn how understand the data, visualize the data, clean the data and create few derived variables.
This case study deals with the comparison of marks of students in a particular school using the statistical concept of hypothesis testing.
Understanding the expenditure habits of their customers is quite essential to devise new strategies for improving spend growth. Through this case study learn how to apply Artificial Neural Networks in regression problems.
Proper Hyperparameter tuning is essential for the successful implementation of Support Vector Machines for both classification and regression tasks. SVM offers some predominant hyperparameters that should be fixed before the training process and have an immense effect on the performance of their predictive models. That being said, hyperparameter tuning for SVM is a trivial task. In this course, we will study the concepts of Hyperparameter tuning for SVM in detail along with extensive hands-on practice.
Tree-Based Models are considered as key algorithms in supervised machine learning. They are simple to train and easy to visualize. They provide high flexibility in training as there are many hyperparameters to tune. In this course, our focus will be to enhance the performance of tree-based models by the use of hyperparameter tuning. We will deep dive into the understanding of the hyperparameters and methodologies for hyperparameter tuning.
One of the most essential phases of the classification machine learning algorithms is to understand the ways to evaluate the performance of the model built. This is one of the tasks that are often difficult to interpret and sometimes not paid attention to. In this course, we will see each of the performance measures that will help you to quantify the performance of the model and decide which algorithm will be most suited according to the problem at hand.
Machine Learning algorithms have been proven extremely beneficial for regression tasks. However, there is one question that is often asked. Which algorithm is suited for the particular problem or Do you think the model built is performing well? In this course, we will learn different performance measures that will help you to answer this question. We will see each performance measure used to evaluate a regression model in detail which will help us to select the top-performing model for a particular problem.
Neural Networks are a series of algorithms composed of artificial neurons, that are capable of recognizing the underlying principle from the set of data in a way similar to the human brain. In the past decade, neural networks have laid the foundation for a new domain called Deep Learning that has been used to develop some of the most intelligent machines capable of performing complex tasks such as object recognition, speech translation and recognition, robotics, and many more. This course is designed to give a complete understanding of the most fundamental aspects of neural networks.
First among the 3 applied courses focused on Application of Optimization in Analytics talks about the vital importance of General Optimization Problems in the Business Functions. Through its 3 Datacases it will introduce you to application of three different optimization algorithms to some of the key problems in finding optimal parameters for business settings
Second among the 3 applied courses focused on Application of Analytics by Optimization talks about the vital importance of Shortest Path Problem and Minimum Spanning Tree in the Network Flow Optimization Problems. Through its 2 Datacases it will introduce you to application of Analytics to some of the key problems in Network FLow Optimization
Third among the 3 applied courses focused on Application of Analytics by Optimization talks about the vital importance of several specialized optimization problems such as transportation problem, knapsack problem and travelling salesman problem. Through its 5 datacases it will introduce you to application of analytics to some of the key optimization problems.
Decision Trees are classic non-parametric supervised learning algorithms used for both classification and regression tasks. The decision tree splits the dataset multiple times based on a series of decisions efficiently representing the data and making predictions. They are one of the most widely used algorithms for data mining as it is easy to interpret the decisions made by the algorithm and this reason has made it one of the favourite algorithms of Data Science professionals as well.
Naive Bayes Classifiers are a family of probabilistic algorithms that are based on the Bayes Theorem with an assumption of conditional independence between the features of the dataset. Their robustness and fast execution are widely known. Despite its simplicity, they have wide-ranging applications such as Text classification, Spam Detection, medical diagnosis, and robot sensing to name a few.
Training a dataset with a large number of features can be sufficiently slower with the classical machine learning or deep learning algorithms. Principal Component Analysis or PCA is an unsupervised machine learning algorithm that is able to summarize information of large datasets into a smaller set that makes it easy to train and visualize data. This course provides an in-depth understanding of the hands-on application of PCA.
When we picture Artificial Intelligence, we often picture ourselves talking to a machine. In fact, we interact with personal assistants (Siri, Cortana, Alexa) on a daily basis and give them instructions to perform tasks. This functionality has been enabled through the domain of NLP. NLP is concerned with the interactions between computer and human languages. Machine Learning algorithms are applied to text and speech data in NLP.
Raw or source data is often inconsistent, vague, and duplicative in nature. Therefore, Data transformation is the process that helps convert this raw structure to another structure that can help you summarize and discover knowledge. The Data Transformation process is essential to any business, especially when there is a need to maintain data quality to deliver recommended Business Intelligence and a requirement to change the structure to a more readable format, enabling effective data analysis.
Data grows exponentially every day and it is becoming an essential factor for driving businesses. Huge amounts of data are being generated and from different sources. Hence, it is essential to perform data aggregation to identify patterns and trends in the data which otherwise would not have been recognized in the data from a single source. This course will enable you to understand different data aggregation methods that are being used on a regular basis in any data-based decision-making.
Feature Engineering is a way of extracting features from data and transforming them into formats that are suitable for Machine Learning algorithms. The performance of a ML algorithm can be improved considerably with correct feature engineering. In this course we will understand some of the common feature engineering techniques and learn to implement them in Python.
Multiple Linear Regression is an extension of Simple Linear regression as it takes more than one predictor variable to predict the response variable. The applied course focuses on writing a python code for building and evaluating a robust Multiple linear regression model, on a real-world dataset. The course provides the hands-on case with a guided approach for building appropriate solutions.
Exploratory data analysis (EDA) is done by data scientists to analyze, explore, summarise and visualize some of the key characteristics in the data. Univariate analysis is a kind of EDA that involves analyzing one variable at a time. In this course, we will explore some of the commonly applied univariate analysis on both continuous data and categorical data. We will also understand how to implement it in Python and interpret the results.
The primary data visualization library in Python is Matplotlib, that was developed to mimic the plots and capabilities of MATLAB. Matplotlib is hugely capable of plotting various plots and gives tremendous capabilities to control every aspect of plotting surface. More so, libraries such as Pandas and seaborn rely on matplotlib for their plotting needs. This course will provide you theoretical as well as practical knowledge for working with matplotlib.
Predicting risk of heart diseases can prove useful if they can help preventing them. In this case study we will learn the steps to build a binary classification model using logistic regression with the data collected from different patients. We will explore data visualization, data normalization and feature importance on the way to build the optimal model.
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Seaborn provides an API on top of Matplotlib that offers more choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrames. When it comes to building advanced and engaging plots, seaborn will be your best choice.
Control Flow in Python is the order in which the program code executes. It is managed by conditional statements, loops and built-in function calls. Control flow techniques are essential in Python as real-world problems are full of situations and to solve the problem using programming you need to mimic the situations closely. This requires you to control the program execution sequence and hence, Control the flow.
Functions are an essential part of the Python programming language. Functions are used to bundle a set of instructions that can be used repeatedly when required. There are three major types of functions i.e., Built-in functions, User defined functions and anonymous functions. This course will is a deeper dive into exploration of Python functions. As a Data Scientist, you’ll constantly need to write your own functions to solve problems that your data poses to you.
Python programming language is extremely powerful and widely used to perform complex tasks in a simpler manner. This makes Python a good skill to have for any job that may require working with data files, excel spreadsheets, or scrape data from web pages. By developing simple and modular scripts, you can understand how to use common scripting language constructs to build useful applications. In this course, you will learn how to develop a fully functional program using industry-relevant tools.
Multivariate data analysis techniques normally show the relationship between two or more variables. It is important to understand how the variables are correlated with each other before entering into the feature engineering phase. In this course, we will understand some of the commonly applied EDA techniques with multiple variables and also learn to implement them in Python.
Feature engineering is an integral step before model building as it improves the model performance and the algorithm can interpret the variables without bias or error. This case study will leverage such techniques on a credit card fraud dataset.
Feature Engineering is the process of using domain knowledge to identify or create the important features that have the strongest predictive power. It is considered one of the fundamental tasks for improving machine learning model performance and prediction accuracy. In this course, you’ll learn a number of techniques for selecting the most important features for model prediction.
Hypothesis testing is one of the fundamental aspects of statistics, as well as data science. It is widely used to make conclusions about a population, based on a much smaller sample. It is also used to test certain assumptions of several machine learning techniques. This course focuses on writing a python code for various Inferential and Statistical Tests, on a real-world dataset. The course provides the hands-on case with a guided approach for building appropriate solutions.
The course on ‘Model Selection Techniques’ introduces the concept of various techniques used for selecting better predictive models based on the different number of available models. It helps in differentiating the models based on different criterions available in the data science theory. Model selection uses important principles of data analytics and helps in selecting better model, thereby improving predictability of the model.
Support Vector Machines (SVM) fall in the category of supervised learning algorithms in machine learning. Theoretically speaking, the implementation of the algorithm from scratch in Python programming language is a difficult task. Most data science professionals utilize the packages such as sklearn to implement the algorithms with ease. This course will utilize python programming language to implement the SVM algorithm and its concepts for classification and regression tasks.
Rooting in Statistical learning, Support Vector Machine (SVM) is an exciting algorithm with relatively simpler concepts. It is usually considered to be a classification algorithm but can be employed for both classification and regression tasks. Widely known for its kernel trick, it offers better generalization than logistic regression or the decision trees for the unseen data.
Logistic Regression is one of the most widely applied statistical techniques across businesses and verticals. The successful completion of the applied course will equip the learner to build and evaluate a robust logistic model on a real-world dataset to solve a business objective.
Naïve Bayes is a supervised machine learning algorithm that is probabilistic in nature. It is based on the famous Bayes theorem and uses conditional probability to make predictions. This allows the algorithm to be super fast while at the same time being simple. Naïve Bayes algorithm is suited in cases of high dimensional datasets or large datasets. This course is a deep-dive into understanding and application of the algorithm
Overfitting is a common phenomenon in real-world data. Regularization is the technique that helps deal with overfitting problem in machine learning. Regularization is used to put additional constraints on the loss function of the model. It introduces bias in the model by reducing complexity but the variance is lowered as well. In this course, the concepts of regularization, Bias-variance tradeoff are discussed explaining the fit of the machine learning model.
Tree-based models are a family of supervised machine learning algorithms in Machine Learning that enable higher predictive power and better stability. Unlike the linear models, they can be used efficiently with non-linear data. Tree-based models are adaptive to all sorts of data science problems and you will find these models being used in almost all organizations leveraging machine learning. This course is meant to discuss the fundamental concepts of tree-based models.
Bagging is a technique in which predictions of multiple machine learning models are aggregated to make the final prediction by either majority rule or aggregation function. One of the most useful algorithms based on the bagging ensemble technique is Random Forest. Through this course, let us understand the components of bagging and random forest and apply ensemble algorithms to find solutions to real-world problems.
A drug is any chemical substance that causes a change in an organism's physiology or psychology when consumed. Pharmaceutical drugs are often classified into drug classes groups based on their chemical structures.
Drinking water is essential to maintain a healthy functioning of human system. The quality of drinking in some of the under developed countries are below the standards as recommended by World Health Organization. Drinking water which is not potable can lead to serious health effects and may affect the functioning of the human system
Carbon dioxide (CO2) is a colourless, odourless and non-poisonous gas formed by combustion of carbon . Road transport is responsible for about 16% of man-made CO2 emissions. The C02 emission of a vehicle depends on various reasons, C02 emission can be reduced by implementing proper maintenance of the vehicle
AdaBoost classifiers are supervised machine learning algorithms based on the concept of boosting ensemble techniques. This course explains and implements Adaptive Boosting (AdaBoost) for both classification and regression tasks. It explains how the algorithm works, why it works, and when it works. It will set a strong foundation in AdaBoost algorithm concepts and their use cases for the learners.
Most machine learning algorithms use a single high-quality algorithm for prediction. Boosting algorithms are different in principle, as they combine several weak learners to create a strong learner with high prediction accuracy. These fall under the category of Ensemble techniques. With the advancement and development of various boosting techniques and the ease of use, these algorithms have become some of the most used techniques in practical solutions and hackathons.
Although most machine learning algorithms are based on supervised learning, those require labelled data to function. On the other hand, Clustering can help make use of unlabelled data. Clustering finds applications for Data analysis, Customer segmentation, recommender systems, search engines, dimensionality reduction and so on. The goal of clustering is to group similar instances into clusters. This course is a deep dive into some of the most important clustering algorithms.
Many machine learning problems involve thousands or millions of features in the data. With these many features, not only is the processing is slow but finding the optimal solution is also challenging. This problem is handled using the dimensionality reduction algorithm. One such technique is Principal Component Analysis or PCA, which captures most dataset information in terms of the principal component. This course is focused on explaining the fundamental concepts of PCA.
Scammers send fake messages and try to steal your personal data and gain access to your bank accounts, emails etc. It is important to identify spam texts and avoid them
Online reviews reveal a lot about the product. People look at reviews to confirm that the product is making the customer's happy and they are not misleading as advertised
With a plethora of startups in the new digital age, finding the next big start up can get daunting. With the use of machine learning, predict if a start up will get acquired or not using classification algorithms.