Applied Data Science & ML Certification Program
10 Months | 40+ Live Lectures | 50+ Practice Datacases
Experience the hands on data science virtual lab and courses, designed by industry experts, powered by our patented and award winning platform ATH Precision
Immerse yourself in the courses with contextualized content designed by industry experts from diverse industry domains, to gain on the job problem solving experience
Experience the data science virtual lab powered by our patented and award winning data science platform ATH Precision
Apply your data science quotient to solve real world and business problems
70%+ of the participants finish the enrolled courses and get certified, driven by interactive problem solving on actual data
Get started with this applied course focusing on the building blocks of analytics and statistics. The multi dimensional course provides 12 hands on data cases across different domains and techniques. Beginner Level course which would take approx. 61 Hours to complete
Linear Regression is one of the most widely leveraged techniques, for building future predictions and forecasts. The applied courses focuses on the different aspects of building and evaluating a robust linear regression model, on real world datasets. The course provides 6 hands on data cases with guided approach for building appropriate solutions. Advance Level, Approx. 23 Hours to complete
Logistic Regression is one of the most widely applied statistical techniques across businesses and verticals. The successful completion of the applied course, will equip the learner to build and evaluate a robust logistic model on a real world dataset to solve a business objective. Advance Level Approx. 20 Hours to complete (5 Hours per week)
The applied course focuses on the different concepts of segmentation via some of the most commonly used supervised and unsupervised algorithms. The data cases discussed within the course are based on real world business problems with a guided approach for building appropriate solutions. Advance Level Approx. 20 Hours to complete (5 Hours per week)"
This applied course focuses on the fundamentals of text data mining like cleansing, Treatment and Visualization of Text Data, which is a kind of must to know for text data analytics. With the help of Real-world datacases based on unstructured text data you get familiarized with the concepts through a guided approach.
This applied course, first one in the series, focuses on the building blocks for any time series analysis, exposing the learner on application of different visual and statistical techniques applied on real world datasets. Upon successfully completion of the course the leaner would have a thorough understanding of the different concepts and experience on their application on data. Beginner Level Approx. 15 Hours to complete (5 Hours per week)
Building a deeper understanding about the data is a very crucial step of any Data Science project. Without understanding the data well you can never draw actionable and impact insights from the data or build Predictive Solutions.This course discusses the different techniques of data mining through which you can build a better understanding about the your data and use it effectively in problem solving.
Today as many people utilize social media and electronic channels to convey their opinions , it has become crucial for businesses to assess the sentiment behind the opinions and act accordingly. So in recent times Sentiment Analysis as a technique has gained a lot of popularity due to its wide ranging application across domains. In this course alongwith other basic text analytics concepts , you will also be introduced to Sentiment Analysis and how one can draw interpretations through analysis of the identified sentiments of a text corpus
NLP and Text Analytics are two of the new age domains sprouting from the advent of the different forms of data generated in the digital world. This applied course provides the learner a thorough understanding of Text Clustering and Classification techniques, with hands on application of those techniques on real world datasets. The structure of the course exposes the learner not only to identify underlying themes in a text dataset through clustering methods, though also enables the learner in building predictive text algorithms through classification techniques. Advance Level Approx. 25 Hours to complete (5 Hours per week)
First among the 3 applied courses focused on Application of Analytics in Banking and Financial Services talks about the vital importance of Data Analytics in the Customer Acquisitions Functions. Through its 6 Datacases it will introduce you to application of Analytics to some of the key problems in Acquisitions.
Second among the 3 applied courses focused on Application of Analytics in Banking and Financial Services talks about the vital importance of Data Analytics in the Customer Engagement Management Functions. Through its 5 Datacases it will introduce you to application of Analytics to some of the key problems in Customer Engagement function
Third among the 3 applied courses focused on Application of Analytics in Banking and Financial Services talks about the vital importance of Data Analytics in the Customer Retention Management Functions. Through its 5 Datacases it will introduce you to application of Analytics to some of the key problems in Customer Retentionfunction
Hyperparameter tuning refers to the task of choosing the optimal combination of parameters that can only be learned before the training process begins. When the performance of the algorithm is not satisfactory, hyperparameter tuning is used to tweak certain parameters for a performance boost. In this course, we will focus on hyperparameter tuning for Linear and Logistic Regression through a hands-on approach and will deep-dive into the understanding of various hyperparameters associated with the algorithms.
Get started with this applied course focusing on the introduction to data preprocessing. This quick course provides opportunities of learning through reading material, Business case and a quiz to test your understanding. Beginner Level Approx. 1 Hour to complete
This course will help you understand the need of explainable AI (XAI) and introduce you to LIME, which is one of techniques for XAI. Learn how LIME can be applied on a real world datacase and how the results can be presented in a human interpretable format. Take the quiz to validate your understanding and then solve a datacase based on the learnings.
This course helps you to understand different types of basics of summary and distribution charts which are majorly used to represent the data in a visible format to reveal the trends and patterns. Learn how and when to use different types of charts to present the actionable insights.
Analysis of Variance (ANOVA) is a statistical method to test difference between two or more means. This applied course focuses on the concepts of ANOVA , its types and applications by using business cases. Basic Level
This course will help you understand the need of explainable AI (XAI) and introduce you to SHAP, which is one of techniques for XAI. Learn how SHAP can be applied on a real world datacase and how the results can be presented in a human interpretable format. Take the quiz to validate your understanding and then solve a datacase based on the learnings.
Gradient Boosting Classification is a machine learning technique, attracting attention for its prediction and accuracy. From Machine Learning competitions to data science businesses gradient boosting is providing best-in-class results. It uses an ensemble of weak prediction models for building a classification model. The model is built in a sequential process, where each new step tries to minimize the errors of the previous step. This course will provide you an in-depth understanding of the algorithm along with its practical use case.
Extreme Gradient Boosting is an optimized machine learning algorithm designed for better speed and performance. It is an extension of gradient boosted decision trees. Since its inception in 2014, the algorithm has achieved high performances on structured data and dominated other algorithms because of its flexibility. This course covers the end-to-end walkthrough of the algorithm.
CatBoost (short for Categorical Boosting) is a recently developed machine learning algorithm that helps data scientists and machine learning engineers in automating their column transformations and provides best-in-class accuracy. It can easily integrate with different deep learning frameworks and can work on diverse data types. It solves a variety of business problems that earlier required extensive data preparation.
This course will give you a brief introduction to the concept and applications of one of the most important and commonly used model validation techniqes: Gain and Lift Charts.
At present, when it comes to domains such as machine learning, data science, web development and others, Python seems to be the most favored choice of programming language. But for these domains, efficient storage and accessibility of data plays an important role. This is where Python Data Structures comes in handy. Data Structures help us in organizing, managing and processing data in an effortless manner.
NumPy (Numerical Python) is a library for scientific computation in Python programming language. It provides easy to use multi-dimensional arrays designed for performing high-level computations at a higher speed. Some other important libraries such as Pandas, matplotlib, scikit-learn are built on top of NumPy library making it an essential tool in the Data Science domain. This course is a deep-dive in understanding the structure and working of the NumPy arrays. Through this course, a solid foundation can be developed for NumPy library.
Pandas is a Python Library that is currently one of the most important tools for data analysis and manipulation. Its key data structure is DataFrame which enables you to represent your data in a tabular format. It provides several essential functionalities that makes working with data simpler. Whether you are an entry-level or an experienced professional in Data Science, Pandas may always be your solution to data analysis problems.
Get started with this applied course focusing on the introduction to data preprocessing. This quick course provides opportunities for learning through reading material, the Business case using Python, and a quiz to test your understanding.
Linear Regression is one of the most widely leveraged techniques, for building future predictions and forecasts. The applied course focuses on writing a python code for building and evaluating a robust Simple linear regression model, on a real-world dataset. The course provides the hands-on case with a guided approach for building appropriate solutions.
In this quick byte, visualize and analyse the pattern of gold prices based on past data.
Analyse the stationarity of the time series, and detrend the data
Use the ARIMA time series model to obtain predictions for the given dataset.
Use the Holt-Winters forecasting model to obtain predictions for the gold prices
Use the General Additive Model to obtain predictions for the gold prices
In this quick byte, use sorting operation and bar chart to find and visualize which countries send the most tourists to India
In this quick byte, use difference of columns, sorting operation and visualization to find which country has highest rise in tourists over the last 5 years
In this quick byte, use filtering, sorting operations and 3d visualization to find and visualize which countries send the most tourists to India
In this quick byte, use percent difference, sort and visualization to see which countries saw the highest percent growth in tourist inflow
In this quick byte, use sort operation, filtering and 3d bar chart to visualize the ratio of tourist to business visa for various countries.
Summarize the data and visualize which kind of Australian bowlers are most successful against VK
Compare performance of bowlers against Smith based on wickets taken
Calculate frequency and visualize Australia's win percentage
Use hypothesis testing to determine if winning the toss leads to winning the match
Compare the legendary batsmen one year at a time
In this quick byte, use grouped bar charts and chi square independence tests to test whether all the schools of a district have similar proportions of boys and girls
In this quick byte, use two-sample t-tests to find out if there is a significant difference in marks obtained by girls and boys
In this quick byte, use paired t-tests to find whether girls are better in arts or science
In this quick byte, use bar charts and analysis of variance(ANOVA) to test whether there is a significant difference between the final marks obtained by students studying in four different schools
In this quick byte, use box plots and F-tests to compare the consistency of marks obtained by girls and boys
Group the match scores by Venue and Innings and Plot a bar chart to project the score for the upcoming test matches.
Visualize the most impactful batsman in T20Is using the sum of average and strike rate.
Visualize the most impactful bowler in T20Is using the inverse sum of average and strike rate.
Filter out the data and plot a chart to find the player with most ODI Runs in Australia.
Filter out the data and plot a chart to find the player with most ODI Wickets in Australia.
Understand and impute your data
Visualize your data and gather insights
Normalize the numerical data using Standard Scaling
Feature Encode the categorical data using One Hot Encoding
Use Linear Regression to estimate the prices of diamonds using various attributes
Apply Descriptive Statistics on the first innings total score of IPL matches won while defending a total
Use pie plots to compare the impacts of Virat Kohli and Rohit Sharma
This quick byte introduces some of the steps which are essential in cleaning a textual data, and is an integral part of text analytics.
In this quick byte, the user will be required to obtain the most common words in spam messages, using word clouds.
Plot the total volume of biomedical waste (BMW) generated across all the Indian cities/UTs over the last 5 months ( June 2020 - October 2020) using an Interactive 3D Bar chart.
Filter and compare data of product sales during Diwali, with those during Dussera across two years, to understand the variation in average sales.
Plot the Air Quality Index (AQI) recorded a day after Diwali in different cities in India for the last 5 years and analyze the trend across all the cities.
Plot if Men or Women shop more for kids during Diwali?
Plot the sales of various products over a year to understand trends around holidays. Identify and validate if sales of all products go up during Diwali.
Create a grouped bar chart to compare the Average Annual AQI with the AQI recorded during Diwali across 6 different cities in the year 2019.
Understanding and cleaning the data set before starting with the analysis is an important step while solving any problem. Through this case study, learn how to apply the data cleaning & data treatment techniques on a real business problem
Selecting the right features and building a model is an iterative process and part of any model building activity. As part of this case study, we will learn how to scale the features, perform residual analysis, avoid multicollinearity, and pick only the significant features to build a linear regression model.
Understanding the attrition habits of their customers is quite essential to devise new strategies for improving customer retention. Through this case study learn how to apply Artificial Neural Networks in classification problems.
The first step in the machine learning life cycle is to understand & process the data so that it can be passed on to a machine learning algorithm. In this case study we will learn how understand the data, visualize the data, clean the data and create few derived variables.
This case study deals with the comparison of marks of students in a particular school using the statistical concept of hypothesis testing.
Understanding the expenditure habits of their customers is quite essential to devise new strategies for improving spend growth. Through this case study learn how to apply Artificial Neural Networks in regression problems.
Summarize & treat your data
Create an interactive pie chart to analyse the distribution of the number of matches won by each of the teams.
Perform the one-hot encoding (dummy variables creation) of categorical data
Identify the players who have won more no. of man of the match awards in this season.
Identify the number of times that teams winning the toss have won the match.
Plot the trend of the total number of confirmed, recovered and deceased cases over time. Identify the cases reported daily for the last 7 days and plot the last one week's trend.
Analyze the cases distribution across different continents. Identify the continents that are most impacted and ones that are least impacted.
Plot the cases distribution across different states in an India map. Analyze through a bubble plot to know if population and the cases reported are related.
Plot the percentage distribution of cases based on age groups. Plot the distribution based on gender for each of these age groups.
Proper Hyperparameter tuning is essential for the successful implementation of Support Vector Machines for both classification and regression tasks. SVM offers some predominant hyperparameters that should be fixed before the training process and have an immense effect on the performance of their predictive models. That being said, hyperparameter tuning for SVM is a trivial task. In this course, we will study the concepts of Hyperparameter tuning for SVM in detail along with extensive hands-on practice.
Tree-Based Models are considered as key algorithms in supervised machine learning. They are simple to train and easy to visualize. They provide high flexibility in training as there are many hyperparameters to tune. In this course, our focus will be to enhance the performance of tree-based models by the use of hyperparameter tuning. We will deep dive into the understanding of the hyperparameters and methodologies for hyperparameter tuning.
One of the most essential phases of the classification machine learning algorithms is to understand the ways to evaluate the performance of the model built. This is one of the tasks that are often difficult to interpret and sometimes not paid attention to. In this course, we will see each of the performance measures that will help you to quantify the performance of the model and decide which algorithm will be most suited according to the problem at hand.
Machine Learning algorithms have been proven extremely beneficial for regression tasks. However, there is one question that is often asked. Which algorithm is suited for the particular problem or Do you think the model built is performing well? In this course, we will learn different performance measures that will help you to answer this question. We will see each performance measure used to evaluate a regression model in detail which will help us to select the top-performing model for a particular problem.
Neural Networks are a series of algorithms composed of artificial neurons, that are capable of recognizing the underlying principle from the set of data in a way similar to the human brain. In the past decade, neural networks have laid the foundation for a new domain called Deep Learning that has been used to develop some of the most intelligent machines capable of performing complex tasks such as object recognition, speech translation and recognition, robotics, and many more. This course is designed to give a complete understanding of the most fundamental aspects of neural networks.
First among the 3 applied courses focused on Application of Optimization in Analytics talks about the vital importance of General Optimization Problems in the Business Functions. Through its 3 Datacases it will introduce you to application of three different optimization algorithms to some of the key problems in finding optimal parameters for business settings
Second among the 3 applied courses focused on Application of Analytics by Optimization talks about the vital importance of Shortest Path Problem and Minimum Spanning Tree in the Network Flow Optimization Problems. Through its 2 Datacases it will introduce you to application of Analytics to some of the key problems in Network FLow Optimization
Third among the 3 applied courses focused on Application of Analytics by Optimization talks about the vital importance of several specialized optimization problems such as transportation problem, knapsack problem and travelling salesman problem. Through its 5 datacases it will introduce you to application of analytics to some of the key optimization problems.
Decision Trees are classic non-parametric supervised learning algorithms used for both classification and regression tasks. The decision tree splits the dataset multiple times based on a series of decisions efficiently representing the data and making predictions. They are one of the most widely used algorithms for data mining as it is easy to interpret the decisions made by the algorithm and this reason has made it one of the favourite algorithms of Data Science professionals as well.
Naive Bayes Classifiers are a family of probabilistic algorithms that are based on the Bayes Theorem with an assumption of conditional independence between the features of the dataset. Their robustness and fast execution are widely known. Despite its simplicity, they have wide-ranging applications such as Text classification, Spam Detection, medical diagnosis, and robot sensing to name a few.
Training a dataset with a large number of features can be sufficiently slower with the classical machine learning or deep learning algorithms. Principal Component Analysis or PCA is an unsupervised machine learning algorithm that is able to summarize information of large datasets into a smaller set that makes it easy to train and visualize data. This course provides an in-depth understanding of the hands-on application of PCA.
In this quick byte, analyse the cultures of characters in GOT on the basis of their nobility.
In this quick byte, analyse the different types of battles that has been played in GOT.
In this quick byte, analyse the troop sizes of different kings and the war outcomes to find out the strongest king.
In this quick byte, make a pie chart to analyse which king has won more battles.
In this quick byte, analyse the relation between a character's popularity and their dead/alive status.
When we picture Artificial Intelligence, we often picture ourselves talking to a machine. In fact, we interact with personal assistants (Siri, Cortana, Alexa) on a daily basis and give them instructions to perform tasks. This functionality has been enabled through the domain of NLP. NLP is concerned with the interactions between computer and human languages. Machine Learning algorithms are applied to text and speech data in NLP.
In this quick byte, clean the data by finding and treating missing values
In this quick byte, clean the reviews by applying text preprocessing functions.
In this quick byte, analyse the reviews and perform sentiment analysis on them.
In this quick byte, make word clouds to visually understand the sentiments of the users.
In this quick byte, make pie chart to understand the distribution of sentiments over review texts.
Raw or source data is often inconsistent, vague, and duplicative in nature. Therefore, Data transformation is the process that helps convert this raw structure to another structure that can help you summarize and discover knowledge. The Data Transformation process is essential to any business, especially when there is a need to maintain data quality to deliver recommended Business Intelligence and a requirement to change the structure to a more readable format, enabling effective data analysis.
Data grows exponentially every day and it is becoming an essential factor for driving businesses. Huge amounts of data are being generated and from different sources. Hence, it is essential to perform data aggregation to identify patterns and trends in the data which otherwise would not have been recognized in the data from a single source. This course will enable you to understand different data aggregation methods that are being used on a regular basis in any data-based decision-making.
Do you think you are a Cricket Stats Champ ? Are popcorn popping in your kitchen for every match ? Warm up to try your hand on the Innings of the Quiz Battle !
Do you Believe in making History? Lets explore the history players created in IPL
Feature Engineering is a way of extracting features from data and transforming them into formats that are suitable for Machine Learning algorithms. The performance of a ML algorithm can be improved considerably with correct feature engineering. In this course we will understand some of the common feature engineering techniques and learn to implement them in Python.
Ready to face some bouncers of Quiz Questions this IPL season? Play the Quiz and test your IPL Stats
Analysing the money flow in IPL and Feel the cash prize in your hand.
Want to experience the biggest rivalry of all time? Lets watch some Sixes of Hitman and Stumps of Captain cool!
Multiple Linear Regression is an extension of Simple Linear regression as it takes more than one predictor variable to predict the response variable. The applied course focuses on writing a python code for building and evaluating a robust Multiple linear regression model, on a real-world dataset. The course provides the hands-on case with a guided approach for building appropriate solutions.
Exploratory data analysis (EDA) is done by data scientists to analyze, explore, summarise and visualize some of the key characteristics in the data. Univariate analysis is a kind of EDA that involves analyzing one variable at a time. In this course, we will explore some of the commonly applied univariate analysis on both continuous data and categorical data. We will also understand how to implement it in Python and interpret the results.
The primary data visualization library in Python is Matplotlib, that was developed to mimic the plots and capabilities of MATLAB. Matplotlib is hugely capable of plotting various plots and gives tremendous capabilities to control every aspect of plotting surface. More so, libraries such as Pandas and seaborn rely on matplotlib for their plotting needs. This course will provide you theoretical as well as practical knowledge for working with matplotlib.
Predicting risk of heart diseases can prove useful if they can help preventing them. In this case study we will learn the steps to build a binary classification model using logistic regression with the data collected from different patients. We will explore data visualization, data normalization and feature importance on the way to build the optimal model.
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Seaborn provides an API on top of Matplotlib that offers more choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrames. When it comes to building advanced and engaging plots, seaborn will be your best choice.
Control Flow in Python is the order in which the program code executes. It is managed by conditional statements, loops and built-in function calls. Control flow techniques are essential in Python as real-world problems are full of situations and to solve the problem using programming you need to mimic the situations closely. This requires you to control the program execution sequence and hence, Control the flow.
Functions are an essential part of the Python programming language. Functions are used to bundle a set of instructions that can be used repeatedly when required. There are three major types of functions i.e., Built-in functions, User defined functions and anonymous functions. This course will is a deeper dive into exploration of Python functions. As a Data Scientist, you’ll constantly need to write your own functions to solve problems that your data poses to you.
Python programming language is extremely powerful and widely used to perform complex tasks in a simpler manner. This makes Python a good skill to have for any job that may require working with data files, excel spreadsheets, or scrape data from web pages. By developing simple and modular scripts, you can understand how to use common scripting language constructs to build useful applications. In this course, you will learn how to develop a fully functional program using industry-relevant tools.
Multivariate data analysis techniques normally show the relationship between two or more variables. It is important to understand how the variables are correlated with each other before entering into the feature engineering phase. In this course, we will understand some of the commonly applied EDA techniques with multiple variables and also learn to implement them in Python.
Feature engineering is an integral step before model building as it improves the model performance and the algorithm can interpret the variables without bias or error. This case study will leverage such techniques on a credit card fraud dataset.
Feature Engineering is the process of using domain knowledge to identify or create the important features that have the strongest predictive power. It is considered one of the fundamental tasks for improving machine learning model performance and prediction accuracy. In this course, you’ll learn a number of techniques for selecting the most important features for model prediction.
Hypothesis testing is one of the fundamental aspects of statistics, as well as data science. It is widely used to make conclusions about a population, based on a much smaller sample. It is also used to test certain assumptions of several machine learning techniques. This course focuses on writing a python code for various Inferential and Statistical Tests, on a real-world dataset. The course provides the hands-on case with a guided approach for building appropriate solutions.
The course on ‘Model Selection Techniques’ introduces the concept of various techniques used for selecting better predictive models based on the different number of available models. It helps in differentiating the models based on different criterions available in the data science theory. Model selection uses important principles of data analytics and helps in selecting better model, thereby improving predictability of the model.
Support Vector Machines (SVM) fall in the category of supervised learning algorithms in machine learning. Theoretically speaking, the implementation of the algorithm from scratch in Python programming language is a difficult task. Most data science professionals utilize the packages such as sklearn to implement the algorithms with ease. This course will utilize python programming language to implement the SVM algorithm and its concepts for classification and regression tasks.
Rooting in Statistical learning, Support Vector Machine (SVM) is an exciting algorithm with relatively simpler concepts. It is usually considered to be a classification algorithm but can be employed for both classification and regression tasks. Widely known for its kernel trick, it offers better generalization than logistic regression or the decision trees for the unseen data.
Logistic Regression is one of the most widely applied statistical techniques across businesses and verticals. The successful completion of the applied course will equip the learner to build and evaluate a robust logistic model on a real-world dataset to solve a business objective.
Naïve Bayes is a supervised machine learning algorithm that is probabilistic in nature. It is based on the famous Bayes theorem and uses conditional probability to make predictions. This allows the algorithm to be super fast while at the same time being simple. Naïve Bayes algorithm is suited in cases of high dimensional datasets or large datasets. This course is a deep-dive into understanding and application of the algorithm
Overfitting is a common phenomenon in real-world data. Regularization is the technique that helps deal with overfitting problem in machine learning. Regularization is used to put additional constraints on the loss function of the model. It introduces bias in the model by reducing complexity but the variance is lowered as well. In this course, the concepts of regularization, Bias-variance tradeoff are discussed explaining the fit of the machine learning model.
Perform outlier analysis to eliminate skewness in house allocation data
Evaluate the relationship between the features of the house allocation data by through a correlation analysis
Create Train and Test data using Stratified splitting
Create an SVM classifier model to predict the chances of a house being sold
Use Hyperparameter tuning to find the most optimal classifier which can predict the chances of a house being sold
Tree-based models are a family of supervised machine learning algorithms in Machine Learning that enable higher predictive power and better stability. Unlike the linear models, they can be used efficiently with non-linear data. Tree-based models are adaptive to all sorts of data science problems and you will find these models being used in almost all organizations leveraging machine learning. This course is meant to discuss the fundamental concepts of tree-based models.
Bagging is a technique in which predictions of multiple machine learning models are aggregated to make the final prediction by either majority rule or aggregation function. One of the most useful algorithms based on the bagging ensemble technique is Random Forest. Through this course, let us understand the components of bagging and random forest and apply ensemble algorithms to find solutions to real-world problems.
A drug is any chemical substance that causes a change in an organism's physiology or psychology when consumed. Pharmaceutical drugs are often classified into drug classes groups based on their chemical structures.
Drinking water is essential to maintain a healthy functioning of human system. The quality of drinking in some of the under developed countries are below the standards as recommended by World Health Organization. Drinking water which is not potable can lead to serious health effects and may affect the functioning of the human system
Carbon dioxide (CO2) is a colourless, odourless and non-poisonous gas formed by combustion of carbon . Road transport is responsible for about 16% of man-made CO2 emissions. The C02 emission of a vehicle depends on various reasons, C02 emission can be reduced by implementing proper maintenance of the vehicle
AdaBoost classifiers are supervised machine learning algorithms based on the concept of boosting ensemble techniques. This course explains and implements Adaptive Boosting (AdaBoost) for both classification and regression tasks. It explains how the algorithm works, why it works, and when it works. It will set a strong foundation in AdaBoost algorithm concepts and their use cases for the learners.
Summarise the data and remove duplicate/repeated values from the dataset
Sort the data according to the Candies having higher winning percentage
Find out which columns has blanks and if values are at a far away distance from other values. Treat them before going further
Visualize the information present in the data with the help of various bars and charts.
Remove correlated variables and apply Linear Regression model
Most machine learning algorithms use a single high-quality algorithm for prediction. Boosting algorithms are different in principle, as they combine several weak learners to create a strong learner with high prediction accuracy. These fall under the category of Ensemble techniques. With the advancement and development of various boosting techniques and the ease of use, these algorithms have become some of the most used techniques in practical solutions and hackathons.
Although most machine learning algorithms are based on supervised learning, those require labelled data to function. On the other hand, Clustering can help make use of unlabelled data. Clustering finds applications for Data analysis, Customer segmentation, recommender systems, search engines, dimensionality reduction and so on. The goal of clustering is to group similar instances into clusters. This course is a deep dive into some of the most important clustering algorithms.
Many machine learning problems involve thousands or millions of features in the data. With these many features, not only is the processing is slow but finding the optimal solution is also challenging. This problem is handled using the dimensionality reduction algorithm. One such technique is Principal Component Analysis or PCA, which captures most dataset information in terms of the principal component. This course is focused on explaining the fundamental concepts of PCA.
Scammers send fake messages and try to steal your personal data and gain access to your bank accounts, emails etc. It is important to identify spam texts and avoid them
Online reviews reveal a lot about the product. People look at reviews to confirm that the product is making the customer's happy and they are not misleading as advertised
With a plethora of startups in the new digital age, finding the next big start up can get daunting. With the use of machine learning, predict if a start up will get acquired or not using classification algorithms.
Natural Language Processing (NLP) is one of the most widely used concept under Artificial Intelligence. These days it is used to find practical solutions in almost all major sectors such as finance, retail, marketing, and so on. NLP provides a way to manipulate the human language (Text/Speech) for extracting the information and performing further analysis. However, before applying any advanced NLP task, the requirement is to preprocess the data utilized by advanced techniques. This task is handled by various basic text processing techniques that we will look at in this course.
This course will help you understand the need of Explainable AI (XAI) for text and introduce you to concepts of LIME & SHAP, which are the techniques of XAI. We will understand how LIME & SHAP can be used on text cases to understand the importance of every word and its contribution in predicting the target class. Take the quiz to validate your understanding and then solve a data case based on the learnings.
Sentiment Analysis is a Use-case of NLP that utilises the emotion from the textual data to discover the associated sentiment. This use case has been beneficial for businesses to understand the customer's behaviour towards business. The text can be extracted from reviews by customers, blogs, emails, surveys etc. and can be processed for Sentiment Analysis. This course is a deep dive into Sentiment Analysis, explaining the end-to-end process for the implementation.
Text classification is the process of categorizing a text document based on a model that is learnt from similar text documents, which have an assigned associated categorical response. It is the process of assigning predefined categories or grouping labels to the raw text documents. It is applicable to various real life applications as it provides conceptual view of documents
Explore the data and find the top 5 resolutions made during new year time
Visualize the number of resolution tweets made by men & women
Text processing helps to structure the text data & Word clouds are used to visualize them
Roll Up function enables the user to aggregate data from multiple columns into levels of subtotals
USA Map helps in visualizing the distribution of tweets across various states
Crash is defined an accident in which two or more vehicles collide into each other on the road, usually causing damage and often killing or injuring the people inside the vehicle and on the road
No person can drive a vehicle in public unless it is registered by the respective government. In a wealthy nation, more people mean more vehicles. The United States is one of the world’s largest automobile markets based on the number of new vehicle registrations
Sentiment analysis is a technique that has gained a lot of popularity due to its wide range of applications across domains. Sentiment analysis expresses emotions, opinions towards a person/ product/ topic
To gain hands on experience of the data science virtual lab, powered by our innovative platform ATH Precision
Join webinars led by data science experts and practitioners
7th January 2022 (Finished)
10th December 2021 (Finished)
26th November 2021 (Finished)
8th October 2021 (Finished)
16th September 2021 (Finished)
9th September 2021 (Finished)
20th August 2021 (Finished)
8th January 2021 (Finished)
5th December 2020 (Finished)
25th September 2020 (Finished)
28th August 2020 (Finished)
14th August 2020 (Finished)
17th July 2020 (Finished)
3rd July 2020 (Finished)
19th June 2020 (Finished)
5th June 2020 (Finished)
22nd May 2020 (Finished)
8th May 2020 (Finished)
1st May 2020 (Finished)
Lakshmi E | Assistant Vice President, Client SolutionsRead More
Lakshmi E | Assistant Vice President, Client SolutionsRead More
Lakshmi E | Assistant Vice President, Client SolutionsRead More
Analyttica DatalabRead More
Nikhil Nene | Principal Analyst, Client Solutions | Analyttica DatalabRead More
Analyttica DatalabRead More
Rajeev Baphna | Founder and CEO | Analyttica DatalabRead More