Fork me on GitHub
Tutorial Home

Machine Learning Algorithms


In our previous article, we gave an overview of the difference between artificial intelligence (AI) and its subsets known as machine learning (ML) and deep learning (DL).

Both ML and DL have a wide range of uses across several industries and whilst the applications might be different, the algorithms and neural networks tend to have the same foundations.

This article talks about some of the most popular technical algorithms used in machine learning and what they do as well as examples of neural networks used for deep learning.

Machine Learning Algorithms

When you start trying to learn Data Science and begin to research programming platforms like R and Python it can be intimidating. A lot of the techniques come with multiple page definitions and descriptions and it is very difficult to put the detail into a practical use case. If you dream of becoming a true expert and earning the big salaries, you will definitely need to have a handle of the ins and outs of everything but as a starter, this brief guide attempts to define a realistic beginning.

  1. Linear Regression

This is one of the quickest algorithms for a machine learning beginner to master. Essentially, if we have a set of variables (x) that are used to determine an output variable (y), the goal is to quantify the relationship between the two. This would be used in something like sales forecasting or risk assessment. It will show you what happens to dependent variables i.e. sales when changes are made to independent variables

  1. K-means Clustering

Used in applications such as grouping images together or detecting activity types in motion sensors as well as structured data use cases like customer segmentation. This unsupervised learning algorithm takes unstructured data and separates it into ‘K’ groups. It will classify the data and categorise it based on specific features.

  1. Logistic Regression

Predictions based upon continuous values after applying a transformation function. Unlike linear regression, the output will be the likelihood of an event occurring rather than a precise number like a sales figure. This could be whether a student will pass a test or if an employee is likely to be sick for example.

  1. Support Vector Machine (SVM)

This is a classification algorithm used for category assignment like detecting spam emails and sentiment analysis projects. It is a form of supervised learning that looks for support vectors that are along what is known as a ‘hyperplane’, the line that separates and classifies a set of data. It is designed for smaller datasets and is often more efficient than other algorithms given that it users a subset of training points.

  1. Decision Trees

A supervised learning method used in classification type problems. A decision tree is probably best explained using an example. Imagine we have 30 students with boy/girl, height and class variables. 15 out of 30 of them play soccer in their spare time. A decision tree will segregate the students based on values in the three variables and identify which create the best homogenous set of students.

  1. Naive Bayes

A probability algorithm that outputs the chance of an event occurring given that another event has already occurred. For example, if a student fails one test, what is the probability of them passing or failing their next test. It assumes all variables are independent of each other, hence the term ‘naïve.’

  1. Random Forest

Taking decision trees to the next level, a random forest algorithm constructs a number of tress together. The output will take a majority vote from the trees, so to speak, or take the average if the trees are producing numerical values.

  1. Principal Component Analysis (PCA)

This algorithm is used in applications such as stock market prediction and pattern classification tasks. A principal component analysis tries to identify patterns in data and make correlations of the variables within it.

  1. K-Nearest Neighbour

As the term neighbour suggests, this algorithm looks for similar items in comparison to others. It works well with unstructured data like images where the algorithm needs to have a “best guess” in some cases about what the likely output or classification should be of some input. It may no be accurate at first but can become very powerful, take Amazon Alexa as an example.

  1. Recommender System

This algorithm filters and predicts user ratings and preference by using collaborative and content-based techniques. The most popular examples of how this is used in the real-world are Netflix, Spotify and Amazon. In essence, it makes recommendations based on how different pieces of data have been classified e.g. genre in Netflix.