How do machines Actually learn?

Logic behind the working of Machine Learning Algorithm

Astha Sharma
5 min readJun 5, 2021
(Google-PhonlamaiPhoto via iStock by Getty Images)

As we all know that Machine learning is the Latest way of approaching and solving problems and by now you have probably heard many explanations on “what is Machine Learning ? “

The General Idea behind the Machine learning is that humans generally learns from their past experience and machine follows instructions given by the humans but what if humans can train the machines to learn from the past data and let them improve over time when exposed to new data.

Great!…So “Machine learning Provide computers an ability to learn from their past experiences and then increase the performance as they feed on more and more data.

We got it…but wait “How does it learn ?”

Most of the time we disregard this question as we got so overwhelmed and fascinated by the trendy libraries that one starts to think that why do even I need to learn the mathematics and logic behind the learning algorithms? every thing is already there, I just need to learn some syntax to import modules and call the functions to get the output. Done!

But in reality the real world problems need appropriate and feasible solutions which can be derived by the right approach towards the problem and to validate this we need a clear understanding of the fundamentals.

So In this article I’d like to cover a simple explanation of how machine learning works under the hood.

Here ML task is to learn the relationship between the input data and the output data.

let us assume this relationship as a mathematical function like one given below. we can call this function a “Model”.

Y = f(x) = Wx+b

Above equation says that W times x plus b makes y. In context of ML, x would be our input, the result of the equation y would be our expected output and W and b are parameters.

The goal of the ML algorithm would be to find such values for W and b so that the output of W times x + b is as close to the observed value as possible.(i.e. y)

The function has two parts :

  • Form of the function (given by user)-the function can be of any form depending on different algorithm.
  • Parameters/weight of function (found by the algorithm)

So basically the entire ML process is about getting lots of examples for input(x) and output(y) and based on those examples it try to figure out the value of W and b by making some educated guesses and improving those guesses.

Learning the Parameter via Feedback

To learn the parameters there are following steps :

Mathematical Representation of ML algorithm
  • We start with some point say x. i.e. we collect and pass lots of input data x and output data y to the model.
  • Define a form of the function that model has to follow.(In this article I am using simple linear function for the explanation.) ex : y = f(x) = Wx+b
  • Make some arbitrary guess for W for which will get some(here is a predicted outcome ).we mostly found some difference between the actual outcome(y) and the predicted outcome.()
  • Then define and find the cost function — It is a function that measures the performance of our ML model on the given data. It quantifies the error between predicted values and expected values and presents it in the form of a single real number. Depending on the problem Cost Function can be formed in many different ways. The most common cost function we for linear model is Mean Square Error (MSE)

The goal is to find the values of model parameters(in this case W and b ) for which Cost Function(MSE) return as small number as possible.

  • To minimise the cost function and to improve W we use optimisation algorithm such as Gradient Descent. Gradient descent is a first order iterative optimisation algorithm for finding the minimum of the function. It is the most commonly used optimisation algorithm(let’s not get into the details here) but again depending on problems we can use different optimisation algorithms.

Lets understand the above steps with the help of simple example to get an clear idea :

Let’s say we have training set which contains single data point(x=4,y=14). we know that if x is 4, y should be 14. We need to find the a, b pair that fits these numbers. As in the equation below.

a*4 + b = 14

So before the model starts its search for the a and b values we need to assign some random numbers to the values as it does not use any prior knowledge, No mathematics knowledge, no assumptions, nothing.

Let’s say, in our example, we initialized as 7 and b as 10. This would give us:

7*4 + 10 = 38

The error is: 38–14 = 24. So this would tell us that our a and b values are too high.

so the next time model tries the new value guided by how much it overstepped the actual value and in which direction. Like, was the guess higher than the expected value or lower then the expected value?

Now our model might try 2 and 4 next. This would give us:

2*4 + 4= 12

This is a much better guess. Our error is only: 12–14 = -2 This error means that now we underestimated the y value. After going with this for a while, let’s say our model tries, 3 and 2:

3*4 + 2=14

Ok we are Done! We were able to figure out the a and b values but its not only the values which satisfies the given equation, there might be many such pairs and in reality the process is not so fast. The whole process is trial and error approach and in the end models comes up with optimal values with as minimum error as possible.

So This how ML model actually learns in Principle.

Well of course not all the Machine learning algorithm follows the linear model. The mathematical equation that needs to be solved is different for each algorithm depending upon the problems and linear model is one of them but the general logic behind the learning is almost same. what we are doing at the end of the day is that we are trying to come up with a mathematical explanation that can closely estimate what happens in the real world.

That’s all. This basic idea about how machine learns behind the scene helps me alot to develop my interest to learn more about the different algorithms and understanding what is being optimized and what value we are trying to estimate.

I hope someone find this useful :)

Thank you!

--

--