Letâs say two friends, Joe and Bob meet for drinks at a local bar; ready to meet after a year of not talking. When Bob and Joe lock eyes, Joe walks over to Bob, and they both immediately, from their subconscious mind, perform the signature handshake that theyâve trained over the years. For us humans, this is usually formed over a lifetime of experience as Joe and Bob have done.
(Source)
Letâs also take the example of when you listen to a song. Once listened to, your subconscious mind picks up and trains, schools, and houses that rhythm: instilling it inside the back of your head. When you revisit it months later, itâs as if itâs never left đ€Ż.
Computers, on the other hand, can do this in a slightly different yet mindblowing way. The incredible thing is that, while it can predict your next interaction, it can do so in less than a second without taking into account the external circumstances and paradigms that we humans must use to connect with one another đ§ .
So how does it work?
From the ground up, this is accomplished through machine learning and deep learning. Humansâ complex knowledge, intuitions, and impulses are difficult for machines to comprehend, but instead of conforming to this limitation, they use data instead. While doing so, the implications for the computer's ability to do this can skyrocket into something uncanny đ. Predictive computer systems would open up new possibilities ranging from better-navigating human environments to emergency response systems that predict falls to Google Glass-style headsets that feed you suggestions for what to do in various situations.
MITâs computer science and Artificial intelligence lab (CSAIL) has made a vast, possibly uncanny breakthrough in developing an algorithm that can anticipate interactions more accurately than ever before đł.
The system, trained on YouTube videos and TV shows like âThe Officeâ and âDesperate Housewives,â can predict whether two people will hug, kiss, shake hands, or slap five. In a second scenario, it could expect which object will appear in a video five seconds later.
All jokes aside, this comically humorous video is just the beginning of something great. Then again, the implications for this are crazy as previously stated.
Well, how do machines predict these types of things?
This is done through artificial intelligence, which refers to systems or machines that perform tasks by mimicking human intelligence and can iteratively improve themselves based on the data they collect.
AI comes in two flavors: machine learning and deep learning.
Deep learning đ
Deep learning is a subset of machine learning that employs artificial neural networks to mimic the human brainâs learning process. It is a type of neural network with three layers.
Machine Learning đ€
Machine learning employs two techniques: supervised learning, which involves training a model on known input and output data in order to predict future outputs, and unsupervised learning, which consists in discovering hidden patterns or intrinsic structures in input data. In a nutshell, machine learning is AI that can adapt automatically with minimal human intervention.
Neural Networks đ§
A neural network is a set of algorithms that attempts to recognize underlying relationships in a set of data using a process similar to how the human brain works. In this context, neural networks are systems of neurons that can be organic or artificial in nature, known as the perceptron.
Just like in the name, neural networks are a facet of neurons. In an example, a typical neuron receives signals from other neurons via a network of fine structures known as dendrites. Consider the following and the âdendritesâ that the image composes of.
Photo demonstrating machine learning and deep learning distinctions. (source)
This can involve things like image detection and convolutional neural networks.
To create this, they âlearnâ and find patterns in similar data. Think of data as information you acquire from the world. The more data given to a machine, the âsmarterâ it gets.
To do so, there is a lot of math!
I wonât bore you, but a few good book recommendations for diving deep into machine learning are Pattern Recognition and Machine Learning byChristopher Bishop, the Elements of Statistical Learning by Jerome Fisher, and Mathematics for Machine Learning by Marc Deseinroth.
In this article though, we wonât go too in-depth, but in-depth enough for you to be able to understand and apply this knowledge in a project (stayed tuned for the next article đ€«).
Letâs go over the basic, architectural overviews of deep learning đ
Note: this is not all the math for machine learning, it covers just the amount of stuff required to create the first deep learning model!
- The âNeuronâ đ§
- It is a collection of mathematical operations that connects entities.
Consider the following problem: estimating the price of a house based on its size. It can be modeled as follows. But Before that, guess which type of learning model this is!
(source)
In general, deep learning, also known as MLPs (Multi Layers Perceptrons), are a type of direct formal neural network organized into several layers, with information flowing only from the input layer to the output layer đ. Each layer is made up of a specific number of neurons, and we distinguish between:
- The input layer
- The hidden layers
- The layer of output
The following graph represents a neural network with 5 neurons at the input, 3 in the first hidden layer, 3 in the second hidden layer, and 2 out.
Some variables in the hidden layers can be interpreted based on the input features: for example, in the case of house pricing, and assuming that the first neuron of the first hidden layer pays more attention to the variables x 1 and x 2, it can be interpreted as the quantification of the houseâs family size.
The theorem of universal approximation
In real life, deep learning is the approximation of a given function f. The following theorem makes this approximation possible and accurate:
(*) A set is said to be compact in finite dimensions if it is closed and bounded. The main takeaway from this algorithm is that deep learning can solve any problem that can be expressed mathematically.
Data Preprocessing
In any machine learning project in general, we divide our data into 3 sets:
- Train set: used to train the algorithm and construct batches
- Dev set: used to finetune the algorithm and evaluate bias and variance
- Test set: used to generalize the error/precision of the final algorithm
The following table sums up the repartition of the three sets according to the size of the data set m:
Standard deep learning algorithms require a large dataset with around lines of samples. Now that the data is ready, we will look at the training algorithm in the following section. Before splitting the data, we usually normalize the inputs, which is covered in more detail later in this article.
2. Learning Algorithm
Learning in neural networks is the process of calculating the weights of the parameters associated with the networkâs various regressions. In other words, we want to find the best parameters that give the best prediction/approximation of the real value starting from the input. For this, we define a loss function, denoted J, that quantifies the difference between the real and predicted values on the overall training set. We reduce this by taking two major steps:
- Forward Propagation: We propagate the data through the network either entirely or in batches, and we calculate the loss function on each batch, which is simply the sum of the errors committed at the predicted output for the various rows.
- Backpropagation: this involves calculating the gradients of the cost function with respect to the various parameters and then updating them using a descent algorithm.
We iter the same process a number of times called epoch number
. After defining the architecture, the learning algorithm is written as follows:
(â) The cost function L evaluates the distances between the real and predicted value on a single point.
So, know that you have some bit of an understandingâŠ.gradient descent, forward propagation, etc⊠Letâs take a look at how a neural network will first be created. This will just show how to load the libraries and give a bit of an example of backpropagation, forward propagation, etc.
3. Parametersâ initialization
The first step after defining the architecture of the neural network is parameter initialization. It is equivalent to injecting initial noise into the modelâs weights.
- Zero initialization: one can think of initializing the parameters with 0âs everywhere i.e: W=0 and b=0
Using the forward propagation equations, we note that all the hidden units will be symmetric which penalizes the learning phase.
- Random initialization: itâs an alternative commonly used and consists of injecting random noise in the parameters. If the noise is too large, some activation functions might get saturated which might later affect the computation of the gradient.
Two of the most famous initialization methods are:
Xavier
's: it consists of filling the parameters with values randomly sampled from a centered variable following the normal distribution:Glorot
's: the same approach with a different variance:
4. Forward and Backpropagation
Before diving into the algebra behind deep learning, we will first set the annotation which will be used in expliciting the equations of both the forward and the backpropagation.
Neural Networkâs representation
The neural network is a sequence of regressions
followed by an activation function
. They both define what we call the forward propagation. and are the learned parameters at each layer. The backpropagation is also a sequence of algebraic operations carried out from the output towards the input.
Forward propagation
- Algebra through the network
Let us consider a neural network having L layers
as follows:
Algebra through the training set
Let us consider the prediction of the output of a single row data frame, through the neural network.
When dealing with a m-row data set, repeating these operations separately for each line is very costly. We have, at each layer [i]:
The parameter b_i uses broadcasting to repeat itself through the columns. This can be summarized in the following graph:
Backpropagation
The backpropagation is the second step of the learning, which consists of injecting the error committed in the prediction (forward) phase into the network and update its parameters to perform better on the next iteration. Hence, the optimization of the function J, usually through a descent method.
Computational graph
Most of the descent methods require the computation of the gradient of the loss function denoted âJ(Ξ).
In a neural network, the operation is carried out using a computational graph which decomposes the function J into several intermediate variables.
Let us consider the following function: f(xyz)=(x+y).z
We carry out the computation using two passes:
- Forward propagation: computes the value of f from inputs to output: f(â2,5,â4)=â12
- Backpropagation: recursively apply chain-rule to compute gradients from output to inputs:
The derivatives can be resumed in the following computational graph:
How to load and begin a neural network
TensorFlow (developed by Google) and PyTorch are the two main libraries for building Neural Networks (developed by Facebook). They are capable of performing similar tasks, but the former is more production-ready, whereas the latter is better for building rapid prototypes due to its ease of learning. Because they can take advantage of the power of NVIDIA GPUs, these two libraries are popular among the community and businesses. This is very useful and sometimes required when processing large datasets such as a corpus of text or an image gallery.
pip install tensorflow
If you want to enable GPU support, you can read the official documentation or follow this guide. After setting it up, your Python instructions will be translated into CUDA by your machine and processed by the GPUs, so your models shall run incredibly faster.
So that is how you would load the basics for a neural network. Whew, that was a lot! Since Iâm just new to the field, I have not created my first neural network (yet) but one day we both will! In the essence of the stated example, we would load our libraries to guide the GPU. After this, we would set up our parameters and then go into the actual math of it all.
Now, what are the pros and cons of predicting the future from my take đ€
Pros đ
To some extent, machine learning and data science can predict future events, trends, and customer behavior. These forecasts can help businesses make better decisions about where to allocate resources and how to respond to market changes. As the following video shows, I honestly belive that machine learning can save lives!
Cons đ
Data collection. To train on, machine learning requires massive data sets that are inclusive/unbiased and of high qualityâŠTime and Materials Results in InterpretationâŠHigh susceptibility to errors
Overall, I do think there is some part of machine learning that is bad, but ultimately the good can outweigh to cons if everything is monitored appropriately!
(source)
Well, what is the future of machine learning? đ€
Because machine learning algorithms have the potential to make more accurate predictions and business decisions, many companies have already begun to use them. Machine learning companies received $3.1 billion in funding in 2020. Machine learning has the potential to transform entire industries.
With machine learning being so prevalent in our lives today, itâs difficult to envision a world without it.