Deep Learning notation

2021-10-15 | aprates.dev

Leia este post em português

A computer would deserve to be called intelligent if it could deceive a human into believing that it was human. - Alan Turing

Deep maths, ufff…

By the mid 2021 I started diving into a machine learning course I though I should do. A long time ago, when I graduated, my graduation paper was about chatbots with emotions and how humans would react to that. I wanted to better understand how the techniques had evolved from back then, in 2006, and found something a bit different from what I was expecting.

For the current status quo, you just cannot avoid some basic knowledge of python libraries (such as numpy), linear algebra, and a good dose of mathematical notation understanding, when reading descriptions of machine learning methods. And it can be very frustrating at times.

One bit of notation in an equation you don't grasp completely might prevent you from implementing the concept your are trying to learn. Coming as an experienced developer, I had that beginner-like feeling, while facing modern machine learning basics.

So here I have collected some mathematical notation that I have come across while doing the deep learning course, and also some notes on concepts that felt like mysterious to me like cost and derivatives.

I noted those mostly for my personal use, but posted it as I wish I had found this when searching the Internet. Also I must say notation varies a lot from author to author, and also, that I am still learning, so take my notes with a grain of salt.

Principle

The activation of a node in a neural network is something of the form:

output = activation_function(dot_product(weights, inputs) + bias)

General Notation

as per Andrew Ng of the deeplearning.ai specialization on Coursera [2]

Hyperparameters

These parameters actually control how parameters w and b work:

Concepts

Cost

The loss function is determined as the difference between the actual output and the predicted output from the model, like y V.S. y^.

Although sometimes loss is also referred as cost, it's not the same thing. The cost function is an average loss over the complete train dataset like Y.

Derivatives (dx)

Collected from a note I found useful on forum posted by BurntCalcium (nick), another student:

Basically if f is a function of x, you're taking a ratio of the *change in f* to the *change in x*, given that the latter is an infinitesimally small quantity. The 'd' that is used while writing the notation represents the Greek letter Δ (Delta), which is commonly used to show change in a quantity in physics and math. So basically dx would mean the change in x, df(x) would mean the change in f(x), and df(x)/dx as a whole is called the derivative of f(x) with respect to x. And of course, in the course the instructors have adopted the notation that dx represents df(x)/dx, however outside the context of this course dx would simply mean change in x.

Reference

Deep Learning on Coursera

See also

Capsule Archives

Capsule Home

Want more?

Comment on one of my posts, talk to me, say: hello@aprates.dev

or /msg aprates on irc.libera.chat

Subscribe to the Capsule's Feed

Checkout my projects on GitHub

Checkout my projects on SourceHut

© aprates.dev, 2021 - content on this site is licensed under

Creative Commons BY-NC-SA 4.0 License

Proudly built with GemPress

Privacy Policy