💾 Archived View for aprates.dev › log › 2021-10-15-ml-notation.gmi captured on 2022-03-01 at 15:04:59. Gemini links have been rewritten to link to archived content
➡️ Next capture (2022-04-28)
-=-=-=-=-=-=-
2021-10-15 | aprates.dev
A computer would deserve to be called intelligent if it could deceive a human into believing that it was human. - Alan Turing
By the mid 2021 I started diving into a machine learning course I though I should do. A long time ago, when I graduated, my graduation paper was about chatbots with emotions and how humans would react to that. I wanted to better understand how the techniques had evolved from back then, in 2006, and found something a bit different from what I was expecting.
For the current status quo, you just cannot avoid some basic knowledge of python libraries (such as numpy), linear algebra, and a good dose of mathematical notation understanding, when reading descriptions of machine learning methods. And it can be very frustrating at times.
One bit of notation in an equation you don't grasp completely might prevent you from implementing the concept your are trying to learn. Coming as an experienced developer, I had that beginner-like feeling, while facing modern machine learning basics.
So here I have collected some mathematical notation that I have come across while doing the deep learning course, and also some notes on concepts that felt like mysterious to me like cost and derivatives.
I noted those mostly for my personal use, but posted it as I wish I had found this when searching the Internet. Also I must say notation varies a lot from author to author, and also, that I am still learning, so take my notes with a grain of salt.
The activation of a node in a neural network is something of the form:
output = activation_function(dot_product(weights, inputs) + bias)
as per Andrew Ng of the deeplearning.ai specialization on Coursera [2]
These parameters actually control how parameters w and b work:
The loss function is determined as the difference between the actual output and the predicted output from the model, like y V.S. y^.
Although sometimes loss is also referred as cost, it's not the same thing. The cost function is an average loss over the complete train dataset like Y.
Collected from a note I found useful on forum posted by BurntCalcium (nick), another student:
Basically if f is a function of x, you're taking a ratio of the *change in f* to the *change in x*, given that the latter is an infinitesimally small quantity. The 'd' that is used while writing the notation represents the Greek letter Δ (Delta), which is commonly used to show change in a quantity in physics and math. So basically dx would mean the change in x, df(x) would mean the change in f(x), and df(x)/dx as a whole is called the derivative of f(x) with respect to x. And of course, in the course the instructors have adopted the notation that dx represents df(x)/dx, however outside the context of this course dx would simply mean change in x.
Comment on one of my posts, talk to me, say: hello@aprates.dev
or /msg aprates on irc.libera.chat
Subscribe to the Capsule's Feed
Checkout my projects on GitHub
Checkout my projects on SourceHut
© aprates.dev, 2021 - content on this site is licensed under