💻 Technical
Series: AI Crash Course

Intro to AI - Part 1: Features, Embeddings, and Losses

Amol Kapoor

Module Description:
Ubiquity Extended Team member Amol Kapoor begins this Intro to AI Series with an explanation of 3 foundational concepts of machine learning: features, embeddings, and losses.

Full Transcript:
Hi, my name is Amol Kapoor. I'm currently CTO and co-founder of a startup called Soot, where we are working on building a visual file system to better help everyone organize their data. Before that, I was at Google Research. I worked on a couple of things. I worked on turning 2D into 3D. I worked on large scale graph mining and one of my more fun projects was I was working on making robot dogs learn how to dance using YouTube videos of dogs. I've been doing AI for about nine years now. Before that, I spent four years doing connectomics, that's the study of putting human brain circuits onto silicon and then before that I was doing oncology.

A few quick notes. So in this talk I'll be using the terms AI, deep learning, and ML somewhat interchangeably. These are technically different things, but for the purposes of this talk, we can treat them as synonyms. At its core, an ML model has three parts, features, embeddings and losses. The features are just the input values, these form the baseline or the limit of what information a model can access. So on the right hand side as an example, these values labeled X, these colors and these shapes, that's the data that the model gets. The model can't extrapolate beyond that because it doesn't have additional information. Embeddings are the intermediate states in a model. You can think of them as compressed stores of information. It's the same information just in a different format. In this case, it's numeric factors. And so here you have two different models. They're actually the same architecture and you can see that these intermediate layers, which we call embeddings, have different kinds of information. The information that's stored in those embeddings is a composite of what information is passed in its features and what information is selected for by the loss. So a loss function, you can think of that as the goal of the model. It carves out an embedding space acting as a magnetic attractor for information. In other words, the loss function determines what information is important and how we wanna compress it. So again, going back to the right hand side, we have two models with the exact same architecture, the exact same inputs, but they have two different loss functions. The one on the left is trying to select for data based on color and so the embeddings, the intermediate information, ends up losing that shape data and selects for the color data. On the right hand side, you have a model that's trying to select for shape and so the embeddings, that intermediate representation, loses the color information and begins to select only on shape. So very high level overview.

Let's dive into the linear algebra of this and start from the beginning of how these models are actually put together. These days, a lot of advances in AI are because of neural networks and the neuron is the building block of a neural network. A neuron is a learnable function. It takes in a single number and transforms it into some other number. So as an example, you might have a single neuron, which here we define as f of x. It takes in some number x and spits out some number y and you could plot that on a chart. So you might get this blue line as an example. A neural network is a combination of many neurons. It adds and subtracts the outputs of many, many different kinds of learnable functions. So if you have a blue function and a green function and a red function and a green function, you might get many different kinds of outputs. Generally speaking, a neural network with more neurons is more expressive because each new neuron in a neural network is a different function that the neural network can then work with. And importantly, a neural network can combine these neurons to represent intermediary outputs, so if I have one neuron that represents this blue function and another one that represents this red one, I can combine them and get this nice little purple squiggly line. One thing that I think is really important, and this has actually theoretically been proven to be true, is that a neural network can accurately represent any continuous function, which actually speaks to why these things are so powerful. If you can imagine some sort of a continuous representation of data, well a neural network can actually accurately represent that. So far we've only talked about neural networks with single inputs, single outputs, and no layers, but in practice, neural networks have many inputs and many outputs and many layers. We call the outputs of these intermediate layers embeddings, that's these columns of spheres here that each represent numbers.

Most ML models have two phases. There's a training phase and an inference phase. During training, the network updates the strength of different neurons based on patterns provided in some dataset and then during inference when the data is passed through a layer of neurons, some information is selected, some of it might be inverted, and some is filtered out entirely. So let's, as an example, say we were trying to create a neural network that predict housing prices. We might feed in a lot of different kinds of information like the area code or the tox code or the school district or the color of the house and during training, the model is gonna up sample some of that data, down sample others, and then remove some entirely. The output of this layer of neurons, this embedding, ends up becoming a representation of the data we care about, but this representation is very abstract and it may not be interpretable by a human because of the way the information is mixed in together. So again, going back to our housing example, this intermediate representation is just gonna be a list of numbers that somehow mixes together the area code and the tax bracket and the school district and the color all into one. It makes sense to the model. It might not make sense to humans. In my opinion, embeddings are the secret sauce behind neural networks and I think if you have a good intuition for how embeddings work, you'll have a pretty good intuition for how most neural networks work. So I wanna take a moment and dive into what an embedding is exactly and just talk a little bit about embeddings. An embedding is a vector representation of data. It's literally just a list of numbers that represents something and you probably are already familiar with some kinds of embeddings. For example, RGB. RGB is a way of representing colors as lists of numbers or ASCII, which is a way of representing characters as list of numbers. So as an example, let's say in the top right I have an image, which is this blue and white shucker board, I could represent that as a series of numbers where blue is 0.0.128 and white is 256.256.256. It's just points in space. Embeddings allow us to put concepts into geometries. Going back to this RGB example, we can actually represent every single color as a point within a 3D cube and then if I wanted to do mathematical operations on those colors, I could. I could say combine red and blue to get purple. As long as two things have embeddings of the same length, they can be compared. You can calculate distances between them or add and subtract them, which is really, really powerful because it allows us to do math on concepts.

So as an example, if we could embed different words, say we had two axes, one was royalty and one was gender, we could do mathematical operations like king minus man plus woman equals queen and this is actually a result from a very famous paper called Word2vec, which was basically kickstarted this whole field of representation learning. Every embedding layer in a neural network represents an embedding space, a geometry, like our RGB cube. You can pass in data and get a point that lives somewhere on that geometry. Neural network weights are basically doing the same kind of embedding math we just talked about. The weights in the model are just moving points around, adjusting the underlying shape. So as an example, if I had this yellow vector, again, it's just an embedding that we're passing in to a neural network, which we're representing as this blue matrix, I'll get out a green vector and in the geometry of this all is just equivalent of moving this yellow dot over to where this green is. Now what's really cool about this is you can do this across multiple dimensions. So if we started with our RGB cube, as an example, while we have three inputs our R and our G and our B axes, we pass those into a neural network and the neural network will redefine the shape. It'll project it all into a two dimensional space so that we might be able to say, identify blue and red colors real easily. The caveat to all of this is our embedding geometry needs to make sense. Similar things actually need to be near each other. We can average, add, and mix colors in RGB without a neural network because we know exactly how that space is organized, but this is a lot harder to do with words or images because we don't have an obvious way of creating a useful geometry. I need things like food, for example, to be in the same space near each other, potentially far away from a series of points that are labeled city. One way to think about neural networks and AI as a whole is these things are geometry generators. They find hidden structure relative to some problem or goal. When you change the features or you change the loss, different information ends up being stored in the embeddings in a model, which means the geometry changes and different things end up close to or far from each other. So going back to our initial example where we had two models with two different loss functions. On the left hand side, when we're selecting for color, we might have an embedding space where all the things that are yellow are close to each other and all of the things that are blue, that are close to each other, but if we were selecting for shapes, then we might end up with a completely different embedding space, one that emphasizes shapes and makes sure that those things are close to each other. So one way to build an intuition around neural networks is to ask yourself, what data do I want next to each other? What two things do I want to be neighbors in this high dimensional geometry that we're constructing? And I think all of this speaks to why ML models are really, really powerful. Deep ML models can create good embeddings from just about anything. All you need is the right data and the right loss function.

9 minutes
Series: AI Crash Course
Startup Stage:
Pre-seed, Seed, Series A
Upload Date: