### Vladimir Vapnik: Statistical Learning | MIT Artificial Intelligence (AI) Podcast

The following is a conversation with Vladimir Vapnik. He is the co-inventor of support vector machines,

support vector clustering, VC theory, and many foundational ideas in statistical learning. He was born in the Soviet Union and worked

at the Institute of Control Sciences in Moscow. Then in the United States, he worked at AT&T,

NEC Labs, Facebook Research, and now is a professor at Columbia University. His work has been cited over 170,000 times. He has some very interesting ideas about artificial

intelligence and the nature of learning, especially especially, on the limits of our current approaches

and the open problems in the field. This conversation is part of the MIT course

on Artificial General Intelligence and the Artificial Intelligence Podcast. If you enjoy it, please subscribe on YouTube or

rate it on iTunes or your podcast provider of choice or simply connect with me on Twitter

or other social networks at Lex Fridman, spelled F-R-I-D. And now, here’s my conversation with Vladimir Vapnik. Lex: Einstein famously said that God

doesn’t play dice. Vladimir: Yeah. Lex: You have studied the world through the eyes

of statistics, so let me ask you in terms of the nature of reality–fundamental

nature of reality. Does God play dice? Vladimir: We don’t know some factors. And because we don’t know some factors, which could be important, it looks like God plays dice, but you should describe. In philosophy, they distinguish between two

positions: positions of instrumentalism, where you’re creating theories of prediction and position of realism, where you’re trying to understand what God did. Lex: Can you describe instrumentalism

and realism a little bit? For example, if you have some mechanical laws, what is that? Is it law which is true always and everywhere or is it a law which allows you to predict a position of moving elements? What do you believe? Do you believe that it is God’s law, that

God created the world which is this physical law, or is it just law for predictions? Lex: And which one is instrumentalism? For predictions. If you believe that this is the law of God

and it is always true everywhere, that means that you’re a realist. You’re trying to understand God’s thought. Lex: So the way you see the world is as an instrumentalist? Vladimir: You know I’m working from some models– Models of Machine Learning. So in this model, you can see settings and you try to resolve the problem. And you can do it in two different ways from

the point of view of the instrumentalist, and that’s what everybody does now because

the goal of machine learning is to find the rule for classification. That is true, but it is an

instrument for prediction. But I can say, the goal of machine learning

is to learn about conditional probability, so how God play and use. Does he play what is the probability for one

and what is the probability for another in a given situation? But for prediction, I don’t need this. I need the rule. But for understanding, I need conditional probability. Lex: So let me just step back a little bit first

to talk about, you mentioned which I read last night the parts of the 1960 paper by

Eugene Wigner, Unreasonable Effectiveness of Mathematics in the Natural Sciences. It’s such a beautiful paper, by the way. To be honest, to confess my own work in the

past two years on deep learning heavily applied, it made me feel that I was missing out on

some of the beauty of nature in the way that math can uncover. So let me just step away from the poetry of

that for a second. How do you see the role of math in your life? Is it a tool? Is it poetry? Where does it sit? And does math, for you, have limits? Vladimir: Some people are saying that Math

is language which use god. Lex: Speak to god or use god? – Use God. Lex: Use God Vladimir: I believe that this article about Unreasonable Effectiveness

of Math is that if you look at mathematical structures, they know something about reality. And most scientists from Natural Science,

they look at an equation in trying to understand reality, so the same with machine learning. If you try to very carefully look on all the

equations which define conditional probability, you can understand something about reality

more than from your fantasy. Lex: So math can reveal the simple underlying

principles of reality, perhaps. Vladimir: You know, what may seem simple, it is very

hard to discover them. But then, when you discover them and look

at them, you see how beautiful they are. And it is surprising why people did not see

that before when you look at an equation and derive it from the equations. For example, I talked yesterday about the

Least Squares Method and people had a lot of fantasies about improving

least squares method. But if you look, going step by step by solving

some equations, you suddenly will get some terms which after thinking; you understand it,

the described position of an observation point. Least squares method, they throw out a lot

of information. You don’t look at the composition of point

of observations. We’re looking only on the details. But, when you understood that very simple

idea, which is not too simple to understand and you can derive this just from equations. Lex: So some simple Algebra, so a few steps will

take you to something surprising that when you think about– Vladimir: Absolutely, yes. And that is proof that human intuition is not too rich and very primitive, and it does not see very simple situations. Lex: So let me take a step back, in general, yes. What about human ingenuity as opposed to intuition,

the moments of brilliance? Do you have to be so hard on human intuition? Are there moments of brilliance on human intuition

that can leap ahead of math, and then the math will catch up? Vladimir: I don’t think so. I think the best human intuition, it is putting

in axioms, then it is technical where you have to arrive. Lex: See where the axioms take you. Vladimir: Yeah. But if they correctly take axioms. Axioms are polished during generations of

scientists and this is integral wisdom. Lex: That’s beautifully put. When you think of Einstein and especially,

relativity, what is the role of imagination coming first there in the moment of

discovery of an idea? So, that’s obviously a mix of math and out

of the box imagination there. Vladimir: That, I don’t know. Whatever I did, I exclude any imagination

because whatever I saw in machine learning that come from imagination, like features,

like deep learning, they’re not really one to the problem. When you’re looking very clearly from a mathematical

equation, you’d arrive in very simple story which goes far beyond, theoretically, than

whatever people can imagine because it is not good fantasies. It is just interpretation. It is just fantasy, but it is not what you need. You don’t need any imagination to derive mind

principle of machine learning. Lex: When you think about learning and intelligence,

maybe thinking about the human brain in trying to describe mathematically the process of

learning that is something like what happens in the human brain, do you think we have the

tools, currently? Do you think we will ever have the tools to

try to describe that process of learning? Vladimir: It is not description what’s going on. It is interpretation. It is your interpretation. Your vision can be wrong. You know, when the guy who invented the microscope,

Leeuwenhoek, for the first time, only he got this instrument and he kept it secret. But he wrote a report in

the London Academy of Science. In his report, when he’s looking on the blood,

he looked everywhere–on the water, on the blood on those film, but he described blood

like a fight between queens and kings. So he saw blood cells, red cells and he imagines

it is like an army fighting each other. And it was his interpretation of the situation. And he sent it as a report in the Academy

of Science. They very carefully looked because they believe

that he is right. He saw something, but he gave a wrong interpretation. And I believe the same can happen with the brain. The most important part, you know, I believe

in human language. In some proverbs, there’s so much wisdom. For example, people say that it is better

than a thousand days of diligent study is one day with a great teacher. But if you’ll ask what the teacher does,

nobody knows. And that is intelligence. But we know from history, and now

from machine learning is that a teacher can do a lot. Lex: So what from a mathematical

point of view is a great teacher? Vladimir: I don’t know, but we can say

what a teacher can do. He can introduce some invariants, some predicate

for creating invariants. How is he doing it, I don’t know, because

a teacher knows reality and can describe from his reality a predicate and invariants. But we know when you’re using invariant, you

can decrease the number of observations a hundred times. Lex: Maybe try to pull that apart a little bit,

but I think you mentioned that like a piano teacher saying to the student,

“Play like a butterfly.” I played piano. I played the guitar for a long time and maybe it’s romantic

and poetic, but it feels like there’s a lot of truth in that statement,

like there’s a lot of instruction to that statement. Can you pull that apart? What is that? The language itself may not contain this information. Vladimir: It’s not blah, blah, blah

because it affects you. It’s what? Affects you, affects your playing. Lex: Yes it does, but what is the information being exchanged there? What is the nature of information? What is the representation in that information? Vladimir: I believe that it is a sort of predicate,

but I don’t know. That is exactly what intelligence in machine

learning should be because the rest is just mathematical technique. I think that what was discovered recently

is that there are two mechanisms of learning. One is called strong convergence mechanism

and big convergence mechanism. Before, people used only one convergence. In big convergence, you can use predicate. That’s what “fly like butterfly” is and if

you immediately effect your plan. You know there is an English proverb which

is “If it looks like a duck, sleeps like a duck, and quack like a duck, then it is

probably a duck.” But this is exact about predicate. It looks like a duck, what does it mean? So, you saw many ducks–that’s your training data. You have a description that looks like ducks. Lex: Yeah, the visual characteristics of a duck, yeah. Vladimir: Yeah, and you have a model

for recognizing ducks. So you would like that theoretical description

from the model to coincide. There’s empirical description which you saw. So, about “it looks like a duck,” it is general. But, what about swims like a duck? You should know that ducks swim. You can’t say it plays chess like a duck. Okay, ducks doesn’t play chess. It’s a completely legal predicate but it is

useless. So, how can a teacher recognize a non-useless

predicate? So, up to now, we don’t use this predicate

in existing machine learning, so why do we need zillions of data? But this English proverb say use only three

predicates–looks like a duck, swims like a duck and quack like a duck. Lex: So you can’t deny the fact that swims like

a duck and quacks like a duck has humor in it, has ambiguity? Vladimir: Let’s talk about “swims like a duck.” It does not say jumps like a duck, why? Lex: It’s not relevant. Vladimir: It means that you know ducks and you know

different birds. You know animals and you derived from this

that it is relevant to say “swim like a duck.” Lex: So in order for us to understand “swims like

a duck,” it feels like we need to know millions of other little pieces of information

we pick up along the way. You don’t think so? That doesn’t need to be this knowledge-based,

in those statements, carry some rich information that helps us understand the essence of duck? Vladimir: Yeah. Lex: How far are we from integrating predicates? Vladimir: You know that when you can see the

complete story of machine learning, so what it does, you have a lot of functions, and then you’re talking it looks like a duck. You see your training data. From the training data, you recognize what

the expected duck should look like. Then, you remove all functions which do not

look like what you think it should look from the training data. So, you decrease the amount of function from

which you pick up one. Then, you give a second predicate and again,

they create a set of functions. And after that, you pick up

the best function you can. It is standard machine learning. So, why do you need not too many examples? Lex: Because your predicates are very good. Vladimir: Yeah, that’s exactly basic predicate because

every predicate is invented to decrease the admissible set of functions. Lex: So you talk about admissible set of functions

and you talk about good functions. So what makes a good function? Vladimir: So admissible set of function is a set of

function which has a small capacity or small diversity, a small dimension, which contains

good functions inside. Lex: By the way, for people who don’t know VC,

you’re the V in the VC. So how would you describe to a lay person

what VC theories are? How would you describe VC? Vladimir: When you have a machine, a machine capable to pick up one function from the admissible set of function. But the set of admissible functions can be big. They contain all continuous functions and

theories. You don’t have so many examples to pick up

functions. But it can be small– what we call capacity, but maybe diversity–

so not very different functions in the settings, an infinite set of functions but not very diverse. So, if it’s a small VC dimension and when

the VC dimension is small, you need a small amount of training data. So the goal is to create admissible set of

functions which have small VC dimension and contains good functions. Then, you’ll be able to pick up the function

using a small amount of observations. Lex: So that is the task of learning is creating

a set of admissible functions that has a small VC dimension and then you figure out a clever

way of picking up the good. Vladimir: That is the goal of learning

which I formulated yesterday. Statistical learning theory does not involve

creating admissible set of functions. In classical learning theory everywhere, in

100% of textbooks, the admissible set of functions is given, but this is telling us about nothing

because the most difficult problem is to create admissible set of functions given, say, a lot of functions, a continuous set of functions. Create admissible set of functions, that means

that the finite VC dimension, small VC dimension and contains good functions. So, this was out of consideration. Lex: So what’s the process of doing that,

I mean, that’s fascinating? What is the process of creating this admissible

set of functions? Vladimir: That is invariance. Lex: That’s invariance. Can you describe invariance? Vladimir: Yeah. You have to think of properties of the training

data and properties means they have some function and you just count what is the average value

of function of training data. You have a model and what is the expectation

of this function on the model and they should coincide. So, the problem is about how to pick up functions. It can be any function. In fact, it is true for all functions, but when I say a duck doesn’t jump, so you don’t ask a question on “jumps like a duck”

because it is trivial. It does not jump, so it does not help you at all. But you know something on which questions

to ask like when you ask “swims like a duck.” But “looks like a duck,” it is a general situation. But, looks like, say, a guy who has this illness,

this disease, it is legal. So, there is a general type of predicate,

“It looks like,” and a special type of predicate which is related to this specific problem. And that is the intelligence part of this

business and that is where a teacher is involved. Lex: Incorporating the specialized predicates. Vladimir: Yes. Lex: Okay. What do you think about deep learning

as neural networks, these architectures, as helping accomplish some of the tasks

you’re thinking about? Their effectiveness or lack thereof, what are the weaknesses and what are the possible strengths? Vladimir: You know, I think that this is fantasy,

everything like deep learning, like features. Let me give you this example. One of the greatest books is Churchill’s book

about the history of the Second World War. He starts in his book describing that in the

old times when a war is over, the great kings, they gather together–and most of them are

relatives–and they discuss what should be done to create peace and they come to an agreement. And what happens in the First World War? The general public came in power. They were so greedy that robbed Germany. It was clear for everybody that it is not

peace, that peace will only last for 20 years because they were not professionals. I see the same in machine logic. There are mathematicians looking for the problem

from a very deep mathematical point of view and there are computer scientists that mostly

do not know mathematics. They just have interpretations of that and

they invented a lot of blah, blah interpretations like deep learning. Why did you do deep learning? Mathematics does not know deep learning. Mathematics does not know neurons; it is just

functions. If you like to say piecewise linear function, say that and do it in a class of piecewise linear function. But they invented something and then they

tried to prove the advantage of that through interpretations, which was mostly wrong. And when it is not enough, they appeal to

the brain and they say they know nothing about that. Nobody knows what’s going in the brain. So, I think it is more reliable to work on math. This is a mathematical problem, do your best

to solve this problem. Try to understand that there is not only one

way of convergence, which is the strong way of convergence. There is a big way of convergence

which requires predicates. And if you will go through all this stuff,

you will see that you don’t need deep learning. Even more, I would say one of the theorems,

which is called Representer theorem, it says that optimal solution of mathematical problems, which describe learning, is on a shallow network, not on deep learning. Lex: On a shallow network. Yeah, the problem is there. Absolutely. So, in the end, what you’re saying

is exactly right. The question is, you have no value for throwing

something on the table, playing with it–not math. It’s like a neural network where you said

throwing something in the bucket or the biological example in looking at kings and queens or

the cells on the microscope, you don’t see value in imagining the cells or the kings

and queens and using that as inspiration, an imagination for where the math

will eventually lead you? Do you think that interpretation basically

deceives you in a way that’s not productive? Vladimir: I think that if you’re trying to analyze this

business of learning and especially, the discussion about deep learning, it is a discussion about

interpretations and not about things, about what you can say about things. Lex: That’s right. But, aren’t you surprised by the beauty of it,

not mathematical beauty but the fact that it works at all? Or, are you criticizing that very beauty,

our human desire to interpret, to find our silly interpretations in these constructs? Like, let me ask you this, are you surprised

or does it inspire you, how do you feel about the success of a system like AlphaGo at beating

the game of Go using neural networks to estimate the quality of a board? Vladimir: That is your interpretation–quality of the board. Lex: Yes. It is not our interpretation. The fact is a neural network system–it doesn’t

matter–a learning system that we don’t, I think, mathematically, understand that well, beats the best human player, that’s something that was thought impossible. Vladimir: That means it’s not a very difficult problem. That’s it. Lex: So we’ve empirically have discovered that

this is not a very difficult problem. That’s true. I can’t argue. Vladimir: Even more, I would say, if they used deep

learning, it is not the most effective way of learning theory. And usually, when people use deep learning,

they’re using zillions of training data, but you don’t need this. So when I describe a challenge, can we do

some problems that you did well with deep learning method, with deepnet, using a hundred times less training data? Even more, there are some problems that deep

learning cannot solve because it’s not necessarily that they created admissible set of functions. To create deep architecture means to create

admissible set of functions. You cannot say that you’re creating good admissible

set of functions. It’s your fantasy. It does not come from us. But, it is possible to create admissible set

of functions because you have your training data Actually, for mathematicians, when you

consider a variant, you need to use the law of large numbers. When you make a training in existing algorithms,

you need a uniform law of large numbers, which is much more difficult. It requires VC dimension and all that stuff. But nevertheless, if you use both big and

strong way of convergence, you can decrease a lot of training data. Lex: Yeah, you could do the three–that swims like

a duck and quacks like a duck. So let’s step back and think about

human intelligence in general. And clearly, that has evolved in a non-mathematical way. Lex: As far as we know, God or whoever didn’t come

up with a model and placed in our brain of admissible functions; it kind of evolved. I don’t know your view on this but Alan Turing

in the 50’s in his paper asked and interjected the question: Can machines think? It’s not a very useful question, but can you

briefly entertain this useless question “Can machines think?” So, talk about intelligence and your view of it. Vladimir: I don’t know that. I know that Turing described imitation–if

a computer can imitate a human being. Let’s call it intelligence and he understands

that it is not a thinking computer. He completely understands what he was doing,

but he set up a problem of imitation. So now we understand it as a problem of not

an imitation. I’m not sure that intelligence is just inside of us. It may also be outside of us. I have several observations, so when I prove some theorems,

it’s very difficult theorems. In a couple of years, in several places, people

will prove the same theorem, say, saw a dilemma after ours was done, then another guy proves

the same theorem. In the history of science, it has happened

all the time. For example, geometry, it happens simultaneously. First is Lobachevsky and then Gauss and Bolyai

and then other guys, and approximately, in a ten-year period of time, and I saw a lot of examples like that. And when a mathematician thinks it, when they

develop something, they develop something in general which affects everybody. So, maybe our model of intelligence is only

inside of us is incorrect. Lex: It’s our interpretation. Yeah. Vladimir: It may be that they exist with some

connection with world intelligence. I don’t know that. Lex: You’re almost like plugging in into… Vladimir: Yeah, exactly. Lex: …and contributing to this. Vladimir: …into a big network. Lex: Into a big, maybe a neural network. On the flip side of that, maybe you can comment

on the big O complexity and how you see classifying algorithms by worst-case running time

in relation to their input. So, that way of thinking about functions,

do you think P equals un-P? Do you think that’s an interesting question? Vladimir: Yeah, it is an interesting question. But let me talk about complexity and about

worst-case scenario. There is a mathematical setting. When I came to the United States in 1991,

people did not know this. They did not know statistical learning theorem. In Russia, it was published in our monographs,

but in America, they did not know, and then, they learned it. Somebody told me that it was worst-case theory

and they will create real-case theory, but until now, they haven’t. Because it is a mathematical tool, you can

do only what you can do using mathematics, which is clear understanding and clear description. For this reason, we introduced complexity. In VC dimension you can prove some theorems. But we also create theory for cases when you

know probability measure and that is the best case it can happen. So from a mathematical point of view, you

know the best possible case is the worst possible case. You can derive different models in the middle,

but it’s not so interesting. Lex: Do you think the edges are interesting? Vladimir: The edges are interesting because it is not

so easy to get the exact bounds. It’s not, in many cases where you have the

bounds are not exact, but interesting principles are discovered the most. Lex: Do you think it’s interesting because it’s

challenging and reveals interesting principles that allow you to get those bounds or do you

think it’s interesting because it’s actually very useful for understanding the essence

of a function of an algorithm? So, it’s like me judging your life as a human

being by the worst thing you did and the best thing you did versus all the stuff in the middle. It seems not productive. Vladimir: I don’t think so because you cannot describe

situations in the middle or it will not be general. So you can describe edge cases and it is

clear it has some models, but you cannot describe a model for every new case. So, you’ll never be accurate when you’re using models. Lex: But, from a statistical point of view, the

way you studied functions and the nature of learning and the world, don’t you think that the real world

has a very long tail that the edge cases are very far away from the mean, the stuff in the middle, or no? Vladimir: I don’t know that. I think that from my point of view, if youwill use formal statistics, you need uniform law of large numbers, if you will use this invariance business, you don’t need just law of large numbers. And there’s a huge difference between uniform

law of large numbers and large numbers. Lex: Is it useful to describe that a little more

or shall we just take it at… Vladimir: No. For example, when I’m talking about ducks,

I get three predicates and that was enough. But, if you will try to do formally distinguish, you will need a lot of observations. So that means that information about “looks

like a duck” contained a lot of bit of information formal bits of information. So we don’t know how much bit of information

is contained from intelligence and that is a subject of analysis. Until now, on business, I don’t have people

consider artificial intelligence. They consider it as some codes which imitate

activities of human beings. It is not science. It is applications. You would like to imitate Go. Okay, it’s very useful and a good problem, but you need to learn something more on how people came to develop, say, predicates “sleeps like a duck” or “fly like a butterfly” or something like that. It’s not that the teacher tells you how it

came to his mind, how he chooses the image. That is a problem of intelligence. Lex: That is the problem of intelligence. And you see that connected to the problem

of learning? Are they? Vladimir: Absolutely, because you immediately give

this predicate like specific predicates “swims like a duck” or “quacks like a duck.” It was chosen somehow. Lex: So what is the line of work, would you say,

if you were to formulate as a set of open problems that will take us there, to fly like

a butterfly, we’ll get a system to be able to? Vladimir: Let’s separate two stories–one mathematical

story that if you have predicates you can do something, and another story on how to

get predicates. It is an intelligence problem and people even

did not start understanding intelligence. Because to understand intelligence, first of all,

try to understand what they will teach us, how a teacher teach, why one teacher is

better than another one. Lex: Yeah. And so, do you think we really even haven’t

started on the journey of generating the predicates? Vladimir: No. We don’t understand. We even don’t understand that this problem exists. Lex: You do. Vladimir: No. I just know a name. I won’t understand why one teacher

is better than another and how the teacher affects the student. It is not because he is repeating the problem

which is in the textbooks. He makes some remarks. He makes some philosophy of reasoning. Lex: Yeah, that’s beautiful. It is a formulation of a question

that is the open problem: Why is one teacher better than another? Vladimir: Right. What he does about it. Lex: “Why” at every level. How did they get better? What does it mean to be better? Vladimir: Yeah. From whatever model I have, one teacher can give

a very good predicate. One teacher can say “swims like a duck” and

another can say “jumps like a duck.” And jumps like a duck carries zero information. Lex: So what is the most exciting problem in statistical

learning you ever worked on or are working on now? Vladimir: I just finished this invariance story and I’m happy that I believe that it is an ultimate learning story. At least, I can show that there are no other

mechanisms. There are only two mechanisms but they separate

statistical parts from intelligence parts and I know nothing about the intelligence part. And if you will know there’s the intelligence

part, it will help us a lot in teaching and in learning. Lex: And we’ll know it when we see it? So for example, in my talk, in the last slide

was a challenge. So you have a NIST digit recognition problem and deep learning claims that they did it very well say 99.5% correct answers, but they used 60,000 observations. Can you do the same using a hundred times

less but incorporating invariants, what it means, you know, digit 1, 2, 3? Just looking on that, explain the vision variant

I should keep, to use a hundred times less examples, to do the same job. Lex: Yeah, that last slide, unfortunately, your

talk ended quickly, but that last slide was a powerful open challenge and a formulation

of the essence there. Vladimir: That is the exact problem of intelligence

because everybody, when machine learning started and it was developed by mathematicians, they

immediately recognized that they use much more training data than humans needed. But now, again, we came to the same story

of how to decrease. That is a problem of learning. It is not like in deep learning, they use

zillions of training data because maybe zillions are not enough if you have a good invariance. Maybe, you’ll never collect

some number of observations. But now, it is a question of intelligence

on how to do that because the statistical part is ready. As soon as you supply us this predicate,

we can do a good job with the small amount of observations and the very first challenges of a long

digital cognition and you know digits and 12 invariants. I’m thinking about that and I can say for

digit 3, I would introduce the concept of horizontal symmetry, so digit 3 has horizontal

symmetry more than digit 2 or something like that. But as soon as I get the horizontal symmetry,

I can mathematically invent a lot of measure of horizontal symmetry or the vertical symmetry

or the diagonal symmetry, whatever, if I have the ideal symmetry. What would it tell us? Looking on digits, I see that it is a meta-predicate

which is not shaped into something like symmetry, like how dark is the whole picture, something like that, which can certify as a predicate. Lex: Do you think such a predicate could rise out

of something that’s not general, meaning, it feels like for me to be able to understand

the difference between the two and the three, I would need to have had a childhood of 10

to 15 years playing with kids, going to school, being yelled at by parents, all of that, walking, jumping, looking at ducks. And now, then, I would be able to generate

the right predicate for telling the difference between a two and a three, or do you think

there’s a more efficient way? Vladimir:I don’t know. I know for sure that you must know something

more than digits. Lex: Yes, and that’s a powerful statement. Vladimir: Yeah, but maybe there are several languages

of description around these elements of digits. So, I’m talking about symmetry, about some

properties of geometry. I’m talking about something abstract. I don’t know about that, but it is a problem

of intelligence. So in one of our articles, it is trivial to

show that every example can carry not more than one bit of information because when you

show an example and you say, this is a one, you can remove functions which

doesn’t tell you one. The best strategy if you can do it perfectly

is to remove half of that. But when you use one predicate which is “looks

like a duck,” you can remove much more functions in half, and that means it contains a lot of bit

of information from a formal point of view. But, when you have a general picture, on whatyou want to recognize and a general picture of the world, can you invent this predicate? And, that predicate carries a lot of information. Lex: Beautifully put. Maybe it’s just me, but in all the math you

show in your work, which is some of the most profound mathematical work in the field of

learning AI and just math, in general, I hear a lot of poetry and philosophy. You really kind of talk about philosophy of science. There’s a poetry in music to a lot of the

work you’re doing and the way you’re thinking about it, so where does that come from? Do you escape to poetry? Do you escape to music? Vladimir: I think that there exists ground truths and that can be seen everywhere. The smart guy philosopher, sometimes

I’m surprised how they see deeply. Sometimes I see that some of them are

completely out of subject. But the ground truths, I see in music. Lex: Music are the ground truth? Vladimir: Yeah. And in poetry, many poetry, they believe

that they take dictation. Lex: So what piece of music as a piece of empirical

evidence gave you a sense that they are touching something in the ground truth? Vladimir: It is structure. Lex: The structure, the math of music. Vladimir: Because when you’re listening to Bach,

you see the structure–very clear, very classic, very simple. And the same it was when you have axioms in

geometry, you have the same feeling. And in poetry, sometimes, this is the same. Lex: Yeah. And if you look back to your childhood,

you grew up in Russia. You maybe were born as a researcher in Russia,

you developed as a researcher in Russia. You came to the United States and a few places. If you look back, what were some of your happiest

moments as a research? Some of the most profound moments, not in terms of their impact on society, but in terms of their impact on how damn good you feel that day and you remember that moment? Vladimir: You know, every time when you found something, it is the greatest moments in life, every simple thing. But, my general feelings most of the time

was wrong. You should go again and again and again and

try to be honest in front of yourself, not to my interpretation, but try to understand

that it is related to ground rules and it is not my blah, blah, blah

interpretation or something like that. Lex: But, you’re allowed to get excited at the

possibility of discovery. Vladimir: Oh, yeah. Lex: You have to double check it. Vladimir: No, but how it’s relates to the ground rules. Is it just temporary or is it forever? You know, you always have a feeling when you

found something. How big is that? So 20 years ago, when we discovered statistical

learning theory, nobody believed except for one guy, Dudley from MIT. And then, in 20 years, it became in fashion,

and the same with Support Vector Machines. Lex: So, with support vector machines and learning

theory, when you were working on it, you had a sense, a sense of the profundity of it, how this

seems to be right, this seems to be powerful? Vladimir: Right. Absolutely. Immediately. I recognized that it will last forever. And now, when I found this invariant story, I have a feeling that this is complete learning because I have proved that there are

no different mechanisms. You can have some cosmetic improvements that

you can do, but in terms of invariants, you need more invariants in statistical learning

organization work together. But, also, I’m happy that you can formulate

what is intelligence from that and to separate from the technical point. That is completely different. Lex: Absolutely. Well, Vladimir, thank you so much for talking today. Vladimir: Thank you. Lex: It’s an honor.

Another great video. Thanks for that amazing content, Lex.

That was real good interview, thanks for sharing

I appreciate you sharing this with us all Lex. Gratitude.

AGI should make games and enjoy music.

It would be very helpful if you added good closed captions. I really can't understand what Professor Vapnik is saying,too much of the time.

Very sad that this only gets 455 views

Isn't it the case that in deep learning we are finding predicates? We don't necessarily go back into the model to see which weights are large, but we can. Is there a branch of ML that is building tools to analyze models?

Very insightful. Learned a lot about ducks

thanks Lex,that was great!

good questions Lex! thanks for sharing!

Great conversation! But I beg to differ with Vladimir Vapnik on the role of imagination in discoveries. Imagination and human intuition plays an active role in extending the existing laws and axioms, and to construct theories to fit observations. What he had worked on might not have required imagination and intuition, but when it comes to theorizing and extending the existing laws, or the language of mathematics itself (or physics) human intuition and imagination will be essential.

Every sub-domain people specialize in will have its own unique demands.

0:00 Introduction by Prof. Lex

1:04 Fundamental nature of reality : Does god play dice ? (Refers Albert Einstein)

1:54 Philosophy of science : Instrumentalism and Realism

4:08 The unreasonable effectiveness of mathematics [1][2]

6:08 Math and simple underlying principles of reality

7:26 Human intuition and ingenuity

8:56 Role of imagination (Refers Einstein's special relativity)

10:00 Do we/ will have tools to describe the process of learning mathematically ? (Refers Hook's Microscope) [3][4][5]

12:16 From a Mathematical point of view : What is a great Teacher ?

13:48 Mechanism in Learning and Essence of Duck (Bumper sticker material. Quack Quack !!)

16:58 How far are we from integrating the predicates ? (Refer the duck content to understand this question)

18:17 Admissible Set of Functions and Predicates (Talks about VC Theory [6])

23:01 What do you think about deep learning ? (Mentions Churchill's book "The Second World War" [7], Shallow Learning [8])

27:57 Alpha Go and Effectiveness of Neural Networks [9]

30:46 Human Intelligence and Alan Turing

33:34 Big-O Complexity and Worst Case Analysis

38:49 Opinion of how AI is considered as coding to imitate a human being

39:44 Learning and intelligence

42:09 Interesting problems on Statistical Learning (Mentions Digit Recognition problem and importance of intelligence)

48:48 Poetry, Philosophy and Mathematics

50:40 Happiest Moment as a Researcher

References :

[1] Wigner, Eugene P. "The unreasonable effectiveness of mathematics in the natural sciences." In Mathematics and Science, pp. 291-306. 1990.

[2] http://www.hep.upenn.edu/~johnda/Papers/wignerUnreasonableEffectiveness.pdf

[3] https://youtu.be/2gtrkxtsQ2k

[4] https://books.google.com/books?hl=en&lr=&id=ISP_gRwuz94C&oi=fnd&pg=PR1&dq=Micrographia+hook&ots=LF1VWdxjQg&sig=Qca7QzxkynZXc4AGy0YldNdQP_k

[5] Hook, Robert. "Micrographia: Or Some Physiological Descriptions of Minute Bodies Made by Magnifying Glasses with Observation and Inquiries Thereupon." Royal Society: London, UK 1665.

[6] https://www.cs.cmu.edu/~bapoczos/Classes/ML10715_2015Fall/slides/VCdimension.pdf

[7] https://www.goodreads.com/book/show/25587.The_Second_World_War

[8] https://files.meetup.com/18405165/DLmeetup.pdf

[9] https://www.imdb.com/title/tt6700846/

Great stuff, thought the editing somewhat breaks the flow. Why not put the whole conversation as is? I like the stutters and misunderstanding of questions type conversation 🙂 There is something there as well.

Lex congrats on getting the part as the G-Man in the upcoming Half-Life film

http://amlbook.com/ helped me understand his discussion on "expressiveness or diversity of functions" and the VC dimension. "Learning from Data" book

I have a feeling this guy doesn't like the phrase neural network. It's nice seeing someone with opposing views for once.

I can't help but wonder if professor Vapnik could have expressed his thoughts a bit better if the interview was done in Russian.

Thank you for uploading such a beautiful interview! I enjoyed this video so much!

With each interview, I'm getting more interested in the subject. Thank you for the great content!

really interesting conversation, thank you!

His comment about music is similar to the ideas in GEB!

I haven't watched the entire video as of now, but on the same subject can you get Chomsky and Norvig.

It would be great if we had similar lecture also in Russian

Ground Truths guide us all <3

I am not sure if we can derive theory of inteligence purely from math. In physics the problems are easier, because we can create meaningful equations, which can guide us. The examples could be Max plank quantization of energy or Albert Einstein retativity theory or Dirac's anti particles or currently string theory.

On the other hand in biology, chemistry, … there is less insight from equations. For example effects of protein folding are very difficult to deduce from equations and we have to use computation instead. The same could be with intelligence that it has mathematical description, but is very messy and does not adhere to our sense of mathematical beaty. This could of course change as we find more connections and built consistant theory, so initially messy ideas become more and more intuitive and beautiful, but the core does not change.

Using beauty and elegance of math as heuristic is a little bit dangerous. For example geocentric theory at the time had nicer description than heliocentric theory. The reason was that we had to made more correction term to heliocentric theory to match the precision of geocentric theory. It was, because they didn't use elipse to describe motion, but instead compositions of circular motions were used. Only after emprical findings of Kepler we switched to elipses.

Another more anecdotal example would be the dynamo theory of WALTER M. ELSÄSSER describing why plantes have magnetic fields. He told his theory to Albert Einstein, but “he didn’t

much believe it. He simply could not believe that something so beautiful could have such a complicated explanation" in words of Einsten assistan (Einstein prefered not to tell his opinion). The theory was correct, Einstein's intuition was wrong. (Source: top of 3rd page of pdf -> http://www.geosociety.org/documents/gsa/memorials/v24/Elsasser-WM.pdf)

Also currently string theory is getting some backlash, because of lack of results despite decade long effort. This theory has some promising connections and seems to be a perfect fit for missing element in our understanding of physics, but there are also some ugly parts, like need for more dimensions or too many possible universes.

So we have to be carefull to not be too much focused on mathematical beauty, nature can just be messy or we might not have a mathematical tools to appreciate it's beauty.

I strongly disagree with Vapnik on his opinion about intuition. He seems dogmatic in his dismissal of the idea, however, through history we have seen a number of human phenotypes that produce significant intellectual achievement. One such phenotype that appears to be convergent in many individuals who have made tremendous achievements and cracked open entire academic disciplines (e.g. Einstein) is that of the visionary. Someone who is able to intimately understand a problem so that they may sufficiently abstract it to allow for giant leaps of progress by using intuition or visualization rather than iterative logical steps. I feel like Vapnik may be more of the literal, autistic type of individual who is very good at specializing and using brute force logic to iterate from axioms to a model within his discipline.

I would not be too quick to discount the role of intuition particularly in the more demanding, technical fields such as pure mathematics and theoretical physics as opposed to machine learning and statistics.

Hey Lex, Thanks for making this content free and accessible online! Very generous and much appreciated.

haha I liked his response to the AlphaGo question!

On the other hand, I think it's missleading. Just like in maths, a problem's difficulty should be gauged by how hard it seems before solving it, not how hard it is in hinsight.

I have to express my gratitude for uploading stuff like this, Thanks so much Lex and thanks to Dr. Vapnik for taking the time to express some of the insights he has gained throughout his life

Beauty and poetry! Again, thanks Lex!

very interesting person

Wow what an interesting conversation, thank you so much Lex for the video, really appreciate it and looking forward to more of such videos, cheers

💖

This was incredible.

Just another day I was thinking about "how come ideas are generated in different parts of the world within a definite time period simultaneously?". Glad to hear that a prominent mathematician thinks the same way (31:34).

It's Platonic and poetic. And I have heard many mathematicians say this sort of thing. Ramanujan is also a great example that makes this theory interesting.

I can't remember the time that I've really enjoyed a great conversation like this one.These are good questions by Lex . And I am so excited and thrilled by the intelligence of Vladimir Vapnik.

So in a way, the problem of intelligence or at least the basis regarding the concept of a good teacher hinges on metaphorical truth and linguistic precision.

what is he saying at 1:35 ? "it is ???? described ", what is ???

is he saying @ 3:11 "setting" ?

what does he says at @ 3.49

"the GOD or GOAL of ML is to learn about conditional probability" ?

I think it's "GOAL" but then the next sentence is about God playing dices.

I think he says GOAL first and then GOD in the following sentence but they sound so similar and they are very close to each other in the dialoge.

Another interesting interview, but I think all of the interviews would be better with fewer leading questions and professing by the interviewer.

Hi Lex, could you please share a link to the presentation mentioned in the dialogue? thanks

God Bless Vladmir Vapnik

Instrumentalism = inductive logic. Realism = deductive logic. The induction to deduction process is pretty much how our brains navigate existence. Induction is basically the gathering and categorization of data (passive). Deduction is basically a conclusion that becomes a principle that dictates reaction (active). Problem is that once a deduction is made, it is hard as pulling nails with your teeth to modify or update the conclusion. Some sort of fight/flight autonomous grip makes our conclusions into a dictatorship incapable of further induction or the addition of new data. This is meme oppression. Me, I'm sticking with the statistical approach as a truer reflection of a changing reality.

Those subtitles should probably "Weak and strong convergence" not "Big and strong .."

NO IMAGINATION!!! lol

24:29 Lays the smackdown on the dilettante and mathematically deficient.

He shot down neural networks even for a hypothetical scenario, lol

Thanks Lex, this talk was amazeballs!

Anyone got what the MIT guy's name was @52:27?

Dodley? Or something

How many of you came here after reading the book machine learning with python by François Chollet

Think like Duck!

I understand some of what's being said here.

1. "I'm not sure that intelligence is just inside of us. It may also be outside of us."

2. "I know for sure that you must know something more than digits."

3. Invariance theory might be the hope of understanding intelligence?

Gold. This is gold. Very nice to hear others perspectives. This guy is stubborn lol.

Wonderful. Thank you.

Keep revisiting this and slowly understanding more. This may be the best podcast on the channel.

He speaks like Dracula after receiving chemotherapy. I can't understand him. He needs speech therapy or a cup of coffee or something.

25:56 Representer theorem says that optimal solution … is on shallow networks, not on deep learning.

I cannot understand why this holds. Can sb explain or give me a reference?

Thanks