Sam Charrington: Hey Everyone!
Last week was the first week of our TWIMLcon: AI Platforms conference, and what a great first week it was! Following three days of informative sessions and workshops, we concluded the week with our inaugural TWIMLcon Executive Summit, a packed day featuring insightful and inspiring sessions with leaders from companies like BP, Walmart, Accenture, Qualcomm, Orangtheory Fitness, Cruise, and many more. If you’re not attending the conference and would like a sense of what’s been happening, check out twimlcon.com/blog for our daily recaps, and consider joining us for week two!
Before we jump into today’s interview, I’d like to say thanks to our friends at Microsoft for their continued support of the podcast and their sponsorship of this series! Microsoft’s mission is to empower every single person on the planet to achieve more. We’re excited to partner with them on this series of shows, in which we share experiences at the intersection of AI and innovation to inspire customers to reimagine their businesses and the world. Learn more at Microsoft.com/ai and Microsoft.com/innovation

Sam Charrington: [00:01:29] All right, everyone. I am here with Gurdeep Paul. Gurdeep is a corporate vice president with Microsoft.
Gurdeep, welcome to the podcast!

Gurdeep Pall: [00:01:38] Thank you, Sam. Really excited to be here.

Sam Charrington: [00:01:40] I’m super excited for our conversation today! As is our typical flow, I’d love to have you start by introducing yourself. You’ve had quite a career at Microsoft culminating in your work in AI and autonomous systems. Tell us a little bit about your background and how you came to work in this field.

Gurdeep Pall: [00:02:02] Thanks Sam. I’ve had a really nice long run at Microsoft, as you mentioned. And in fact, today is my 31st anniversary at Microsoft.

Sam Charrington: [00:02:11] Wow.

Gurdeep Pall: [00:02:12] So, yeah, it’s been a long career, but I really had a great time. In fact I feel like I’ve been into the candy store like three times. So my career can be divided into three parts.
I worked on networking and operating systems. So that was sort of my first gig at Microsoft. I was very fortunate to work on a lot of the internet technologies when they were first rolled out in operating systems. I worked on VPNs, I’ve worked on remote access. And then I worked up to windows XP, I was the general manager for windows networking, where we shipped wifi for the first time in a general purpose operating system. And then at that time I moved over to work on communications and I started Microsoft’s communications business. So these are products that you may remember from the past, things like office communication server, which became link, which became Skype for Business, which is now Teams.
So started that business from scratch, and all the way until we announced teams, in fact, a few days before we announced Teams, I was involved with that business. Though I’d had a stint in the middle on AI and I came back to work on AI. So it’s been, I would say, roughly three parts to my career and the latest being AI.
And I’ve had lots of fun in all of them.

Sam Charrington: [00:03:30] That’s awesome. I talked to so many people at Microsoft too, are working in AI and a lot of them started their careers working on Bing. You’re maybe one of the the outliers in that regard.

Gurdeep Pall: [00:03:43] Well, the funny thing is that first stint had mentioned on AI was actually in the Bing team and I was running Microsoft speech. I was running some of our interesting explorations we were doing at Bing, recognizing objects. In fact, some of the image stabilization work we’ve mentioned to HoloLens actually came out of that group. So yeah, I worked on maps and lots of interesting stuff.

Sam Charrington: [00:04:08] That’s awesome. So tell us a little bit about autonomous systems and some of the work you’re doing in that area.

Gurdeep Pall: [00:04:14] Yeah. So, for the last four years or so, I’ve been focused on emerging technology and how it can be applied to interesting business problems. And, in that regard, I’ve worked on some interesting technology in the language space, language, understanding space. Worked on ambient intelligence where you could actually make sense of a space sort of make reality computable if you will.
And then as I was exploring interesting emergency AI, which can solve business problems, we started focusing on autonomous systems. That was interesting to us, not just as a very interesting aspect of which AI was enabling, but also Microsoft didn’t have a lot of focus in that area before. So, when I talked to Satya and the time Harry Shum was here, we decided this was an area we were going to go invest in.

Sam Charrington: [00:05:04] Interesting. And one of those investments was the acquisition of a company called Bonsai. This is a company that I know well. I interviewed one of the founders, Mark Hammond. This was back in 2017. It’s hard to believe it was that long ago. And the company had a really interesting take on using technologies that are still difficult for folks to put to productive use, namely reinforcement learning.
Their take on it was this idea of machine teaching. Maybe you can tell us a little bit about that acquisition, the role that it plays in the way Microsoft thinks about autonomous systems and elaborate on this idea of machine teaching and some of the things that Bonsai brings to the table.

Gurdeep Pall: [00:05:49] Sure. Absolutely. So, when we started focusing on autonomous systems, we were like trying to get our hands around this thing. People interpret the autonomous systems, many different ways. Some people think it’s only about autonomous driving, so let’s build a vertical stack. Some people think about robots, these humanoid robots with arms and joints and so on.
And we’re thinking, what is our point of view? And, at the end of the day, we look at our own capabilities. We’re a software company, what is a software interpretation of the space? And it was with this sort of point of view that we started thinking about it. There was some work going on in Microsoft research at the time, which I’ll talk more about. And that’s when I first met Mark and team and we had a really good discussion and, as we finished the first meeting, I remember this thing going through my head, that this is like such a great approach. And it really fits into how we are starting to think about this space and makes sense to us.
And then also thought, God, this feels like, just the wrong thing for a startup to do, building platforms and tools. It’s a tough thing. And Mark is such an incredible guy. I think you’ve talked to him, so you know that. So when we first finished the acquisition, he shared that with me too.
He says, every VC I talked to, he says, why are you doing this? This is like the kind of thing Microsoft should be doing. So it was a marriage sort of made in heaven as it were, and C acquired that company. And it’s been really great, actually working with Mark and picking up from some incredible thinking that.
You know, he and Keene had done and the team that was there, and then actually really expanding on that and really helping it realize its potential and also making it much more of an enterprise ready sort of an offering because this space is as mission critical and as important as it gets. So that’s been a very fun journey for the last two and a half years.

Sam Charrington: [00:07:52] One of the ways I’ve heard you describe the way you’re approaching autonomous systems or that world broadly, and its two words and I still may butcher one of them, but it’s like this marriage of bits, and is it atoms that you say? Or molecules, or something else?
But the idea is that,and this was something that was core to the way Bonsai

Gurdeep Pall: [00:08:15] articulated what they

Sam Charrington: [00:08:16] called then industrial AI. It’s a different problem when you’re applying AI solely in a software world,

Gurdeep Pall: [00:08:23] recommendations on a website or looking at

Sam Charrington: [00:08:27] customer churn, to when you’re actually trying to move physical goods or devices or systems. Elaborate on what you’ve seen in terms of the different requirements that come up in that world.

Gurdeep Pall: [00:08:43] Absolutely. This is a very important point, when we start focusing on autonomous systems. I know people asking me about half the time, “oh, you’re talking about RPA, right?”
No, I’m talking about RPA. Of course it doesn’t help when some of the RPA companies were calling their tech robots and, it could take action and so on. So it was in some ways, it was just a way for us to be clear about what we are doing. And we said, no, we’re actually focused on atoms, not things we just deal with bits. Of course, to digitize anything, you have to go from atoms to bits and then reason over it.
But that became sort of the mainstay for us. The biggest difference, I would say, between those two worlds is that there is in the physical world, it is governed by some things like physics. The physical world, of course there’s Newtonian physics, and then you get into some of the multi-joint movements and you get into fluids, that’s a whole different kind of a physics which comes in.
So you have to really think about modeling the real world and how then you can apply the tech towards that. The second thing I would say is that, most of the scenarios in autonomous systems pertain to taking action in the real world. And when you’re taking action in the real world, every time you take an action, the real world changes.
And this is where reinforcement learning becomes a very natural mate as an AI technology for the problems that really apply to the real world, which is great because we have no other science which allows us to take a really sort of an unbounded state space and actually reason within it.
And reinforcement learning becomes this really important piece in it. Lastly, I would say is that, every problem that we’ve looked at from an autonomous system space typically is one where there are experts who exist already. So far we haven’t been called to a problem where this is completely new and completely different and “oh, let’s solve it for the first time,” you know? And so tapping into the human expertise became a very important piece of this equation as well, which sometimes you don’t need to worry about, [inaudible] the data, you throw things at it and then maybe there is judging, certainly, if you want to sort of fine tune the models and so on, but that was another interesting aspect of this.

Sam Charrington: [00:11:11] So we’ll be digging a little bit deeper into some of the technology that makes all this happen, but you started to mention some of the use case scenarios. Can you dig a little bit deeper into some specific scenarios that you’ve been working on?

Gurdeep Pall: [00:11:27] Absolutely. And that’s, one of the things which makes this very, very interesting to me because it’s literally everything you see in the world around you can be a target for some of the technology that we’re building.
Everything from smart climate controls. This is a field, HVAC control is a field that has, for the last 70 years, theres been very incremental improvement. Things like fuzzy logic and stuff like that has been used. And, we’ve seen incredible results using our approach.
There things have plateaued out in performance. We were able to bring a much better performance, so energy savings or better climate control. We’ve seen oil drilling, horizontal drilling from companies like Shell, where you have these incredibly big machines and they look like these Bazookas, and you’re drilling with them.
And these machines need a pretty high level of precision, so great human experts can do it, but you sometimes need more work than you can actually get that many trained experts on the problem. So being able to guide the drill bits through that.
Cheeto extrusion is a very interesting, complicated process. You know, it’s very easy to eat, very hard to make. I always say, I know there are professional chefs out there, but certainly I cannot make the same kind of eggs every morning. Because even that simple task of heating the oil and getting it just right and putting the eggs in, you cannot replicate it every time. But if you’re Pepsi and you’re making Cheetos, that has to be consistent every time. When you open a bag of Cheetos, everybody’s familiar with the fluffiness and the crispness, and so everybody’s a judge and you have to win that every time. So very hard problem, because you have this corn meal, which was mixed with water. It’s impacted by the age of the machine which is extruding, sometimes impacted by humidity, temperature, all these things.
So it’s a highly dynamical system and experts today, they sample and then they tweak, and then sample and then tweak, and they’re really, very stressful jobs of trying to keep that quality right. Otherwise the quality folks will come in and reject the material. So this is a problem we’ve been approved to apply our tools to, and basically consistently keep tweaking the parameters of this process so that you can have consistent Cheetos coming out on the other side.
Chemical process control and other polymer manufacturing. Very, very hard problem. Some of these problems take six months to design the process for producing polymer for a particular grade. And, if you’ve been able to apply this problem, they’re both in the designing and the actual manufacturing process itself.
Our favorite thing is flying things. Bell Flight is an incredible company, they have all kinds of commercial as well as a military applications for their vertical liftoff vehicles and so on. They’re trying to bring autonomous capability to those things. So we’ve been able to apply this towards that as well. So as you can see, anything which has control in the real world where you’re sensing and you’re picking an action, and you’re taking that action sensing again, this kind of a loop exists, this technology can be applied.

Sam Charrington: [00:14:53] It’s been interesting over the past few years, just reflecting on some of the early conversations I had with Mark and the team at Bonsai around. There’s kind of this pendulum in the industry where we started out with kind of, rules, like physics and how things work.
And we’ve kind of early on in the, in applying AI, we throw all those rules away and kind of leaned heavily on data and statistics. And over the past few years, there have been efforts, both in academia as well as what you’re doing, to kind of incorporate the rules and the human expertise back into the equation, without tossing everything that we’ve gained in applying data. One of the interesting challenges, when you layer on the physical world here is simulation, and how do you let an agent explore and learn without destroying helicopters and lots of Cheetos? Share a little bit about the challenge of simulation and how that’s evolved to help make some of these problems more tenable.

Gurdeep Pall: [00:16:01] Yeah. Yeah. I think that’s such an important piece of this equation. Reinforcement learning is great, but reinforcement learning requires many, many, many steps, literally just to get a policy to be robust. You can be six 60 million cranks in before you start to see your policy start to develop at the appropriate level.
So the question is, how do you go do that in the real world. And this is, one of the big insights I think the Bonsai folks came up with, and then this was some work that was happening at Microsoft Research coming at it from a very different direction, but they sort of merge together.   This is AirSim, and I can talk more about that, but the ability to model the appropriate aspects of the real world so that you can actually take action against them, get the right input back, and use that to train the model has been sort of the biggest insights here. Because really, what it says is you’re taking the physical world and you’re creating a mapping of it in the digital world, which then allows you to train the models quickly. And that’s where these simulators come in. Now simulators can be, depending on what they’re trying to simulate, can be very computationally intensive.
And if you are nervous towards equations and things like that, cFDs. These are pretty long running simulations and some are, of course, faster. Now because we are using simulators for training AI, we want to crank this very, very quickly. So sometimes you end up with this problem where the physics, or at least how that physics is approached using these mathematical equations, actually becomes like a big piece of the problem.
And so this is an area on how to take simulation, and how do you mate it with the training of the AI in a way that you can do it fast, you can do it cheap and you can frankly do it in parallel because that is one of the things, we have with some of the RL algorithms now is that you can actually take a policy, the last best known policy, you can explore in thousands of machines at the same time, you can take the samples and come back and update the policy. And then you take that, and again, you fan it out and you’ve got learners which are learning very quickly.  Getting all that figured out is actually one of the big things we managed to get done after the acquisition as well. And it’s all running on Azure and really allows us to do stuff efficiently.

Sam Charrington: [00:18:33] You mentioned AirSim what is that, and what’s the role that it plays?

Gurdeep Pall: [00:18:36] Yeah, so fierce them was a project in Microsoft research, which started off in a team that was exploring drones and how you bring autonomy to drones. And they had very similar experience. This was, I think they started in 2015. They would go out with their drone in the morning and they would come back with a broken drone in the evening and they will have very, very little data. And it’s like, how are we ever going to get enough data to actually get this thing to fly, to do even the basic tasks?
So that’s when they looked at some of the work that is happening in, frankly, the gaming world. And they looked at some of the incredible scenes that could be rendered with unreal and unity and those kinds of things, which, if you’ve seen Forza and stuff like that, I mean, these things start to look pretty real.
And they said, let’s create a simulator for perception oriented tasks, where you can create a scene and you can integrate physics into that scene for the different objects that are involved. There could be a flying object, it could be something with wheels, which is driving, et cetera.   And so you integrate the physics and now you’ve created in an environment in which you can train AI.
Now it could be reinforcement learning where you’re sensing. So, you model the actual sensors inside this virtual environment, and you are able to use that for reinforcement learning and taking actions. Or you can use these sensors that are modeled inside of AirSim itself, and you can just generate lots of data on which you can do supervised learning offline. For both these purposes. So AirSim, they created this tool for themselves and they realized it’s so powerful, so they put it out as an open source utility. So today it has more than 10,000 stars on GitHub. It is really one of the most popular tools because others are realizing that, this idea of being able to simulate the reality is a very powerful approach.

Sam Charrington: [00:20:35] So, can you maybe talk us through for some of the, any of the use cases you described when you go into an environment with a real customer, with, real problems. What’s the process to actually get something up and running and demonstrate value that they can build on meaning concrete value as opposed to theoretical POC value.
What, what does it take to really do that?

Gurdeep Pall: [00:21:02] I think, and this is something that we’ve been working on and we will continue to work on because our goal is to get this to a point where people are able to identify that this is a great tool for the problem that they have. It’s not like some sort of a speculative exploring exercise.
They know that they’ll definitely get the results if they adopt this tool chain and going from there, to actually training the policy and to be able to export the brain, and actually start using it at the real world. That period is pretty short. So this is a journey for us, it started off fairly long.
And now we are at a point where we are focusing on these so-called solution, accelerators, these areas where, the problem is very clear, what we are solving, how to solve it is very clear. And then some of the things that you need, like what simulators do you need sometimes, folks already have simulators, other cases, they need a simulator.
And then the entire thing is stitched together and all they need to do is come in and create the variations for the problem, create the policy, and then go ahead and use it. But this is what is needed to take a customer from, “Hey, I’ve got a problem. I don’t know what this thing does. Maybe I’ll understand that.”
No. Okay. Now I know kind of a problem. I don’t know if the problem can be solved with this or not. So this is what we’ve been targeting. And as we’ve gotten our solution explorations to be very crisp, our own how we talk to customers because there’s, as you’re alluding to. There’s an education thing here, there is a confidence thing here. So we have to address all those pieces and we’re bringing the customers along the journey. The great thing is, customers like Pepsi moment, one thing they thought successful. They looked around the factory and said, I can put this approach on many things and that’s the conversation we’re having right now.
The same thing with Shell, same things at Dell. So, this is the journey.

Sam Charrington: [00:23:01] I appreciate in that the idea that to the contrary of what you might think if you read popular reporting about AI, it’s not like a silver bullet, particularly in this domain where, you’ve got some tool chain and it applies to every problem that any customer might have.
And it sounds like you’re being strategic, selective and building kind of expertise and supporting tools around specific areas, so that, to your point, at when you are engaging with someone, they can have a high degree of confidence that you’ve done this before, you know how it’s going to work and what the process is.

Gurdeep Pall: [00:23:37] Exactly. And the other interesting thing that we found, which is I think a little unique compared to some of the other things we’ve done with AI, is that the experts that we end up talking to in the different industries and these application areas, they have never encountered AI before.
Folks who went to engineering discipline schools, real engineers, not fake engineers like software engineers, like us. I mean, these are like mechanical chemical, what have you. And when they went through college, they did Matlab and they did learn Simulink and so on. And they have relied on a set of tools that have given them employment, giving them career success and stood the test of time. And here, these five guys walked in with a swag and, Hey, we got AI for you and it’s called reinforcement learning. You gotta it’s really awesome. You got to try it. I mean that just doesn’t work. You should really bring them along.
And then they have some real, real things that we’ve had to sort of go and take in like safety. Even if this thing worked, they want to be able to assert that this thing is gonna do something crazy. I mean, when you have that horizontal drilling machine from shell, And I mean, this thing can drill through anything.
I mean, it’s this huge thing. There was a wall street journal article about three years ago when we first did this project with a two years ago, we did the challenge and, for them, they want to make sure that this thing actually is going to be safe and I’m going to create another new problem while it solve one for one.
Yeah. So it’s, it’s been a learning thing for us, but it’s the need for the education, the need for bringing these folks along. And this is one of the reasons we did this project more app, which is this very interesting device. It’s like a toy, basically. It’s the three robotic arms, if you will.
And there’s a clear plate on top. And the task is to balance a ping pong ball on this device, on this plate. Now this problem, of course, they’ll image it. The engineers will go to pin, right? I mean, PID control is something, in college. And guess what? So we said first, let’s start with Pitt. He does a pretty good job.
But then he said, okay, well, I’m going to toss the ball onto the plate and see if it catches it well, turns up it doesn’t catch it. So that starts, then he said, I’m going to add more complexity. How about we try and make the ball go around the edge of the plate. So as the problem progresses in complexity, You now realize that the only way you can solve it is if you had something like our tool chain, which we have with Bonsai, you create a simulator and you have policy that you’re training, and then you’re able to get to that level of performance.
So we did this solely to bring engineers who are used to a particular way along and to start to believe, and to start to get excited about this. So we created the sort of metaphor in which we could connect together with them.

Sam Charrington: [00:26:37] Interesting. Interesting. It reminds me of this idea of, why deep learning is, is so important and software 2.0 and how, what is, where, where it’s particularly powerful is.
In solving problems that we didn’t know how to write the rules for like in computer vision. Like how do you identify a cat versus a dog, right. The rules for that, who knows how to do that, but the neural network and figure that out. And similarly, there is a, a range of problems that PID is easily applied to, but there’s also a level of complexity that it is difficult to apply it to.
And that is where you’re finding. The value in applying RL.

Gurdeep Pall: [00:27:18] Exactly, exactly. And, we’ve you seen that either. They were just too many moving parts. So the folks had achieved automation, but they have not issued autonomy. So either it’s that class of problems, wherever you’re getting traction or that with the existing methods, they’ve plateaued out and performance.
You know, there is more performance to be had, and this is incredible. Like you would think like, we’ve figured everything out, right? I mean, as a society and with all the advancements that’s happened, but HVAC control in buildings, we’ve been able to get startling results. I mean, this is millions of dollars, like on a campus that you can save.
And then also the green benefits that you get from that. So there’s just tremendous opportunity.

Sam Charrington: [00:28:07] So maybe let’s drill into that example more because I do want to get to kind of a more concrete understanding of what is the process look like? I’ve got a data center or physical plant or something, and, I have my HVAC costs are through the roof and someone told me about this AI thing on an airplane. And I called her deep, like, what’s the first thing that I do and how do I get from there to some cost reduction or greater efficiency or whatever my goal is applying some of this. Yeah.

Gurdeep Pall: [00:28:40] So in this particular case, that’s, we’re focusing one of our solution accelerators just on this use case.
Okay. And so we are able to say with very high confidence that. If you can give us this information. Which is typically you can have data that you might have collected because a lot of these are now sort of IOT sort of devices, the data that you’ve collected, we’re able to go from that data to we ingest that.
And then this case, which is sort of another double click on the simulation thing, we able to actually create a data-driven simulator and we are able to now start creating a policy. Now they do need to specify, and this is where machine teaching comes in. They need to specify to us what behavior they are desiring. Which means that, that specification can be, is fairly, flexible. So you could say things like, I want it to be really informed between these times of the day. Or you could say if the outside temperature, which becomes one of the state variables, which goes into creating the brain, if that variable is outside of this range, then I want this kind of a behavior, in somewhere I want it to be cooler and inventory, I want to be warmer.
All those inputs that are there now create a policy for me, which automatically controls the HVAC system, which means turning on the fan or turning on the heat or turning on the cooling and to do it dynamically because once the brain is built, all you have to do is to connect the inputs and the actions.
So inputs is where we are sampling the state. And actions is what you’re saying. Okay. Increase heat, decrease, heat fees, the fan done off the fan, et cetera. And by the way, it’s not just temperature in this case. It’s also the carbon dioxide and nitrogen levels. And so on, all those are making sense and then the actions will be taken based on that.
So that is what the position we would have. And we, again, trying to make it as. Turn key, et cetera, but recognize that every building is different. So every building has its own climate sort of fingerprint. And so there is work required in creating the brains. So you could take a brain off the shelf and use it.
You know, I can’t say whether that would work better. It might have better energy consumption, but then use the people are not as comfortable. So you have to sort of tweak it and the more efficient we can make this end to end thing, but sooner folks can realize the value
and a brain in this case is essentially a model or an agent or something like that is that fair?
Great question. I have had, lots of folks asked me, including bill Gates. Why do you call it brain? and I think it’s a really good question. So the way we talk about it is it’s actually a collection of models. Okay. So. autonomous system tasks, sometimes these be decomposed into different parts.
Like for example, if sort of robotic hand, it had to pick up an object and to stack it, you can pick up, can reach, can be one action. Pickup can be another action in a move and then stack. These are all distinct actions. No, some are pretty easy. You can almost sort of program them, reaching as nowadays, obviously many program depending on the device you have, but some need to be trained.
So now this whole collection of things has to be orchestrated. And the right piece has to be invoked at the right time. And each one of them either is programmed, or this is a model and it’s a deep learning model. The Deanna Lynn Swann, and putting all of it together, becomes the brain. In fact, that’s how the human brain works.
So the name is actually quite great, the visual cortex, and then, that’s the one has a particular purpose of, then it gives us another piece which then does reasoning. And then, you want to take. The action and that invokes a different part of the brain. So that’s why we call it a brain.
And, yeah.

Sam Charrington: [00:32:33] Okay.
Going back to the HVAC example, you mentioned that a data driven simulation, so I’m imagining you coming to my company, I guess since this is my scenario and I’ve got the data center, I probably don’t have a simulation that exists for my data center and HVAC. And so. That’s immediately a big challenge if I need that to train a brain, but you’ve got a way to generate that just from the data that I’ve
collected.

Gurdeep Pall: [00:33:01] Yes. And this was something that we are having to do a lot more of as we are swinging and talking to customers, some have a simulator. Interestingly, now, simulators, as have been used for designing, modeling, testing they’ve existed. But typically there’s been a human on one side of the simulator, driving the simulator for whatever purpose they want.
You know, if it’s flight simulator, you’re, you’re flying it. But for our case, It’s the AI, which has been trained as sitting on the other end of the simulator. And so some cases, we were able to take their existing simulators and to actually change the use case and still make it work okay. In some cases that worked great.
Now, in some cases it didn’t work great because their simulator was designed for really different booklets. Like if you do CFD. the purpose is to model this thing and you have to model it to high precision. I mean, this is going to be, a plane flying through rain. So, it has to be very precisely done, but each crank, they typically have like HPC setups for CFD simulation, but each crank can take so much.
So how are we don’t crack it so fast that we could learn, right. So we said, Well, that doesn’t work or they just don’t have a similar at all, like your case. So that’s where our next step is. Can you give us data? And for many folks, they have the data. If they have the data, then we say, okay, let’s start how we can take data.
And how do we can actually make it into something that we can meet with our system. That worked for certain class of problems. And then we said as a complexity of problems, started increasing, we realized that we need a new trick up our sleeve. there’s a research group as part of my team.
And we started looking at how can we apply deep learning to learn from this data to create simulators there. We ran into the first insight, which is that, deep learning is designed for sort of inference, right? So you run one crank. And you get a prediction and you’re done well. It turns out the real world is not like that.
You know, this real world is modeled with differential equations, differential equations. Basically, you’ve got time and you’ve got this thing, which is continue to change its behavior with time. Depending on the previous state and the actions are being taken. So there’s some work, great work that is being done right now.
And we are publishing it right now. In fact, some of it is already out in deep simulation networks and basically it’s like a noodle competitional fabric where you have, it’s kind of like ordinance where. You have with every crank, you take the output and sort of feed it back into the next time cycle.
Of course you have to have, so the sampling of time can be actually variable. So you have to that neural competition fabric has to do with that, which is a pretty big thing in itself, but it also allows you to have many different components inside the simulation each, which is sort of learning in a different way.
For example, if you’re tossing a ball. The ball has it’s physics. And then there’s the environment that has physics, which is new for me in physics, but turns out the Newtonian physics doesn’t change. You can toss a ball, you can toss up a water. So if you are training those components, it’s give me some of these pre-trained components.
If you will, that can be trained ones, then you can, maybe tweak it based on the, the object will have different physics. But now, so you did this noodle competition fabric, which plays out in time. You are now able to have multiple components and you train this thing. This new architecture we believe is a pretty transformative thing in simulation because it now allows us to offer any complex simulation space.
Which basically has lots of differential equations that are sort of running around inside of it. And we can train it reasonably quickly. Really.  It’s kind of like a graph noodle network because you have time and you have space. If you look at the components that actually make space. So there’s message passing, which is happening between every stage and that allows the learning to happen.
And this backpropagation, which happens in which each of the components, like eventually you’re able to get a trained model, which can run like a simulator. So you stopped at some state to take an action, distinct States changes and you’re able to crack it. So we’re really excited about it. We think this will be a big accelerant in the approach that we have.
Again, we get the data, use it, we can go at it and this similarly, they can also learn from other simulators. So if you have something that is quite inefficient, in terms of competition and stuff like that, this thing can learn of it. And then it can execute very fast. Because once it learns the fundamental differential equations that are underlying, this is just inference.
It’s not doing any kind of a big competition once a string. So that is an area that we’re really excited about right now.

Sam Charrington: [00:38:09] Awesome. So first step is capture some data. Next step, use that to train a simulator using this idea of deep simulation networks, potentially. Then you mentioned kind of using that to create a brain.
It sounds like part of that is you corrected me when I said it’s a model. So part of that I’m imagining is figuring out the right level of abstraction for these different components or pieces. And then individually, I guess one of the questions that I had around that was. And when we talk about reinforcement learning and kind of a academic sense and how difficult it is to put it to use in real world situations.
A lot of it has to do with like carefully crafting this objective function or cost function and all of the issues associated with that. You described what the customer has to do as more, less about describing this objective function and maybe constraining what the solution looks like. Am I kind of reading that correctly?
And maybe you can elaborate on that and help us understand.

Gurdeep Pall: [00:39:17] Absolutely. And you’ve, you’ve hit the nail on head on with reinforcement learning the reward, specification, the reward function that he had, the specification of that becomes the next problem. In fact, we have a very famous researcher at Microsoft research.
Blackford, he’ll tell you that. He says, if you have a problem, And you modeled it as a reinforcement learning problem. You don’t have to, it really gets to the core of it, this thing, which is that getting the reward function. Right. And there’s lots of funny stories about bad reward functions and unintended consequences, but we ran into that and they still allow that in our tool chain, you can specify the board function, but now we are actually.
The machine teaching, we read exploring what are other ways for an expert to describe what they want done and we’ve come to the concert or goal. So they specify the goal, using a particular approach, the semantics of which are contained within the problem and the environment. And we will automatically generate the reward function.
Under the covers based on the goal. And we found this to be a very, much more approachable thing for, for our customers. In fact, a lot of our new engagements with customers, most of the time we ended up using goals. So that’s been, you know, and like I said, you know, we’re on this learning thing ourselves.
And, you know, we’re seeing what’s working, what’s not working how to enhance it and move from there.

Sam Charrington: [00:40:45] And so some of these like classical challenges with reward functions, like delayed attribution and things like that, that you see in reinforcement learning does goals as an approach. Side skirt those in some ways, or are those still issues that you see in the autonomy systems world?

Gurdeep Pall: [00:41:06] Yeah. I mean, those are still issues we see and separately the algorithms are getting pretty good too. So he, you know, there’s an active area of research and better algorithms coming up. we are, you know, we are, we stay on top of that and be an incorporating more and more algorithms now into our tool chain because there’s some albums.
Better suited for certain class of problems. Others are better for suited for another other type of problems, which then of course moves the problem to the next layer, which is which one do you select for? Which kind of problem. And you don’t want, obviously folks who’ve never done programming or AI to say, Oh, you tell me, do you want SAC?
Or do you want this. No idea. Right? So we are also trying to put in that intelligence, so that it’s a, it’s a meta reasoning thing, which says, you know, given this kind of a goal, given this kind of a problem, and this is a sampling rate. So state space let’s automatically select the best algorithm. And we will use that for training.
So, you know, nobody ever has to know, like, you know what craziness you had walked under the covers, but staying on top of this has been a really important piece for us. You know, we use this framework called re which has come out of a lot of the book please. you know, still can source Facebook. We are one of the.
Big users of it and contributors for it now, in fact, the rate team 13, which is building that my team in Berkeley are literally in the same building on one floor apart. So there’s a lot of good intermingling there as well. So because we using that framework V relive is how people are adding more and more algorithms, you know, being able to really tap into that and what we find, of course, sometimes, you know, people will write an algorithm to publish a paper, but it’s not really Production grade. So then these come back and do our own implementation of it and contribute that.

Sam Charrington: [00:42:54] So, kind of in this journey, we started with data, we built a simulation, we built a brain out of that simulation. Then that brain is able to then help me control my data center. HVAC. I’m imagining in this scenario that, you know, I still care about the safety issue that you mentioned.
Maybe not, you know, it’s not a drill, that’s going to destroy my data center, but you know, I don’t wouldn’t want the policy that you recommend to decrease the life of my coolers or chillers. And then there’s also maybe explainability issues that arise. Like, why are you telling me to, you know, my HVAC engineer has always set the XYZ at six and you’re saying it should be at eight. Why is that?

Gurdeep Pall: [00:43:40] Yeah, no, this is, it’s such a great topic. And, I’ve talked to my team and given my, experience at Microsoft. I remember when we were building windows NT and putting, networking into it. And so on, we had no idea how stuff was going to be attacked when the internet was starting out In fact, I was the development manager for the TCP IP stack for windows from 95 to 2000. I still managed to keep some of my sanity, but I can tell you, there were folks on my team who really were pushing 20 updates a week because we were starting to get attacked with every layer bottom of the network, moving its way up.
All the way up into sockets, you know, all the tear drop API’s and all that. And then when they got to the top layer, that’s what is really started the most sophisticated attacks. That’s where I don’t know if you remember back after windows XP shipped the entire team took one year to harden the system.
Because it was no longer just my problem as the networking guy, it was everybody’s problem. People who do buffer overruns and they would insert code and all that. So literally every component had it So the reason I’m telling this story is that I think that safety is a problem like that. And when we came into it, Hey, we got really good control and I can show you it better performance, but then there’s all this hidden stuff that you have to deal with. That’s been a big realization for us. it’s a multifaceted approach. So the first thing is, you know, you talked about like the wear and tear of the machine or breaking it down. A bunch of our use cases right now with customers are with those are factored in, and actually they’re factored in at the time of the teaching.
So when you talk about the state space and something that has to be specified so that the policy is taking that dork out, so that component gets handled. The hardest safety things that are, there are like when the brain is operating, like, are we really at the mercy of the, sort of a deep learning model, which is going to say, take this action.
And then, you know, the consequences of that are actually out of scope for, for, for what we’re doing. And this is where we started, you know, this is going to be ongoing work. This is never done. You know, kind of like what cyber security right now, we’re learning. It’s never going to be done, but we want to take some pretty concrete steps. So one really important work. And there was a newspaper that is published on this is that he developed a policy and the policy suggests an action. What do you do is you introduce another layer after that to decide if the action is a safe action or not. Now what goes into deciding, is it a safe action or not?
Can be many things can be predicate logic. It can be temporal logic, you know? So you can pretty much assert no. Yes, because it is outside some range or it actually can be trained things itself. Like imagine adversity. Models which go into that component. So now when you are specifying in machine teaching right upfront, you can now start to insert ways where, you know, safety can be specified and that actually follows a very different path. Some of it will actually follow the path of the policy building itself because some things can be caught there, but other things are actually more brought into bear at operation style. And that is very important because, you know, you probably heard about some of the discussions on how like level five autonomy is going to be rolled out in cities.
And they’re saying, you know, these bus lanes and stuff like that. And I think it’s a wonderful idea because you’re solving the other side of the equation, which is you can control. So imagine like, you know, I always talk about this example and my team just sort of looks at me strange. So imagine you have the sort of armed robot and it is working the space with humans, also working.
It is very common. You see this in machines in factories, they will have a red line or dotted red line around the protection. And the humans know they’re not going to go there. And now you’ve created a rule which says, regardless of what policy, what action, the policy tells you, if it is outside of radial, whatever distance that is.
You will not take that action. So you’ve created an environment in which humans and this armed robot to swing around can actually co-exist in the same place. So it’s a very pragmatic approach, but it has to be part of your solution. Otherwise you don’t, the engineers are right. I mean, these crazies are showing up with reinforcement learning and it’s going to create all kinds of issues for, for us safety issues and so
on.

Sam Charrington: [00:48:33] Yeah. I love that analogy and just taking it one step further. It would be a lot more difficult to build into your kind of motion trajectories, for example, a way for this arm to avoid a human that stepped into the zone, then building something that determines that a human has stepped into the zone and just shuts everything down.
And I think what I’m taking away from what you’re saying here is that. Safety is a multi-layered problem. And it’s not all about kind of making the neural net responsible for everything it’s about identifying, you know, how you can enforce safety in these different levels. And thinking about it as a system, like from an engineering person. Right.

Gurdeep Pall: [00:49:16] Exactly. I think that has been a big learning for us as well, that, you know, it’s not just resolved the hardest they have problem and suddenly, you know, everything and they will come, right? No, you have to really think about it that way. And I think this, you know, the safety layer, which evaluates after every action is recommended, you know, it has to be this amazing, like.
This is where a lot of the new capabilities will come in in the future adversity stuff. But you can imagine a completely separate model, which is basically trying to, this is going to give you this one or zero. If anybody human has stepped into the red line, it is going to give you a one and it shut off.
Right. And that keeps improving the perception and things like that. So, yeah. So it is, it is a system thing as you, as you know, that’s, that’s very good to think of.

Sam Charrington: [00:50:03] Right,
right. So maybe to help us wrap up. It’s the very beginning of 2021 autonomous systems is a. Kind of a broad area, where do you see things going over the next few years?
How does this all evolve?

Gurdeep Pall: [00:50:18] Yeah. You know, we believe that we’re entering the era of autonomous systems and you know, it’s always hard to predict, right? This is famous billboard thing. Prediction is hard, especially about the future, but, you know, I remember looking on windows, NT, the networking of the internet, you know, these things just, they explode.
And some right elements have to be there for this explosion to happen. And I think with the breakthroughs in AI, with the focus on solving business problems in a complete way, like we talked with safety with the industry coming along, like, you know, we’ve been spending a lot of time on data during simulators, but we believe that the simulation industry that is there, you know, we really want to partner with them.
We’ve got great partners with MathWorks, you know, with you to bring them along. So that. Together. We can create an end to end tool chain in which these autonomous systems can be created without, you know, requiring, you know, the level of high level of expertise. That for example is going into a lot of the autonomous driving.
I mean, the teams that are building this dominance, driving stacks are just super deep driving. There’s super experts and they’re building it all in the sort of silo way, very vertical way. We want it to be horizontal components. Then you’ll have some of the vendors of autonomous systems where anybody can come in, they come and describe the problems.
They’re able to create the brain and employ it. That’s going to explode the number of autonomous systems that are out there. And I think this is great for many different things, including our climate, including, you know, resilience that we’ve seen during COVID where logistics and these things just have to continue.
Production has to continue. So I think now’s the time and, you know, I think it’s going to happen.

Sam Charrington: [00:52:05] Awesome. Awesome. Well, good deal. Thanks so much for taking the time to chat and sharing a bit about what you’re up to there.

Gurdeep Pall: [00:52:13] Totally my pleasure. And you know, you have a great podcast, so it’s great to be here talking to you about my stuff.

Sam Charrington: [00:52:25]Awesome. Thank you. Thank you. Take care.
All right, everyone. That’s our show for today to learn more about today’s guest or the topics mentioned in this interview, visit stage.twimlai.net. Of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite pod catcher. Thanks so much for listening and catch you next time.