title One Brain, Any Robot: Skild AI's Skild Brain Explained - Ep. 295

description What if one AI brain could run every robot on the planet—a humanoid, a warehouse arm, and a dog-like inspection bot—all at once?

That's not a thought experiment. That's what Skild AI is building right now.

Deepak Pathak (CEO and Co-Founder) and Abhinav Gupta (President and Co-Founder) of Skild AI join the pod to break down Skild Brain—a universal, general-purpose AI model designed to power robots of any form factor, tackling any task, from a single shared intelligence.

pubDate Wed, 22 Apr 2026 15:45:00 GMT

author NVIDIA

duration 1787000

transcript

Speaker 1:
[00:00] Robotics is a data problem. Unlike language or vision, there is not much data in robotics. There is no internet of robot data. So if that's the scenario, we cannot pick and choose which data we use. So we go in a most general fashion, every single instance of our brain which we deploy for any kind of task or any form factor that contributes in making the brain better for the future scenarios.

Speaker 2:
[00:28] Welcome to the NVIDIA AI Podcast. I'm Noah Kravitz. I'm here today with Deepak Pathak and Abhinav Gupta from Skild. Skild is a robotics company that's building the Omnibrane, a universal brain that can power robots across any form factor to tackle any task. It's amazing stuff. Very excited to find out about it from the source. And so let's get into it. Deepak, Abhinav, welcome. Thank you so much for joining the AI Podcast.

Speaker 1:
[00:55] Thank you so much for having us.

Speaker 2:
[00:56] So Deepak, maybe you can start and tell us a little bit about the company, about Skild, and then you can both talk a little bit about your roles.

Speaker 1:
[01:02] Yeah. So at Skild, as you mentioned, we are building a general purpose brain. So we call this Omnibodied Intelligence. Any robot, any task, one brain. So think of like what GPD is for language. We are building a general brain for any physical device or any kind of robot. So this is absurdly general. Like you can have a humanoid or a dog-like robot or a robotic arm on a conveyor belt, all being controlled by the same shared brain, shared intelligence behind the scene. So why do we go so general? And the reason is robotics is a data problem. Like unlike language or vision, there is not much data in robotics. There is no Internet of robot data. So if that's the scenario, we cannot pick and choose which data we use. So we go in a most general fashion, every single instance of our brain which we deploy for any kind of task or any form factor that contributes in making the brain better for the future scenarios. So this is the main goal behind this. And personally, in my role, we both have been professors before this. So we are extremely technical. We have been involved in bringing up these technologies in the robot learning area for the last decade and more. So our role is both on the technical side to make sure that these things get built, and they are super general, transferable. But our focus is also a lot on deployments.

Speaker 2:
[02:35] Right.

Speaker 1:
[02:36] We do not believe deployment to be a, it's not hindsight scenario. Like for instance, in case of ChatGPT or language models, folks did research for several years, but once it was ready, you have million users in seven days. Maybe one day, I don't remember. Maybe 100 million users in one month.

Speaker 2:
[02:54] Right, fastest growing product.

Speaker 1:
[02:55] Physical AI is not like that. The things takes time to deploy. So for us, deployment is our first priority from day one.

Speaker 2:
[03:02] Yeah, makes sense. And you mentioned being a professor. You're at Carnegie Mellon?

Speaker 1:
[03:06] Yeah.

Speaker 2:
[03:06] And the company is based in Pittsburgh?

Speaker 1:
[03:08] So company has HQ in Pittsburgh, but we have offices in Pittsburgh. Now we are also in Bay Area, in the San Mateo area.

Speaker 2:
[03:14] Great.

Speaker 1:
[03:15] And one office in India, Bangalore.

Speaker 2:
[03:16] Fantastic. And Abhinav?

Speaker 1:
[03:18] Yeah.

Speaker 3:
[03:20] I think one thing which I want to start from is like, the reason we are actually so excited about this is because we are almost rethinking the way robotics is done traditionally. Traditionally, robotics has been a very classic, like a vertically oriented field. So what that means is, if you think before this AI era, you first decide what vertical you want to place the robot in. So let's say I want to build a welding robot. Now you go and start making your hardware, which is very specific to welding. You start making your software, which is very specific to welding. Now the problem with these kinds of deployments has been, is it's very easy to get the first 80% or 90% of the performance. But then you hit this wall, which is called the corner cases in the physical world. There are so many corner cases in the physical world, like someone might leave a package in front of you, and now it becomes a corner case and so on. That is why, if there is a corner case, now because you are at 90% performance, you will still not be able to get it completely automated. Human still needs to be around to make sure the corner cases are handled and so on. And that is why it has not been traditionally, robotics has not really gone big mainstream essentially. Now, however, things have changed when AI came in. Like if you think language also, before this whole came in, was very verticalized. There were different companies building chatbots. There were different companies building search engines. But once LLM came in, they became the horizontal platform. And now, everyone is building on top of that horizontal LLM platform. That is exactly how we are now thinking about robotics. We are building this horizontal general purpose brain. That will, and this general purpose brain can then be fine-tuned for different verticals essentially. And our thesis is that if there's a corner case of one vertical, becomes the central case of the other vertical. So now, the data is from everywhere. And so now, it will be able to handle these corner cases through data play with the different verticals. In terms of what Deepak was talking about, I mean, we are definitely very similar in that profile, because both of us are professors. So we do not divide our work like, oh, I do business and you do this kind of stuff. We are more think of it as extension of each other's brain and thinking about it, strategizing about it and the whole, and really, really focusing on deployment.

Speaker 1:
[05:38] Humans are limited in the sense, we cannot enter each other's brain. We are fusing the only body intelligence in the human way.

Speaker 2:
[05:46] I have a feeling from talking to you guys for five minutes, that you might be closer to fusing brains together than you realize. I don't know, you seem to be on the same wavelength. What was the inspiration? I mean, you discussed in some ways the inspiration for Omnibrain building that horizontal platform. But were there deficiencies or gaps that you saw in existing robotics foundational models or what was really the impetus to say, hey, we need to go do this in a different way?

Speaker 1:
[06:16] I think if you look at the current systems, I think Abhinav already alluded to it. In a way, when the robots are currently deployed, they behave more like machines, right? So everything is measured, everything like in factory setups, everything is. So for instance, if you look at a classical automation line, you will have a robot, but around the robot, you will have a big cage, everything will be measured very precisely. The whole setup may cost several times more than the robot itself. Then if anything were to change, you have to redesign the whole setup. And then people talk about consumer applications, where things change, let's say your home, right? You don't, you can, no matter how many sensors you put, you cannot measure everything, single thing to 0.1 millimeter accuracy. Sure, right? So this whole paradigm of robotics has, the main shift in robotics has happened going from this programming in the behaviors to learning the behaviors, which means you learn that from data. So now the engineering part has gone from, okay, how should my robot move, what failure may occur to thinking where the data will come from, or how can I make it high quality? How can I get it at scale? And that's where the shift has come. So we saw the shift in academia, like we began seeing results one after another, like we could get a result today and demo live demo in a conference the next week. So for us, it was like either we bring it to the masses or we are the ones who just get eventually replaced by it in some way. So it was just a no-brainer for us that this is the future of robotics. I think this realization is also happening at the same time in the general field. You can see the excitement around physical AI in GTC. We are working with several major players in this space to bring this. So this is not really, oh, this happened, hence this should happen. This is the way to scale. If you do not do this, it is almost impossible to scale the way how things have been in the robotics space.

Speaker 2:
[08:30] I noticed on your blog, on the website, I was reading an article about training on video data. Can you talk a little bit about the benefits and why you're training on video data? Is that the primary way, the only way you're training your robots or are you bringing data sources from other places as well?

Speaker 3:
[08:52] When it comes to robotics, we have multiple choices when it comes to data. So there are three main sources of data. The first source of data is videos, or maybe let's start with the robot data itself. So now the way you will do it is you have to collect robot doing a task and that data itself can be used to train the robot. However, this is very hard to scale because you're collecting data with robots. So for every data point, you need a robot, you need humans to control the robot because currently robot and we call this tele-operation. So you have to collect data with tele-operation. The good thing about this data is it's the richest form of data because robot itself is doing the task, so you can read all the sensor values, you can read all the motor commands that are going in the robot and so on. The problem with this form of data is very hard to scale. When it becomes hard to scale, it's very hard to learn large scale AI models on top of it. The second form of data is something like videos. Now in this case, you are that there's huge diversity of the data, because we are collecting videos in US, people are collecting videos in India, China, everywhere. So you have huge diversity of the actions everywhere and so on. So this is a scalable form of data, highly diverse. But the problem with this form of data is that it's not rich enough. You do not know what exact actions, what exact forces people are applying to do it. Then there's a third form of data, which is the simulation form of data. Now, in this case, it's highly scalable. Simulation is as scalable as it gets. You can collect trillions of examples in a day, for example, and so on. It is also, you can measure all the forces in a simulator and so on. But the problem with simulator is, there's always what people call sim-to-real gap. Simulator cannot be exact replica of the real world. There's always some difference. So now you have to bridge this sim-to-real gap either through algorithms or some other data and so on. So for at Skilled, we use actually all three different forms of data. We believe every form of data is critical because every form of data is complementary to others. Like, I mean, if you think videos are scalable and diverse, simulation is scalable but not diverse, and then the third one is the robot data, which is the richest form of data. So every form of data is useful. But some data has different metric. Videos is not as good quality for robot training as, for example, the real world data. So what we do is we use the video data to pre-train our models. This is the data that is available in billions already. So we can pre-train our models to build a model. However, the problem with videos is, if we can learn everything from videos, Deepak, this is a great example that if we can learn from videos, all of us will be Federer. Because we will watch Federer and we will start playing like Federer and so on. So that's never going to be sufficient. Just watching videos is not going to be sufficient.

Speaker 2:
[11:44] If it was sufficient, I could dunk a basketball, but I can't.

Speaker 3:
[11:46] Exactly, we cannot here. And so that is where for us, simulation comes into play. We get the idea of what the task is, what the action is from video, but then we practice it in simulation. We robustify it in simulation. But again, simulation is, there's still a gap. Remember, sim to real gap still exists. And now we take this model, which has been pre-trained on videos and simulation. But before deployment, we post-train it on the real world data, on the small amount of real world data that we can collect in factories or whatever task we are trying to solve. And that makes it precise and help it solve. So you get the robustness from this pre-training data, like the corner cases. Remember, I was talking about these corner cases. Those videos and that simulation helps you to robustify. And to make it precise is where the post-training data comes in.

Speaker 1:
[12:35] You can also find the analogies with language. I think AI has been mainly successful at a massive scale for language data. But the same recipe is there. Like you have this, when you are building this general model, you go general first and then you go specialized model. The general model is training on all of internet data from different sources, different articles. But then let's say you are open AI, you build charge GPT and then Amazon comes and say, Oh, I will deploy your robot, so your model in my amazon.com website. Then you will take that model and you will fine tune it. Right. Right. And then you deploy it. So then data from just amazon.com will be very high quality for Amazon, but very low in amount. Sure. So it's used for post-training. Internet data may be low quality because people are saying different things, and maybe there is junk text many, many, many places. So it's low quality, but at massive scale in pre-training time. So this separation of pre-training and post-training is how the current AI revolution is governed. Even at NVIDIA, you have chips for inference, you have chips for pre-training, and this is the same separation we are building to robotics, and which is why we are seeing this immediate access to variety of applications, which you would not have otherwise.

Speaker 2:
[13:56] You've talked about this a little bit, but maybe to put a narrative around it for the viewers and listeners, can you talk about what it takes, the process of building, testing, and deploying, bringing to market something like the Omnibrane?

Speaker 1:
[14:14] So, it's a very complex question, because it really depends on the scenario, right? Like in language, it's very easy, because you can ask a question, it's just prompt does everything.

Speaker 2:
[14:22] Oh, sure.

Speaker 1:
[14:23] So, the general recipe which we are going towards is that, the behind-the-scene brain is shared, okay? So, any single action you will take will improve the brain. Now, how do we orchestrate the deployment of this brain? So, the idea is, let's say if you have some task, if we have seen that task before, let's say if it's a task of moving around, or walking, or jumping over things, we can do that already very well. So, in that case, you can just take the brain, put on the robot, and it will just work off the shelf.

Speaker 2:
[14:54] Right.

Speaker 1:
[14:55] Then you can build applications on top like, okay, I want to use the robot for taking a selfie or security inspection. That's the second part, right? Sure. But let's say now you go to a different task, where the robot is, I don't know, like assembling a GPU on a conveyor belt. It's a super different task compared to what people generally do. Even humans need training. So in that scenario, what we do is, on that robot, we may collect data for a few days. Okay? Either do that, or if you already have the assets, then we'll get data in simulation, either way. Then we use that data and we post-train the model, and then that model takes over and it turns out on the robot directly. Okay? So in this case, now what you have done, you have bridged the gap between what you saw before to a very different task by adding data from the actual task. So it's called domain-specific data. Now, as you deploy more and more of these robots, imagine you are getting a fleet of specialists, which all came from a generalist. So it's very much like when you're in high school, you know many subjects. I did PhD, I barely know any chemistry, physics at this point. But I needed that to get to get the knowledge right now. So then when you have this specialist, then the data can pull back from all of them and come to the same brain behind the scene, which is not what happens in humans, but we can do it in a computer. Now this happens, now when you have the next task to go to, you may need, you will need less data for the next task. Now this act as a, this is what we call, in other words, a data flywheel. You may have heard this term for self-driving, like humans drive cars. So this data flywheel, now we orchestrate this across Vertigo. So you start with factories, they act as data flywheel for semi-structured scenarios, like hospitals, grocery stores, I don't know, like hotels. Data flywheel from there helps you get to the ultimate challenge, which is like homes, consumer robots. So this is basically how we are orchestrating the... So self-sustaining data flywheel loop from every development. And this is why you can probably understand now, why do we have omni-bodied brain? Because you want to take benefit of every single data point and use it for the next complex task.

Speaker 2:
[17:18] And does the same concept apply to different form factors?

Speaker 1:
[17:23] Yeah. I mean, on factory, it's a robotic arm. In home, probably some humanoid or some other form factor. For security and inspection, with dog-like robot, delivery, a different form factor. So across form factor.

Speaker 2:
[17:34] So I want to ask you guys a little bit about how you're using NVIDIA technology and specifically around synthetic data and simulation, as you mentioned. But really, just kind of open-ended. What NVIDIA stuff are you using and how does it fit in?

Speaker 1:
[17:48] I mean, so our company is two and a half year old. But I have been working personally with NVIDIA, I think since 2018, like not at NVIDIA, working with them. So there is this whole suite of simulation, like Isaac Sim. Back in the day, there was physics and Isaac Jim. We used the physics component of that to really create these gazillion scenarios on which we can try and practice, like what Abhinav was describing, practicing and learning. So we are basically the OG user, and we are now working with NVIDIA on like Newton as well. And in fact, we are co-developing better physics solvers. We will open-source them together. That's one collaboration, the simulation side. Second side is the video models, like the Cosmos and other models. So we use them to data augmentation. Like every data point, you can get that and you can create multiple variations with these NVIDIA AI models. So we leverage, we partner on that front. And I think the biggest of all is the whole compute platform. Because robots are the next-generation device. And the solution that worked for LLMs of big GPUs in servers, it will look very different for a robot. Because a robot doesn't have time to connect to a server if it's falling, react immediately. So on-device edge compute, this is where we are partnering as well.

Speaker 3:
[19:21] Excellent.

Speaker 2:
[19:22] So when you're testing Omnibrain, maybe when you're using it with a new partner or developing a new feature, do you have a go-to test case, a go-to scenario that you put it through? Or walk us through what that's like testing something before you're ready to play it.

Speaker 3:
[19:39] Yeah, I think that's a great question. Although this is also very hard, because that's a problem, is something general purpose, right?

Speaker 2:
[19:46] Yeah.

Speaker 3:
[19:46] And that's what Deepak was talking about, a general purpose brain. Now, if you're fine-tuning it for something specialized, like bringing a special brain, should it forget the general part of it? Does it matter the general part of it or not? It probably does not matter, but then it matters if there was a corner case that was coming in and so on. So those are the kind of things that matter. So this is why we have been trying to develop a very specific strategy of testing these out. So the first thing, of course, we have to test out is on the task itself. Let's say we are putting, let's take the example of the GPU that we have been working with NVIDIA as well as a partner as well. Like putting a busbar on a GPU rack, on a server. Now, there are two requirements. First, it has to be put properly. So that's the accuracy part of it. Then how much time does it take you to put? If it takes you one day to put one busbar, that's not good enough for any deployment and so on. So our testing has these KPIs that we first test on. These are the task-driven KPIs that we are trying to match and so on. But just doing KPIs is not sufficient, because that is where the whole idea that 90 percent is done through KPIs or 95 percent is done through KPIs, but the rest of the 5 percent is also what matters, and that's where we go and test for generalization. We say, okay, what if someone left a box here, or what if somehow the lights were completely off, or we change these conditions, and we have these set of conditions that we want to test in. Even if these things happen, the robot will either continue to work, but still be safe. That safety is the third aspect of it as well. In all these conditions, we have to ensure that the robot is safe, and it's not doing any unexpected behavior and so on. So we basically have this whole pipeline where we first start from task metrics, then generalization metrics. If things go wrong, this is something which you're not expecting, but you still want your robot to be robust to those kinds of things. And we have a whole list that we develop before we deploy. These are the things that we want to test on when it comes to generalization. And last is the safety, that in no scenarios that you should break the safety violations and so on. So we put something called safety guardrails also before the deployments that ensures that, let's say, somehow someone broke the wire and cut the camera wire. Because now the robot is blind, it doesn't see anything. So that's a safety metric that we need to make sure that now the guardrails come in and say, okay, if I'm not seeing a camera, either I should stop or at least I should not cross the boundaries that I have been given by those things. So these are all the things that you have to test for. Again, the problem with the physical world is that it's not like an overnight sensation that you can become, you put it on a web page and now everyone can access it and so on. We have to go through very rigorous test before we can put anything online for deployment.

Speaker 2:
[22:36] This is one of my favorite questions I always ask as we start to wrap up. What do you think the future of robotics looks like? We try to put a timeframe when next year, next two years, things are moving so quickly these days. Particularly as you're talking about with physical AI, the embodiment of AI is really this year. In particular, I think we're seeing so much more of it. But how do you see robotics developing in the next few years, five years, whatever the right time frame is?

Speaker 1:
[23:08] I think in the longer timeline, we will be able to automate every single action that humans can take in the physical world. Because we are following the approach which is very similar to how this actually things happen in nature. Now the timeline, and in some sense, the longer you go, the more you realize that this is the way to achieve general intelligence. Currently, what we have so far, all the results in language models, vision models, it is all what people call digital intelligence. But digital world, if you think about this, it's not more than 50 years old.

Speaker 2:
[23:48] It's a good point.

Speaker 1:
[23:49] Where humans not intelligent before that. So this is the longer-term vision. Now, how does this orchestrate? Well, in our opinion, you will start to see already things getting automated with these models in a very short horizon. But high-complex, repeatable, maybe less variable scenarios first. So it's what we call semi-structured, like industrial task warehouses. They act as stepping stone, I was saying earlier, to get to more semi-structured scenarios. This is a spectrum. Structured is like everything is mapped, like a microwave. Inside microwave, you don't really care, you don't put your hand when it's running. It's a completely separate system, right? Other part is home, which is completely unstructured. It's a spectrum. So in this year itself, we will start to see deployments in factory, warehouse, around people that bootstraps the next one, like hospitals, hotels, service industry, that bootstraps the ultimate consumer robots. It's very hard to break the timeline for the ultimate home robots, but you will start to see robots for sure, and you're already seeing that happening in this year or in the next couple of years.

Speaker 3:
[25:07] I think in the longer run, we all agree that robots are going to be everywhere, going doing every task, and I think everyone agrees. And so shorter term also, we all, at least in the company, we are all in agreement that this year, we are going to have the structured places like factories and warehouses being more and more automated, like the penetration will start to happen by the end of this year, more and more penetration. And it's a middle which is unclear, and that's where we always have a betting pool inside a company also, like gelato beds and all these kind of beds that we keep going on. Then when will these things come in to play? Everyone has a different view. Like some people believe that home robots might still come in 2-3 years, but then some people are arguing that 2-3 years is still very hard. I mean, we have to be honest, and we have to say the kind of uncertainty that can happen in the real world is very, very high. And while you're seeing so much hardware in humanoid space also, are these hardware reliable to be even put in homes today? Like no one has put them, because safety again is a big issue. Like when you're putting them in home, what if it falls? And there's a child around and something like that, right? So we have all these kind of, within the company, all these pools going on and so on. And I think both of us are kind of like agree on the short term and the long term, but it's written there, no one knows. And we are just figuring it out. Okay. We are playing it as long. The interesting part is it's very surprising how it's playing out. I mean, from the AI perspective, when I was doing my PhD in 2008, would have never guessed where we are in AI. And it actually continues to surprise even more and more. Like if you ask me three years ago where we would be today, that also is very unsurprising. And so the progress of compute and the hardware costs coming down has just made this all so surprising that I would say even the experts like us who have been working in this for 20 years are scared to say anything online.

Speaker 1:
[27:08] Probably you know this thing, right? This is a quote, I'm sure, I'm not remembering from home, but probably Bill Gates mentioned it somewhere. Humans are extremely optimistic in the short term and pessimistic in the long term. I think this applies, this is like a real world paradox.

Speaker 2:
[27:22] So, my million dollar question is, when am I going to have a robot that can fold my laundry? That's the task I want.

Speaker 1:
[27:29] Well, the thing is, you can have that robot this year, but if it does just that, in a corner, you have to bring it close, you have to bring it like, would you really want it?

Speaker 2:
[27:37] No, fair.

Speaker 1:
[27:38] That's the whole point, I think.

Speaker 2:
[27:39] Yeah, no, absolutely, fair point.

Speaker 1:
[27:40] But if you can do the same thing and it's doing something maybe more complex in a factory where you have to turn lights out every day, then would you want it? Of course, like we have people that are in line for that. So it's just the same thing, but different perspective.

Speaker 2:
[27:54] No, absolutely. And so what's next for Skild? What are you guys working on now? Are there new areas you're exploring on the technical side, new industries or business avenues that you're breaking into? What's the company roadmap look like?

Speaker 1:
[28:10] One thing like in this, depending on when it gets released in these couple of months, we have been ultra-focused on how do we take this general model and convert it into specialized systems which can be deployed at scale very quickly.

Speaker 2:
[28:28] Right.

Speaker 1:
[28:29] Like get a new system up and running in couple of days with a small amount of fine tuning and use that strategy to scale to as many scenarios as possible. And the reason behind that is to really get started on this general data flywheel.

Speaker 2:
[28:44] Right.

Speaker 1:
[28:45] Flywheel takes time to set up, takes time to get momentum. And if these things are to happen in the timeline, we want them to happen, we have to start now. And this is one of our main focus. Not that, not saying that technologically we are there, like everything is solved, but this is a big deployment in robotics is a technical challenge. Unlike language or other areas where if you build a thing, it will get deployed, because people will use it, I'll figure out how to use it. But here, deployment is in itself is a big technical challenge. And how do you orchestrate that at scale? It has not been done before. So this is what we are focusing on a lot.

Speaker 2:
[29:24] It's amazing stuff. And as you know, to sort of paraphrase you, it's not going to slow down, it's only going to get more and more amazing, at least in the short term, right? So who knows what the long term has to bring. It's just fascinating stuff. Best of luck to both of you. Again, Deepak and Abhinav, thank you so much for taking the time to join the podcast.

Speaker 1:
[29:43] Thank you so much for having us.