transcript
Speaker 1:
[00:00] If you ask almost any software developer, when you do code review, do you really read the whole PR? Like, do you go through every line and think it through? Do you pull it down and test it out and then leave the good feedback on each line? Agents are very good at that, right? If something goes wrong, that's very human. It's actually a thing that's never been good in software development is inter-team communication. And so it's a very interesting UX problem set that I think nobody's really thought through, really, even now. What does that tool look like in a way that is easy to use and easy to learn? Software developers that will be the best producers of product in the near future are the ones who can communicate, the ones who can write, the ones who can describe. That is, I think, the next superpower. And I think if we could talk to each other in more real time about what we're doing, that's a lot of overhead. That is not a problem that agents have.
Speaker 2:
[00:47] The most widely used developer tool in the world was never designed. Git started as plumbing commands for the Linux kernel team. Unix primitives meant to be wrapped in whatever scripts each developer preferred. A volunteer wrote a unified interface. It got pulled into core. And for 20 years, almost nothing has changed. Now coding agents are the fastest growing users of command line tools, an entirely new persona. They struggle with interactive rebasing. They run status after every command. The assumptions baked into Git's interface no longer hold for humans or machines. The question is whether the tool underpinning nearly all modern software can adapt, or whether something new has to replace it. Matt Bornstein, general partner at a16z, speaks with Scott Chacon, co-founder of GitHub and CEO of GitButler.
Speaker 1:
[01:45] We are here today with Scott Chacon, CEO of GitButler, former co-founder of GitHub. Thank you very much for being here today, Scott. Of course, thanks for having me on. You are a major driving force behind GitHub. You've literally written a book on Git. You could be doing anything in the world with your life right now. What's brought you back to startup land? That's interesting. I feel like if you ask any sort of repeat founders, they probably have similar answers, right? This is the most fun thing to do. So when I started at GitHub, it was a real sort of slog to learn, okay, like it's stressful and it's difficult and stuff. But when you get something working, it's so satisfying and it's so much fun to build and grow and create something that you want to see exist in the world. So I'm sure I'll be doing this when I'm 90. Do you think there's kind of unfinished business for you in version control, or what kind of attracts you back to the same space that you know so well? Yeah, I mean, I did a language learning startup post GitHub because I was trying to learn French at the time. And I think this is the other thing that other founders do is they leave and then they think they can solve any problem. I couldn't solve that problem. I did try very hard. But did you successfully learn French though? I did not. I successfully learned German because I wanted to start from scratch. And so that was what I dog fooded the product with. And so my German's not bad now. And I married a German after that and live in Berlin now. So it did definitely change the course of my life. So long-term ROI on that. Yeah, 100 percent. Totally worth it. Even though the company didn't quite work out. But when I went to go look for something else to do after a very short stint of doing some woodworking, like I think most of us do at some point when we have some time off, I started building some stuff and realized that the tooling for Git hasn't changed since I left, right? Since really I started at, started GitHub or wrote the first edition of the book. Like I was approached by APRESS to write a third edition of the book and I was like, why? It hasn't, it's exactly the same. Nobody's going to care about updating it with a handful of new commands or capabilities it has. So it became an interesting problem set. What would I want this to look like if I could just sort of scrap the porcelain user interface and have a tool that not only did what Git does better or easier or something, but rethinks it a little bit and said, if we had started from scratch learning everything we had learned in 2008 or 2005, if I'd gotten involved in the Git project and could come in and say, maybe it should work this way, maybe these are the things that it should do for us. I set out to build that because I thought it'd be a really interesting fun thing to do, especially from my background. Is there truth to the story that there was tension between the Git Core Committer Team and the GitHub Founding Team early on? Because on the face of it, it makes some sense that these teams wouldn't have exactly the same objectives. Yeah, I think they didn't think we were very smart because we couldn't write C code. There was a grudging respect over time because so many projects ended up moving to GitHub. You built the foundational piece of the entire DevStack, so that earned you a little bit of credibility. I think they only like it because it's fast. I don't think they liked anything else. Linus has talked about us where they're like, well, he moved his tree, there was some outage or something, and he moved his tree to GitHub. He's like, they're a good host. I hate PRs and I hate everything else they have, but abuse them as a host if you want to. I think that was the general. I had friends on the core teams and stuff, and I still hang out and go to the Gitmerge conferences and stuff like that. But I think we always try to be supportive and stay out of their way. It's one of the interesting things, we might talk about this more later, but how hands-off everybody is to Git itself. It's a very designed by committee type thing because it's an open source project where whatever seems to be a relatively good idea comes in, but there's not a drive to say, here's what the product should look like. I think over time, it's just become a Frankenstein where it does lots of things very fast and very well, but it's not designed. It doesn't have overall an arc of taste. And so that's where I wanted to come in because there is a lot of, I mean, this was the root of GitButler is, we don't want to rewrite the whole stack. We don't want to rewrite how it soars data or how it transmits data, the wire protocols or anything like that. That's all very solid. It's very good. It's very smart. It's just the user interface that we want to inject some taste and say, here's a way that we think people are trying to use Git and make it easy to do. So the world is moving towards agentic coding or AI-assisted coding now. This is obviously a very different set of ergonomics compared to a human writing all code by hand. It sounds like you're making the argument that Git wasn't even optimally configured for humans before. No one really driving taste of the developer interface. Now with agents, it's this compounding problem. What do you think will happen? Just expand on this point of what you think needs to change, what needs to stay the same. What's interesting about the Git project is that they started with essentially a Unix philosophy. I think the listeners that are too young may never have heard of the Unix philosophy, but when you write tools in a Unix environment, you want to pipe the output of one into the input of another so you can change stuff. It's actually funny now seeing how agents work because they use all of these old Unix tools. They're using like sed and grep and stuff, right? Yeah, the secret weapon. Where a lot of developers may never have heard of it. They may learn it from their agent and just seeing these things running and they're like, you want to run sed? I guess so. What does that do? So it's very good for that type of thing of saying, okay, I want to do this thing, I want to pipe it into something else, and then have it take the output of that and then I can do this set of filters and stuff. So the original Git plumbing commands, like all of the commands like Linus and the original team were like, we're just going to do things that do all of these very basic things, and then you can write Perl scripts to wrap all of them and do whatever you want. So they had no, I don't even think they had an intention of writing a user interface to it, or making it easy to use. It was completely orthogonal to their goals. They were like, whatever you want it to be, here's the tooling that does it well, and we've solved a lot of the hard problems, the APIs or whatever, you write the interface you want. The hard problems are like the storage layer, like what are some of the hard problems? Compression algorithms, wire transfer protocols, like all of the sort of how the trees are read fast or written fast or stored in a format that's small or can be transmitted quickly. So if you sort of think of the Git history as sort of a fairly complex tree with data attached, like you need sort of an efficient way to represent this. How to move branches around, like what branches even are, that was a very, all of it was so much different than the way subversion or RCS. Like subversion, CVS, RCS, they were all kind of sort of the next step of a philosophy of how to store data, right? Git completely changed that, right? They just thought of it more as tarballs rather than as like a series of patch files and deltas, like a sort of delta series. So they kind of rewrote it and then had people just do their own Perl scripts, right? Not thinking a lot of people would be using it and just sort of the Linux core team or whatever. This guy Pascy, he wrote these Perl scripts that gave it a unified. So then the lazier people that came along that didn't want to write all of that stuff themselves just started using that and it became popular enough that they just pulled it into core and they're like, if you want to use an interface, here you go, right? Here's the portion for it. And that's the CLI tool, basically? That is the CLI. And then most of those commands haven't changed massively, like those original core ones from 2005, 2006, right? Didn't really change a lot. And they rewrote them from Perl into C. They used to just actually send the Perl scripts to everybody. That's another thing that kids listening to this podcast may not be familiar with Perl. Perl 6 is coming out any day now. So yeah, that was kind of how this started. And the other big philosophy of Git that I really do appreciate, but has added to this problem set, is that they always wanted to be backwards compatible, right? So anything that existed before, they won't take out. They would wait for a major version, take out a handful of things, but for the most part, everything works exactly the same way that it did. When I wrote the first version of ProGit, which was like 2009, right? There's almost nothing in that book that doesn't work still. I just added more stuff, more commands or whatever. So, yeah, if you start with a Unix philosophy, then what you end up with is a middle ground of something a computer can use, but maybe not super well, and something a human can use, right? So if you run Git branch, it's just a list of branches, right? There's no user interface on it by default, and you can add some stuff that makes it slightly more usable. But the point is they need to solve both of these problems with one interface, which is I need a computer to be able to do this, and I also want a human to be able to sort of interpret this. And so how do we bridge that gap, right? And so it's kind of not great for humans, and not all of the ones are particularly good for computers, but like they'll do dash dash porcelain and some of them, if they want to do that, right? And so like blame, for example. And so I think that's kind of where the problem set happened. And now it's been so long, and they don't want to be backwards compatible. And so it's really difficult to go in and be like, okay, Git 3.0 is going to be a complete rewrite of the user interface that takes all these things into account. Because also there's nobody really running the project in that way to have some vision and say, okay, this is what's going to happen. And so we felt, I felt like this is a good opportunity to try that and to be able to put that in as long as it's drop in. And like with GitButler, you can go back and forth between Git and GitButler. Like we don't want to, it's not Jiu Jitsu where you have to have some co-located thing that's a really different way of doing stuff. We want it to be a Git compatible tool, right? And that was kind of a design decision. And so what do you think like the machine optimized version of Git looks like versus the human optimized version of Git? Like now that we, to your point, you have the flexibility, you can kind of do both. Right. So actually it's been really interesting. So we started as a GUI and the idea there was that I never used a GUI, right? I've used Git for 20 years now. Boy, that sounds like a long time. I've used Git for a long time now and I've never used a GUI. I've never found it valuable because it kind of just wraps Git commands. And it's generally faster for me to just run the Git commands. And most people that I know, I mean, we did a survey at some point, like 80% of people still use CLI for Git stuff, even though there are GUIs that exist. Because I can click a button or I can run the command. It's pretty fast to run the command. But the GUIs don't add a lot of functionality that is hard to do it in the CLIs, right? If you know how to do them. So that was kind of where we wanted to go, is started with this GUI and do some drag and drop stuff and have multiple branches and rebase by just dragging a file from one commit to another commit. And that type of stuff. I have to admit, I'm not sure I've ever successfully rebased anything. So maybe I'm the target for a GUI on that. I have to admit, I've messed up rebasing recently. And I literally wrote the book on it. So it's very error prone, right? And not really automatable. So like agents can't do rebase-I, right? Like they can't reword commits very easily. They can't squash commits together or whatever, right? Like you have to drop into an editor and do stuff and then have it keep running. And so there's modalities that I think nobody's particularly well served by. What we ended up doing somewhat recently, last six months or so is create a CLI. And that became really interesting to us because there's with a CLI, like a GUI, an agent can't use a GUI, which is really why we started going down that path. But people are using TUIs now for stuff. So now we have a TUI, we have a CLI, we have a GUI. They all operate on the same data structures and we can optimize each one for what's good for that. But even in the case of the CLI, we can do a TUI that's sort of interactive, right? And like very fast to do a bunch of interesting stuff. You can do a CLI where you can run whatever you want and kind of get a nice human sort of output that we know a human is going to read. So do hints or something like that, right? That you wouldn't do if you're piping into another command. Or you can do dash dash JSON and get the same data but in JSON, right? So that's very easily scriptable now. I can pull it into Python and JQ and take stuff out of it or whatever. And we've been talking about doing like a dash dash markdown where it gives you the same information but very specifically for agents, right? Because that's what's good at kind of injecting into context or whatever. Like a very, just one simple example where we have some agent loops, some eval loops that we run through sort of all these things of, is an agent good at using our, like an agent is sort of, people used to talk about like personas, right? Like an agent is now a persona. It's a very, very different persona. It's very hard to guess. It's hard to empathize with, right? Like it's hard, like whatever I guess that an agent is going to be good at or want to do or whatever, is not always what it really wants to be good at or do. Like we did dash dash JSON and it turns out that the agents like just actually getting the human data because they would kind of compensate by piping it through JQ or writing Python scripts to get the one piece of data that they want out of it, right? Then they would immediately run the status command, and so we added a dash dash status after to all the mutable commands because we're like, you're going to run this next, so we might as well just give you that as the output, right? That's stuff you would never do for scripting, you would never do for Unix philosophy, you would never do for humans really, right? But agents really want it, and so we have to think about them as a persona of the user of the CLI. But I can have it using those and I can open up the GUI and see what it's doing and help it out. It's interesting to have these very persona-focused interfaces for what each person is good or not good at, in order to accomplish the goal that they had. That's really interesting. Because the agents or the underlying models have sort of flexible input schema, it's actually better for your kind of tool output schema to be kind of flexible or kind of like dump all the information that they need. Yeah, I mean, actually, we're even thinking of like, we've been talking a lot about this dash dash markdown sort of output format because we thought it would love JSON and it doesn't like JSON that much. And so, like, how do you optimize for what an agent really wants, right? Because we can put in stuff like, guess what I think the agent's next step is going to be and give it some extra context that we wouldn't give a human, right? Because we're like, if you want to do this, next run this. If you want to do this, next run this, right? And then it can kind of help lead whatever the next steps that we think it's going to be in that particular case, right? And so it's a very interesting UX problem set that I think nobody's really thought through really even now, right? Like even most CLIs don't have dash dash JSON or whatever, right? Even ones that have been developed fairly recently, like we're all learning this now, right? And it's not easy to see what it like, you really have to dig in, you have to start asking it, like look through the last 50 tool calls that you did, the GitButler stuff, like what did you struggle with, right? Like what had errors, what did not do what you expected it to do? And like weirdly it will kind of tell us, right? And we can kind of work on the skill files and figure it out. But it's a new era of trying to figure out usability. That's so interesting. You know, you're picturing the consultants showing up with their slide decks saying, you know, here's your persona, you know, Agent Alice. Here's your persona like a bot. Yeah, exactly. You know, at least, actually, I guess you don't need the consultants because you can just ask the models directly, right? Like they're available everywhere all the time. So I think that's actually a really interesting observation. Can you go through some of the sort of reasons that GitButler, especially CLI is a better fit for agentic workflows than kind of plain Git? I mean, for one, it's because the input and output are built specifically for that, right? So we get to look at what it's trying to do and try to give it feature flags that it's trying to run, right? Or that it asks for or is trying to get around by writing some Python script. And then I can see what the Python script does and be like, okay, here's a new sort of flag that you can give that command so that that's the only output that comes out, right? And sort of optimize for what it's trying to do. Like we can see what it's trying to do. And I find that really interesting. The other thing that GitButler specifically is really good at, one of the early design decisions that I found really valuable and powerful is doing parallel branches. So a lot of people are using kind of work trees now to do a lot of stuff in parallel so the agents don't step on each other. And every, I mean, humans have dealt with this for a long time, right? Of like, you're working on some feature and then you see a bug, you notice a bug, right, that you want to fix and you have to decide do I stash everything I'm doing, fix the bug, open a branch, you know, push it to that and then go back to my other branch and unstash. Stashing always felt a little hacky, didn't it? But I mean, it's an artifact of you can only be on one branch at one time, right? There's one head, there's one index, there's one working directory that you can deal with at a time. And the data model just doesn't really support that very well. And so we built stuff on top of that where we essentially have like a hidden mega merge type thing, and you can have multiple branches, and you can take stuff that you've done and assign it to different branches and commit it to different branches orthogonally, and then it doesn't matter what order you merge these branches in, but it's using one working directory. And so what's really nice about doing multi-agent work, which not everybody's doing, I think some of us are like some people that are really trying to push the boundaries of, you know, using a lot of agents constantly. But one thing that's nice about doing that instead of pushing boundaries is that the agents can see each other's work, right? And so it's almost like they're kind of not pair programming, but they're using one working directory. So if one agent modifies a file and then the other agent tries to, it notices that it's been modified. And it can pull how it's been modified and then add on top of that, right? And so, and not create conflicts. And so they don't even, we even experimented with having like a communication between agents. We have three agents running at one time, give them a little chat channel and they can talk to each other about what they're doing. Hey, I'm editing this file now and stuff. And it was super cool. I wanted to ship it so badly. I'm like, this is awesome. This is so, we had a little two, you could see them talking to each other and stuff. And I'm like, this is amazing. And we put it like through, like very sadly, we were all devastated. It does not help, right? Like they will see that something else is happening. I see. They'll figure out what, why, right? Like it'll be like, looks like somebody's working on some other feature because they added this stuff. So I'm going to leave that alone and I'll put my changes somewhere else. And it's faster, right? Because they don't have the overhead of the communication. So unfortunately we're not shipping that because I really wanted to be able to show that off because it's very, it's very fun. But just to sort of parse what you're saying a little bit, like if I have, you know, five agents operating on the same repo at the same time, same code base, the work tree solution is basically making five copies of the working directory with some sort of smart storage optimizations, basically five separate copies. The idea behind parallel branches with GitButler is everybody's operating on the same code base directly at the same time. And they're surprisingly good at not stepping on each other. And they can't make merge conflicts because they only have, they all have the same files to edit, right? That's super interesting. And so that, but when they're done with their loop, right? Or when they're, when they have an agent stop and they try to commit stuff, if they have our skill, then they'll look at it and they'll be like, okay, I'm just going to create my own branch. Like each one of them can work in their own branches and they can commit their stuff into their own branches. And now, now it's sophisticated enough where if one agent really wants to edit something that another agent did, it can see that it's locked and it can stack the branch instead, right? And so now it doesn't really matter if they're stepping on each other or not. I can kind of figure out, my co-founder, Caril, today was showing me this thing where the two agents had, you know, they were both trying to sort of vie for the same file and edit it in a way that wasn't really compatible. And so they, one stack hit their branch on top of the other one and then they kept working and they kept committing, but they commit to their part of the stack, right? That's cool. And so like, that's the type of thing that you can't, it's really, really difficult to do that with work trees or really any other type of, like you can't do it with Git, right? Like, just plain Git doesn't allow for that very, doesn't really allow for it all, right? Like, you can't do rebasing and sort of amending commits or moving commits down the stack or something like, or squashing stuff or, so it's just not possible. So part of it is that we have that solution, but the other part is we're really trying to make the tools accessible for agents in the first place, right? That's super interesting. So I still get the sort of logical isolation of the branch, but the agents can kind of see each other's work, and it kind of makes sense, right? It's like if you completely isolate them each in their own working directory, they by definition don't have any awareness of what the others are doing. And I mean, there's other ways of getting around that as everybody finds out, right? Like if two worksheets create merge conflicts, they won't figure it out, and then they get a merge conflict on GitHub when the second person merges or whatever, and then you can get the agent to pull it down and fix it and try it again or something like that. So it is doable, but it's kind of nicer to just not have them in the first place and be able to kind of review everything sort of from a high level standpoint and be like, okay, that's all of these have done what I want, and I have these two stack branches and then two independent branches, and these, all three of these stacks can be merged in any order, and we're really going to end up with my working directory now, right? That's kind of where it started, right? And so this functionality is all built into the CLI now, where different types of agents can use this just by calling out to the CLI, basically. That's very cool. So what do you think happens to GitHub in this world, right? Like Git clearly is still the backbone everyone's building on. You're even extending sort of Git. You're not building a whole thing. But GitHub in some ways seems less relevant than it was before. I'm curious what you think will happen next. I mean, I think it depends on GitHub, right? I think it's actually mostly framed as what's the next GitHub, because I think people feel like GitHub's not going to be sort of be able to pivot fast enough to keep up. I mean, it's a behemoth, right? That is both its advantage and disadvantage, is that its advantage is it has everyone in the world using it. And so whatever it does, it does at scale, which is awesome, right? It could, it can, it has the obvious capability of being able to introduce something to most of the world, of all of the agentic users in the world, no matter if they're using Copilot or not. The question is, do they care enough to do that? Or do they have the vision to do that? Do they know? The other question is, do they, do any of us even know what that should be? Right? I feel like we're in an interesting sort of Cambrian explosion of workflows now where it's like, who knows, right? Like, and to put time and resources into doing that in a way that's very hard to pivot around is really difficult. For a startup, you know, we don't have the audience in the same way clearly, but we can, we can, you know, mess around more and kind of follow things faster and try to figure out, okay, this is because, I mean, it changes every month now. Like the, like I was saying, the tools that we're writing, we think work, they don't work. We can ditch them. There's no whatever, right? Like it's, it's fine. So it's, it's kind of an idea of, of, you know, what are people going to more or less settle on? And then how can you give that to the most people or capitalize on that to some degree? Like I found GitHub's evolution really interesting because people ask, what is the next GitHub? And I find that interesting to look at from the point of view of what was before GitHub. Like what was, what was, what was GitHub the next one of, right? Yeah, yeah. And it wasn't really anything. Like there's no, there was nothing like a GitHub before GitHub. And so I think whatever is the next GitHub is going to be the same problem set, right? It's not going to look like GitHub, right? Like I think GitHub will be more like a source forage or something that, that maybe you could say was kind of before GitHub, like it didn't really have collaboration tools, right? It wasn't, it wasn't about sharing patches or like track or, you know, issue tracking systems was kind of before GitHub. So like it just, the entire programming community changed so fast. And I think GitHub took advantage of that and was able to grow and provide tooling that nobody had, right? And nobody was really thinking about, you know, short enough time spanning and get that audience. And I think there definitely probably will be something. The question is, can GitHub do that like the way that, you know, SourceForge couldn't or Google Code or whatever was kind of before us? Or is there going to be a startup, you know, like us or like somebody else? You know, I'm sure there's a lot of people trying to do this type of thing right now because they see the need. It's just that nobody's agreed on what the solution is, right? Like, where to go? And it doesn't have to be perfect. It just has to be good enough that people don't want to write it themselves and are like, OK, I'll give you some money so that I can use that instead of re-implementing it from reinventing that particular wheel. You mentioned before that the original Git developers didn't like the GitHub primitives, PRs and issues and all this. Do you think now the primitives need to be re-evaluated? Do we need issues? Or is there some agent-native issue format that is better than what we have now? I think we've needed it for a long time. And it's just too much friction for too little value, really, to change to... I would really prefer patch-based review rather than branch-based review, right? PRs made things really easy, but you get a lot of commit slop, right? There's a lot of oops sort of things added on the end because it's the branch that matters in the review context and it's the branch that matters in the merge context. And so the commit message doesn't make any... Nobody really looks at them. So Hale once posted an old commit log and half the messages were like, ah, this is not working. They're like, yes, I made it work. Yeah, I mean, it doesn't provide value that way, right? Like in the mailing list days, it did because that was the way that you reviewed stuff. You had to have a good commit message because that was your PR description, right? Now it's the PR description was not kept in Git, and so... But people don't care that much, right? Just once it's merged and out there, who cares, right? There's not a lot of sort of, you know, code archaeology or whatever that people really depend on for that. And so what I think review changes a lot because we're going to, you know... I mean, like just to go back to sort of the problem with the primitives right now, PR is if you ask, I think, almost any software developer, like when you do code review, do you really read the whole PR, right? Like, do you go through every line and think it through? Do you pull it down and test it out and then leave the good feedback on each line or whatever, say, this is working or not working? Or do you give it a cursory glance and say, yep, fine, right? Like, it doesn't look badly broken or, or like, you know, you're introducing, you know, API keys or whatever, right? Like, if you can pull it down and compile it and run it and test it, and I feel like that's almost better reviews. If something doesn't work right, fine, go to that piece of the code and do it. Now, agents can do this for us or can augment us in a very nice way. I think review, it would be nicer if it was patch-based and it was local and you could actually run stuff, right? Or your agent can run stuff and then give you sort of a short list and then you can look at that. And so, I think a sort of centralized online URL for a PR is something that was not the greatest. It was better. It was a better tool. I think it was better than, you know, a tracker, you know, looking at patch files or whatever. Certainly, but it's unfortunate that it hasn't evolved more, right? I think there was still a lot of room to grow in that space, and it hasn't changed that much, and I think there's definitely room for that. So, the question is, like, we even talked internally about trying to do, like, our own Forge, right, or our own review system, which we actually shipped one and then kind of took it back off because it wasn't solving the problems that we really, really needed. But I 100% think that, and we'll, I'm sure we'll do other approaches to the review system, and other people will as well. I'm really fascinated where it goes. Like, what do we really need as code writers, whatever that means in the near future? Like, what do we really want to accomplish from review of code by whoever does it, right? And what does that tool look like in a way that is easy to use and easy to learn, right? And it's a very unsolved problem right now that is, for me, incredibly fun to tinker around with. Like, I have lots of opinions here and I want to try out lots of stuff and I want to see what feels right to me as a software developer, but there's lots of possible answers to it. But I definitely think it's an unsolved problem. I don't think the primitives are the right thing anymore. Yeah, it's sort of interesting, right? Because people are reading the code less. I mean, there's even sort of an extreme view that, you know, like JavaScript is the new assembly, right? Where it's like, you know, it's there and, you know, something compiles down into it, but it's like not really, you know, so it's like, if we're not writing JavaScript directly, if we're not reading JavaScript directly, like should the code review actually be code review or should it be prompt review or something else? So right now at the state of, you know, agents and models, the way that we sort of approach stuff, I think is somewhat triagey where, you know, if it's a red wristband, like if it's really, if, you know, if something goes wrong, it's bad, right? Then that's very human. Like we look at stuff, we handwrite stuff, like it's very important for us to essentially get the APIs right. Like we know if we call these APIs, it will do the right thing and it will give us the right errors. Or it has a, like we have high confidence in this is good code. There are sort of other levels of triage where it's not that important, right? If they're just calling those APIs and it's a UX problem, if we add a feature flag for an agent, like we refine vibe coding that, right? And just kind of being like testing it, having some, you know, having it write some tests, look at the test, be like, okay, that seems fine, right? Like that seems to work. Run it through the eval loop. It doesn't break anything. It makes things 10% faster, whatever. Ship it, right? Like, so we do kind of have this middle ground, but I feel like a lot even internally at GitButler and with a lot of companies that are really trying to push the boundaries of this and use LLMs a lot, it's becoming much more of a good writing problem. Like, can you do a good write up? And not every team is good at that, like not every software developer is good at that. Like, can you communicate? I feel like a lot of developers that, especially the ones who think that they're very smart or are legitimately very smart, feel like they don't have to describe what they're doing, right? Like, can live in their head and it's fine. I think almost the software developers that would be the best producers of product in the near future are the ones who can communicate, right? The ones who can write, right? The ones who can describe and like that, that is, I think, the next superpower. And I think you're right, the collaboration of how do we write code is more important of what's the spec, right? Like, what is the write up that we want to be true? And then we can give it to, the implementation details are probably, they become less and less important as the agents get better. So all of us who were attracted to engineering because we could deal with machines instead of people are now finding out that engineering is like an actual like human discipline after all. I clearly have a huge bias because we started this saying I wrote a technical book. So I like the idea of I'm going to be the best programmer because I can write technically, right? Yeah. I mean, we've all had this thought, like there are 10x or 100x programmers out there. All of us have probably thought at some point that we were one of them, which like definitionally is challenging. So we've all had this thought of like, oh yeah, I don't need to like explain what I'm doing. I'm right. But the point you're sort of making is like now, like right is not sort of as objective as it was, or there's just so much activity going on that like kind of managing state across the team is maybe the most important thing. Yeah. Or yeah, consensus, right? Like the why rather than the how becomes, I think more and more valuable as the how becomes cheaper. What do you think will happen with coding agents in general next? What do you think is the big hill that will climb for them? Just because you guys are pretty deep in this. As they get better, I think it's the problem set of what to work on next. I think we're all kind of afraid of just giving it access to linear and being like, do it, right? Just solve all of them. Because the other problem with that is that most linear tickets are most ticket, there's no write-ups, right? It's not a database of write-ups of here's, it's actually very hard to sit down and write up, this is how I want the entire product to look in every interesting and important aspect, right? I think what becomes problematic is you build something, you have this downtime where you're using it and trying to figure out, I spend most of my time when I'm doing stuff right now, and again, I'm the CEO, I'm not writing the important code, but doing proof of concepts or coming up with ideas or whatever, is I'm spending most of my time testing it, and writing up the next thing, and there's a lot of wasted cycles. I almost feel, I don't know if there's a phrase for this, but you're not spending enough tokens, you're not having enough things working at the same time. You're the German speaker, you should have a word for this compound word. The token shot of coitah. Yeah, the fear of not consuming enough tokens. I think the problem set becomes training a team to be good at writing, right? And training, like, figuring out what that coordination looks like to decide, like, which of these series of write-ups describes the product that we want to have, and less, probably less so how exactly is it implemented, right? But that makes things really, that makes things really constraint, like the constraint moves not to can I produce the code, but can we agree on what we want, right? And that's, that's, I mean, it's difficult in some ways, it's easier in some ways, like, I kind of like being able to, like I've been working on this metadata system that we want it, we want to, we might talk about this at some point, but like, add, like, append transcripts to commits or branches or whatever, right? And Git can't really do that very well. And so we're trying to figure that out. And so I've been spending a lot of time writing a proof of concept in a spec, and it's most of the time on the spec, right? And like weeks, we're like just so much time on the spec. And then every time I have a decision, I just make it, build it, and then I try it out. And then I go back to the spec and I fix the spec and I tell it, okay, do it again, right? And so that's really nice because I don't have to spend all my time implementing it and then seeing what's wrong or, or telling, and I don't have to just have a write up that I'm trying to convince you to read and agree with me or come up with problems or whatever. I can have something, I can have show and tell all the time, right? Like I can always show you something and be like, is this what we want to productionize or not? And we have a really good idea that it is or it isn't. And I find that a really interesting, powerful thing. So I'm really curious kind of where things fall down in the long run. But right now, I mean, I'm using AI to help me write specifications so that we can by hand implement a lot of it if necessary, right? But we know we're implementing the right thing because we have a very good idea of what it feels like. That's really interesting. So not only will AI coding models and coding agents continue to improve, but it sounds like you're making the case that we're not maximally taking advantage of the agents that exist now, especially in a team setting. I mean, there's so many things you could do, right? Our agents could talk to each other. It's actually a thing that's always been, like from, it's never been good in software development, is inter-team communication. Like, if you're working on some project and it's modifying a file and I'm working on a project and I'm modifying the same file, neither of us know that until, I mean, we kind of lost that with centralized version control to a degree as well, but I think it's obviously much better, the advantages that we get. But we've lost some aspects of coordination, where it'd be nicer to know that rather than you merge first so I have 100% of the work, right? Like if we could talk to each other in more real time about what we're doing, but that's a lot of overhead for, but agents are very good at that, right? That is not a problem that agents have, like it can take their downtime and talk to the rest of your team's agents and be like, what do I need to look out for, or tell Scott about so that he can be aware of this, when he's working on his feature or the next iteration of this or whatever. I feel like that almost would be a better use of sort of the cycle downtime than trying to run five or 20 of them at the same time, right? Which just becomes hard to manage and hard to figure out, like, is this doing the right thing or going the right direction? I feel like it's almost interesting to help it have you write or suggest things, or, you know, be sort of a helper that way, talk to the rest of your team, figure out what the world of things is in that project that could affect you, and give you that information so you can make decisions, right? Like, that moves towards what I think would be a really interesting way of writing good product. So, this is like the smart, responsible version of using agents instead of like the amphetamine fuel, just like pound as many agents at the, you know, repos at the same time as I can. Yeah, I mean, also, I could be biased in that I've just never been able to keep enough of them running at the same time and then review stuff and then find useful data out of it. Like, I like to have a little bit more control, and so I take things slower, generally. But, yeah, it's, maybe some people are really good at that, right? But in all of these scenarios, the metadata you're talking about and, like, appending chat transcripts to the changes and things like that, I guess it feels like that gets increasingly important, right? Because the, like, focus of sort of creative control or, like, engineering insight is at the metadata layer even more than at the code layer. Yeah, I mean, it also becomes a big data problem. Like, you would be surprised, it's all text, but you'd be surprised how quickly that balloons, right? If you're really trying to keep every transcript or even every, I mean, every prompt is okay. It's kind of commit message-y, like, length, but having, like, all the tool calls or whatever, like, or everything that the LLM was thinking sort of, like, it becomes a really, really, really big data problem very fast, like, even on small projects. So now you have to go into, like, you know, large file, large repository, like, you know, it's interesting. Git has some things of being able to work with data of the size that, you know, Chrome uses or Microsoft Office team uses, or, you know, stuff that Microsoft built into Git a long time ago that most people don't use or know about. And so we're taking advantage of some of those primitives in Git to try to do a metadata system that can scale relatively well without having to worry about it too much. But, yeah, so there's a lot of interesting problem sets that are coming up with just trying to keep everything, right? Like, we have version control is all about, you know, well, change management, I guess, change control and history. Like, you want to be able to rewind, you want to have save points, like, and you have to have this balance of how much data do I store in this that has, you know, is there a cost benefit sort of analysis there of I need all that data because then it's too much data, you can't find anything you want, right? Yeah, it has to be indexed or searchable in some way, presumably. First version of GitButler that we did, we did like a CRDT based version where it was just constantly recording every sort of file buffer save, right? Like everything you ever change so you could just take a timeline and scrub it back to like any point in a working directory. And it was awesome. But and it actually wasn't even that much data, but it was just the user interface. It was too much information, right? Like it's not common that I want to go back 17 minutes, right? And so we kind of scrapped it because it was just too much complexity for humans. It almost be interesting to re-implement some of that and have agents take a look like if you want to get an entire working directory back to what it was 27 minutes ago, here you go, right? Like here's a tool to do that. But I think it's still, you have to figure out usability versus what's possible. I watch some of the thinking logs for agents sometimes just out of curiosity and some of them are super unhinged, right? It's like, I see the problem. It's like, oh wait, no. Oh God, I can't believe I missed that. It's like the agent starts to berate itself. Don't be too hard on yourself. Yeah, but there's super valuable information in there, right? It's just like, it's a little bit like extracting it out of this like junior developer's kind of like private freakout. It's a very difficult problem set because it's so, you know, I don't know, like I'll have conversations with other people on my team where we, it's so subjective. Like they think, you know, Codex is good at this. And I think Cloud Code is good at exactly the same thing. And we have different reasons why on a similar problem set, right? And we're like, okay, well, it was like, what is it? Like, why am I having that particular feeling? Why does it seem to be acting that way for me? And it's kind of difficult to quantify. But yeah, I find it interesting thinking about, I used to do this for the language learning thing, of trying to figure out, you know, what's the logical end of the tooling that this will eventually become, and then what's interesting or valuable at that point. Like for language learning, we used to, you know, Google would come out with like an in-ear translation, auto translation thing or whatever. Babblefish. Yeah, a babblefish, and they're like, okay, well, language learning is done, right? Yeah, yeah. And I was like, I was always like, that's, it's a dumb argument because there's no, it's not the same, right, to have a babblefish. Like both people have to have a babblefish for one, and it has to be very fast and very effective, and have cultural context that's very difficult. And I like to think about, okay, like what's the logical sort of endpoint of this? And for language learning, it was having a human translator, which I did a tour for GitHub in Japan at some point, and I had this translator walk around with me for a week, right? And every interaction was through the translator, and she was amazing, she was very good, but it's still not a good experience, right? Like you're not going to get married with that type of communication, right? Or you're not going to start a business. I don't know, there's a lot of things that you don't really, it's not good, right? But that is the best possible version of a Babel fish, right? And so I always think about that because for this, it's kind of interesting, like what is the logical extension of how good agents become, right? Is it having the best engineer you've ever known that can stop time and work on something as long as they want, and then start time again and now you have that solution, right? Which is kind of what it is, right? Like you have the smarter and smarter and smarter people are, you know, people that can write code, that can do it faster and faster. And so, but when you have that, what do you want to do with that? Like, how do you manage that time? How do you figure out, like what tooling do you want to help you, like, figure out what you want to build and what you're happy with at the end result? And that's a very, very interesting question that we're getting closer to, but I still think we have a while before it's that good, right? But it will get that good, I think, or very close to that. And so, how do we, what do we do with that? This is one of the big questions in the tech industry and the big question in the world right now. I mean, what is the end state of this all look like? I don't claim to know the answer, but I'm glad people smarter than me are working on the problem. Well, thank you very much, Scott, for doing this talk with us today. Yeah, thanks for having me, Matt. And yeah, for folks who are listening, try out GitButler, especially with the new command line tooling. Super, super cool. And for folks building startups, try out Andreessen Horowitz. They're awesome. Too kind, as a two-time Andreessen Horowitz founder. Hopefully, this one is even more successful than the last. Thanks, Matt. All right, cool. Thanks, Scott.
Speaker 2:
[46:01] Thanks for listening to this episode of the a16z podcast. If you liked this episode, be sure to like, comment, subscribe, leave us a rating or review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on X at a16z and subscribe to our sub stack at a16z.substack.com. Thanks again for listening and I'll see you in the next episode. As a reminder, the content here is for informational purposes only. Should not be taken as legal business tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any a16z fund. Please note that a16z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com/disclosures.