Adam Berry, Senior Staff Engineer, Amplitude
Adam Berry, Beyang Liu
Adam Berry has worked on developer tools and infrastructure his entire career, starting with Eclipse plugins for Wolfram Research and later moving into team-oriented tools that drastically lower cycle times, especially in organizations that reach scale in terms of humans and code. Over the years, he has had a front-row seat to the evolution of deployment models from VMs to containers and the impact this has had on tooling, processes, and developer culture. Beyang and he discuss the framework through which he views engineering productivity at two distinct levels: the individual and the organization.
Show Notes
Highlights:
- Building Eclipse plugins at Wolfram Research and the unique working culture there
- Bringing cycle times from days to hours on the team at Yahoo that pioneered Hadoop
- Bringing down code review times at Nutanix and Pinterest, Dunning-Kruger and the peculiar structure of large single-product organizations
- Code as asset management
Transcript
Beyang: All right, welcome back to another edition of the Sourcegraph Podcast. Today we’re here with Adam Berry. Adam has worked his entire career on developer tools and infrastructure, starting with Eclipse plugins for Wolfram Research and later moving into team-oriented tools that drastically lower cycle times, especially in organizations that reach scale in terms of humans and code. Over the years, he has had a front-row seat to the evolution of deployment models from VMs to containers, and the impact this has had on tooling, processes, and developer culture. And he has arrived at a framework through which he views engineering productivity at, sort of, two distinct levels, the individual and the organizational level, which, hope we can get into in this next hour. Adam, thanks so much for taking the time and being on the show.
Adam: Thanks for having me. Excited to be here.
Beyang: Awesome. So, to kick things off, you know, let’s start with your programming background. What was your, you know, journey or entry point into the world of computers?
Adam: Yeah. So, I think slightly atypically, I had computers around as a kid, I was lucky enough, like, from a very young age, but I didn’t actually really do any coding when I was young. Really, the entry for me was I graduated college with a degree in mathematics and moved to the States, and was lucky enough to have, like, moved essentially to the hometown of Wolfram Research, where I got an entry-level position. And that was like my gateway into that, so mathematics, Wolfram. And it just, kind of like, took off from there.
Beyang: Interesting. What sort of math were you studying in university?
Adam: So, my focus was pure mathematics broadly. My master’s thesis was in set theory and graph theory, sort of, very specifically. But I think, in a technical sense, a very long way from what I do for a living now.
Beyang: Was that, you know, like, Wolfram Research, has, like, a very mathematics-oriented use case; is that how you got connected to them? Or, like, how did you get in the door there?
Adam: Um, so back at that point in time, so Wolfram had this sort of like way that they hired new graduates into their technical support department. And so basically, you would spend a couple of years in there working with most of the users of Mathematica—like, the principal product at the time—learning the product, kind of like figuring that out, and then gradually, you would transition into a development role. So, that was, like, very much what I was able to do there. You know, and tech support can sound kind of heinous, but it was very much, like, the level four and up support, right? It’s like, you’re working with researchers and professors who were doing—who were, like, calling in, like, with very serious issues that they’re having with the product.
I mean, you have to do the install stuff, too, but, like, it was actually a really good, kind of like, introduction to the field. And I’ve remained really thankful to that my whole career because there is no better way to build empathy for someone who’s using software than working with them on the phone when they’re stuck. So, like, it’s really sort of, I think, served me in good stead over the years.
Beyang: Oh, absolutely. I feel like [clear throat] support is one of those roles, which maybe gets, like, the short end of the stick, especially in the tech industry, but it’s just so, so valuable. Like, every time that I’ve hopped on my customer call, you just pick up so much context from the way that they’re trying to use the products, you know, they push the boundaries in interesting ways, or sometimes there’s, like, things that are obvious to you that are just completely non-obvious to a user who’s just, you know, isn’t on the product team or isn’t on the support team.
Adam: Yeah, I think that’s dead right. And then I think also when you think about, like, the organizational model that you have then, in that, like, a very high percentage of your developers have come into the organization through this path of working with your users, like that’s a really valuable thing that’s, like, very difficult to capture, right, normally. Like, so you know, a couple of years of that, it’s ingrained. So, as you’re then working on code and working on stuff, like, you’ve got that muscle. It’s really good.
Beyang: I like that maybe every developer should start their career out in support. And, you know—[laugh]—
Adam: Yeah, maybe. That’s not a bad idea.
Beyang: Okay, cool. So, you know, when you transitioned to a development role in there, what sort of things did you work on at Wolfram?
Adam: So, that was really—so the couple of things that I started on there, included the Eclipse-based development tool that we had, and that was called Wolfram Workbench. So, like, I started working in on that. So, that was like, straightaway I’m into developer tools on one side. And I was also at the time working on a couple of other pieces. One was a AMPL, which is, like, a modeling language into Mathematica translation tools.
So like, I was working on that at the time. So like, but it very quickly got me into, like, the development tooling end of the scope. It’s like, I don’t really know why I was pulled into that space early on, but I was, like, literally day one.
Beyang: So, talk about that Eclipse plugin. What did it do? What was the thing that it added to Eclipse that was important to the dev team?
Adam: Yeah, so Mathematica has, essentially, a package mechanism for extending it for its core functionality. It actually dates to, like, the early days of Mathematica as a product where not everything could fit in memory, so like, they built this package system so that you could, like, opt in, like, pieces of functionality. And it just sort of like stayed around as, like, this is actually how we develop extensions and, to a large extent, develop the core of the product. So, what was pulled into Eclipse then was the classic set of an editor that had support for the language, you know, with good code completion, good reference finding, like, all the things you would expect from an IDE editing experience, and also a unit testing framework, so that you could do, like, proper test-driven development for those packages as you were going with it, and, like, properly embedding that in the development cycle inside of Eclipse. And then, like, run and debug, both are like, single local kernels and remote kernels because one of the use cases of Mathematica is to run on parallel clusters, right?
So, like, you could be developing custom code and, like, hook up to, like, debug across the cluster. Like, so very sort of like classical sets of things inside of the IDE, but just, like, translate it to Mathematica.
Beyang: Got it. So, this really, like, tightly coupled Eclipse with the application that you were developing. It is kind of meant to be—
Adam: Yep.
Beyang: Kind of like an end-to-end portal for testing and also, like, spinning things up in cluster environments, that sort of thing?
Adam: Yep. Very much so.
Beyang: How many people were working on this thing?
Adam: So basically, uh, three of us, which included the Director of Kernel Technology. Like, Wolfram—and I don’t know if this is still true; I mean, I still have friends who work there—but, like, when I was working there, ran—[sigh] I don’t want to say ran lean, actually. Like, I think actually the way I want to say is did an incredible amount with a really small number of people. And so, like, I remember there was a guy who was an office mate of mine at the time who was working on the statistical functionality inside of Mathematica. And he was actually essentially the lone developer for all of the statistics functionality in Mathematica, which could do pretty much everything you could do with SPSS. And it was, like, one guy and SPSS had, like, I don’t know, dozens of people doing the equivalent functionality.
So, like, there was a lot of that kind of translation of, like, the numbers of people who are working on things. So yeah, there was three of us who were doing it and we were all also doing other things, including in one of the cases, running the whole of the kernel group. [laugh].
Beyang: That’s incredible. [clear throat]. I feel like that gets at, like, one of the interesting things about software work, where there is indeed, like, such a high level of variance between different individuals and different teams in terms of, like, how many people-hours are required to ship, you know, a particular set of functionality. Like, did you notice that at the time or is that something that you kind of like became aware of? Did that factor into how you, like, developed the tooling at all?
Adam: So, the latter part, no. I didn’t know that we were running, like, this, right? And I think there was a little bit of—as I think back—of, well, hey, we just run cheaply here, right? Like, that’s just kind of how we do things. I think it’s not until much later, right as I’ve now worked at a bunch of places that I can look back and really put that into context and to understand the difference, right?
Beyang: Cool. Okay. So, after Wolfram, you left, you went to Yahoo. Is that right?
Adam: Yeah, pretty much. There was a small stint at a little startup in there in the middle, but—
Beyang: Okay cool.
Adam: —I went back to Wolfram in between. But yeah, so my career arc is basically Wolfram and then Yahoo.
Beyang: And so, from Wolfram to Yahoo. It was kind of this journey from, like, this editor plugin, which is very much kind of like a single-player tool, right? It’s like, you know, one developer sitting at your editor. And then in Yahoo, you got involved in more kind of like team-based tools. Can you explain a little bit about, you know, what you were working on there?
Adam: Yeah, so I was on the release engineering team inside of the grid engineering team at Yahoo. So, this was the group of people who developed Hadoop and friends, basically, which, for people who remember this history, Hadoop came out of Yahoo, right? So, like, this family of technology was birthed there, and so this is the group of people who were working in delivering on those things. And it was all of them: it was Hadoop, Hive, Oozie, Pig, that the whole set. And so, I was on the release engineering team in that group.
My memory is it was something on the order of 50 or 60 developers, give or take. The grid operations team was another 15 to 20 or so, I think. And this is the group of people who are pushing, you know, like, essentially the core of the platform, Yahoo’s data infrastructure, which is 40,000 nodes, give or take. So, yeah, very different kind of world to step into.
Beyang: That’s incredible. And what sort of things were you working on enabling for Yahoo? Like, you know, in terms—this feels like almost, like, the early days of kind of the big data trend? You know, Hadoop was kind of at the leading edge of that. What were some of the, like, the interesting use cases that you encountered there?
Adam: Yeah, so we—so when I joined was right when the team was developing and delivering essentially YARN. So, like, it was Hadoop dot 23 that then became Hadoop 2, which was, like, this new compute framework. So, like, I was like, there, right as that was going out. And so, when I joined, you know, there was already, like, the idea is to start attacking some of the things, essentially, in the delivery lifecycle for the team. And so, my manager, like, explicitly hired me and one of the person into the team was like, “We need a new test framework. The current test suite is a nightmare. There’s no way we can fix it in the current setup—” You know, like“—so we’re going to have to go tackle that.”
And so, there was a few of these projects where we sort of ended up, like, tackling chunks of what I now know to be the delivery lifecycle, but we would not coining in those terms at the time. So broadly, we just, like, did that. We, like—each chunk of the delivery lifecycle we went in, we did something, we made it better. I think one of the really big things that I personally worked on in that realm as well, that really informed a bunch was smoothing the management between the open-source releases and the internal releases, right? There was a really strong tension there because this was the group of people who were developing essentially YARN, or at least split with Hortonworks, right?
Hortonworks did exist that by this time, so there was, like, a co-development of the next-gen compute framework. But you want to be able to deliver internal versions of open-source releases with as small number of fixes as possible. So, like, smoothing that process, getting really clear visibility into what your difference is—internally over open-source—and minimizing it, which was like the first time I started to think about collection of risk in the development process. Yeah, so like, that was one of the things that, like, I sort of personally did there. But yeah, we tackled the full chunk, and I was involved in all of it. It was a lot of fun.
Beyang: That’s interesting. So, I’m trying to paint a picture of, like, what the development environment looked like when you came in. So, you know, 40,000 nodes; that’s, like, a huge cluster. It seems, like there’s a—
Adam: 40,000.
Beyang: 40,000.
Adam: 40,000 in prod, we, yeah the—yeah, prod was 40,000 nodes across, like, ugh, I don’t know, roughly ten clusters, I think, is my memory. And then in dev we had, I want to say up to about 10,000 nodes that we always got, like, as hardware aged out of prod, we got them in dev. So, our hardware was super unreliable. [laugh].
Beyang: [laugh].
Adam: It was awesome. [laugh].
Beyang: That’s fascinating. You know, like, usually, when people speak of dev environments, they’re like, oh, you know, my local machine. It’s got, you know, 16 CPUs and—
Adam: Yeah.
Beyang: —you know… but around—but like, 10,000 nodes, that’s uh—what does it look like to have development environment that’s hooked up to 10,000 nodes?
Adam: Yeah. So basically, what it was the inner loop there for developers, you know, was like, the classic Linux workstation under the desk, like, as much power as you can throw at it because just compiling Hadoop—all of it—was pretty expensive. So, the completion and what were broadly called unit tests, but as all large unit test suites in open-source projects go, there were some definitely some stuff in there that wasn’t unit tests, you know, so, like, they would get expensive. So like, that was like, the local, like, the very local inner loop. And then pretty much every developer had access to a small number of development clusters that numbered typically, a couple of dozen nodes, basically, that they could actually kind of like compile as Hadoop, fling it out, like, to test something out.
That was, like, basically what they were doing, you know, day in, day out. And then we also had sort of like slightly larger you know, like, build and release clusters, right? So, as well as these, like, personal clusters for the devs, like, we had, like, stuff that occupied spaces in the stages of the pipeline. Which wasn’t really a pipeline at that—like, it was very much, like, hey. So, when I started, we were absolutely on a once-a-day cycle as a team.
Beyang: Once a day meaning, like, you shipped once a day, or, like, it took—like, what was once a day?
Adam: Once-a-day was the team had a one-a-day iteration. So, there was a nightly build, it got deployed onto a set of test clusters, the test suite would run. And so, in essence, the team, collectively, had one iteration per day.
Beyang: That was like your inner loop, like, you couldn’t even, like, try something out and, like, deploy it into a cluster environment. You had to wait a whole day?
Adam: You could do it, but not integrated with anybody else’s changes, right? So, you could be working on something but, like, in an isolated sense. That loop was okay. But the ability to integrate those changes any more than once a day just wasn’t there. And once a day was a little bit in the ideal state because the test suite took, like, my memory is pretty close to a full day, like, it was like a good 20, 22-hour test suite.
And it wasn’t super reliable, so like, you lost a decent chunk of days of, like, was those test failures [unintelligible 00:16:17], or—
Beyang: [laugh].
Adam: —like, is it just the test suite, right? So, that’s why I think the test suite, you know like, the manager at the time—and this was not, like, codified in metrics or any objectives; it was just, like, I’m tired of losing days in my team to figuring out what’s going on with this test suite. It needs to go. Right? Like that was just very much his position.
Beyang: Okay, so what did you do to, like, fix that or improve that situation?
Adam: So, the projects during my time there, that kicked off to get this sorted out, new test framework, translating the test suite. So like, that sort of, like 20-ish hours collapsed into, I want to say, like, 90 minutes to two hours. Like, somewhere in there. Like, it was a huge reduction.
Beyang: What’s your secret? [laugh]. “There’s one weird trick.”
Adam: [laugh]. I think don’t have a massive test suite written in Perl. Like, that might be like—that’s not really—I don’t really feel like that’s a secret at this point, really, in 2022. Like, no one would really do that. I will also say, like, I think it was like an early exposure to the danger of metrics, like, your incentives.
Because I know, that test we came to be because a director had said, “In three months, we will have 1000 tests.” So, they did. [laugh].
Beyang: Wow.
Adam: But, like, you know, two years later, it’s not performing very well, right? So basically, it was like a new test framework was written in Java, which, like, so into [J Unit 00:17:57], which also then aligns with the stack of Hadoop and friends that was already in [unintelligible 00:18:01], right? So, like, there’s a good lesson, right?J as you’re thinking about productivity tooling. Choose stacks that align with your team. Like, that’s a really important lesson.
So, that was like one effort. As Hadoop was evolving, you know, as the compilation and the unit test time was climbing, so there was an effort to heavily parallelize the unit test runs across, like, essentially a Jenkins cluster using Maven natively. So, that got the unit tests down from, like, what was getting to a few hours to, like, 25, 30 minutes. That’s super important. So like, that was two efforts.
I also actually worked on—so as well as this, like, test suite in Perl, we also had deployment code that was like 10,000 lines of Perl and shell, like, mixed together. I worked on some Chef cookbook code to, like, collapse a bunch of that, which also, like, was much less code and actually deployed quicker. So, like, we re[ally just went after every single piece of this. And the end of this before I left, we’d basically gotten down to an iteration cycle could be done in about two hours. So, we’d gone from, like, the team had an iteration cycle once per day best case scenario to they had four or five in a typical working day. And that’s a huge difference, right? Like if you’ve never felt like that order of magnitude jump in your career, like, it is an amazing thing to be around.
Beyang: Okay, so I have two questions here. One is around the point you made earlier about metrics I think it was really interesting. But before that, sounds like you were working in what was effectively a sizeable codebase yourself that was composed mainly of Perl and bash. How the heck did you do get productive in that environment? You know, like, those two languages are notoriously known for being, like, you know, optimized for write but very difficult to read and understand.
Adam: Yeah, I mean, I think the brutal truth is by doing as much of it in other things as I possibly could, right? Like, that was not a codebase that was, to your point, accessible, right, if you’ve come to it later. You know, and this is also a point in my career, where it’s like, I’m a little bit new to, like, managing infrastructure. Like, this is a bit new to me. And so, you know—but luckily, at the time, things like Chef are growing in popularity. They already had a ramp at Yahoo, we also had, like, the beginnings of an OpenStack deployment internally, like, with VMs.
So, it was like, I was able to just kind of go, “You know what? I’m going to make on-demand clusters for this portion of the test suite work much better,” right? So, it was like—and it was—it’s honestly ridiculous, right, that, like, I could do the equivalent deploy with, my memory is something like 700 lines of Chef and another hundred lines of, like, I think it was like a Python script that I wrapped around this, for managing the OpenStack deployment. Like it, [laugh] like, truly ridiculous how much less code it was.
Beyang: Yeah, it’s almost, like, you got the same functionality, but taking on a lot less liability because you’re able to reduce the amount of code that yielded that functionality.
Adam: Yeah. And then the other, sort of, side effect of this as well was the way the Perl and shell actually deployed code when it was upgrading was really awkward in that output that you got from the log was actually just telling you what commands it was executing remotely. So, you never actually saw the output of the commands; you just saw what it tried to run.
Beyang: [laugh].
Adam: And then the result.
Beyang: Where it stopped. [laugh].
Adam: Yeah, so, like, not only was Chef less code, but it was also, like, a very, very descriptive output as to what it tried to do, and if it failed, how it failed, right? Like, you just sort of get this by default. So, not only was, like, the cognitive load on the front-end much less, the cognitive load of figuring out what had gone wrong was lightyears ahead as well, right? So yeah, so like, that was like my first sort of like inkling of, like, hey, there’s a modernization happening in this tooling, and if that’s how the old world looks, I’m never ever going to touch it. I’m only going forward.
Beyang: I feel like that certainly puts modern infrastructure pain points into perspective a bit, you know? I still, like, to complain a lot about, you know, dealing with AWS and whatnot, but you know, it certainly was a lot worse not that long ago. On the question of metrics, like, that is a very, you know, hot debated topic. And what you’re saying is absolutely true. Like, a lot of organizations, I think, make this mistake of over-optimizing for metric. I guess my question to you is, like, what is the right way to use metrics?
Because obviously, like, there maybe it’s a place for them. Like, we’re a lot—we’re very much in a data-driven world, you know? Hadoop played a big part in driving that, so, like, you know, what is the appropriate way to use metrics? Like, do you not use them at all? If you don’t use metrics, like, how do you actually keep yourself honest as to the result that you’re driving?
Adam: Yeah. So. So, my general philosophy on this is what I call data-informed, for both decision-making and, essentially, progress-tracking, right? I think it’s very important to have them, but you must always continue to put them into context. And that actually sort of forces you to go through this, like, renewal process of making sure they’re telling you what you think they’re telling you.
I mean, metrics that you can sort of directly impact will never be valuable because it’s just too easy to game, right? Like, you’ve got to be very careful about proxy metrics that aren’t really measuring the thing that you’re interested in, right? So, there’s a bunch of these, like, things you got to be careful about. But I’m actually, like—I really do, like, working generally under, like, an OKR system for, sort of like—because it puts it forces you to put it in that context, right? Like the objective is, “This is north. This is where we’re going.”
My key results, when designed well, are both, like, “How am I going to measure my progress in that direction?” And the other part that I’ve mostly seen people miss out on is you must also declare constraints, right? You want to declare progress and the things you want to not see go sideways, right? Like you need to do both of those things. And I think if you’re sort of in a good, like, rhythm of going through that process, I think it’s really valuable.
I think tossing up metrics as, like, absolute measures of performance and then using them as sticks across teams is awful. I’ve definitely seen that like that. So, there’s lots I can tell you about how not to do it. OKRs is broadly, like, how I think it’s good.
Beyang: Can you maybe talk about how you applied that in a specific example? So, for example, you know, when you started working on the Hadoop development environment and, you know, that team had gotten to a bad place, or the, I guess like, testing cycle gotten a bad place because of mismanagement of metrics, when you were, you know, trying to fix that situation and get the cycle time from, you know, once a day to multiple times per day, did you have metrics that you were looking at as, like, informative or validators at that point?
Adam: Yeah. So. So, we were, but not in like, that formal of a way, right? Like, we started to track build times, we started to track, like, test times and test failure rates, but in a relatively ad hoc way, right? It was like, it was how we would declare success and move on, is, like, we would do the analysis and be like, “Okay, has this trended, like, where we wanted it to?” And, like, we would move on.
We weren’t doing anything like emitting data and, like, building, you know, dashboards and alerts. Like, there was none of that was happening. But, like, there was definitely points in time where we would do that analysis to be, like, “Okay, have we tackled the problem we thought we were tackling?” And if yes, we would just move on, right? [laugh]. Like, assume it would stay good from that point.
Beyang: Did you run into any issues of, like—so, inside of 60-person org, I feel like that’s enough people for there to be communication difficulties, like people interpreting your stated goals in different ways. And sometimes, like, metrics are thrown out as a way to just, like, make it crystal clear, like, what is the number that we care about? Did that occur at all, or were you able to, like, work through, kind of like, the nuances?
Adam: You know, weirdly, we didn’t really have that in that group of people. I agree with you. It is a large enough group of people where it could have happened, but I don’t ever actually remember feeling like that was the thing that we went—
Beyang: Okay.
Adam: Maybe [crosstalk 00:27:21]—
Beyang: So, everyone just got it? Like, they just were—you were on the same page?
Adam: It felt a little bit more the org was big enough and busy enough that everybody was just, like, “I got my stuff. I trust you to have your stuff.”
Beyang: Got it. [laugh].
Adam: Like, “Come back and report progress, and cool,” right? It was a bit more like that dynamic, I think, then sort of like everybody got it.
Beyang: Cool. One of the other things you mentioned was, you know, transitioning over to Chef for deployment. I feel like, you know, over the course of your career, you seem—I guess, now we’re, like, several iterations, you know, past Chef in terms of, you know, the technologies we’re using to provision and orchestrate environments. Can you talk a little bit about just, like, your view of this whole evolution?
Adam: Yeah. So, I think—so, broadly speaking, it’s a really clear place where you can see the effective tooling in terms of the amount of stuff an individual can manage. You know, and I’ve seen these talks—or, like, I used to watch them back in the day when Chef was still, like, a thing, you know—and I remember some of the folks would sort of put up their slides. It was like, “In the olden days, we had ‘Ye Olde Runbook,’ right?” And then somebody would take Ye Olde Runbook and write Perl and bash, right? Like, this was like, the next step in the evolution was like Perl and bash.
And then after that, someone would be, like, “Okay, for the config management piece of this—” because it’s important to remember that Chef, at that time, didn’t really do provisioning or actually orchestration. It was very strictly, in essence, a config management tool on an individual node, right? Like provisioning and orchestration was like an exercise left to the user, which is, like, terrifying because it’s a huge domain, right? But just getting to that point, you could essentially—you know, an individual sysadmin, who in Ye Olde Runbook, days probably handled a couple of dozen nodes. If they got into the scripting stage, they were probably handling a couple of hundred.
If you’ve graduated to, like, config management tools, like Chef or Puppet, or like, any of that, like, generation of tooling, you’d be in a couple of thousand nodes, sort of, per operator, right? So, you’ve got these, like, step shifts in, like, orders of magnitude of, like, stuff that you can manage, right? The stuff hasn’t changed, really, right? Like it’s still, in essence, nodes, you’re installing packages, you’re laying down config files, it’s running some services.
The stuff has hasn’t changed. You’re just doing more of it, right? Like you’re just managing more of it because the tooling is reducing your cognitive load to get there. And it definitely carries on from there, right? Like, as we figure out, like, the next set of stuff as we head into immutable [unintelligible 00:30:17] patterns around VMs, like AWS and everything that brought to us around cloud-native and, like, containers, and then into—which was, like, a continuation of that pattern.
And it’s also, like, taking us into a pattern where, like, number of nodes per person is not a meaningful view of this space, right? It hasn’t been for a while, you know? But like, that’s really, I think, the progression that you can see, as an industry.
Beyang: Do you have a guess as to, like, what is next? Like, now we’re in—like, from my point of view, it feels like, okay, the Chef thing is about ensuring the environment in the VM was set up properly. And then it was a little bit about provisioning, infrastructure, I think a bit. And then Docker came along, and it was like, very much, like, a container-based world all of a sudden. And then we got Kubernetes for orchestrating, and now there’s, like, this new, like, serverless thing, which is… there’s a lot of things—that term is doing a lot of work.
Adam: Yes.
Beyang: What, in your view, are you most excited about, I guess is what I’m asking, on the horizon?
Adam: So, actually to me, the exciting thing about, like, what’s happening in like, that set of infrastructure stuff isn’t actually so much about what’s happening in the prod side. I think there’s, like, a natural—like to broadly categorize—like, we’re done writing something it’s running, right? Like, that’s sort of like the prod side. With—it’s a really bad term; we need better terms for these things. But I actually think what’s really fascinating right now about this work is that it’s pushing techniques of repeatability further left in the development process. This to me is what’s fascinating about this, right?
To get more specific about that, things like the dev container work inside VS Code—which is, like, relatively recent, right—like, when you sort of unpack what that is, that’s, like, oh, we actually want to make the environment for your inner loop a thing that we define, in code. We share it, it’s repeatable, it’s testable, and it’s used everywhere. So, it’s not so much about, like, unifying that environment to, like, what the production thing is. To me, it’s the application of those techniques that I think is, like, super interesting about how we can grow as an industry.
Beyang: Yeah, that is interesting. It’s like almost, like, pushing it’s almost, like, you know, if you buy into, like, the DevOps breakdown of the development cycle, it’s almost like there are a lot of innovations that we made on the ops side of things in recent years, and now some of those are starting to be applied to the development side in quickening the iteration loop in the early stages of the development cycle.
Adam: I think Kubernetes, the various serverless products slash frameworks out there, broadly speaking, that’s a developer doesn’t care where things run sort of scenario, but that—I’m not going to claim that to given at this point because like, that’s still probably, like, early-adopter slash early-majority stage in, like, the adoption curve, but I think we can all see that it’s heading there, right? Like, it’s like, we could definitely see, like, the path to that. So, I don’t think there’s any controversy and just being, like, yeah, like, there’s going to be a time fairly soon, where developers—in the large—are not going to care about it; it’s not going to be their business. That’s great. That’s awesome. With some caveats. But I think, like, the techniques of that, yeah, you know like, bringing those techniques—yeah, shifting left. A technique shift left is a very interesting phase to be in. And I really think that’s what we’re looking at.
Beyang: Yeah. We’ve talked a fair bit about team-based tooling that is focused on the operational aspects of code. So, like, getting things through the, you know, testing pipeline, and deploying, orchestrating them in production. I think you’ve also done a fair bit of work on just, like, the source code level side of things. Like when we were chatting earlier, you had this kind of interesting framework of, like, you know, viewing source code as, like, an asset management task, almost. Can you talk a little bit about that? Like, what is a way in you in which you view source code and what are some of the, like, tools that you think are good for, you know, being effective in that part of development?
Adam: Yeah, so I think so. The first lens I’ll give you this answer from is, like, from the org level, right, is I think in most tech companies, I don’t think it’s controversial to say that your assets are principally code and data, right? Like, literally the value for, like—this was certainly true of companies, like, Pinterest, it’s certainly true companies, like, Facebook, right—like, that’s it; the entire value of your business is your code and your data. And the data comes from your user base, right? But, like, the user base therefore generates the asset class, right, that is the data, right?
And what’s really sort of interesting about this, like, when you start to think about, like, the risk management side of, like, okay, so that’s a massive financial asset class that we as a business have that defines our value. Start thinking about if we feel like we’ve got comfortable techniques for dealing with it. And pretty quickly, like, some things start to pop as, like, significant issues. I think, you know, like, I’ve been involved in a couple of, like, earlier acquisitions and stuff, right, in my career, which were interesting because, like, there was, “Oh, let’s scan the open-source licenses that you have in your codebase and in your package repositories and we’ll call that good enough.” And I’m like, “Hold up. If you’re buying a business, that’s not remotely close enough to, like, looking at… like, most of the asset cost of what you’re buying, that just doesn’t cover it, right?”
But the thing is when you start to think about that, we actually as engineering orgs, don’t really tackle this, right, by default. And we only have to look back to, like, very recent examples of things, like, Log4Shell to think, “Oh, we don’t have a handle on this as an industry at all.” We just don’t. We’re not—and so—and that’s just the risk side, right? Because obviously, when you think about an asset class, you want to think about, okay, what’s my risk? What’s my exposure here? But what’s my return on investment and what’s my cost of managing this? Right?
Like, you want to be thinking about these things at the organizational level about this. And yes, you have human beings that create this code, right, and we know what the cost of those, but we don’t, sort of like, translate that into this view of the codebase. I don’t really have this figured out; it’s a thing I’m thinking about as, like, “Hey, I think we’ve got to start treating this actually as a financial asset class because you know what? It is.”
Beyang: I think that’s a really good framing. And it reminds me of, kind of, like the 2008 financial crisis, where I actually started my career, working inside large banks that were in the aftermath of that crisis and were building software—
Adam: [crosstalk 00:37:46]—
Beyang: To help them, like, manage the—they thought they had been acquiring these assets in the form of, you know, mortgage portfolios, but, like, to your point, like, they didn’t really peek underneath the hood and understand what—like, they did the surface-level scan and it looked fine, but they didn’t really poke into the code itself. And what they thought was an asset [unintelligible 00:38:08] was actually this giant liability that was almost—basically, existential [laugh] amount of liability. And I feel like code is the same way. Like, you know, if it was something like security, you know, Log4Shell, if you’re pulling into dependency to add, you know, an incremental piece of functionality, there’s incremental upside there, but, like, a huge catastrophic downside that you’re pulling in. And I don’t feel like, you know, developers, we as an industry, have internalized that view of things.
Adam: Right. And then this is how this starts to transition into the individual view because I think it’s probably unlikely that code represents, like, a cliff liability in the way that those mortgage portfolios did, like, heading into—
Beyang: Sure. [laugh].
Adam: I mean, I hope that statement is true, but, like, I, I, I… and I suspect it is, but you know, I wouldn’t know how to prove that. But I think when you start then to think about, okay, from the individual’s perspective, like, what’s happening here as well, right, is we are only increasing the amount of code an individual developer has to deal with, right? Like, this aforementioned—so even though our tools are providing these step-shift reductions in amount of code per task, the sizes of our codebases actually, like, continue to grow past, like, individuals, right? So, like, we can because we’re generating a lot more stuff, right? So, even though the tools are getting massively better, we’re still creating more, right?
And so—slight organizational detour—we don’t have enough software developers in the world; we can’t grow enough quickly enough to counteract this problem. That not it’s not going to happen. So, what we have to continue to do is think about, okay, how can I make it so an individual can deal with more code? Like, because this is just going to continue to be weight, it’s going to continue to be risk, it’s going to continue to be, like, all of these, like, other issues, and there’s no economic force pushing in the other direction on this. Like, there just isn’t one, right? So yeah, so from the individual side, I spent a lot of time thinking about how do I make this like so you can actually just navigate this stuff?
Beyang: Yeah, I think to the earlier point to of, like, the variance in developer productivity, I feel like tooling almost can amplify that, insofar as, like, if you can make, you know, an engineer ten times more effective, that is way better than hiring ten times the amount of engineers, just because, like, if you have one person owning it’s like in one mind, they can actually, like… once you have multiple people in the mix at the, like, gap between people, I think that’s where the gaps emerge and that’s where, like, the liability and risk emerges, as well because there’s no, like, strong, like, ownership at the kind of boundaries between individual’s scopes of responsibility.
Adam: Yeah. Yeah, there’s definitely… there’s definitely a lot in that. I think ownership is, like, a really interesting topic because I think as these codebases, sort of like, continue to grow, that becomes a murkier and murkier sort of way to think about your codebase, right? It becomes far more important that you’re able to move around as an individual—as a team even, right—into different areas. And I think—yeah, I’ve definitely been around organizations, like, operating on this tension, and I don’t think there’s a good sort of like way out of this right now.
I think there’s some interesting things to think about in applying, like, zero-trust security models into, like, a zero-ownership model on top of, like, codebases, but it’s a really tough sell. Developers think they own things, right? Like, so, like, it’s a huge cultural shift for engineers to be, like, “Yeah, no. I want you to operate without that. Like, I’m going to take that entirely away from you.” Like that’s a huge shift.
Beyang: So, I kind of want to just, like, selfishly ask you, so, like, you’ve used Sourcegraph actually, at a couple of companies that you worked with. You actually brought us in, I believe, to both Nutanix and—or Pinterest, and also, perhaps Amplitude. I forget what—yeah, go ahead.
Adam: Yeah. So Amplitude, like, I’ve only been there a few weeks, so like, we’ve not had that conversation yet. I was involved in bringing it into Nutanix, but I wasn’t the only one. I was, in fact, the person who ran point on bringing it into Pinterest because, like, I’d already become a believer and we have the same set of problems. Yeah yeah yeah.
I mean, you’re welcome, but also, like, there was a reason behind that, right? And this is where I’ll actually tell you, like, straight, I think calling Sourcegraph a code search tool is an undersell.
Beyang: [laugh].
Adam: That isn’t to me what it’s about.
Beyang: We have so many discussions about this internally. But yeah.
Adam: Yeah, I know. And I—yeah, and I’ve had some of them with some of you folks. But to me, it’s sort of like a newer breed of what—I think you’ve used this term—a code intelligence tool or a code intelligence platform. And that’s really more of how I think about it because, like, yes, I want to give to an individual developer, a tool that makes it easier to find things, right, and to navigate codebases. And Sourcegraph is best-in-breed, no questions asked, in that space, right?
But when you also then start to think about some of the larger-scale things inside of an org, that start to tie into these asset management questions, like, things, like, there are code patterns that you know are not going to be okay. So, you kind of want to—and yes, you can get some of these things into linters and formatters and testers, like, but if you’re a large tech company and you’ve got several different technology stacks, applying that evenly is really actually quite challenging across your stacks, right? And actually, they tend to drop a little bit late in the process as well, like, as to when to find them. So, Sourcegraph from that point of view, to me is, like, a really interesting, like, point in the lifecycle to be, like, “Yeah, okay. I’m just going to, like, find things that I don’t like, flag them, you know, and we’ll track them.”
I think the other thing that it really stands out to me is in doing migrations. So, like, when you’re rolling out, like, a new platform, even internally, in large companies it’s a huge deal, right? One of the things that has to get done is find all of the people who are consuming, in essence, your API and get them to move. That’s like, Sourcegraph does that really well. And particularly compared to the old school method of, “Let’s just build a spreadsheet and go door-to-door.” Which sucks and, like, I’ve been around a bunch. You know, it’s just a far more powerful way of going about it.
Beyang: What don’t we do well and, like, in particular, you know, what you were saying earlier about, like, code as asset management and enabling, you know, teams and leaders to [clear throat] really think about code in this way, think about the liabilities, and address them? I feel like we’re still a long ways from, like, fully realizing that vision. Like, is there anything that you think, like, we should be doing that would be, like, really value-add to engineering organizations?
Adam: That’s a really interesting question. I think the first thing that kind of comes to mind is, to draw an analogy with GitHub Copilot, right, which, whilst there’s some controversy around this, right, but, like, we’ll just leave that to one side for a second—
Beyang: Sure. [laugh]. Leave aside the license questions. Yeah.
Adam: Yeah, yeah, yeah. But, like, in essence, right, if you can get to a world where you’ve got this shared pool of patterns in codebases that are, like, stuff we need to stop doing as an industry, Sourcegraph is a great way to kind of collect and redistribute those, right? Like, as a platform, it could be that, right? Like, now, there’s a bunch of challenging issues there around, like, crossing enterprise lines and, like, a bunch of other stuff, for sure, right? But you’re in kind of a unique position to sort of think about that problem space, I think.
Beyang: This will be essentially like a database of anti-patterns.
Adam: Yeah… yeah… yeah. Because, like, because Copilot is, like, the database of, like, generational patterns, right? Like—
Beyang: Yes. Right.
Adam: —let’s take this code and, like—
Beyang: Generate the boilerplate, or, like, the thing that we think you’re trying to do.
Adam: Right. Right. But if you sort of think of, like, the dark side of that equation of, like, what’s the set of stuff that we have to stop from happening?
Beyang: Yeah, that’s interesting.
Adam: Because over time, right, like, I think customers like me, we’re going to build up this, like, set of code patterns where we’re like, “This must not happen in my org.” And I would be willing to bet the house there’s going to be a lot of similarity across those patterns across different companies. But we’re all going to reinvent them, right?
Beyang: [laugh].
Adam: Like, we’re going to recreate ourselves, which is both expensive and error-prone.
Beyang: What other tools do you think are useful in this domain, you know, aside from Sourcegraph? Like, have you encountered any other tools that are, like, very helpful in kind of thinking about code at that, kind of like, a high level or organizational level?
Adam: You know, I don’t have anything, like, that’s really top-of-mind right now. There’s a couple of things that I’m sort of watching out for to see, like, how they play out at bit more on the individual level. And I know you had—I can’t remember his name, but the—you had one of the co-founders of Zed a few weeks ago.
Beyang: Oh yeah, yeah, yeah.
Adam: And so, Zed is, like, a collaborative editing platform has, like, a lot of interest for me because, like, because it tackles, in one sense to me, like—
Beyang: Have you tried it out? I—
Adam: No, not yet. Not yet.
Beyang: Okay. Yeah, I’m on the waitlist as well, but it seems, like, really interesting. When I spoke with Max, it’s almost like—it’s probably more than a year ago now, it was still, like, quite early stages. But it just it seems like it’s like the Atom team has, like, you know, got the band back together and trying to, like, follow through on their original vision for Atom, which is really exciting.
Adam: Yeah. And that’s part of why I’m interested in that team, right, is because like that history of Atom and GitHub, and like, some of the other things that they were tackling through there, right? It’s like if they’re coming out, like, is there going to be, like, a step-shift in how we think about… that end of the problem? That’s a super interesting space, I think. But I don’t—there’s not a lot of other things out there right now that are really standing out to me, that like, “Oh, yeah, this thing is super helpful in, you know, just making Git easier to work with at scale, in essence.”
And there’s, like, an interesting conversation as to… why companies keep descending on, like, these very large repos and using Git for it, which is, like… not actually really very well suited to this.
Beyang: [laugh].
Adam: It does it surprisingly well, but you do definitely start to hit some issues, sort of, along the way.
Beyang: Is there almost, like, a black pill on Git here where it’s like at a certain scale you should be using, like, Perforce or something like that?
Adam: I mean, that would be, like, a really controversial statement, I think.
Beyang: [laugh]. Yeah. I’m trying to push you towards controversy, you know?
Adam: Yeah, sure. No—
Beyang: We’ll get more listeners that way. [laugh].
Adam: I think there’s definitely a pill where we have to go to something else. Now, I know that, like, in Google’s history, they went from Perforce to, they now build their own source control system, right? Like, so that w—
Beyang: Piper. Yeah, yeah, yeah.
Adam: So, that was, like, Google’s journey. I think the issues with Git is that it’s not really designed to do mono repos at scale, right? Like it was built to solve a very particular itch, right? And it does it really well, and it’s a flexible model that’s grown really, very impressively, right? Really has.
I love Git. I love the model. I’ve loved it, I’ve used it for a long time. I love it. But working in large repos in large orgs is just painful, right? Like, there’s always just, like, these weird things that you end up finding. And particularly if you operate, like, a Git hosting service internally, which I’ve now done a couple of times, with a couple of different tools, which just… [Gerrit 00:51:19] and [Phabricator 00:51:20], I’ve had the experience now operating. I don’t ever want to do it again. Really, I don’t ever want to do it again.
Beyang: You know, it is interesting, we deal with a lot of customers with very large codebases. Some of them have their, like, own custom fork of Git where they’ve, you know, forked it and replaced some of the, like, parts that are bottlenecks. Others are—like, we see a bunch of different version control systems, you know, Perforce, CVS, and, you know, other ones. And if you just look at the open-source world, you might get a sense of, like, everyone’s on Git, but, like, I actually think a good deal of the private code in the world is on non-Git-based repositories, especially for large codebases. And you made a point, when we were just, like, chatting earlier before the show about, kind of like, the need to build custom tooling at a certain point, especially in—like, I think the way you put it is, like, large single-product organizations, at a certain point, you reach this point where, like, the off the shelf tooling no longer scales to your needs and you almost have to, like, build the custom stuff.
Adam: Yeah. There’s not a clear line in here, right, but, like, you definitely get into these spaces where the things that are special about you as a culture and an org mean that you’re now far enough in terms of, like, habits and workflows from standards that, like, just off-the-shelf tooling just isn’t going to cut it, right? And you may be able to drop some things into certain pockets, but you’re definitely going to be writing, at minimum, custom glue. At absolute minimum, like, custom glue, the maps between them. And that code tends to be, like, dark code. Like, a lot of people don’t—it’s tough for orgs to—
Beyang: Thar be dragons. [laugh].
Adam: Yeah. It’s tough for orgs to see… we don’t necessarily think about proper delivery patterns for it, right? So, you know, and I think honestly, like, the best, the easiest to understand example of this for everyone is, like, essentially, build pipeline code is, like, actually where this manifests first, in every org. Because we don’t build good life cycles around, like, evolving pipelines. You know, like, it’s a really diff—and obviously, like, speaks very close to my heart because I’ve spent a ton of my career working on these things, and they’re really difficult to evolve.
I think it’s getting better. I think GitHub Actions and the like are a massive advance over Jenkins in terms of the developer experience and working on pipelines. I’m very optimistic about Dagger. I don’t know if you’ve picked that up at all, but, like, I’m very optimistic about where that’s heading, as a [unintelligible 00:54:19]. But yeah, there’s a lot of these places where… we don’t recognize that it’s going to be a cost, right? There’s a bit of an assumption.
It’s like, “Oh, we can just pick up three or four tools and we’ll be able to do what we need to do.” And that’s just not how these things work. And particularly when you’ve already got, you know, you’re up to a thousand people, you’ve got ten years behind you, you’re in a weird place now. Like, you will be in a weird place, like, just kind of by definition. And the worst thing is, like, what you really want to be able to do is back out from, like, three of those weird choices, right, so you can start picking up some standard things again, but that’s, like, it’s nearly always cost prohibitive, normally in time, right? Time is normally the cost that you can’t absorb.
Beyang: All right, so we’re kind of at the end of the hour here. As kind of like a parting thought, is there any, like, tool or maybe, like, a blog post, or you know, maybe even like a Twitter account that you would recommend people checking out if they’re interested in just, like, kind of continuing the thread here, like, thinking about code at, like, a high level, like, in terms of, like, asset management, that you think is, like, a particularly good resource for shifting the way about how people view large codebases and how to manage them effectively?
Adam: Um, I don’t know that I have, like, a single point of entry, honestly. I think, you know, if people haven’t read, like, the second edition of The DevOps Handbook, start there. And because, like, that’s really the best of knowledge that we have as an industry right now about how we deliver and develop what we work on. And the set of references that book has, and the people who did that, right, like—and if instead of, like, reading The DevOps Handbook, Gene Kim’s podcast, that would be the listen. Go listen to that.
Because like, that’s, like, the system’s thinking, let’s just take systems thinking to everything that we do. So, I think, like, those would be my two things of, like, if you haven’t picked these two things up, go pick them up. They will take you to good places, for sure.
Beyang: I’ll check those out. Adam, thanks so much for taking the time to chat today.
Adam: Thank you. Enjoyed it.