Geeking Out with Adriana Villela

The One Where We Geek Out on HashiCorp Nomad with Luiz Aoqui of HashiCorp

Episode Summary

Join Adriana Villela as she geeks out with Luiz Aoqui of HashiCorp about operators, container orchestration, and OpenTelemetry in HashiCorp's Nomad. They discuss the challenges and complexities of running Kubernetes on Nomad, the value of simplicity in software development, Luiz's exploration of adding OpenTelemetry instrumentation to Nomad core, and the lessons learned in the process. Whether you're interested in container orchestration or observability in distributed systems, this conversation provides insights and valuable perspectives from a Nomad expert.

Episode Notes

About our guest:

Luiz Aoqui (he/him) is a senior software engineer at HashiCorp working on Nomad.

Find our guest on:

Find us on:

Show Links:

Transcript:

ADRIANA: Hey, y'all. Welcome to Geeking Out, the podcast about all geeky aspects of software delivery, DevOps, Observability, reliability, and everything in between. I'm your host, Adriana Villela, coming to you from Toronto, Canada. And geeking out with me today is Luiz Aoqui. Welcome, Luiz.

LUIZ: Hi, Adriana. Nice to be here. Thank you for having me.

ADRIANA: Thanks for coming. Where are you calling in from today?

LUIZ: I'm calling from Toronto as well.

ADRIANA: Woohoo.

LUIZ: Representing.

ADRIANA: Fellow Brazilian in Toronto. Awesome.

LUIZ: lots of us here.

ADRIANA: Yeah, there are lots of us here. So, we're going to start off with some lightning round questions before we get into the meaty bits. So let's get ready. Be prepared. I swear it's painless. Okay, question number: one are you left-handed or right-handed?

LUIZ: Right-handed.

ADRIANA: All right, question number two: iPhone or Android?

LUIZ: Android.

ADRIANA: Number three Mac, Linux or Windows? What's your preference?

LUIZ: Going to say Mac for now.

ADRIANA: For now. Awesome. Favorite programming language?

LUIZ: Go. Pretty big favorite of mine.

ADRIANA: Awesome. Dev or Ops?

LUIZ: I like both, but I have to say dev.

ADRIANA: Cool. No, wrong answers. JSON or YAML?

LUIZ: I think I'll pick YAML. Just more friendly.

ADRIANA: Fair enough.

LUIZ: I hate the lack of dangling commas on JSON.

ADRIANA: Yeah, I agree. I hate that too. And then finally, do you prefer to consume content through video or text?

LUIZ: Text. Yeah, I get very distracted watching videos.

ADRIANA: Yes. Same. Awesome. You survived the lightning round. By the way, I should mention we had someone on here who when I asked her JSON or YAML, she said HCL.

LUIZ: Yes. That's why I had it in my back of my mind, like, oh, I only have two options.

ADRIANA: So I thought that was pretty funny when she mentioned that. And then I thought of you, because for those who don't know, Luiz works at HashiCorp, he is a Nomad developer, so HCL fits well into the type of work that you do. So I wanted to start off with you being a Nomad developer. Tell us a little bit like, for folks who aren't familiar with Nomad, tell us a little bit about what Nomad is.

LUIZ: Sure, yeah. So Nomad is a workload orchestrator, and I know that doesn't mean a lot to many people. So the goal of an orchestrator is to basically get your assets...So, like, your developer team build things like Docker images, binaries, Java JAR files, whatever. They produce some kind of artifact out of source code. And then you have your infrastructure team that is responsible for running your infrastructure, building servers, configuring machines, all of that. And then in the far end, you have your users that are trying to access your product, trying to access your application. And then the orchestrator sort of sits between your development team and your infrastructure team to make sure that whatever artifacts gets produced, it's running on those machines. So it helps finding where to run things like figuring out what's the best server to run this application, or doing things like upgrading and deployments and all the sort of lifecycle management of your application. That's the job of the orchestrator. That's what an orchestrator do.

ADRIANA: Awesome. Yeah, and I think that's such a great way of explaining what an orchestrator does, because for folks who are familiar with Kubernetes, I mean, Kubernetes is an orchestrator as well, specifically for containers, whereas Nomad gives you that breadth of, pretty much orchestrate anything, more or less. But it is very easy to kind of forget all of the gnarly things that happen behind the scenes in these orchestrators, like all the hard work that they're doing in order for them to operate seamlessly.

Now, I've played around with both Kubernetes and Nomad, and I have to say one of the things starting on the Kubernetes side and then moving to Nomad, moving to Nomad was actually a lot easier because you kind of dealt with the complexity of Kubernetes moving down to Nomad. You're like, "Oh, it's like the simplified...everything's simpler." And it runs in a single binary. It can run in a single binary on your machine and you can get started easily, whereas Kubernetes is more of a beast. I mean, yes, you can have really complex setups with Nomad, of course, and that's probably how you have it in production. But as far as I think the barrier to entry when it comes to Nomad is very low, which I think can be very appealing.

LUIZ: Yeah. Complexity is an interesting discussion because complexity sort of means different things for different people, I think, when thinking specifically on this Nomad versus Kubernetes discussion, I think there are a few things to consider when you think about complexity of adoption. Let's say when you start, if you starting from a managed service for Kubernetes, like EKS, GKS, AKS, that's like one click, and then you have a cluster. So like, oh, there's no complexity there. And that's how most people nowadays consume. Kubernetes is through a managed service. So that almost basically removes the barrier of entry for those that are able to use a managed service, of course.

But then it gets the complexity of understanding how to use those systems. Like, okay, it gets provisioned for me. There's some cloud magic happening behind the scenes. I don't have to deal with that. But now you have to run the system. Now you need to think of how do you map your team's workflow to that new tool? So there's all these different concepts in Kubernetes that I think is part of the complexity. There's all these different tools that you can use. Having a broad ecosystem is great, but it can also lead to some confusion about when do I adopt to do this or when do I do that, how do you bring all together to eventually reach your end goal?

Which is like, I want my users to be able to access my product right. When I think complexity, I think more on that sort of day-to-day operations, like understanding what's happening with the system. And I think that's when sort of Nomad becomes simpler just because it has a smaller surface area for people to interact. You write your job, you run your job, and there you go.

But like stepping back a little, the complexity of Nomad comes in on the deployment part because we don't have a managed service of sorts. Like, okay, now you need to understand what are Nomad agents, what are Nomad servers. Now you need to manage their state, now you need to manage upgrades. And this can get complex in that sense. So I think that discussion of complexity, I find it very interesting just because of this duality of like, okay, what am I trying to do? Am I actually running the cluster? Am I actually just using it? And somebody is provisioning...nowadays we call the platform teams. Is there a platform team running a cluster for me? And so yes, it's no much simpler than Kubernetes, I guess, depending on what you like, depending on where you're kind of like depending on what complexity you're talking about.

ADRIANA: Yeah, that's so true. That's a really good point. I just want to go back to a point that you made earlier about the fact that there's all these Kubernetes managed services, but there's Nomad managed service, which is kind of interesting, because if you look at various Hashi offerings, there are managed services for a bunch of stuff. So why is it that there is no managed Nomad right now, even offered through Hashi's own cloud?

LUIZ: Yeah, it's something that definitely part of the plan, is something we think about. But there's quite a few challenges for Nomad, specifically compared to other HashiCorp tools, is that we have a big competition with Kubernetes. So if you have a managed Vault, that Vault, this is basically the only tool you have.

You have a managed Terraform, and there are other tools for infrastructure-as-code, but like, Terraform is one of the big ones. For Nomad, this competition is much stronger, in that sense. Where you're...so, that's mostly my personal opinion.

But thinking of a managed Nomad service versus a managed Kubernetes service, it gets into that point of how much value that actually becomes. Because part of the benefits of Nomad is the flexibility and the flexibility of running different environments, flexibility of running different workloads. But those workloads, you sort of need to have control over infrastructures. Like, I want to use Podman, so I need machines that have Podman installed.

ADRIANA: Ahh, okay, got it.

LUIZ: Going back to that discussion of managed versus self-hosted, it's a spectrum, right? Like if you go to managed, it's simpler to operate, but it's probably more expensive. And you also lose flexibility, whereas in self hosted it's harder to manage, but you get full flexibility and probably cheaper. So there's that aspect that sort of like, we need to figure out.

There's also something to consider about costs and things like that. Because if you have a managed Kubernetes service on AWS, AWS is sort of taking the heat of like, they can sort of discount the compute because they run the compute.

But now if you have an external managed Nomad, you need to account for the price of that service plus whatever infrastructure you're using. And so it kind of becomes like a pricier solution. And so figuring out the business part of that can also be a little challenging as well. So there's all of these little things to consider, but I have to see what happens in the future.

ADRIANA: It's a really great explanation. Thanks for clarifying. Another thing that I wanted to ask you about is, I think one of the big things that Kubernetes has that you don't see, I don't think a direct translation for In Nomad is that you've got the whole Kubernetes operator concept, whereas as far as I know, there isn't that concept quite in Nomad. I've seen a few blog posts where people try to replicate it to a certain extent, but it's not quite the same thing. Can you comment a little bit about that?


 

LUIZ: So that's a very sort of common question to see. And I think there's this idea that you need to think about that operators is just a pattern that you can implement in pretty much anything. So the idea of listening for events and responding to them and that's something you can do with Nomad. And actually a few projects that I've seen do that. I was googling for the name. There's a project called nomad-ops that implements the operator pattern in Nomad using some of the functionalities we build.

There's also a company called Koyeb. They are a platform-as-a-service company that use the operator pattern. And they have a library called kreconciler, I think it's called, that helps you build this sort of operator paradigm functionalities. But it's also important to think that on Kubernetes, it's not just operators that are the main thing. When you talk about operators, you're always associated with a CRD because that's the data. So you have operators being the logic, CRDs being the data. And that sort of helps guide your end goal sort of based on those two concepts.

So in Nomad, you can do the operator. We're building things that can help you do that sort of things. One of the challenges, like how do I access the Nomad API from my task? And we're building like, now you have like a socket that you can use to talk directly with API. We are building workload identity, so you don't have to worry about ACL tokens or anything like that. It's like we're building things that help people create the sort of operator paradise in Nomad.

But the CRD sort of becomes a bit of a challenge because we don't have that concept of as extensible as Kubernetes has. But you can sort of have that Nomad variables and things like that. You can kind of get around those challenges. But I think there's no another point of CRDs is that they're sort of standardized, right? Let's say the Prometheus operator expects to have this or generates specific CRDs. And then a Grafana operator can sort of rely on those CRDs to do stuff automatically. So this type of standardization, we don't quite have that yet.

ADRIANA: Right. Interesting. That's really cool. I'll definitely be sure to put those two projects that you mentioned in the show notes for folks to refer back to those. Another thing that I wanted to touch upon, because I believe you and I talked about the fact that there was a new version of Nomad that just came out. So what version just came out? And what are the exciting things that we can expect to see in Nomad?

LUIZ: So Nomad 1.6 just came out and the main feature is called node pools. And something that I worked on, so apologies for any bugs. The idea of node pools is that it allows you to create...sort of like, segment your clients into groups, into pools.

So a bit of Nomad background very quickly. So you have two types of machines in Nomad cluster. You have a servers. It's like your control plane. They do the scheduling, they store state, they sort of do all the global view things of your cluster. And then you have clients, they're like talking to the servers to get information about what that specific client needs to run.

So the client is sort of the data plane is the component that actually run things, so actually runs your Docker containers, your JAR files, whatever. And so one of the challenges we've had in the past with Nomad is that it's very hard for you to sort of associate a group of, let's say I have a group of clients that I want to run my backend services.

And you can do something like create a constraint that says, okay, my back-end service only runs on machines that have this specific metadata. But doing the opposite is kind of hard because constraints are like, you need to tell a job what the constraint is, but in order to sort of prevent others from accessing those same machines, you need to create like a negative rule. So you have to say, okay, this machine, this job runs on these machines, but every other job is forbidden from accessing those machines. So you sort of have that dual constraint type of thing. So it's very hard to manage that to get a consistent scheduling outcome because of like, if you forget a constraint rule, now your job is running somewhere that it wasn't supposed to be.

So with node pools, you can put a new configuration on each client saying which pool it belongs to, and then on your job you'll say, okay, this job runs in this pool. Now only jobs in that pool will access those clients, and those jobs will only run in those clients. So you kind of create a sort of segmented part of your infrastructure that is reserved for specific jobs.

ADRIANA: I want to say what you were describing without the node pool is kind of reminding me of Kubernetes node taints, where you can say what can run where.

LUIZ: The idea is to have a very simple solution. So it's like this node is in this pool, this job runs in this pool, and that's all you have to do. So there's not a whole lot of configuration that you have to do. And in addition to having this sort of segmented view of infrastructure, node pools can also hold configuration. So they are a first-class concept. In Nomad, for example, one of the workarounds people used to do to get around this constraint problem is to use data centers.

So in Nomad, the idea of a data center is just like a collection of nodes. And if you have an availability zone, that sort of becomes your data center. And people would kind of hack around the problems using data centers, so they have like a data center for apps, which doesn't quite match the intention behind it.

But the problem is that data center is sort of just like a metadata, so you cannot have specific configurations per data center, things like that.

ADRIANA: Okay.

LUIZ: But with node pools, you can attach like you can put a description on your node pool, you can put metadata in the pool to sort of create more information about what this pool is used for. And then in Nomad Enterprise you can actually have different scheduler configuration per node pool. So for example, you can have a pool that uses the spread algorithm and other pool that uses bin packing so you can adjust how the scheduling is done per node pool. So there's a bit of extra customization that you can do per pool and that could be very helpful in several cases.

ADRIANA: That's awesome. It's interesting because there's this recognition of, like, oh, people were kind of trying to hack together the concept of a node pool by using these data centers. So it sounds like there's this recognition of oh, users are trying to do this. Why don't we formalize it and turn it into a proper solution? What I like that you had said earlier also, which I think it feels like it is a general philosophy with Nomad is basically going for the overall simpler solution. Like, don't try to overcomplicate. Just go with the base thing. That works pretty well and we don't have to drive ourselves mad.

LUIZ: Mmhmm, yeah, exactly. Yeah. And that was a big thing during the development phase because a bit of background on how we develop features at HashiCorp, we start with an RFC. So whenever we want to implement a feature, we write down the description of the feature and sort of send to the we first sent to our immediate team, then to the whole company for feedback. And during that process I had gigantic ideas like, oh, maybe node pools should be dynamic and then you can dynamically add and remove nodes from the pools. But that sort of adds so much complexity with questionable value.

So that's like part of the feedback I got from the team was like, let's start simple, let's start solve the problem at hand and then we can expand if the need arise. But yeah, this idea of simplicity, trying to make things easy to use from day zero, it's very important to us.

ADRIANA: Awesome. That's very cool. I want to switch gears now because there's, like, two things that I'm hoping that we'll have time to talk about still because there's so much cool stuff to talk about, but I want to switch gears quickly to a collaboration that you and I did, which was really fun. It came out of just me having a wild idea that came out of nowhere, where basically I thought, "Hey, wouldn't it be cool to try to run Kubernetes on Nomad?" Because there's Kelsey Hightower's well known GitHub repo where he's running Nomad on Kubernetes.

So I posed the question, "What if you can run Kubernetes on Nomad?"

And I thought, "Maybe let's not try to go too crazy here." And so my idea was, like, I want to find a lightweight Kubernetes distribution that we can run on Nomad, something that's hopefully already containerized, because trying to run Docke-in-Docker is kind of a nightmare if you try to do it yourself. And so there's this distribution of Kubernetes called k0s that comes in a Dockerized version, which seemed relatively straightforward to deploy on Nomad.

And so I reached out to you when I came up with this idea, and then you helped me through a bunch of the troubleshooting, so I just wanted to talk to you, have you share your experience around this collaboration. Yeah, just thoughts.

LUIZ: Yeah, it was pretty fun. Like a lot of learnings, I think, in terms of just understanding how things work, because k0s is pretty cool project in the sense of like, oh, you just run that image and you get a container with everything sort of there for you. But I found it a nice learning lesson of debugging of when things don't work. So normally you would expect things just "docker run" and it works. But what happens when it doesn't work and having to debug and going through logs and sort of combing through those log lines. I found it a bit challenging because having everything in an image, it's easier to start things. But then when you need to debug and you have your etcd logs at the same time as your controller logs, at the same time as the Kubelet logs, it all sort of juggles together and it was very hard for us to sort of comb through that and understand what's actually failing.

ADRINANA: That's so true. That's so true.

LUIZ: And then you have retries, right? So you see an error message and then it retries and then there's an error message again. But is it the thing that is retrying that is the problem or is it trying to call something else that failed before and then the log just sort of disappeared from the history? Just because that part, I found it very interesting.

ADRINANA: Yeah, I totally agree. And it was funny because I did the classic rookie mistake of like, well, of course this thing works in Docker. Let me just try to deploy it in Nomad. And then I realized I was getting all these error messages where the Kubelet was not starting, which you kind of need that for Kubernetes. So it looked like it started up in the Nomad job, right? The Nomad job deployed successfully, but the actual thing inside the job was not running correctly.

So I neglected to try to run the k0s locally on Docker and then discovered a bunch of stuff where initially we were running into all sorts of issues, too, because running on M1 Macs, everything is special. I love having an M1 Mac, but, my God, there's all these little annoying considerations. So that made it extra complicated. But then once we got it running standalone in Docker on the M1, then we were able to port it over.

And it was interesting because for me, I always like to try everything in Hashiqube, which is like this full-fledged Nomad environment where you have, like, Nomad, Vault, Consul all running together. But it's provisioned using Vagrant. And on the M1 Mac...normally you'd provision using a VM in Vagrant...but on the M1 Mac, Vagrant does not play so well. So it's Docker as the provisioner. And so you're basically running Hashiqube. So you're running Nomad in Docker. And then you're deploying Kubernetes as a Docker image and then deploying our test app, which was Jaeger. Was Docker in Docker in Docker. It was like, I don't know, three or four layers of Docker.

And then you took the more pragmatic approach of, like, let me just run this using the Docker binary. Sorry, the Nomad binary. Much easier.

LUIZ: Yeah. That applies like several scenarios when a GitHub issue shows up, people describe their entire cluster and environment and give us a ginormous job file. And usually my first step is, okay, I need to reduce this, you need to boil down to

ADRIANA: It's so true.

LUIZ: What's the actual problem? So, okay, let's try to remove, let's say, half of the job that I don't care about.

ADRIANA: Yeah, I think that is a very sound approach.

LUIZ: Let's just run a dev agent, see if that sort of reproduces the issue. Normally, my first step is to reduce as much as I can and then start adding things. So, yes. I don't know. Dev agent for me, Nomad is like my direct go-to anytime I need something. Nomad, "nomad agent -dev" and start from there. Yeah.

ADRIANA: Reduce the noise as much as possible and then start building back up until you figure it out. This is the wall I actually hit. So, yes, lesson learned to you all. I should know better. I've been doing this long enough that I have found myself in situations where I want to do everything all at once. And then I'm like, strip, strip, strip, strip, strip all the things until you get to the actual problem. But this was a fun little collaboration.

And then there was one component. What was it? The C groups namespace where there's a Docker configuration that Nomad did not support. And so this is where it helps knowing somebody that works on Nomad because Louise was able to make, like, a little fix to accommodate. It's not part of the Nomad product. So you will not find this as part of standard Nomad. This was just so that we could see if we could get this running with this configuration.

LUIZ: That part is very...on the surface, it's like, oh, it's like configuration value that it passed to "docker run" on the Docker CLI, that's just a flag that is set. But what it actually does, it's much deeper onto the environment that you're the I forgot the exact flag.

I think it's cgroupns. And then you need to put host.

ADRIANA: Yeah, I think that's the right one.

LUIZ: But the tricky data is that Docker and cgroups, they do, like, weird stuff that Nomad sort of needs to work around that to make some of the Nomad things work. So, for example, resource isolation. Nomad uses cgroups to sort of enforce that. No matter what task you're running, no matter if it's a Docker container, a binary, JAR file, we use the cgroups because that's the common layer. But the way that Docker does things, it kind of hides that from you. So you as a developer, sort of needs to work around all the things that docker does with cgroups to get that to work.

And so even though it works, that configuration, it's kind of dangerous in the sense that it can lead people to break other stuff without realizing. And so that's like, yes, we should support this, but not this naïve approach that I did.

ADRIANA: Yeah.

LUIZ: Luckily, this is such a sort of common problem that we'll probably have a better cgroup handling in a future release. And once we got that, then we'll be able to support that feature. But for now, it's on my sad, unmerged PR.

ADRIANA: But it's interesting because it was a good learning, right? Because it was like, hey, we got this to work with this special unmerged PR. But then it kind of led to more questions. Right. And I think this is a really great lesson for anyone. Whereas, yeah, this might seem like, oh, this is an easy solution, but what are the repercussions? And that's why pull requests exist...

LUIZ: Exactly.

ADRIANA: ...so that we can mitigate against weird things happening, because you just simply do not know what the side effects are going to be from, like, oh, I added this little flag. What's the worst that could happen, right? So, anyway, it was a really cool side project. I'll provide a link to the blog post where I detailed our adventures in the show notes. And then the final thing that I wanted to talk about, because when you were On-Call Me Maybe, it was one of the reasons why we brought you on was to talk about how you had played around at one point as part of a hack week to try to add OTel instrumentation to Nomad core. And this was, I guess, over a year ago. So I was wondering if you could talk about what you've learned a year on what the status of that is right now.

LUIZ: Yeah, cool. Yeah, that was still pending. I would say it's a very side project of mine, just like an exploration. I think I try at least three times now, try to get some OpenTelemetry to Nomad. And every time I learn something new, which is great and sort of like, builds on top of the previous attempt. So, like, a bit of history is, like my first attempt was sort of a very big view of, like, I want to instrument whole Nomad. I want to be able to create this sort of trace and spans from, like, I submit a job, that job gets scheduled, that gets picked up by a client, and that client starts a test driver, and the test driver calls Docker to start.

I wanted the whole flow as trace and spans and all of that. That turned out to be a terrible idea just because I want to say it's not doable, but it takes a lot of code changes to get to that point. So that's the first learning. Don't try to do everything at once. And then my second attempt was more focused around not exactly OpenTelemetry in Nomad per se, but like helping people using OpenTelemetry and running things in Nomad to get information. Now I forgot what the name of that component is, but it's like a way for if you have an application that uses the OpenTelemetry SDK to automatically pick up information from Nomad, like the allocation ID, the job name.

So you use the OpenTelemetry SDK. There's like environment variables that are sort of standardized.

ADRIANA: Yeah.

LUIZ: So provide those things automatically.

ADRIANA: Oh, yeah, because I think there's, like a similar thing in Kubernetes where you can automatically grab from your Kubernetes pods.

LUIZ: Yeah, there's a whole spec for that. I forgot the name, how it's called, but it's a way to sort of automatically infer information from different sources based on either environment variables or API calls. So I kind of hack around that and sort of works. There's another set branch that I didn't emerge with this work, but the challenge there is sort of like it's kind of hard to tell what information is relevant because you also don't want to shove a bunch of things because it's going to increase your network packet size. It's going to generate a lot of extra information that you may not care about. So I'll have to build a way for you to customize which information you want. So that's where I put a pause on that.

And then my last attempt after talking with you, Ted, and some other folks on the OpenTelemetry community, I learned that don't try to boil the ocean, don't try to instrument everything at once. Focus on your core business logic. Start there and then you're going to get a lot of value from that already and then you can start building on top of that. And so my latest approach was like, okay, I cannot create a whole trace. I cannot create that relationship between traces, but can I use metadata to connect them? Probably I should explain this, but one of the challenges with OpenTelemetry in Nomad is that OpenTelemetry, and more specifically the distributed tracing aspect, is sort of focused on microservices and network requests and sort of keeping a track of those network requests.

But in Nomad, the complexity is sort of built in into the Nomad binary. So the complexity comes almost like from local function calls rather than network requests. And so if you try to create spans for function calls, you get like tiny traces of a few milliseconds that are not really useful and it just generates a huge overhead. But what helped me there was understanding this notion of like, oh, I don't actually need to connect the traces per se. If they have the same metadata, then sort of like whatever platform you're sending those traces to, like Lightstep, Honeycomb or Zipkin, Jaeger or whatever, then you can start querying traces that have the same metadata so you don't have an explicit connection between your traces.

But the metadata becomes a way for you to start to understand what happened. And so that was the last attempt that I did and it was quite successful. It works very nice in terms of trying to understand the inner workers of the especially the Nomad scheduler, because that's sort of like the magical box. You run a job and suddenly you have a bunch of allocations for who knows what reason. And so my goal was trying to understand what happens in there. Because if you look at the source code for Nomad, people that know Go, who like to get an adventure, search for a function called compute group in Nomad's GitHub and try to understand the function.

And then come explain to me once you understand, because that's the function that gets a job and generates the allocations. So it looks at the clients, looks at what allocations already exist. And it's sort of like the central point of all Nomad features, more or less. I think people don't realize how many features Nomad have, but things like preemption deployments, disconnected clients, all of this sort of needs to take it into account when you are scheduling things and it all comes into that function.

So like, Compute Group is my nemesis, and every time I need to touch that, it's like I need a fresh cup of coffee to go there. But yeah, my goal is like, okay, can I make this function more understandable using telemetry? And it helps in quite a bit in some ways, but there are things that this process is just complex for this. You kind of need to embrace that sometimes.

ADRIANA: It's interesting, right? Because you start projects like this, you're like, of course it's going to be easy to instrument.

LUIZ: Yes, there's an SDK

ADRIANA: It's like the k0s on Nomad thing. Of course it's going to be easy. And it's like no.

LUIZ: There's tutorials, and there's all these different materials. Just go install this SDK. But no, it's a very different use case, right? Normally you come from this microservice architectures and then you're trying to instrument the communication patterns between them. But I'm trying to do something very specific that I don't think would apply to most people using OpenTelemetry. So, yeah, it's almost like...

ADRIANA: And you said a lot of the processes are asynchronous, too, which makes it kind of work with, right?

LUIZ: Yeah, so like when you have...the lifecycle of the job, right? Like a new Nomad job run that generates a HTTP request to whatever client or server you're talking to. That request needs to go to the leader, so there's another request going to the leader. But then once it gets to the leader, then there's a bunch of asynchronous stuff happening. So it creates an evaluation. That evaluation gets picked up by what we call it a worker, like a scheduler worker that does all the computing. Once it figure out which allocation needs to get run, then a client picks up. There's no direct network request that covers the whole thing. It's a bunch of put in a queue somewhere, put in a broker somewhere, and then that gets picked up. So that's sort of when you lose your trace a little bit.

But the tricky thing about Nomad is the network request is like the easy part. The complexity is like, what happens after you receive that request. So that was the thing that I wanted to instrument, was like, yeah, network requests, yeah, they happen, it's fine. I know who is talking to who. But inside each process, that's sort of like where the challenge lies. Like, okay, how do I get visibility and what's happening right now in there? And that's like, I don't know, it sounds like a fourth pillar, perhaps. We have metrics, logs, and traces. Maybe there's something new that should exist there. But yeah, that's challenged it's like understanding what's happening inside the process.

There are a few tools for debugging, like DProf and things like that, but they're very low-level in a sense and you don't always want to run those sort of additional instrumentation in production. That was the challenging part.

ADRIANA: Yeah, it I did see something that came out this week where there's, like I want to say there's, like, some sort of go auto instrumentation air quotes, maybe not air quotes with eBPF that can help give some additional insights where that could be a game-changer.

LUIZ: Yeah, that would be pretty cool because eBPF, you can sort of hook into anything like any sort of system call or whatever that combs from your program. Yeah, that could be interesting. Specifically like, thinking of the Nomad case when the scheduler is very complex. But there's also a lot of complexity in the client because, oh, I need to run a Docker container. Cool.

But it's not just that. Especially in Nomad, we have templates, artifacts, volumes. So you need to mount a volume, download a file, you need to render a template, you need to fetch tokens from console involved. So running a simple container, there's like a whole lot that needs to happen beforehand. And we call those like lifecycle hooks. So you can have things that happen before the task starts, things that happen after the task starts. And a lot of those interact with the operating system. So being able to instrument sort of like what's the Nomad agent trying to do against the OS could be very nice.

ADRIANA: Yeah, cool. I think there's definitely more work to be done in that area. But I'm glad that you've continued experimenting. Even if it's not gone, maybe as far as you would like, I think it's still progress, so, you know...

LUIZ: Yeah, it's all learning.

ADRIANA: ...it's awesome. That's awesome.

LUIZ: Like, I think it helps something different, I guess. Something different to learn something different. It's always good to keep up to date what's happening. And a lot of people are starting to adopt OpenTelemetry more. So even if it never comes, that OpenTelemetry is integrated into Nomad core...But I think it's helpful to at least understand because my target audience will maybe use OpenTelemetry on their stuff. And whenever I talk to them, I sort of need to understand what they are doing and how things work. I know if somebody comes and open a niche and say, oh, I'm trying to run the OpenTelemetry Collector in Nomad, I would need to know what they mean. And having this sort of exploration is very helpful.

ADRIANA: Absolutely. Cool. Well, we have come up on time. We could keep talking about this forever, honestly, so we'll have to have you back again. Thank you so much, Luiz, for joining today for geeking out with me. Y'all don't forget to subscribe, and be sure to check out the show notes for additional resources and connect with us and our guests on social media.

Until next time...

LUIZ: Peace out and geek out.

ADRIANA: Geeking Out is hosted and produced by me, Adriana Villela. I also compose and perform the theme music on my trusty clarinet. Geeking Out is also produced by my daughter, Hannah Maxwell, who, incidentally, designed all of the cool graphics. Be sure to follow us on all the socials by going to bento.me/geekingout.

Episode Transcription

ADRIANA: Hey, y'all. Welcome to Geeking Out, the podcast about all geeky aspects of software delivery, DevOps, Observability, reliability, and everything in between. I'm your host, Adriana Villela, coming to you from Toronto, Canada. And geeking out with me today is Luiz Aoqui. Welcome, Luiz.

LUIZ: Hi, Adriana. Nice to be here. Thank you for having me.

ADRIANA: Thanks for coming. Where are you calling in from today?

LUIZ: I'm calling from Toronto as well.

ADRIANA: Woohoo.

LUIZ: Representing.

ADRIANA: Fellow Brazilian in Toronto. Awesome.

LUIZ: lots of us here.

ADRIANA: Yeah, there are lots of us here. So, we're going to start off with some lightning round questions before we get into the meaty bits. So let's get ready. Be prepared. I swear it's painless. Okay, question number: one are you left-handed or right-handed?

LUIZ: Right-handed.

ADRIANA: All right, question number two: iPhone or Android?

LUIZ: Android.

ADRIANA: Number three Mac, Linux or Windows? What's your preference?

LUIZ: Going to say Mac for now.

ADRIANA: For now. Awesome. Favorite programming language?

LUIZ: Go. Pretty big favorite of mine.

ADRIANA: Awesome. Dev or Ops?

LUIZ: I like both, but I have to say dev.

ADRIANA: Cool. No, wrong answers. JSON or YAML?

LUIZ: I think I'll pick YAML. Just more friendly.

ADRIANA: Fair enough.

LUIZ: I hate the lack of dangling commas on JSON.

ADRIANA: Yeah, I agree. I hate that too. And then finally, do you prefer to consume content through video or text?

LUIZ: Text. Yeah, I get very distracted watching videos.

ADRIANA: Yes. Same. Awesome. You survived the lightning round. By the way, I should mention we had someone on here who when I asked her JSON or YAML, she said HCL.

LUIZ: Yes. That's why I had it in my back of my mind, like, oh, I only have two options.

ADRIANA: So I thought that was pretty funny when she mentioned that. And then I thought of you, because for those who don't know, Luiz works at HashiCorp, he is a Nomad developer, so HCL fits well into the type of work that you do. So I wanted to start off with you being a Nomad developer. Tell us a little bit like, for folks who aren't familiar with Nomad, tell us a little bit about what Nomad is.

LUIZ: Sure, yeah. So Nomad is a workload orchestrator, and I know that doesn't mean a lot to many people. So the goal of an orchestrator is to basically get your assets...So, like, your developer team build things like Docker images, binaries, Java JAR files, whatever. They produce some kind of artifact out of source code. And then you have your infrastructure team that is responsible for running your infrastructure, building servers, configuring machines, all of that. And then in the far end, you have your users that are trying to access your product, trying to access your application. And then the orchestrator sort of sits between your development team and your infrastructure team to make sure that whatever artifacts gets produced, it's running on those machines. So it helps finding where to run things like figuring out what's the best server to run this application, or doing things like upgrading and deployments and all the sort of lifecycle management of your application. That's the job of the orchestrator. That's what an orchestrator do.

ADRIANA: Awesome. Yeah, and I think that's such a great way of explaining what an orchestrator does, because for folks who are familiar with Kubernetes, I mean, Kubernetes is an orchestrator as well, specifically for containers, whereas Nomad gives you that breadth of, pretty much orchestrate anything, more or less. But it is very easy to kind of forget all of the gnarly things that happen behind the scenes in these orchestrators, like all the hard work that they're doing in order for them to operate seamlessly.

Now, I've played around with both Kubernetes and Nomad, and I have to say one of the things starting on the Kubernetes side and then moving to Nomad, moving to Nomad was actually a lot easier because you kind of dealt with the complexity of Kubernetes moving down to Nomad. You're like, "Oh, it's like the simplified...everything's simpler." And it runs in a single binary. It can run in a single binary on your machine and you can get started easily, whereas Kubernetes is more of a beast. I mean, yes, you can have really complex setups with Nomad, of course, and that's probably how you have it in production. But as far as I think the barrier to entry when it comes to Nomad is very low, which I think can be very appealing.

LUIZ: Yeah. Complexity is an interesting discussion because complexity sort of means different things for different people, I think, when thinking specifically on this Nomad versus Kubernetes discussion, I think there are a few things to consider when you think about complexity of adoption. Let's say when you start, if you starting from a managed service for Kubernetes, like EKS, GKS, AKS, that's like one click, and then you have a cluster. So like, oh, there's no complexity there. And that's how most people nowadays consume. Kubernetes is through a managed service. So that almost basically removes the barrier of entry for those that are able to use a managed service, of course.

But then it gets the complexity of understanding how to use those systems. Like, okay, it gets provisioned for me. There's some cloud magic happening behind the scenes. I don't have to deal with that. But now you have to run the system. Now you need to think of how do you map your team's workflow to that new tool? So there's all these different concepts in Kubernetes that I think is part of the complexity. There's all these different tools that you can use. Having a broad ecosystem is great, but it can also lead to some confusion about when do I adopt to do this or when do I do that, how do you bring all together to eventually reach your end goal?

Which is like, I want my users to be able to access my product right. When I think complexity, I think more on that sort of day-to-day operations, like understanding what's happening with the system. And I think that's when sort of Nomad becomes simpler just because it has a smaller surface area for people to interact. You write your job, you run your job, and there you go.

But like stepping back a little, the complexity of Nomad comes in on the deployment part because we don't have a managed service of sorts. Like, okay, now you need to understand what are Nomad agents, what are Nomad servers. Now you need to manage their state, now you need to manage upgrades. And this can get complex in that sense. So I think that discussion of complexity, I find it very interesting just because of this duality of like, okay, what am I trying to do? Am I actually running the cluster? Am I actually just using it? And somebody is provisioning...nowadays we call the platform teams. Is there a platform team running a cluster for me? And so yes, it's no much simpler than Kubernetes, I guess, depending on what you like, depending on where you're kind of like depending on what complexity you're talking about.

ADRIANA: Yeah, that's so true. That's a really good point. I just want to go back to a point that you made earlier about the fact that there's all these Kubernetes managed services, but there's Nomad managed service, which is kind of interesting, because if you look at various Hashi offerings, there are managed services for a bunch of stuff. So why is it that there is no managed Nomad right now, even offered through Hashi's own cloud?

LUIZ: Yeah, it's something that definitely part of the plan, is something we think about. But there's quite a few challenges for Nomad, specifically compared to other HashiCorp tools, is that we have a big competition with Kubernetes. So if you have a managed Vault, that Vault, this is basically the only tool you have.

You have a managed Terraform, and there are other tools for infrastructure-as-code, but like, Terraform is one of the big ones. For Nomad, this competition is much stronger, in that sense. Where you're...so, that's mostly my personal opinion.

But thinking of a managed Nomad service versus a managed Kubernetes service, it gets into that point of how much value that actually becomes. Because part of the benefits of Nomad is the flexibility and the flexibility of running different environments, flexibility of running different workloads. But those workloads, you sort of need to have control over infrastructures. Like, I want to use Podman, so I need machines that have Podman installed.

ADRIANA: Ahh, okay, got it.

LUIZ: Going back to that discussion of managed versus self-hosted, it's a spectrum, right? Like if you go to managed, it's simpler to operate, but it's probably more expensive. And you also lose flexibility, whereas in self hosted it's harder to manage, but you get full flexibility and probably cheaper. So there's that aspect that sort of like, we need to figure out.

There's also something to consider about costs and things like that. Because if you have a managed Kubernetes service on AWS, AWS is sort of taking the heat of like, they can sort of discount the compute because they run the compute.

But now if you have an external managed Nomad, you need to account for the price of that service plus whatever infrastructure you're using. And so it kind of becomes like a pricier solution. And so figuring out the business part of that can also be a little challenging as well. So there's all of these little things to consider, but I have to see what happens in the future.

ADRIANA: It's a really great explanation. Thanks for clarifying. Another thing that I wanted to ask you about is, I think one of the big things that Kubernetes has that you don't see, I don't think a direct translation for In Nomad is that you've got the whole Kubernetes operator concept, whereas as far as I know, there isn't that concept quite in Nomad. I've seen a few blog posts where people try to replicate it to a certain extent, but it's not quite the same thing. Can you comment a little bit about that?


 

LUIZ: So that's a very sort of common question to see. And I think there's this idea that you need to think about that operators is just a pattern that you can implement in pretty much anything. So the idea of listening for events and responding to them and that's something you can do with Nomad. And actually a few projects that I've seen do that. I was googling for the name. There's a project called nomad-ops that implements the operator pattern in Nomad using some of the functionalities we build.

There's also a company called Koyeb. They are a platform-as-a-service company that use the operator pattern. And they have a library called kreconciler, I think it's called, that helps you build this sort of operator paradigm functionalities. But it's also important to think that on Kubernetes, it's not just operators that are the main thing. When you talk about operators, you're always associated with a CRD because that's the data. So you have operators being the logic, CRDs being the data. And that sort of helps guide your end goal sort of based on those two concepts.

So in Nomad, you can do the operator. We're building things that can help you do that sort of things. One of the challenges, like how do I access the Nomad API from my task? And we're building like, now you have like a socket that you can use to talk directly with API. We are building workload identity, so you don't have to worry about ACL tokens or anything like that. It's like we're building things that help people create the sort of operator paradise in Nomad.

But the CRD sort of becomes a bit of a challenge because we don't have that concept of as extensible as Kubernetes has. But you can sort of have that Nomad variables and things like that. You can kind of get around those challenges. But I think there's no another point of CRDs is that they're sort of standardized, right? Let's say the Prometheus operator expects to have this or generates specific CRDs. And then a Grafana operator can sort of rely on those CRDs to do stuff automatically. So this type of standardization, we don't quite have that yet.

ADRIANA: Right. Interesting. That's really cool. I'll definitely be sure to put those two projects that you mentioned in the show notes for folks to refer back to those. Another thing that I wanted to touch upon, because I believe you and I talked about the fact that there was a new version of Nomad that just came out. So what version just came out? And what are the exciting things that we can expect to see in Nomad?

LUIZ: So Nomad 1.6 just came out and the main feature is called node pools. And something that I worked on, so apologies for any bugs. The idea of node pools is that it allows you to create...sort of like, segment your clients into groups, into pools.

So a bit of Nomad background very quickly. So you have two types of machines in Nomad cluster. You have a servers. It's like your control plane. They do the scheduling, they store state, they sort of do all the global view things of your cluster. And then you have clients, they're like talking to the servers to get information about what that specific client needs to run.

So the client is sort of the data plane is the component that actually run things, so actually runs your Docker containers, your JAR files, whatever. And so one of the challenges we've had in the past with Nomad is that it's very hard for you to sort of associate a group of, let's say I have a group of clients that I want to run my backend services.

And you can do something like create a constraint that says, okay, my back-end service only runs on machines that have this specific metadata. But doing the opposite is kind of hard because constraints are like, you need to tell a job what the constraint is, but in order to sort of prevent others from accessing those same machines, you need to create like a negative rule. So you have to say, okay, this machine, this job runs on these machines, but every other job is forbidden from accessing those machines. So you sort of have that dual constraint type of thing. So it's very hard to manage that to get a consistent scheduling outcome because of like, if you forget a constraint rule, now your job is running somewhere that it wasn't supposed to be.

So with node pools, you can put a new configuration on each client saying which pool it belongs to, and then on your job you'll say, okay, this job runs in this pool. Now only jobs in that pool will access those clients, and those jobs will only run in those clients. So you kind of create a sort of segmented part of your infrastructure that is reserved for specific jobs.

ADRIANA: I want to say what you were describing without the node pool is kind of reminding me of Kubernetes node taints, where you can say what can run where.

LUIZ: The idea is to have a very simple solution. So it's like this node is in this pool, this job runs in this pool, and that's all you have to do. So there's not a whole lot of configuration that you have to do. And in addition to having this sort of segmented view of infrastructure, node pools can also hold configuration. So they are a first-class concept. In Nomad, for example, one of the workarounds people used to do to get around this constraint problem is to use data centers.

So in Nomad, the idea of a data center is just like a collection of nodes. And if you have an availability zone, that sort of becomes your data center. And people would kind of hack around the problems using data centers, so they have like a data center for apps, which doesn't quite match the intention behind it.

But the problem is that data center is sort of just like a metadata, so you cannot have specific configurations per data center, things like that.

ADRIANA: Okay.

LUIZ: But with node pools, you can attach like you can put a description on your node pool, you can put metadata in the pool to sort of create more information about what this pool is used for. And then in Nomad Enterprise you can actually have different scheduler configuration per node pool. So for example, you can have a pool that uses the spread algorithm and other pool that uses bin packing so you can adjust how the scheduling is done per node pool. So there's a bit of extra customization that you can do per pool and that could be very helpful in several cases.

ADRIANA: That's awesome. It's interesting because there's this recognition of, like, oh, people were kind of trying to hack together the concept of a node pool by using these data centers. So it sounds like there's this recognition of oh, users are trying to do this. Why don't we formalize it and turn it into a proper solution? What I like that you had said earlier also, which I think it feels like it is a general philosophy with Nomad is basically going for the overall simpler solution. Like, don't try to overcomplicate. Just go with the base thing. That works pretty well and we don't have to drive ourselves mad.

LUIZ: Mmhmm, yeah, exactly. Yeah. And that was a big thing during the development phase because a bit of background on how we develop features at HashiCorp, we start with an RFC. So whenever we want to implement a feature, we write down the description of the feature and sort of send to the we first sent to our immediate team, then to the whole company for feedback. And during that process I had gigantic ideas like, oh, maybe node pools should be dynamic and then you can dynamically add and remove nodes from the pools. But that sort of adds so much complexity with questionable value.

So that's like part of the feedback I got from the team was like, let's start simple, let's start solve the problem at hand and then we can expand if the need arise. But yeah, this idea of simplicity, trying to make things easy to use from day zero, it's very important to us.

ADRIANA: Awesome. That's very cool. I want to switch gears now because there's, like, two things that I'm hoping that we'll have time to talk about still because there's so much cool stuff to talk about, but I want to switch gears quickly to a collaboration that you and I did, which was really fun. It came out of just me having a wild idea that came out of nowhere, where basically I thought, "Hey, wouldn't it be cool to try to run Kubernetes on Nomad?" Because there's Kelsey Hightower's well known GitHub repo where he's running Nomad on Kubernetes.

So I posed the question, "What if you can run Kubernetes on Nomad?"

And I thought, "Maybe let's not try to go too crazy here." And so my idea was, like, I want to find a lightweight Kubernetes distribution that we can run on Nomad, something that's hopefully already containerized, because trying to run Docke-in-Docker is kind of a nightmare if you try to do it yourself. And so there's this distribution of Kubernetes called k0s that comes in a Dockerized version, which seemed relatively straightforward to deploy on Nomad.

And so I reached out to you when I came up with this idea, and then you helped me through a bunch of the troubleshooting, so I just wanted to talk to you, have you share your experience around this collaboration. Yeah, just thoughts.

LUIZ: Yeah, it was pretty fun. Like a lot of learnings, I think, in terms of just understanding how things work, because k0s is pretty cool project in the sense of like, oh, you just run that image and you get a container with everything sort of there for you. But I found it a nice learning lesson of debugging of when things don't work. So normally you would expect things just "docker run" and it works. But what happens when it doesn't work and having to debug and going through logs and sort of combing through those log lines. I found it a bit challenging because having everything in an image, it's easier to start things. But then when you need to debug and you have your etcd logs at the same time as your controller logs, at the same time as the Kubelet logs, it all sort of juggles together and it was very hard for us to sort of comb through that and understand what's actually failing.

ADRINANA: That's so true. That's so true.

LUIZ: And then you have retries, right? So you see an error message and then it retries and then there's an error message again. But is it the thing that is retrying that is the problem or is it trying to call something else that failed before and then the log just sort of disappeared from the history? Just because that part, I found it very interesting.

ADRINANA: Yeah, I totally agree. And it was funny because I did the classic rookie mistake of like, well, of course this thing works in Docker. Let me just try to deploy it in Nomad. And then I realized I was getting all these error messages where the Kubelet was not starting, which you kind of need that for Kubernetes. So it looked like it started up in the Nomad job, right? The Nomad job deployed successfully, but the actual thing inside the job was not running correctly.

So I neglected to try to run the k0s locally on Docker and then discovered a bunch of stuff where initially we were running into all sorts of issues, too, because running on M1 Macs, everything is special. I love having an M1 Mac, but, my God, there's all these little annoying considerations. So that made it extra complicated. But then once we got it running standalone in Docker on the M1, then we were able to port it over.

And it was interesting because for me, I always like to try everything in Hashiqube, which is like this full-fledged Nomad environment where you have, like, Nomad, Vault, Consul all running together. But it's provisioned using Vagrant. And on the M1 Mac...normally you'd provision using a VM in Vagrant...but on the M1 Mac, Vagrant does not play so well. So it's Docker as the provisioner. And so you're basically running Hashiqube. So you're running Nomad in Docker. And then you're deploying Kubernetes as a Docker image and then deploying our test app, which was Jaeger. Was Docker in Docker in Docker. It was like, I don't know, three or four layers of Docker.

And then you took the more pragmatic approach of, like, let me just run this using the Docker binary. Sorry, the Nomad binary. Much easier.

LUIZ: Yeah. That applies like several scenarios when a GitHub issue shows up, people describe their entire cluster and environment and give us a ginormous job file. And usually my first step is, okay, I need to reduce this, you need to boil down to

ADRIANA: It's so true.

LUIZ: What's the actual problem? So, okay, let's try to remove, let's say, half of the job that I don't care about.

ADRIANA: Yeah, I think that is a very sound approach.

LUIZ: Let's just run a dev agent, see if that sort of reproduces the issue. Normally, my first step is to reduce as much as I can and then start adding things. So, yes. I don't know. Dev agent for me, Nomad is like my direct go-to anytime I need something. Nomad, "nomad agent -dev" and start from there. Yeah.

ADRIANA: Reduce the noise as much as possible and then start building back up until you figure it out. This is the wall I actually hit. So, yes, lesson learned to you all. I should know better. I've been doing this long enough that I have found myself in situations where I want to do everything all at once. And then I'm like, strip, strip, strip, strip, strip all the things until you get to the actual problem. But this was a fun little collaboration.

And then there was one component. What was it? The C groups namespace where there's a Docker configuration that Nomad did not support. And so this is where it helps knowing somebody that works on Nomad because Louise was able to make, like, a little fix to accommodate. It's not part of the Nomad product. So you will not find this as part of standard Nomad. This was just so that we could see if we could get this running with this configuration.

LUIZ: That part is very...on the surface, it's like, oh, it's like configuration value that it passed to "docker run" on the Docker CLI, that's just a flag that is set. But what it actually does, it's much deeper onto the environment that you're the I forgot the exact flag.

I think it's cgroupns. And then you need to put host.

ADRIANA: Yeah, I think that's the right one.

LUIZ: But the tricky data is that Docker and cgroups, they do, like, weird stuff that Nomad sort of needs to work around that to make some of the Nomad things work. So, for example, resource isolation. Nomad uses cgroups to sort of enforce that. No matter what task you're running, no matter if it's a Docker container, a binary, JAR file, we use the cgroups because that's the common layer. But the way that Docker does things, it kind of hides that from you. So you as a developer, sort of needs to work around all the things that docker does with cgroups to get that to work.

And so even though it works, that configuration, it's kind of dangerous in the sense that it can lead people to break other stuff without realizing. And so that's like, yes, we should support this, but not this naïve approach that I did.

ADRIANA: Yeah.

LUIZ: Luckily, this is such a sort of common problem that we'll probably have a better cgroup handling in a future release. And once we got that, then we'll be able to support that feature. But for now, it's on my sad, unmerged PR.

ADRIANA: But it's interesting because it was a good learning, right? Because it was like, hey, we got this to work with this special unmerged PR. But then it kind of led to more questions. Right. And I think this is a really great lesson for anyone. Whereas, yeah, this might seem like, oh, this is an easy solution, but what are the repercussions? And that's why pull requests exist...

LUIZ: Exactly.

ADRIANA: ...so that we can mitigate against weird things happening, because you just simply do not know what the side effects are going to be from, like, oh, I added this little flag. What's the worst that could happen, right? So, anyway, it was a really cool side project. I'll provide a link to the blog post where I detailed our adventures in the show notes. And then the final thing that I wanted to talk about, because when you were On-Call Me Maybe, it was one of the reasons why we brought you on was to talk about how you had played around at one point as part of a hack week to try to add OTel instrumentation to Nomad core. And this was, I guess, over a year ago. So I was wondering if you could talk about what you've learned a year on what the status of that is right now.

LUIZ: Yeah, cool. Yeah, that was still pending. I would say it's a very side project of mine, just like an exploration. I think I try at least three times now, try to get some OpenTelemetry to Nomad. And every time I learn something new, which is great and sort of like, builds on top of the previous attempt. So, like, a bit of history is, like my first attempt was sort of a very big view of, like, I want to instrument whole Nomad. I want to be able to create this sort of trace and spans from, like, I submit a job, that job gets scheduled, that gets picked up by a client, and that client starts a test driver, and the test driver calls Docker to start.

I wanted the whole flow as trace and spans and all of that. That turned out to be a terrible idea just because I want to say it's not doable, but it takes a lot of code changes to get to that point. So that's the first learning. Don't try to do everything at once. And then my second attempt was more focused around not exactly OpenTelemetry in Nomad per se, but like helping people using OpenTelemetry and running things in Nomad to get information. Now I forgot what the name of that component is, but it's like a way for if you have an application that uses the OpenTelemetry SDK to automatically pick up information from Nomad, like the allocation ID, the job name.

So you use the OpenTelemetry SDK. There's like environment variables that are sort of standardized.

ADRIANA: Yeah.

LUIZ: So provide those things automatically.

ADRIANA: Oh, yeah, because I think there's, like a similar thing in Kubernetes where you can automatically grab from your Kubernetes pods.

LUIZ: Yeah, there's a whole spec for that. I forgot the name, how it's called, but it's a way to sort of automatically infer information from different sources based on either environment variables or API calls. So I kind of hack around that and sort of works. There's another set branch that I didn't emerge with this work, but the challenge there is sort of like it's kind of hard to tell what information is relevant because you also don't want to shove a bunch of things because it's going to increase your network packet size. It's going to generate a lot of extra information that you may not care about. So I'll have to build a way for you to customize which information you want. So that's where I put a pause on that.

And then my last attempt after talking with you, Ted, and some other folks on the OpenTelemetry community, I learned that don't try to boil the ocean, don't try to instrument everything at once. Focus on your core business logic. Start there and then you're going to get a lot of value from that already and then you can start building on top of that. And so my latest approach was like, okay, I cannot create a whole trace. I cannot create that relationship between traces, but can I use metadata to connect them? Probably I should explain this, but one of the challenges with OpenTelemetry in Nomad is that OpenTelemetry, and more specifically the distributed tracing aspect, is sort of focused on microservices and network requests and sort of keeping a track of those network requests.

But in Nomad, the complexity is sort of built in into the Nomad binary. So the complexity comes almost like from local function calls rather than network requests. And so if you try to create spans for function calls, you get like tiny traces of a few milliseconds that are not really useful and it just generates a huge overhead. But what helped me there was understanding this notion of like, oh, I don't actually need to connect the traces per se. If they have the same metadata, then sort of like whatever platform you're sending those traces to, like Lightstep, Honeycomb or Zipkin, Jaeger or whatever, then you can start querying traces that have the same metadata so you don't have an explicit connection between your traces.

But the metadata becomes a way for you to start to understand what happened. And so that was the last attempt that I did and it was quite successful. It works very nice in terms of trying to understand the inner workers of the especially the Nomad scheduler, because that's sort of like the magical box. You run a job and suddenly you have a bunch of allocations for who knows what reason. And so my goal was trying to understand what happens in there. Because if you look at the source code for Nomad, people that know Go, who like to get an adventure, search for a function called compute group in Nomad's GitHub and try to understand the function.

And then come explain to me once you understand, because that's the function that gets a job and generates the allocations. So it looks at the clients, looks at what allocations already exist. And it's sort of like the central point of all Nomad features, more or less. I think people don't realize how many features Nomad have, but things like preemption deployments, disconnected clients, all of this sort of needs to take it into account when you are scheduling things and it all comes into that function.

So like, Compute Group is my nemesis, and every time I need to touch that, it's like I need a fresh cup of coffee to go there. But yeah, my goal is like, okay, can I make this function more understandable using telemetry? And it helps in quite a bit in some ways, but there are things that this process is just complex for this. You kind of need to embrace that sometimes.

ADRIANA: It's interesting, right? Because you start projects like this, you're like, of course it's going to be easy to instrument.

LUIZ: Yes, there's an SDK

ADRIANA: It's like the k0s on Nomad thing. Of course it's going to be easy. And it's like no.

LUIZ: There's tutorials, and there's all these different materials. Just go install this SDK. But no, it's a very different use case, right? Normally you come from this microservice architectures and then you're trying to instrument the communication patterns between them. But I'm trying to do something very specific that I don't think would apply to most people using OpenTelemetry. So, yeah, it's almost like...

ADRIANA: And you said a lot of the processes are asynchronous, too, which makes it kind of work with, right?

LUIZ: Yeah, so like when you have...the lifecycle of the job, right? Like a new Nomad job run that generates a HTTP request to whatever client or server you're talking to. That request needs to go to the leader, so there's another request going to the leader. But then once it gets to the leader, then there's a bunch of asynchronous stuff happening. So it creates an evaluation. That evaluation gets picked up by what we call it a worker, like a scheduler worker that does all the computing. Once it figure out which allocation needs to get run, then a client picks up. There's no direct network request that covers the whole thing. It's a bunch of put in a queue somewhere, put in a broker somewhere, and then that gets picked up. So that's sort of when you lose your trace a little bit.

But the tricky thing about Nomad is the network request is like the easy part. The complexity is like, what happens after you receive that request. So that was the thing that I wanted to instrument, was like, yeah, network requests, yeah, they happen, it's fine. I know who is talking to who. But inside each process, that's sort of like where the challenge lies. Like, okay, how do I get visibility and what's happening right now in there? And that's like, I don't know, it sounds like a fourth pillar, perhaps. We have metrics, logs, and traces. Maybe there's something new that should exist there. But yeah, that's challenged it's like understanding what's happening inside the process.

There are a few tools for debugging, like DProf and things like that, but they're very low-level in a sense and you don't always want to run those sort of additional instrumentation in production. That was the challenging part.

ADRIANA: Yeah, it I did see something that came out this week where there's, like I want to say there's, like, some sort of go auto instrumentation air quotes, maybe not air quotes with eBPF that can help give some additional insights where that could be a game-changer.

LUIZ: Yeah, that would be pretty cool because eBPF, you can sort of hook into anything like any sort of system call or whatever that combs from your program. Yeah, that could be interesting. Specifically like, thinking of the Nomad case when the scheduler is very complex. But there's also a lot of complexity in the client because, oh, I need to run a Docker container. Cool.

But it's not just that. Especially in Nomad, we have templates, artifacts, volumes. So you need to mount a volume, download a file, you need to render a template, you need to fetch tokens from console involved. So running a simple container, there's like a whole lot that needs to happen beforehand. And we call those like lifecycle hooks. So you can have things that happen before the task starts, things that happen after the task starts. And a lot of those interact with the operating system. So being able to instrument sort of like what's the Nomad agent trying to do against the OS could be very nice.

ADRIANA: Yeah, cool. I think there's definitely more work to be done in that area. But I'm glad that you've continued experimenting. Even if it's not gone, maybe as far as you would like, I think it's still progress, so, you know...

LUIZ: Yeah, it's all learning.

ADRIANA: ...it's awesome. That's awesome.

LUIZ: Like, I think it helps something different, I guess. Something different to learn something different. It's always good to keep up to date what's happening. And a lot of people are starting to adopt OpenTelemetry more. So even if it never comes, that OpenTelemetry is integrated into Nomad core...But I think it's helpful to at least understand because my target audience will maybe use OpenTelemetry on their stuff. And whenever I talk to them, I sort of need to understand what they are doing and how things work. I know if somebody comes and open a niche and say, oh, I'm trying to run the OpenTelemetry Collector in Nomad, I would need to know what they mean. And having this sort of exploration is very helpful.

ADRIANA: Absolutely. Cool. Well, we have come up on time. We could keep talking about this forever, honestly, so we'll have to have you back again. Thank you so much, Luiz, for joining today for geeking out with me. Y'all don't forget to subscribe, and be sure to check out the show notes for additional resources and connect with us and our guests on social media.

Until next time...

LUIZ: Peace out and geek out.

ADRIANA: Geeking Out is hosted and produced by me, Adriana Villela. I also compose and perform the theme music on my trusty clarinet. Geeking Out is also produced by my daughter, Hannah Maxwell, who, incidentally, designed all of the cool graphics. Be sure to follow us on all the socials by going to bento.me/geekingout.