Aurimas Griciūnas on AI Groups and Dependable AI Programs

SwirlAI founder Aurimas Griciūnas helps tech professionals transition into AI roles and works with organizations to create AI technique and develop AI methods. Aurimas joins Ben to debate the modifications he’s seen over the previous couple years with the rise of generative AI and the place we’re headed with brokers. Aurimas and Ben dive into among the variations between ML-focused workloads and people applied by AI engineers—significantly round LLMOps and agentic workflows—and discover among the considerations animating agent methods and multi-agent methods. Alongside the best way, they share some recommendation for preserving your expertise pipeline shifting and your expertise sharp. Right here’s a tip: Don’t dismiss junior engineers.

Concerning the Generative AI within the Actual World podcast: In 2023, ChatGPT put AI on everybody’s agenda. In 2026, the problem might be turning these agendas into actuality. In Generative AI within the Actual World, Ben Lorica interviews leaders who’re constructing with AI. Study from their expertise to assist put AI to work in your enterprise.

Try different episodes of this podcast on the O’Reilly studying platform or observe us on YouTube, Spotify, Apple, or wherever you get your podcasts.

Transcript

This transcript was created with the assistance of AI and has been flippantly edited for readability.

00.44
All proper. So immediately for our first episode of this podcast in 2026, we have now Aurimas Griciūnas of SwirlAI. And he was beforehand at Neptune.ai. Welcome to the podcast, Aurimas.

01.02
Hello, Ben, and thanks for having me on the podcast.

01.07
So truly, I wish to begin with somewhat little bit of tradition earlier than we get into some technical issues. I observed now it looks like you’re again to instructing individuals among the newest ML and AI stuff. After all, earlier than the appearance of generative AI, the phrases we have been utilizing have been ML engineer, MLOps. . . Now it looks like it’s AI engineer and perhaps LLMOps. I’m assuming you employ this terminology in your instructing and consulting as properly.

So in your thoughts, Aurimas, what are among the greatest distinctions between that transfer from ML engineer to AI engineer, from MLOps to LLMOps? What are two to a few of the most important issues that individuals ought to perceive?

02.05
That’s an excellent query, and the reply depends upon the way you outline AI engineering. I believe how most people immediately outline it’s a self-discipline that builds methods on prime of already current giant language fashions, perhaps some fine-tuning, perhaps some tinkering with the fashions. However it’s not in regards to the mannequin coaching. It’s about constructing methods or methods on prime of the fashions that you have already got.

So the excellence is sort of huge as a result of we’re not creating fashions. We’re reusing fashions that we have already got. And therefore the self-discipline itself turns into much more just like software program engineering than precise machine studying engineering. So we’re not coaching fashions. We’re constructing on prime of the fashions. However among the similarities stay as a result of each of the methods that we used to construct as machine studying engineers and now we construct as AI engineers are nondeterministic of their nature.

So some analysis and practices of how we might consider these methods stay. Basically, I might even go so far as to say that, there are extra variations than similarities in these two disciplines, and it’s actually, actually onerous to correctly distinguish three fundamental ones. Proper?

03.38
So I might say software program engineering, proper. . .

03.42
So, I suppose, based mostly in your description there, the personas have modified as properly.

So within the earlier incarnation, you had ML groups, information science groups—they have been largely those chargeable for doing a number of the constructing of the fashions. Now, as you level out, at most individuals are doing a little type of posttraining from fine-tuning. Possibly the extra superior groups are doing a little type of RL, however that’s actually restricted, proper?

So the persona has modified. However alternatively, at some degree, Aurimas, it’s nonetheless a mannequin, so you then nonetheless want the info scientist to interpret among the metrics and the evals, right? In different phrases, should you run with utterly simply “Right here’s a bunch of software program engineers; they’ll do every little thing,” clearly you are able to do that, however is that one thing you suggest with out having any ML experience within the workforce?

04.51
Sure and no. A yr in the past or two years in the past, perhaps one and a half years in the past, I might say that machine studying engineers have been nonetheless the very best match for AI engineering roles as a result of we have been used to coping with nondeterministic methods.

They knew consider one thing that the output of which is a probabilistic operate. So it’s extra of a mindset of working with these methods and the practices that come from truly constructing machine studying methods beforehand. That’s very, very helpful for coping with these methods.

05.33
However these days, I believe already many individuals—many specialists, many software program engineers—have already tried to upskill on this nondeterminism and study rather a lot [about] how you’ll consider these sorts of methods. And probably the most invaluable specialist these days, [the one who] can truly, I might say, deliver probably the most worth to the businesses constructing these sorts of methods is somebody who can truly construct end-to-end, and so has every kind of expertise, ranging from with the ability to work out what sort of merchandise to construct and truly implementing some POC of that product, delivery it, exposing it to the customers and with the ability to react [to] the suggestions [from] the evals that they constructed out for the system.

06.30
However the eval half could be realized. Proper. So you must spend a while on it. However I wouldn’t say that you simply want a devoted information scientist or machine studying engineer particularly coping with evals anymore. Two years in the past, in all probability sure.

06.48
So based mostly on what you’re seeing, individuals are starting to arrange accordingly. In different phrases, the popularity right here is that should you’re going to construct a few of these trendy AI methods or agentic methods, it’s actually not in regards to the mannequin. It’s a methods and software program engineering drawback. So subsequently we’d like people who find themselves of that mindset.

However alternatively, it’s nonetheless information. It’s nonetheless a data-oriented system, so that you would possibly nonetheless have pipelines, proper? Knowledge pipelines to information groups that information engineers sometimes preserve. . . And there’s at all times been this lamentation even earlier than the rise of generative AI: “Hey, these information pipelines maintained by information engineers are nice, however they don’t have the identical software program engineering rigor that, , the individuals constructing internet functions are used to.” What’s your sense when it comes to the rigor that these groups are bringing to the desk when it comes to software program engineering practices?

08.09
It depends upon who’s constructing the system. AI engineers [comprise an] extraordinarily wide selection. An engineer could be an AI engineer. A software program engineer may very well be an AI engineer, and a machine studying engineer could be an AI engineer. . .

08.31
Let me rephrase that, Aurimas. In your thoughts, [on] the very best groups, what’s the everyday staffing sample?

08.39
It depends upon the scale of the mission. If it’s only a mission that’s beginning out, then I might say a full stack engineer can shortly truly begin off a mission, construct A, B, or C, and proceed increasing it. After which. . .

08.59
Primarily counting on some type of API endpoint for the mannequin?

09.04
Not essentially. So it may be a Relaxation API-based system. It may be a stream processing-based system. It may be only a CLI script. I might by no means encourage [anyone] to construct a system which is extra advanced than it must be, as a result of fairly often when you may have an thought, simply to show that it really works, it’s sufficient to construct out, , an Excel spreadsheet with a column of inputs and outputs after which simply give the outputs to the stakeholder and see if it’s helpful.

So it’s not at all times wanted to begin with a Relaxation API. However normally, relating to who ought to begin it off, I believe it’s people who find themselves very generalist. As a result of on the very starting, you should perceive finish to finish—from product to software program engineering to sustaining these methods.

10.01
However as soon as this technique evolves in complexity, then very probably the following individual you’ll be bringing on—once more, relying on the product—very probably could be somebody who is sweet at information engineering. As a result of as you talked about earlier than, many of the methods are counting on a really excessive, very robust integration of those already current information methods [that] you’re constructing for an enterprise, for instance. And that’s a tough factor to do proper. And the info engineers do it fairly [well]. So undoubtedly a really helpful individual to have within the workforce.

10.43
And perhaps finally, as soon as these evals come into play, relying on the complexity of the product, the workforce would possibly profit from having an ML engineer or information scientist in between. However then that is extra sort of concentrating on these instances the place the product is advanced sufficient that you simply really need some allowances for judges, after which you should consider these LLMs as judges in order that your evals are evaluated as properly.

In case you simply want some easy evals—as a result of a few of them could be actual assertion-based evals—these can simply be performed, I believe, by somebody who doesn’t have previous machine studying expertise.

11.36
One other cultural query I’ve is the next. I might say two years in the past, 18 months in the past, most of those AI tasks have been performed. . . Principally, it was somewhat extra decentralized, in different phrases. So right here’s a bunch right here. They’re going to do one thing. They’re going to construct one thing on their very own after which perhaps attempt to deploy that.

However now not too long ago I’m listening to, Aurimas, and I don’t know if you’re listening to the identical factor, that, at the very least in a few of these huge corporations, they’re beginning to have rather more of a centralized workforce that may assist different groups.

So in different phrases, there’s a centralized workforce that one way or the other has the fitting expertise and has constructed a couple of of these items. After which now they’ll sort of consolidate all these learnings after which assist different groups. If I’m in considered one of these organizations, then I method these consultants. . . I suppose within the outdated, outdated days—I hate this time period—they’d use some middle of excellence sort of factor. So you’ll get some type of playbook and they’ll aid you get going. Form of like in your earlier incarnation at Neptune.ai. . . It’s virtually such as you had this centralized device and experiment tracker the place somebody can go in and study what others are doing after which study from one another.

Is that this one thing that you simply’re listening to that individuals are going for extra of this sort of centralized method?

13.31
I do hear about these sorts of conditions, however naturally, it’s at all times a giant enterprise that’s managed to tug that off. And I consider that’s the fitting method as a result of that’s additionally what we have now been doing earlier than GenAI. We had these facilities of excellence. . .

13.52
I suppose for our viewers, clarify why you assume that is the fitting method.

13.58
So, two issues why I believe it’s the proper method. The very first thing is that we used to have these platform groups that will construct out a shared pool of software program that may be reused by different groups. So we sort of outlined the requirements of how these methods ought to be operated, and the manufacturing and the event. And they’d determine what sort of applied sciences and tech stack ought to be used inside the firm. So I believe it’s a good suggestion to not unfold too extensively within the instruments that you simply’re utilizing.

Additionally, have template repositories which you could simply pool and reuse. As a result of then not solely is it simpler to kick off and begin your construct out of the mission, however it additionally helps management how properly this information can truly be centralized, as a result of. . .

14.59
And in addition there’s safety, then there’s governance as properly. . .

15.03
For instance, sure. The platform facet is a type of—simply use the identical stack and assist others construct it simpler and quicker. And the second piece is that clearly GenAI methods are nonetheless very younger. So [it’s] very early and we actually don’t have, as some would say, sufficient reps in constructing these sorts of methods.

So we study as we go. With common machine studying, we already had every little thing found out. We simply wanted some apply. Now, if we study on this distributed means after which we don’t centralize learnings, we undergo. So mainly, that’s why you’ll have a central workforce that holds the information. However then it ought to, , assist different groups implement some new sort of system after which deliver these learnings again into the central core after which unfold these learnings again to different groups.

However that is additionally how we used to function in these platform groups within the outdated days, three years, 4 years in the past.

16.12
Proper, proper, proper, proper, proper, proper, proper. However then, I suppose, what occurred with the discharge of generative AI is that the platform groups might need moved too sluggish for the rank and file. And so therefore you began listening to about what they name shadow AI, the place individuals would use instruments that weren’t precisely blessed by the platform workforce. However now I believe the platform groups are beginning to arrest a few of that.

16.42
I’m wondering whether it is platform groups who’re sort of catching up, or is it the instruments that [are] maturing and the practices which are maturing? I believe we’re getting increasingly more reps in constructing these methods, and now it’s simpler to meet up with every little thing that’s occurring. I might even go so far as to say it was unattainable to be on prime of it, and perhaps it wouldn’t even make sense to have a central workforce.

17.10
Quite a lot of these demos look spectacular—generative AI demos, brokers—however they fail if you deploy them within the wild. So in your thoughts, what’s the single greatest hurdle or the commonest purpose why a number of these demos or POCs fall brief or turn out to be unreliable in manufacturing?

17.39
That once more, depends upon the place we’re deploying the system. However one of many fundamental causes is that it is rather straightforward to construct a POC, after which it targets a really particular and slender set of real-world eventualities. And we sort of consider that it solves [more than it does]. It simply doesn’t generalize properly to different forms of eventualities. And that’s the most important drawback.

18.07
After all there are safety points and every kind of stability points, even with the most important labs and the most important suppliers of LLMs, as a result of these APIs are additionally not at all times steady, and you should handle that. However that’s an operational challenge. I believe the most important challenge shouldn’t be operational. It’s truly evaluation-based, and generally even use case-based: Possibly the use case shouldn’t be the right one.

18.36
, earlier than the appearance of generative AI, ML groups and information groups have been simply beginning to get occurring observability. After which clearly AI generative AI comes into the image. So what modifications so far as LLMs and generative AI relating to observability?

19.00
I wouldn’t even name observability of normal machine studying methods and [of] AI methods the identical factor.

Going again to a earlier parallel, generative AI observability is much more just like common software program observability. It’s all about tracing your software after which on prime of these traces that you simply acquire in the identical means as you’ll acquire from the common software program software, you add some extra metadata in order that it’s helpful for performing analysis actions in your agent AI sort of system.

So I might even distinction machine studying observability with GenAI observability as a result of I believe these are two separate issues.

19.56
Particularly relating to brokers and the brokers that contain some type of device use, you then’re actually entering into sort of software program traces and software program observability at that time.

20.13
Precisely. Device use is only a operate name. A operate name is only a appreciable software program span, let’s say. Now what’s necessary for GenAI is that you simply additionally know why that device was chosen for use. And that’s the place you hint outputs of your LLMs. And why that LLM name, that technology, has determined to make use of this and never the opposite device.

So issues like prompts, token counts, and the way a lot time to first token it took for which technology, these sorts of issues are what’s extra to be traced in comparison with common, software program tracing.

20.58
After which, clearly, there’s additionally. . . I suppose one of many fundamental modifications in all probability this yr might be multimodality, if there’s various kinds of modes and information concerned.

21.17
Proper. For some purpose I didn’t contact upon that, however you’re proper. There’s a number of distinction right here as a result of inputs and outputs, it’s onerous. To begin with, it’s onerous to hint these sorts of issues like, let’s say, audio enter and output [or] video photographs. However I believe [an] even more durable sort of drawback with that is how do you ensure that the info that you simply hint is beneficial?

As a result of these observability methods which are being constructed out, like LangSmith, Langfuse, and all of others, , how do you make it in order that it’s handy to truly have a look at the info that you simply hint, which isn’t textual content and never common software program spans? How [do] you construct, [or] even correlate, two totally different audio inputs to one another? How do you do this? I don’t assume that drawback is solved but. And I don’t even assume that we all know what we wish to see relating to evaluating this sort of information subsequent to one another.

22.30
So let’s speak about brokers. A pal of mine truly requested me yesterday, “So, Ben, are brokers actual, particularly on the buyer facet?” And my pal was saying he doesn’t assume it’s actual. So I stated, truly, it’s extra actual than individuals assume within the following sense: To begin with, deep analysis, that’s brokers.

After which secondly, individuals may be utilizing functions that contain brokers, however they don’t understand it. So, for instance, they’re interacting with the system and that system includes some type of information pipeline that was written and is being monitored and maintained by an agent. Certain, the precise software shouldn’t be an agent. However beneath there’s brokers concerned within the software.

So to that extent, I believe brokers are undoubtedly actual within the information engineering and software program engineering house. However I believe there may be extra client apps that beneath there’s some brokers concerned that buyers don’t find out about. What’s your sense?

23.41
Fairly related. I don’t assume there are actual, full-fledged brokers which are uncovered.

23.44
I believe individuals when individuals consider brokers, they consider it as like they’re interacting with the agent immediately. And that is probably not the case but.

24.04
Proper. So then, it depends upon the way you outline the agent. Is it a totally autonomous agent? What’s an agent to you? So, GenAI normally could be very helpful on many events. It doesn’t essentially have to be a tool-using self-autonomous agent.

24.21
So like I stated, the canonical instance for shoppers could be deep analysis. These are brokers.

24.27
These are brokers, that’s for certain.

24.30
In case you consider that instance, it’s a bunch of brokers looking throughout totally different information collections, after which perhaps a central agent unifying and presenting it to the consumer in a coherent means.

So from that perspective, there in all probability are brokers powering client apps. However they is probably not the precise interface of the buyer app. So the precise interface would possibly nonetheless be rule-based or one thing.

25.07
True. Like information processing. Some automation is going on within the background. And a deep analysis agent, that is uncovered to the consumer. Now that’s comparatively straightforward to construct since you don’t must very strongly consider this sort of system. Since you count on the consumer to finally consider the outcomes.

25.39
Or within the case of Google, you possibly can current each: They’ve the AI abstract, after which they nonetheless have the search outcomes. After which based mostly on the consumer alerts of what the consumer is definitely consuming, then they’ll proceed to enhance their deep analysis agent.

25.59
So let’s say the disasters that may occur from incorrect outcomes weren’t that unhealthy. Proper? So.

26.06
Oh, no, it may be unhealthy should you deploy it contained in the enterprise, and also you’re utilizing it to arrange your CFO for some earnings name, proper?

26.17
True, true. However then whose duty is it? The agent’s, that supplied 100%…?

26.24
You may argue that’s nonetheless an agent, however then the finance workforce will take these outcomes and scrutinize [them] and ensure they’re right. However an agent ready the preliminary model.

26.39
Precisely, precisely. So it nonetheless wants overview.

26.42
Yeah. So the explanation I deliver up brokers is, do brokers change something out of your perspective when it comes to eval, observability, and the rest?

26.55
They do some bit, in comparison with agent workflows that aren’t, full brokers, the one change that basically occurs. . . And we’re speaking now about multi-agent methods, the place a number of brokers could be chained or looped in collectively. So actually the one distinction there’s that the size of the hint shouldn’t be deterministic. And the quantity of spans shouldn’t be deterministic. So within the sense of observability itself, the distinction is minimal so long as these brokers and multi-agent methods are operating in a single runtime.

27.44
Now, relating to evals and analysis, it’s totally different since you consider totally different features of the system. You attempt to uncover totally different patterns of failures. For instance, should you’re simply operating your agent workflow, then what sort of steps could be taken, and you then could be virtually 100% certain that your entire path out of your preliminary intent to the ultimate reply is accomplished.

Now with agent methods and multi-agent methods, you possibly can nonetheless obtain, let’s say, input-output. However then what occurs within the center shouldn’t be a black field, however it is rather nondeterministic. Your brokers can begin looping the identical questions between one another. So you should additionally search for failure alerts that aren’t current in agentic workflows, like too many back-and-forth [responses] between the brokers, which wouldn’t occur in a daily agentic workflow.

Additionally, for device use and planning, you should work out if the instruments are being executed within the right order. And related issues.

29.09
And that’s why I believe in that situation, you undoubtedly want to gather fine-grained traces, as a result of there’s additionally the communication between the brokers. One agent may be mendacity to a different agent in regards to the standing of completion and so forth and so forth. So you should actually sort of have granular degree traces at that time. Proper?

29.37
I might even say that you simply at all times must have written the lower-level items, even should you’re operating a easy RAG system, which you’ll study by the technology system, you continue to want these granular traces for every of the actions.

29.52
However undoubtedly, interagent communication introduces extra factors of failure that you actually need to just be sure you additionally seize.

So in closing, I suppose, this can be a fast-moving area, proper? So there’s the problem for you, the person, on your skilled improvement. However then there’s additionally the problem for you as an AI workforce in how you retain up. So any suggestions at each the person degree and on the workforce degree, moreover going to SwirlAI and taking programs? [laughs] What different sensible suggestions would you give a person within the workforce?

30.47
So for people, for certain, study fundamentals. Don’t depend on frameworks alone. Perceive how every little thing is admittedly working underneath the hood; perceive how these methods are literally linked.

Simply take into consideration how these prompts and context [are] truly glued collectively and handed from an agent to an agent. Don’t assume that it is possible for you to to only mount a framework proper on prime of your system, write [a] few prompts, and every little thing will magically work. You want to perceive how the system works from the primary rules.

So yeah. Go deep. That’s for particular person practitioners.

31.32
With regards to groups, properly, that’s an excellent query and a really onerous query. As a result of, , within the upcoming one or two years, every little thing can change a lot.

31.44
After which one of many challenges, Aurimas, for instance, within the information engineering house. . . It was, a number of years in the past, I’ve a brand new information engineer within the workforce. I’ve them construct some fundamental pipelines. Then they get assured, [and] then they construct extra advanced pipelines and so forth and so forth. After which that’s the way you get them in control and get them extra expertise.

However the problem now’s a number of these fundamental pipelines could be constructed with brokers, and so there’s some quantity of entry-level work that was the place the place you possibly can practice your entry-level individuals. These are disappearing, which additionally impacts your expertise pipeline. In case you don’t have individuals initially, you then received’t have skilled individuals afterward.

So any suggestions for groups and the problem of the pipeline for expertise?

32.56
That’s such a tough query. I wish to say, don’t dismiss junior engineers. Practice them. . .

33.09
Oh, I yeah, I agree utterly. I agree utterly.

33.14
However that’s a tough choice to make, proper? As a result of you should be desirous about the longer term.

33.26
I believe, Aurimas, the mindset individuals should [have is to] say, okay, so the normal coaching grounds we had, on this instance of the info engineer, have been these fundamental pipelines. These are gone. Effectively, then we discover a totally different means for them to enter. It may be they begin managing some brokers as an alternative of constructing pipelines from scratch.

33.56
We’ll see. We’ll see. However we don’t know.

33.58
Yeah. Yeah. We don’t know. The brokers even within the information engineering house are nonetheless human-in-the-loop. So in different phrases a human nonetheless wants to observe [them] and ensure they’re working. In order that may very well be the entry-level for junior information engineers. Proper?

34.13
Proper. However that’s the onerous half about this query. Then reply is, that may very well be, however we have no idea, and for now perhaps it doesn’t make sense. . .

34.28
My level is that should you cease hiring these juniors, I believe that’s going to harm you down the highway. So that you simply employed a junior and employed the junior after which stick them in a special monitor, after which, as you say, issues would possibly change, however then they’ll adapt. In case you rent the fitting individuals, they are going to have the ability to adapt.

34.50
I agree, I agree, however then, there are additionally people who find themselves probably not proper for that position, let’s say, and , what I. . .

35.00
However that’s true even if you employed them and also you assigned them to construct pipelines. So identical factor, proper?

35.08
The identical factor. However the factor I see with the juniors and fewer senior people who find themselves at the moment constructing is that we’re relying an excessive amount of on vibe coding. I might additionally counsel in search of some methods on onboard somebody new and ensure that the individual truly learns the craft and never simply is available in and vibe codes his or her means round, making extra points for senior engineers then truly helps.

35.50
Yeah, this can be a huge subject, however one of many challenges, all I can say is that, , the AI instruments are getting higher at coding at some degree as a result of the individuals constructing these fashions are utilizing reinforcement studying and the sign in reinforcement studying is “Does the code run?” So then what individuals are ending up with now with this newer technology of those fashions is [that] they vibe code and they’ll get code that runs as a result of that’s what the reinforcement studying is optimizing for.

However that doesn’t imply that that code doesn’t introduce correct to the fitting. However on the face of it, it’s operating, proper? An skilled individual clearly can in all probability deal with that.

However anyway, so final phrase, you get the final phrase, however take us on a constructive word.

36.53
[laughs] I do consider that the longer term is brilliant. It’s not grim, not darkish. I’m very enthusiastic about what is going on within the AI house. I do consider that it’ll not be as quick. . . All this AGI and AI taking on human jobs, it won’t occur as quick as everyone seems to be saying. So that you shouldn’t be nervous about that, particularly relating to enterprises.

I consider that we already had [very powerful] expertise one or one and a half years in the past. [But] for enterprises to even make the most of that sort of expertise, which we already had one and a half years in the past, will nonetheless take one other 5 years or so to totally truly get probably the most out of it. So there might be sufficient work and jobs for at the very least the upcoming 10 years. And I believe, individuals shouldn’t be nervous an excessive amount of about it.

38.06
However normally, finally, even those who will lose their jobs will in all probability respecialize in that lengthy time period to some extra invaluable position.

38.18
I suppose I’ll shut with the next recommendation: The principle factor that you are able to do is simply hold utilizing these instruments and continue to learn. I believe the excellence might be more and more between those that know use these instruments properly and people who don’t.

And with that, thanks, Aurimas.

Aurimas Griciūnas on AI Groups and Dependable AI Programs – O’Reilly

Transcript

Related Articles

Cisco Safe AI Manufacturing unit: Powering Agentic AI at Scale

PI Introduces Versatile Modular Precision Linear Levels

Oldest Robotic on Display screen (1897)

LEAVE A REPLY Cancel reply

Latest Articles

Cisco Safe AI Manufacturing unit: Powering Agentic AI at Scale

PI Introduces Versatile Modular Precision Linear Levels

Oldest Robotic on Display screen (1897)

India Submit to Launch 24 Velocity Submit for Subsequent-Day Assured Supply

Cascade of A.I. Fakes About Battle With Iran Causes Chaos On-line

ABOUT US