Devs gaining little (if anything) from AI coding assistants

Coding assistants have been an obvious early use case in the generative AI gold rush, but promised productivity improvements are falling short of the mark — if they exist at all.

Many developers say AI coding assistants make them more productive, but a recent study set forth to measure their output and found no significant gains. Use of GitHub Copilot also introduced 41% more bugs, according to the study from Uplevel, a company providing insights from coding and collaboration data.

The study measured pull request (PR) cycle time, or the time to merge code into a repository, and PR throughput, the number of pull requests merged. It found no significant improvements for developers using Copilot.

Uplevel, using data generated by its customers, compared the output of about 800 developers using GitHub Copilot over a three-month period to their output in a three-month period before adoption.

Measuring burnout

In addition to measuring productivity, the Uplevel study looked at factors in developer burnout, and it found that GitHub Copilot hasn’t helped there, either. The amount of working time spent outside of standard hours decreased for both the control group and the test group using the coding tool, but it decreased more when the developers weren’t using Copilot.

Uplevel’s study was driven by curiosity over claims of major productivity gains as AI coding assistants become ubiquitous, says Matt Hoffman, product manager and data analyst at the company. A GitHub survey published in August found that 97% of software engineers, developers, and programmers reported using AI coding assistants.

“We’ve seen different studies of people saying, ‘This is really helpful for our productivity,’” he says. “We’ve also seen some people saying, ‘You know what? I’m kind of having to be more of a [code] reviewer.’”

A representative of GitHub Copilot didn’t have a comment on the study, but pointed to a recent study saying developers were able to write code 55% faster using the coding assistant.

The Uplevel team also went into its study expecting to see some productivity gains, Hoffman says.

“Our team’s hypothesis was that we thought that PR cycle time would decrease,” Hoffman says. “We thought that they would be able to write more code, and we actually thought that defect rate might go down because you’re using these gen AI tools to help you review your code before you even get it out there.”

Hoffman acknowledges there may be more ways to measure developer productivity than PR cycle time and PR throughput, but Uplevel sees those metrics as a solid measure of developer output.

Check back later

In addition, Uplevel isn’t suggesting that organizations stop using coding assistants, because the tools are advancing rapidly.

“We heard that people are ending up being more reviewers for this code than in the past, and you might have some false faith that the code is doing what you expect it to,” Hoffman adds. “You just have to keep a close eye on what is being generated; does it do the thing that you’re expecting it to do?”

In the trenches, development teams are reporting mixed results.

Developers at Gehtsoft USA, a custom software development firm, haven’t seen major productivity gains with coding assistants based on large language model (LLM) AIs, says Ivan Gekht, CEO of the company. Gehtsoft has been testing coding assistants in sandbox environments but has not used them with customer projects yet.

“It becomes increasingly more challenging to understand and debug the AI-generated code, and troubleshooting becomes so resource-intensive that it is easier to rewrite the code from scratch than fix it.”

—Ivan Gekht, CEO, Gehtsoft

“Using LLMs to improve your productivity requires both the LLM to be competitive with an actual human in its abilities and the actual user to know how to use the LLM most efficiently,” he says. “The LLM does not possess critical thinking, self-awareness, or the ability to think.”

There’s a difference between writing a few lines of code and full-fledged software development, Gekht adds. Coding is like writing a sentence, while development is like writing a novel, he suggests.

“Software development is 90% brain function — understanding the requirements, designing the system, and considering limitations and restrictions,” he adds. “Converting all this knowledge and understanding into actual code is a simpler part of the job.”

Like the Uplevel study, Gekht also sees AI assistants introducing errors in code. Each new iteration of the AI-generated code ends up being less consistent when different parts of the code are developed using different prompts.

“It becomes increasingly more challenging to understand and debug the AI-generated code, and troubleshooting becomes so resource-intensive that it is easier to rewrite the code from scratch than fix it,” he says.

Seeing gains

The coding assistant experience at Innovative Solutions, a cloud services provider, is much different. The company is seeing significant productivity gains using coding assistants like Claude Dev and GitHub Copilot, says Travis Rehl, the CTO there. The company also uses a homegrown Anthropic integration to monitor pull requests and validate code quality.

Rehl has seen developer productivity increase by two to three times, based on the speed of developer tickets completed, the turnaround time on customer deliverables, and the quality of tickets, measured by the number of bugs in code.

Rehl’s team recently completed a customer project in 24 hours by using coding assistants, when the same project would have taken them about 30 days in the past, he says.

Still, some of the hype about coding assistants — such as suggestions they will replace entire dev teams rather than simply supplement or reshape them — is unrealistic, Rehl says. Coding assistants can be used to quickly sub out code or optimize code paths by reworking segments of code, he adds.

“Expectations around coding assistants should be tempered because they won’t write all the code or even all the correct code on the first attempt,” he says. “It is an iterative process that, when used correctly, enables a developer to increase the speed of their coding by two or three times.”

number goes up???

Many developers say AI coding assistants make them more productive, but a recent study set forth to measure their output and found no significant gains. Use of GitHub Copilot also introduced 41% more bugs, according to the study from Uplevel, a company providing insights from coding and collaboration data.

number goes up!!! (its a bad number)

4 Likes

The main problem I think is often that by the time these assistants are released the code they train on is already out of date, plus they can’t come up with anything new so any novel idea is gonna be very difficult to impossible for them to address and if your problem is simple enough for an ai to solve you’ll probably have better luck on stack overflow anyway

I dont even like ides because they try to debug my code while I’m making and I have to beat it back like “stay out of this” that’s why I prefer plain text editors and if I really need help I just ask my friends or people online

3 Likes

It has always made such little sense to me because like they say in the article it’s all based in this mistaken idea that the real work of being a programmer is in generating the code, when no, the real work is gaining an understanding of the code and problem space so that you can write the code. And it’s really frustrating to me because I feel like that mistaken belief that the number of lines written is at all a meaningful metric still haunts how management thinks about output.

And I don’t see how people expect to do that when they replace actually writing the code with doing a bunch of code reviews of AI-generated code (if they even do that and don’t just drop it in).

And I’m actually very pro-tool in many ways, but with the explicit understanding that you better be able to do it without the tools too like how pilots still need to know how to fly for when the auto-pilot goes out. Plus there’s honestly great tools like static analyzers and the noble diff tool and debuggers that are extremely helpful and can already generate an actual functional model of the code and not just the vague statistical model of it.

But noteworthy, none of these try to generate code. They help illustrate things in code that may be hard to be certain of just reading it and thinking about it, but they don’t generate code. Non-AI code generators do exist and I very much remember being immediately and correctly suspicious of them, though there are definite cases where they do work well.

Every time I’ve talked about this with friends in the industry that make more money than me and have more prestige they are all on the bandwagon and it makes me feel like I’m crazy. But then I’ve always been the person screaming that the code sucks and we should do it right and fuck what the deadline and sadly that is not a popular person to be.

11 Likes

Never mind that if you don’t understand the code how do you expect the person who handles your code next to understand the code because source code is actually for people more than the computer… I can rant on this for fucking days…

8 Likes

I think anyone who would rather use an AI code tool than an IDE is going to be (or already is) a pretty lousy programmer. If you’re not responsible for 100% of the code in a project, the AI tool strikes me as far less able to help you understand data structures or control flow, and I don’t see how you can make working code that fits into a larger project if you can’t understand those things.

I use the jetbrains python and java IDEs at work and they really do make things massively more comprehensible. If I need to know where a value comes from I can almost always get a variable’s usage information (i.e., where it’s declared/assigned, where it gets written to).

4 Likes

It actually drives me mad in my current position when I’m talking to a colleague about how to solve a problem or add a new bit of functionality to our environment and my first thought is to hit up Stack Overflow and trial things…and my colleague just goes to ChatGPT. :pensive: It’s quite low-stakes in our case overall, but, as chimerror said, the real work is gaining an understanding of the code and problem space.

4 Likes

The new technician at my job has never learned R before, and I get that learning how to talk to computers can be tricky if you don’t know how. But also. Every time he pulls up chatgpt I die a little inside.

It’s one of those things where he can ask it “hey, write me some code that does [xyz]” and it will spit out something and then explain why it’s written that way. But also. The only way you learn how to talk to a computer is to internalize the syntax and understand what the different objects are and how to interact with them.

I should just dig up that package that is an inbuilt tutorial with modules and maybe that’ll be something for him to do.

4 Likes

A colleague did this in real time to see how well it would do translating things to fully escaped JSON. The answer in this case was like 95% correct…but that’s effectively the same as 0% if you’re trying to save time by putting it in directly because it wouldn’t parse, and if you’re double checking all of it then you’re going to be spending basically the same amount of time, so…

5 Likes

i’ve been feeling extra joker mode about “AI” after a phd told me he uses it for “general reasoning” (HELP) and then i had to debunk a terrible garbage paper that’s making the rounds of bad science news sites. like i think coding “AI” was already doing an okay job when it was just like, smartly populating your method or completing your syntax? that’s enough for Me Personally, and the part of this article where they say they don’t always even understand the code the “AI” wrote sends chills down my spine. like unless you’re working on idk, an emergency (?), it’s not worth creating a bunch of code you can never wade into

5 Likes