Just how good is AI-assisted code generation? – Computerworld

“In our early experimentation, we were doing a lot of work in Python, JavaScript and languages like that,” GitHub COO Kyle Daigle said in an earlier interview with Computerworld. “GitHub is mainly a Ruby company, but we also write in Go, and C, and FirGit. And so we were expanding our use cases of Copilot and using it in different languages. But overall, Copilot is able to work on the vast majority of languages that are in the public sphere.”

Relying on nothing more than user prompts based on natural language processing, genAI-assisted code generators can offer software code suggestions ranging from snippets to full functions. And updates can make the tools even better.

Amazon, for instance, said updates to its CodeWhisperer tool increased code acceptance rates from around 20% on average to 35% across all languages and use cases.

“Now, with Amazon Q included with CodeWhisperer, developers can ask about their code, and leverage Amazon Q’s capabilities to find bugs, optimize, and translate code they are working on,” Doug Seven, general manager of Amazon CodeWhisperer and director of software development for Amazon Q, said in a blog.

Why is AI-assisted coding so powerful?

One of the more heralded aspects of AI-assisted coding is that users don’t have to be versed in software development. Natural language processing allows even business users to simply write a prompt and get back the software needed for any number of projects.

For example, users can write a comment in natural language that outlines a specific task in English, such as, “Upload a file with server-side encryption.” Based on that information, CodeWhisperer recommends one or more code snippets directly in the development platform to accomplish the task, according to an Amazon spokesperson.

Many of the coding tools also come with enhanced code securitycapabilities scans and code remediation suggestions. Some even come with “bias” filtering and reference trackers, which detect whether a code suggestion might be similar to open-source training data. The latter are important features in an AI-based coding assistant.

Amazon and other providers are also experimenting with tools to assist non-developers in producing apps for business purposes. For example, Amazon is testing and working on prototyping a tool called PartyRock that allows non-developers to work with genAI and LLMs in a sandbox environment.

“You can experiment with building different applications,” Seven said in an interview with Computerworld. “We’ll see an increase in different tools for different personas that will use generative A. I think we’re just scratching the surface on where we’ll see genAI in different places. We’ll start to see more and more of these tools.”

Accuracy rates vary

Seven said code acceptance rates for CodeWhisperer are around 30% to 40%, but that doesn’t mean the code it wrote was incorrect or error ridden. The acceptance rate refers to whether the genAI tool correctly interpreted what the developer asked it to do.

Seven described something akin to a conversation between a developer and an AI-code generator, where the developer asks it to produce something and then modifies the request with follow-up requests. The ability of CodeWhisperer to produce error-free, usable code is “quite high,” though Seven said Amazon doesn’t reveal internal metrics.

Anecdotally, developers and IT leaders have placed the ability of popular AI-based code augmentation tools to correctly generate usable code at anywhere from 50% to 80%.

“We had this as a hypothesis. Now we’re starting to see this in actual studies,” said Derek Holt, CEO of digital transformation service provider Digital.ai.

According to a study by Cornell University last year, there’s a wide variance between various genAI coding tools. The study showed ChatGPT, GitHub Copilot and Amazon CodeWhisperer generate correct code 65.2%, 64.3% and 38.1% of the time, respectively.

While the study is a year old, the accuracy rates for the AI-assisted code tools is “more or less the same” today, according to Burak Yetiştiren, the paper’s lead author and a graduate student researcher at UCLA’s Henry Samueli School of Engineering and Applied Science.

A study by GitClear, a developer tool for GitHub and GitLab that provides code analysis and git stats, examined more than 153 million lines of code from 2020 to 2023. Highlighting key shifts in code churn, duplication, and age, it explored the impact of AI tools like GitHub Copilot on programming practices.

Among GitClear’s findings was that developers write code 55% faster when using Copilot. When GitClear looked at GitHub’s code quality and maintainability compared to what would have been written by a human, it found less experienced developers have a greater advantage with AI-assisted programming compared to veteran developers.

GitHub’s own data suggests that junior developers use Copilot about 20% more than more experienced developers, the research found.

GitClear conducted a corresponding survey of 500 developers and asked, “What metrics should you be evaluated on, when actively using AI?” The top three issues they named were code quality, time to complete task, and number of production incidents.

“When developers are inundated with quick and easy suggestions that will work in the short term, it becomes a constant temptation to add more lines of code without really checking whether an existing system could be refined for reuse,” GitClear’s paper said.

More code, but more errors?

Developers are producing 45% more code with the automation tools, according to Digital.ai’s Holt, but that’s not necessarily a good thing.

“The main challenge with AI-assisted programming, however, is that it becomes so easy to generate a lot of code which shouldn’t have been written in the first place,” Adam Tornhill, founder & CTO at CodeScene, said on X/Twitter.

Another wrinkle is that when code is not generated by humans, it is more opaque. As a result, quality challenges are emerging, including questions about whether code can effectively be tested for errors and security holes.

In a survey of software engineers last year (96% of whom used AI-based coding tools) by developer security platform Snyk, more than half said insecure AI code suggestions were common.

“That shouldn’t surprise us,” Holt said. “It’s early days and we’re training these models on all of the code in certain repositories. All you’re going to do is repeat the mistakes that were made by the developers who wrote that original code.”

Given that much of a developer’s time is spent fixing existing code — not writing new features — the ability to read code and find issues when it’s not written by humans becomes yet another issue, Holt said.

Even with those issues, developers wouldn’t be adopting tools like Copilot if they didn’t believe it accelerated their ability to produce code. GitHub’s research on the former point found “developers are 75% more fulfilled when using Copilot.”

In a study of 450 Accenture developers using Copilot for six months, 88% of suggested code was retained, build success rate increased by 45%, and every developer surveyed reported Copilot was useful, according to Microsoft’s Silver.

Churn, moved and copy/paste code issues

GitClear, however, also found that with the increased use of AI-assisted programming, the amount of “Churn,” “Moved,” and “Copy/Pasted” code increased significantly.

“Churn” is the percentage of code that is pushed to the repository, then subsequently reverted, removed or updated within two weeks. It was relatively rare when developers authored all their own code; only 3% to 4% of code was churned prior to 2023.

But overall code churn jumped 9% the first year Copilot was available in beta — the same year that ChatGPT became available.

From 2022 through 2023, the rise of AI assistants was strongly correlated with “mistake code” being pushed to the repository. Copilot prevalence — its use in generating code — was 0% in 2021, 5% to 10% in 2022, and 30% in 2023, GitClear found.

“If the current pattern continues into 2024, more than 7% of all code changes will be reverted within two weeks, double the rate of 2021,” GitClear’s report said.

There is perhaps no greater scourge to long-term code maintainability than copy/pasted code. That’s because code that’s simply reused can also contain previous mistakes, security holes or other issues.

“I have no doubt we’ll be able to figure out the problems, and we’ll be able to train models on small amounts of code created only by our best developers,” Holt said. “But right now you’re getting a junior developer, and if you’re not paying attention to what that means to the broader software development lifecycle, you’re going to be running some risks.”

Amazon’s Seven argued that one of the strengths of CodeWhisperer and other products is their ability to examine existing code for errors and then suggest changes. “So, it’ll actually give you the code to make that change,” Seven said. “The advantage of using Amazon Q [CodeWhisperer] in this context is as a developer, you have a debugging companion.”

That “could be particularly useful in checking for discrepancies in existing code that may not be familiar to developers. And Q is really good at that,” he said.

Another advantage of automated tools is that they can be used in a set-and-forget mode, where a developer or engineer simply explains a task and then the tools complete it independently – whether developing a new application or debugging an existing one. “In either case, the accuracy of the code, and the quality of the code, is really quite high,” Seven said.

What’s not in question is that over time, software generation tools will continue to improve — though there will always be the need for a human in the loop.

“My gut tells me there will always be roles for developers, whether that’s reviewing or catalogizing or a mixture of both,” Holt said. “We’re not even talking about the fact that delivering code is not the goal. …Delivering great features that customers love is the actual goal.

“So, from my view, I still have a long career ahead of me in software development.”

READ SOURCE