AI Code Smells: Expert Q&A

The demand for AI agents is surging, yet many produce code riddled with errors requiring extensive manual fixes. Factory, a new player in the field, aims to change that by building quality checks directly into its coding agent’s process.

——————————

Q: Tell us a little bit who you are and what Factory does.

Eno Reyes: I’m the co-founder and CTO of Factory. We’re building a platform to help large engineering organizations build software fully autonomously. Specifically, we provide a cutting-edge coding agent capable of handling any task throughout the software development lifecycle, alongside tools to analyze code quality and measure the impact of these agents, maximizing successful implementation.

Q: There’s a lot of big players in the coding agent space. Why build another coding agent?

ER: We’ve been focused on this for almost three years. Many of our researchers have backgrounds building both large language models and agents.

A truly effective agent needs to be model agnostic—deployable in any environment, operating system, or integrated development environment. Many existing tools force a difficult choice that we believed wasn’t necessary.

You’re often locked into a single large language model or required to standardize on a specific IDE across your entire company. Building a genuinely model and vendor-agnostic coding agent requires significant effort in “harness engineering” to make it work, a skillset distinct from model creation. That’s why we believe companies like ours can build agents that consistently outperform in lab evaluations.

Q: What does the harness engineering entail for something that’s connecting to any given IDE, terminal, whatever. How do you make sure that just works?

ER: It’s a complex problem with multiple dimensions. You must manage context, as all large language models have context limits, especially during tasks lasting eight to ten hours. Considerations include how you instruct the agent and inject environment information, and how you handle tool calls. It requires meticulous attention to detail.

This is where our expertise lies—the sum of hundreds of small optimizations. We’ve developed methods to identify what constitutes a good versus a bad harness, allowing us to systematize improvement. Because we also build coding agents, we can automatically upgrade and refine those harnesses.

Q: The definitions of good and bad are super important to the final result of any software. What is a good harness?

ER: It’s a surprisingly broad question, as there are countless signals you can use in software development. These range from automated checks—whether code compiles, passes linting, or has passing tests—to whether associated documentation explains the code’s functionality.

Ideally, organizations would leverage all these signals, but in reality, many lack comprehensive implementation.

We dedicate significant effort to identifying these hundreds of potential validation signals. Our evaluations incorporate these signals, and we assist organizations in deploying coding agents by helping them understand and implement those signals within their own codebases.

If they’re missing signals, they can use our tools to add them, improving both their code quality and the performance of our agents.

Q: So it’s not just the agent, it’s the tooling around the agent, right? A software developer has a ton of tooling that gets them from writing code to production. What’s the sort of tooling that an AI coding agent needs?

ER: Developers benefit from linters, static type checkers, unit and end-to-end tests, and auto formatters. SaaS static application security testers and scanners are also valuable.

Essentially, anything that audits code and provides a “green or red” assessment—or even a score, like code complexity—can be used. GitHub recently released a code quality analyzer. Agents can leverage all of these to enhance their work. If you want an agent to operate without human intervention, it needs that feedback from somewhere.

We believe autonomy will be achieved by automating more of these signals, rather than relying on human oversight.

Q: Instrumenting a sort of data observability, signal production is an important thing for better agents. You said the agents can help with that. How does that work?

ER: Humans can often get by without extensive tooling. If you’re unsure about code formatting, a senior engineer can review your work.

However, scaling requires a different approach. Deploying agents isn’t simply hiring another person; it’s like onboarding a hundred intern-level engineers. You can’t code review a hundred engineers—you need automation.

If our autonomous coding agent, Droid, performs an “autonomy maturity analysis” and identifies missing signals, you can instruct Droid to fix them. As a developer, your focus should be on how to best utilize linters and formatters. Once that’s decided, the feedback loop accelerates.

Q: A lot of senior developers talk about code smells. Is it possible to systematize code smells automatically? Or does somebody need to go in and say “That’s a bad practice”?

ER: Achieving a fully autonomous codebase requires both static automations and AI-powered automations integrated into the software development lifecycle.

For example, Droid can conform to workflows like code review, incident response, and documentation, becoming an automation within those processes. You can integrate Droid into your GitHub actions pipeline, transforming it into a code review tool. To check for specific code smells, you have a fully customizable code review Droid.

Similarly, you can plug Droid into cron jobs or virtual machines—even run it locally on a laptop. We anticipate AI agents will handle the more nuanced, non-statically determinable practices related to code smells.

Q: Some folks have been worrying about AI agents causing a rise of work slop, low-quality code that needs to be fixed manually. How do you make agents that don’t contribute to that problem—that are a net benefit?

ER: Stanford researchers conducted a study examining the impact of AI on codebases and productivity. They analyzed code volume, adoption rates, the density of power users, and, crucially, baseline code quality. The goal was to determine which factors predicted whether AI would accelerate or decelerate a company.

Surprisingly, code volume, agent numbers, and user penetration didn’t correlate with productivity. The sole predictor was code quality. Higher-quality codebases saw greater acceleration with AI. It’s intuitive—AI excels at pattern recognition, so good code in means good code out.

We’re seeing organizations hampered by poor standards and low-quality code when introducing AI agents. We can provide a fantastic agent, but also the tools to assess the quality of your existing codebase.

Q: How are AI agents changing the nature of work?

ER: As general software development agents become more capable, we’re realizing that much of the world can be framed as a software task. Creating a PowerPoint presentation for sales, conducting customer research, or responding to complex documentation requests—these are all software tasks. We believe the best general agents are fundamentally the best software development agents.

Increasingly, it’s not just software engineers, but also product managers, data scientists, and even sales professionals who are recognizing the potential of software development agents, a trend we find particularly interesting.

AI Code Smells: Expert Q&A

Related

Podcast Accelerator: Spotify Alum Helps New Creators Launch

Ghost Town Project: Local Government Failure Explained

You may also like

Leave a Comment Cancel Reply