I don't think there's a good objective metric here, at least not like cyclomatic complexity or SonarQube-style checks, because it's difficult to tell whether the code is overcomplicated by AI, or whether the domain itself is just complicated.
Code is derivative - it's modeling real behavior. So its quality depends closely on how well it captures what should actually happen.
That's why measuring the actual outcome is more important than raw "code quality" metrics: do the important user flows and edge cases work, how the system behaves in these edge cases. I'd more use something like Journey SDK to fuzz edge cases and measure how well the system behaves, rather than measure some arbitrary properties of the code.
Not sure if I got the question right, but there are benchmarks like SWE pro and stuff. There's whole another debate whether you can trust it or not, and whether the labs are training on those benchmarks, but that's one way to measure that.
Other than benchmarks, I'd say that's your own test suite
Why would a metric for code quality be different depending on how the code got to to a file? In other words, if there was a good measure, would it not exist already for us? How do we measure the quality of our own code?
I don't think there's a good objective metric here, at least not like cyclomatic complexity or SonarQube-style checks, because it's difficult to tell whether the code is overcomplicated by AI, or whether the domain itself is just complicated.
Code is derivative - it's modeling real behavior. So its quality depends closely on how well it captures what should actually happen.
That's why measuring the actual outcome is more important than raw "code quality" metrics: do the important user flows and edge cases work, how the system behaves in these edge cases. I'd more use something like Journey SDK to fuzz edge cases and measure how well the system behaves, rather than measure some arbitrary properties of the code.
Not sure if I got the question right, but there are benchmarks like SWE pro and stuff. There's whole another debate whether you can trust it or not, and whether the labs are training on those benchmarks, but that's one way to measure that.
Other than benchmarks, I'd say that's your own test suite
i would never trust benchmarks tbh most of the new model releases do benchmaxxing
Sad, but fair!
Why would a metric for code quality be different depending on how the code got to to a file? In other words, if there was a good measure, would it not exist already for us? How do we measure the quality of our own code?