Testing and reviewing a code base are standard practices in software development. Nowadays, a wide range of testing practices, including AI-based methods, are available to developer teams. Typically, such practices efficiently ensure software quality. However, time and cost pressure may lead to undesired quality waste and, therefore, to rising fear and weak communication within the development team. These well-known phenomena raise the need for improved software development workflows for responsible software delivery that avoid such adverse effects. Test-driven development (TDD) is a popular software development technique which has been hotly debated by high-calibre IT experts for decades. There are many mutual opinions on how and when to apply the TDD technique.
But the promises of the TDD method speak for themselves: reduced bug density, less overengineering, well-understood code requirements, improved code quality, faster development, early treatment of difficult things, higher effort estimation accuracy, and, last but not least, better collaboration and communication within teams. While its effectiveness remains debatable, the TDD method has addressed many challenges and possible obstacles when developing high-quality software. TDD has contributed to and inspired the development of many other helpful methods and tools. In this article, we would like to review the basics of TDD and take a closer look at the recent trends to automate TDD by embracing GenAI.
Test-Driven Development
“Clean code that works is the goal of the test-driven development”, emphasizes Kent Beck, the creator of the TDD programming style. Kent has developed a method to manage the complexity and, as he says, the fear during programming through fine granular development steps. These small steps make it possible to understand the gap between programming decisions, i.e., the current code behavior, the desired quality of the code, and the goal of the software system. Understanding these gaps can be used to close them and reduce the density of bugs. Thus, developing production code is driven by tests, the so-called test-first approach. As such, incremental testing and refactoring trigger the code development. While test-driving the production code development, a trustworthy test basis for the whole software is implemented. Like a contract, the tests are developed incrementally in small steps. They ensure that already existing functionality is not lost and that adding a new feature to the already working code does not break everything.
In TDD, there are mainly two rules:
- Write a failing test before you write any code
- Eliminate duplication
Interestingly enough, these two simple rules have a big impact on the way how the code base and the team evolves. The style also determines how the final software architecture is composed and how the whole team dynamics are. The rules have caused the well-known TDD mantra (red: write a test, green: make it run, refactor: make it right) describing the development steps when applying this programming style.
Consider the following instructions when using that style:
- A test has to be written before developing production code.
- Only as much code may be written in the test method as is required for the test to fail or even for the compilation of this test to fail.
- Only as much code may be implemented as is sufficient for the current test case to be compiled and to pass.
See Figure 2 for the general approach when developing a feature or solving a bug.
As you can see, the most important phase Analyze includes the generation of the so-called test list or to-do list. A to-do list contains test cases, each covering a small increment in feature implementation. Each test case determines how small and ugly the code changes can be to make the new test run. The order of the tests picked up from the to-do list is also crucial since it determines in which order the functional increments are added to the overall feature realization. The tests on the to-do list even influence the resulting software architecture. For effectiveness, the to-do list should be compiled with a lot of smarts: some test cases can be deleted, some are given a higher priority, and new ones are added to the list during the green phase.
Thus, TDD developers must have a strong command of the practice and experience in development. Like a spider weaves its web, the requirements have to be decomposed into small components, distributed into many small tests, and then incrementally implemented in tiny steps, still providing a robust overall feature. Therefore, the experience and knowledge about everything that can go wrong during programming is therefore very valuable, particularly those not explicitly written down in the feature requirements. This is one reason why test-driven development has only limited adoption in the industry: it requires open-minded, experienced developers and effectively communicating teams. Despite these challenges, the advantages of this methodology continue to keep it in high demand.
Can GenAI be efficiently incorporated for TDD and vice versa?
Is writing unit tests still one of the main benefits of AI-based code generators? Of course, AI can generate tests and code simultaneously in one shot. Can it do so iteratively and incrementally, as postulated by TDD, as such enhancing the code generation correctness? Adding AI-generated tests is one of the time savings and recent techniques boost testing performance, in particular, when producing edge cases for tests. Automated code generation can benefit TDD stages because the development is made in small steps, adding tiny functional increments in each iteration.
There are many approaches for improving AI code generation using the test description, both in natural language and as a textual or numerical data example. Such a prompting technique has been widely used for solving programming challenges as it enables developers to verify the correctness of generated code against predefined tests. However, the approach does not directly embody the core TDD idea. The code generation is still produced in one shot, not incrementally using tiny increments, as specified by TDD.
It is also worth mentioning that in TDD, the tests represent a kind of specification. The developers test-drive tiny little developments by breaking the problem into tiny pieces. Developing a to-do list which should contain short and precise test cases requires developers to shortly and precisely formulate their coding intents through tests - which is one of the keys to success when solving programming problems. Putting new test cases on the to-do list is also an iterative process, subtly driving further development. By applying such a programming mindset, a deeper understanding of the programming problem is more likely to be. Can GenAI also produce better coding solutions when operated in TDD mode?
Recent AI method developments show that incorporating tests in the GenAI code generation process enhances the ability of the model to produce a correct functional code. On the other hand, it requires a holistic software engineering approach for developers to continuously interact with the GenAI model, check the correctness of the model’s output, write tests and code, and manage the test cases on the to-do list. The adoption of TDD in an automated code generation process seems to be a promising strategy for shaping and expanding visions of AI evolution in the software development field. Many challenges have to be tackled before then, especially in the research and development of the techniques for supervising the GenAI for code generation and interaction patterns between developers and AI models. The GenAI’s output requires strong supervision of the produced code quality. Also, various risks should not be underestimated when embracing AI. If the entire context is available to AI, it could easily manipulate tests, code, configurations, and even specifications. It can potentially propose buggy solutions to let the test pass.
TDD is effective for solving problems and it has the potential to enhance the quality of the automatically generated code. Having said that, however, we should instead consider how GenAI can be further optimized to gain a deeper and better understanding of programming problems and support communication within teams.
The interested reader is referred to these sources:
- Kent Beck: Test-Driven Development by Example, Addison Wesley, 2003
- Kent Beck: Tidy First? A Personal Exercise in Empirical Software Design, O’Reilly Media, 2023
- Martin Flower: Test-Driven Development, 2023, https://martinfowler.com/bliki/TestDrivenDevelopment.html
- Ian Cooper: “TDD, Where Did It All Go Wrong“, https://www.youtube.com/watch?v=EZ05e7EMOLM
- Mathews, Noble Saji and Nagappan, Meiyappan: Test-Driven Development and LLM-based Code Generation, Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024