Core Insight about AI
The core understanding is that AI LLM models do not know facts and do not understand concepts but do have some kind of statistically inferenced world model that associates things like facts and concepts in a manner approximating how they occur in the real world. They hallucinate an answer to a prompt that is sometimes correct. Ultimately retries are cheap and easy, so we develop a process to take enough shots a goal to succeed. Understanding this, we can build a process that uses that to create trustable output. These AI LLM models have a generative mode and an evaluative mode that can be used to check the work product from the generative mode. Those two insights drive this process.
Development Process
In code development we can use a test-driven development model to create proof points for valid code, and we use the AI’s evaluative mode to make qualitative assessments of the code. And then we use expert human coders to guide the process and make the final call. Refinement of the process to create shippable code is crucial and the ultimate point.
There are six phases to this process.
PRD Tests Coding Retries Build & Test Code Management Process Eval
The first step is to develop a precise and complete product requirements document. Think Waterfall with AI support. Start with an initial document, analyze it with an AI for completeness, level of detail, clarify ambiguities, and for each feature, component and capability have a description of the success criteria for that element and overall, a set of functionality, performance, security and maintainability criteria for the end-to-end system. At the end of this step a rich document with a set of high-level descriptions of the system, user stories, desired architecture, anticipated failure cases and the desired system response should be well defined. Using the deep research ability of AIs here is useful. Feature matrices and use cases can quickly be built off existing similar systems. Have the AI evaluate those.
In the next phase create a complete inventory of all the test cases. This includes unit tests, integration tests, end to end tests, API tests, performance tests, security tests, UI tests and any build related tests. Generate a set of AI prompts for code evaluation that are to be run against any code to assess code quality, maintainability, cultural code practices, best practices and the like. All the tests should produce either clear pass/fail or a readiness score (1-5: 5 high).
Create a dashboard of all the test output would also be needed to make build verification easy to assess. Include the evaluation summaries from the AI LLMs. Tune those for utility.
Create a set of coding prompts for the different modules and elements of the system. These prompts refer to the approved libraries, tools, patterns and coding approach for that module, function or element. Establish the best practices for prompting for the features and capabilities desired. Use the proof criteria as the target for the prompt. Develop a retry process including the best set of models to use. Refine these prompting strategies to evolve the process. The team’s development culture revolves around this.
At this stage a multi-shot model approach is taken where some N coding attempts (possibly against more than one LLM) are attempted. That code is then run through the tests for verification and assessments. The assessments should produce readiness scores (1-5:5 high) per their criteria and then human eval would take place on top scoring output. Within the regime of a set of tests, viable code blocks/files are produced that filter out of the top of this process. This is still a human in the loop process. For now.
All code has provenance states that would be tracked. Bug fixes, refactors and enhancements pivot back and forth between human and LLM led approaches depending on the code complexity and LLM readiness and effectiveness. Watch for time sucks.
The overall process is assessed periodically. Assessment of the code architecture; file hierarchy/schema, database architecture, code maintainability assessment, prompt best practice eval and LLM model output eval. The intention here is to elevate developers to development managers guiding the productive work product of their team, which in this case is the process and the methodology for utilizing a set of LLMs.
