An AI Retrospective in building an OpenSource Project, KOffset

blog-post

Observations from my experience building koffset leveraging AI models. This project was built from scratch, with a goal of replicating metrics from other tools. The use of models was for exploring algorithms and design and flushing out aiding in the software development process.

I encourage developers to use tools that aid in their development process, and AI models are one of those tools.

TL;DR - What I do (now)

  • I use a separate model and application for 1:1 chat questions to avoid mixing in 1-off discussion and tangents with my development.

  • I use models to write code that I expect they has seen before.

  • For code that is new or at least less common, I instruct models to argue with me, tell me where I’m wrong.

  • I still review all code, I don’t want to put Kinetic Edge or my name on open-source software that hasn’t been written or reviewed by a human.

  • I worry about licensing. If an AI model used LGPL code for training, what does that do for my Apache 2.0 licensed projects? What about commercial endeavors?

What I learned

  • Using AI enforces the need of having SMEs, not replacing them. I know this can sound self-serving, but not once did I feel my Kafka knowledge was not needed when building this project.

  • Days of pair-programming are back, but without knowledge sharing. I gained a great asset in validating my ideas and development. But who has that knowledge when I move on? While saving AI context used to develop software, will that be relevant when a new model is released?

  • You will need to be less productive in order to get better. Your development process is changing, give yourself time to adapt.

  • Models use old versions of libraries and also mix versions together. Explore multiple models to see which are more current.

    • When it comes to Apache Kafka, whose API is evolving rapidly, be ready for a model to give you deprecated code.

    • Always give specific versions of software in the context when working with any model. Include more than just version number, such as “use Apache Kafka 4.1 which includes no zookeeper (obviously)”.

  • When a model gets something wrong, take the time to capture that in a rule to add to your context. For example, the models I used assumed that getting the latest() offsets for a topic would also provide the max timestamp, not -1. If it gets it wrong the first time, it will get it wrong again.

  • Personally, I was more successful when I used the models to challenge my algorithms and my implementations. This aspect of pair-programming was extremely beneficial for me. Instructing the model to challenge your algorithms gives it context from ths start, and leverages its training to uncover areas to validate.

  • The “praise” the models give you will get old quickly, set up context to minimize that.

  • Get unit tests in quickly. It is easy to accept new code from a model, only to get a bug that goes unnoticed, making it difficult to revert.

  • Use revision control effectively. For the same reasons as needing unit-tests, you want the ability to revert changes. I failed to do this, and my -1 max timestamp came back and caused a day of lost time.

  • Using AI models requires more discipline (not less).