Pair Programming with ChatGPT

I can't say that I've always wanted to pair program with a computer. Still, when the opportunity to work on a project with ChatGPT arose, I was eager to try it. Why settle for writing code on a computer when you can make the computer write code for you?

At first, I didn't have high expectations of the service; but after asking it a few questions, I was impressed by its responses and eager to put it through more rigorous testing. I was planning on composing a server of microservices for handling some routine tasks that I do daily, and working on this project would give me an excellent opportunity to see how capable ChatGPT would be in a real-world setting.

Working with ChatGPT has been surprising for both good and bad reasons. I wanted to share my experience to help anyone looking to use this service for assistance in programming. Would I recommend ChatGPT to someone looking for help with coding? Yes, Maybe, and No, depending on what you are doing with it. Keep reading to understand better where ChatGPT excels and where it can be problematic.

What is it?

If you are unfamiliar with the service, ChatGPT is a Chatbot designed by OpenAI. It is built on top of the GPT-3 language models, developed in 2020 and designed to use deep learning to produce human-like text. In the past, GPT models have been used to create stories and writings that seem convincing enough to appear to be written by a human, even though they are artificially generated. ChatGPT takes this underlying deep learning model and adds an interface where the user can give it prompts, such as questions or statements. It will process those prompts and respond. Through both unsupervised and supervised learning, ChatGPT has been trained to react convincingly to human prompts in a way that seems similar to how a human would respond. Since its release, people have been asking ChatGPT all sorts of questions, including how to write code. ChatGPT has shown a solid ability to provide responses that answer the questions accurately. I wanted to test ChatGPT by seeing if I could develop my latest project by only asking ChatGPT questions and using its code.

What was the project?

I want to discuss the project and why I was drawn to use ChatGPT. The overall goal is to build an application from a collection of Microservices to handle some everyday tasks that I do every day. I have a one-year-old daughter at home, so much of my time is spent measuring formula, keeping track of the food and drink she ingests and planning her mealtimes and overall schedules. I have a variety of scripts that I use in different formats for some of these tasks, but I would like to build and deploy a server at home that can handle all of these tasks in a centralized location. I chose the microservices approach because I want to be able to add and remove functionality as my daughter grows up, and using microservices means that it's a lot easier to enable or disable features without impacting the overall service architecture.

I wanted to build the initial services in Flask for two reasons. The first reason is that I am very familiar with Python, so it would be easier to validate the responses that ChatGPT gave me and spot any inconsistencies or errors right away. The second reason is that, although I had some cursory introduction to Flask, my area of expertise lies more in Django. I have always wanted to spend more time building more lightweight web servers in Flask. Since I have a lot to learn, it would give me a great perspective on how helpful ChatGPT could be to users unfamiliar with a particular language or framework.

Along similar lines, I have some decent experience with Docker and Docker Compose. Still, I needed help figuring out how to set up multiple dockerfiles within one multi-container application. ChatGPT could help guide me along this more complex topic I needed to familiarize myself with.

How did ChatGPT help?

While building this application, ChatGPT had a lot of valuable features and functionality that helped speed up my development process. First and foremost, ChatGPT is excellent at providing a boilerplate for setting up basic configurations on both the Flask and Docker sides of the application. Being able to ask two questions and have the baseline code for my flask app, the Dockerfile for the first app, and the Docker Compose file that includes the first app was miraculous. I could have come across something similar if I had found a single tutorial that handled all of these topics. Still, that one perfect tutorial usually doesn't exist. I might have spent up to an hour trying to piece together different tutorials until I had what ChatGPT provided me within seconds.

Moreover, it was super helpful that ChatGPT provides practical walkthroughs of the code it produces, so I can read along with the code and the explanation and get a quick and straightforward answer of precisely what the code is supposed to do and how the different files interact. If I had a question about something ChatGPT wrote, I could ask about a function or concept. It would know how to explain what was happening in context, suggest alternatives, or provide more examples of approaches to the same problem. For someone less familiar with Flask and more complex aspects of Docker, it was great to be refreshed on what I did know and have the newer or more complicated things explained to me simultaneously.

What did it do well (the "good")?

ChatGPT has some fantastic features that make it a very appealing programming partner. Because of the thread-based nature of the conversation, ChatGPT can keep a solid understanding of what we are discussing without requiring me to repeat myself. For example, since I was working in Flask and asking questions about Flask throughout this thread, it knew that I would expect responses that apply to building a Flask application. When I asked a question that didn't include context, such as "How do I set multiple HTTP methods for one route," it already knew to give me a response relevant to the topic we were discussing. Being able to follow the flow of conversation made the process of interacting with ChatGPT a lot smoother and more like interacting with a human being.

ChatGPT could also be even better than interacting with human beings because the threads are persistent and stay open over days, weeks, or even years. This means that you could ask a series of questions about Flask, then leave for a month, and come back and ask a question like the one I presented above, and ChatGPT would not miss a beat; it would be able to answer in context with all of the previous chat history fresh in its memory.

ChatGPT also excels in its ability to provide documentation around its code. This benefits someone with less experience in the field for two reasons. First, it makes it easier to understand the code when ChatGPT writes it so the user knows what is happening and what they may need to change. Second, it makes it easier for the user to understand what the code is doing down the road; if the user returns to this code in a month without working in Flask, then it's likely they may have forgotten the code or why it was written. Writing good documentation is an excellent habit for all developers, human or otherwise. So it's great that ChatGPT tends to do it by default. Not everything was clearly documented, but it was much more consistent than I tend to be when documenting my code.

Finally, I was very impressed by ChatGPT's ability to write code that follows the structure and syntax that Flask applications generally follow. For example, ChatGPT generally recommended that I use the Flask jsonify function to prepare json objects for my responses; since Flask doesn't automatically serialize and sanitize your data, it is helpful always to make sure to jsonify the data before providing it as content in the response. It could have suggested returning a python dictionary as the json content for the response. Flask typically can handle standard data types within a python dictionary and provide it as a JSON response. Instead, it recommended using jsonify, a safer way to handle the data being returned and accounts for serializing database models straight from python objects into valid json, preventing any implicit serialization issues. ChatGPT had a good handle on some best practices when building standard, simple Flask apps. This was very helpful for someone like me who is coming from a different web server paradigm.

What were some of the hiccups (the "bad")?

Although ChatGPT had a lot of features and functionality that impressed me, there were many points where it fell short of impressive and even came close to frustrating. One challenge was that it took a lot of work to communicate context with ChatGPT. It was hard to describe my specific file structure to the service, which meant it couldn't advise me on import errors and where to store files. There were even some instances, such as when describing where to store my .env files, where ChatGPT would tell me to create a .env file in one directory, but then the next code snippet would be written under the assumption that I had placed the .env file in a completely different directory. When it suggested creating multiple files that referenced each other, sometimes it would be clear from the imports that they were located in the same directory. Still, other times it would just give me the file contents and assume I knew where each file should go. At one point, it even caused my application not to work because it wrote in my Dockerfile that I should run the Gunicorn command and point it to my to start my production server, but I had not yet created that file and inadvertently placed it in the wrong directory, causing my entire run command to fail and my Docker container to crash. When I asked ChatGPT what the issue was, it gave me general advice about how to fix import errors, but I needed a clearer way for me to show it my file tree and have it point out the misplaced files.

ChatGPT also tended to play fast and loose with variable names and imports, which is not a big deal when talking about code snippets and abstract questions but can cause significant issues that are hard to debug when applied to a project. In some responses, it would import the entire datetime library. In contrast, in others, it would use "from datetime import datetime", and these conflicting imports can cause scripts to crash. In other responses, it would use the jsonify function, but forget to import it. So the code would give an import error instead of working. Sometimes it was clear that the response was just an abstract snippet, whereas other times, it would be an entire fully-written file; but in some instances, it was hard to tell which type of response it was providing, as it wouldn't tell me if it had written the entire file contents or just one piece. With variable names, many of the responses had abstract names like "data" or "result", which work well in small snippets but are not descriptive enough to capture what is coming back and can often overlap with other code snippets. There was one example where it named the variable "data", which represented the response content from the request, but then it ran that "data" through a function and named the return value "data" as well, thus overwriting the original "data" variable from the same example. In practice, this can work fine, but for a less experienced developer trying to debug the code from ChatGPT, this can cause confusion and frustration that can be avoided by simply choosing more varied variable names.

Finally, although ChatGPT initially impressed me with its knowledge of Flask syntax and conventions, it found ways to disappoint me in other areas I was more familiar with. It routinely broke PEP style conventions in writing python code and tended to favor javascript-based forms when simple HTML forms were sufficient. It also has limited knowledge about information beyond 2021, so it was working on outdated information, such as not knowing that the match statement was implemented in Python and using var instead of const or let in Javascript. In one particular instance, it led me to some incorrect implementations, as it told me that, in my Docker configuration, I need to expose the ports for all of my microservices to the host machine and then tell each service to communicate through the host machine. The proper solution for my use case is to use the service name in the URL and have each service communicate directly with the other services. Trying to send requests to the host machine is convoluted at best and failed to work when I tried to implement it. I would have spent hours trying to implement this solution that ChatGPT was leading me toward if I had not spoken with a colleague about the best approach and had him walk me through how the Docker services should actually be set up. (Thanks, Evan Matizza!)

What were the major issues (the "ugly")?

As much as I liked working with ChatGPT, some major issues came up during the project. The most severe of these issues was that ChatGPT can be confidently incorrect and still logically explain their answer even though it's completely wrong. As an example, when I was working on the service for calculating how much powder and water to mix to make formula, I asked it to make a function that would calculate how much powder and water I should mix to prepare the formula. It explained that since one scoop is 20 calories, I can use the recipe below to calculate the correct calorie density and the amount of water to mix with the powder to prepare the formula. It did not ask any clarifying questions or require further input; it simply stated that its recipe would work. The logic made sense, but I was sure to double-check the math with other accurate services that discussed the calorie density of formula and found that the ChatGPT had errors in its calculation. One scoop of formula contains 45 calories, so the function ChatGPT created for me would result in making more than double the calorie density of what I needed. In the past, we have had issues with feeding formula that's slightly too dense to my baby, which has resulted in significant constipation and other digestive problems; if I had blindly followed the recipe ChatGPT provided, I could likely have seriously harmed my baby. When working with ChatGPT, it's important to remember that it could sound 100% confident in its answers, even if there are severe calculations or logical flaws, so verifying the answers with other sources you trust is essential.

Besides that serious issue, there were several times when ChatGPT gave me an answer that was not entirely wrong but was not the right way to approach the problem because of a lack of context. For example, I received an error when I tried to save a string into a database field expecting a bytes object. When I passed the error message along to ChatGPT, it explained that I needed to convert my string into a bytes object and provided code. However, after I looked at the setup and thought about the problem, I realized that the field in my database was set up wrong and that I intended to make it a string field. So, I changed the field type and everything worked. ChatGPT gave me an answer that would have solved the problem, but it wasn't the correct answer overall because it lacked the context of what I had done and my true goals.

I also had other instances where context was lacking, such as when it told me that I should be returning the database entry directly from my function, but in the context of my application, that function was supposed to return a JSON response. So I needed to return a response object with the database entry jsonified and attached to that response. In another question, I had asked it to make a form that would delete an entry. It made a form that sends a POST request to my endpoint, but then in the suggested code for my endpoint, it set it up to only accept DELETE requests. If I just copy-pasted this code into my app, it would not work, and I would have to spend time tearing apart the suggestion that ChatGPT provided in one cohesive response because that one response was not internally consistent. Even when directly implementing code that ChatGPT suggested, it would sometimes just not work and require the user to be able to figure out what ChatGPT's mistake was, which could take more knowledge than is required to write it from scratch by themselves.

Finally, I find it difficult to recommend to users with no context of what is going on in the responses it provides. I was fortunate to have a solid foundation in Python and a decent understanding of Flask before starting this project because it takes at least a fundamental understanding of how the language works to piece together the different responses that ChatGPT provides. If you ask it to solve one problem, it might be able to provide all the code necessary for that one solution; but as your application grows and you need to piece together code from different responses, you need to understand what to keep, what to delete, and what to replace. In many of the responses, I would have boilerplate code or general setup code that was repeated but a little differently or with different variable names; if you combined both responses, you might end up duplicating steps at best or breaking your existing code at worst. In one of my inquiries, I could logically piece together the code from multiple responses, but the code stopped working and said there were configuration issues with the database. I asked ChatGPT for help in a few different ways; however, all of its responses just told me that I needed to delete the database and call db.create_all() to reset it, which did not work. I did a simple google search for my issue. The first result was a forum post about how you need to call db.create_all() only after you import all of your models; otherwise, the database will not be appropriately created; once I refactored my code based on that suggestion, it worked right away. I found other forum posts and articles from 2019 that discussed my exact problem, so I was surprised that ChatGPT could not figure out how to solve my problem. Whether it was a lack of context, a lack of understanding, or a lack of source material that answered the problem, it was clear that there are some problems that ChatGPT won't know how to solve. It is up to the user to diagnose and understand the problem independently, even when using the code ChatGPT provided.

Would you recommend ChatGPT for writing code?

Considering all the good, bad, and terrible examples above, I have three answers to whether I recommend using ChatGPT for writing code.

Would I recommend ChatGPT for conceptual questions or questions about overarching themes? Yes!

ChatGPT is excellent at explaining overall concepts with clear words and examples to back up what it is saying. Someone with good general knowledge who wants to learn more can quickly get their questions answered by material that is on-topic and to the point. Suppose you have questions about translating a concept from one language to another or doing everyday tasks within a language. In that case, I have a decent amount of confidence in ChatGPT's ability to answer those questions. Furthermore, questions you expect to have many answers online, such as questions about common algorithms or programming functionality, should be safe to ask ChatGPT. I always recommend double-checking your answers with verified sources online. Still, it can be a great place to get quick and straightforward answers to general, conceptual questions or as a starting point for research and learning.

Would I recommend ChatGPT for code examples and specific implementations? Maybe.

The code examples I asked for during this project were helpful, and I could either use the entire snippet and modify it myself or take parts that I needed and leave the rest behind. It was beneficial for boilerplate or initial examples or when I needed to write a file and knew I would need some standard things and only needed to tweak a few settings. It felt lacking in areas where the technology is newer or tends to change quickly, such as with new language features in Python or with JavaScript in general. As a rule of thumb, if I worried about it changing in the past 3-5 years, I would take anything ChatGPT said with a grain of salt. They may include more recent data as the service evolves. Still, even so, you're less likely to have an abundance of source material to draw on if you're asking questions about a language feature that only recently came out. Furthermore, it can make some specific choices in design patterns and library usage without telling you that there are alternatives or why you might choose an alternative. It is up to you to spot places where there might be other choices that you can make and ask ChatGPT about the alternatives.

Would I recommend ChatGPT for solving problems from start to finish? No!

Although ChatGPT strongly believes that it can solve entire problems, there have been a few instances in this project where it has proven to be false. Furthermore, it may be making assumptions about your goals or the technology, which may not be true, and it will not tell you about those assumptions when it boldly proclaims that it has solved the problem. ChatGPT is missing the critical ability to self-reflect on its answers and explain nuances such as side effects, further questions for the user, and any indication of self-doubt. These skills are critical for any software developer who wishes to provide correct answers; if you want to be sure, then you have to seek out the uncertainty and ensure that all underlying assumptions and foundational knowledge are at least identified, if not challenged and verified.

Overall, ChatGPT is a very powerful tool and can be extremely helpful in the software development process, but it requires a keen eye and strong attention to detail to be used effectively. Although the temptation is there just to ask it questions and take its answers at face value, ChatGPT users must remain diligent and be sure to interpret and verify all responses before putting the code into use. I am very impressed by the state that the service is in currently, and I look forward to seeing it improve and to all the great things you can create when working with ChatGPT!

Note: Since working on this project, the team at OpenAI has been improving ChatGPT's ability to ask questions and get clarification around code issues. Although I have not seen this feature personally, I am excited to see how ChatGPT evolves and can better address my abovementioned concerns. There may be a part 2 to this article as new features emerge!

More from Lofty