ChatGPT × Stack Overflow as coding assistant

There is this study (arXiv:2308.02312v3) about ChatGPT as a software coding assistant when compared to Stack Overflow. I write code every day and although I have used ChatGPT in the past for this purpose, I have a bias towards believing that humans answering on Stack Overflow would be better. Apparently the study findings below confirm my beliefs. But maybe, just maybe, LLM tools can be specialized or improved to be better than the amazing Stack Overflow. Anyway, I would never rely on LLM for a complete app refactor or overall design, since it requires architectural knowledge, real life experience, lessons learned, context around and strategic vision, multidisciplinary features of grown up, real, human, experienced software engineers.

The findings:

1️⃣ ChatGPT’s answers are more incorrect, significantly lengthy, and not consistent with human answers half of the time. However, ChatGPT’s answers are very comprehensive and successfully cover all aspects of the questions and the answers.

2️⃣ Many answers are incorrect due to ChatGPT’s incapability to understand the underlying context of the question being asked. Whereas, ChatGPT makes less amount of factual errors compared to conceptual errors.

3️⃣ ChatGPT rarely makes syntax errors for code answers. The majority of the code errors are due to applying wrong logic or implementing non-existing or wrong API, Library, or Functions.

4️⃣ The Popularity, Type, and Recency of the SO questions affect the correctness and quality of ChatGPT answers. Answers to more Popular and Older posts are less incorrect and more verbose. Debugging answers are more inconsistent, but less verbose, and Conceptual and How-to answers are the most verbose.

5️⃣ Compared to SO answers, ChatGPT answers are more formal, express more analytic thinking, showcase more efforts towards achieving goals, and exhibit less negative emotion.

6️⃣ ChatGPT answers portray significantly more positive sentiments compared to SO answers.

7️⃣ Participants can correctly discern ChatGPT answers from SO answers 80.75% of the time. They look for factors such as formal language, structured writing, length of answers, or unusual errors to decide whether an answer is generated by ChatGPT.

8️⃣ Users overlook incorrect information in ChatGPT answers (39.34% of the time) due to the comprehensive, well-articulated, and humanoid insights in ChatGPT answers.

9️⃣ Participants preferred SO answers more than ChatGPT answers (65.18% of the time). Participants found SO answers to be more concise, and useful. A few reasons for SO preferences are – correctness, conciseness, casual and spontaneous language, etc.

I learned about this study from a post of Denis Baltor

Leave a Reply Cancel reply