Exploring the performance of ChatGPT for numerical solution of ordinary differential equations

EXPLORING THE PERFORMANCE OF CHATGPT FOR NUMERICAL SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS

Saso Koceski , Natasa Koceska , Limonka Koceva Lazarova ,
Marija Miteva* , Biljana Zlatanovska

Goce Delcev University, Stip (Macedonia)

Received February 2024

Accepted September 2024

Abstract

This study aims to evaluate ChatGPT’s capabilities in certain numerical analysis problem: solving ordinary differential equations. The methodology which is developed in order to conduct this research takes into account the following mathematical abilities (defined according to National Centre for Education Statistics): Conceptual Understanding, Procedural Knowledge, Problem Solving, and Application in Real‑world Contexts. The outcomes demonstrate that ChatGPT’s performed very well for the set tasks, and it also gives promising results for programming code generation, with certain limitations. The effectiveness and accuracy of the answers and solutions obtained by ChatGPT are related to the type of equation, i.e., how complex it is, and also with the instructions we give to ChatGPT. It also requires further improvement of the machine learning model and the ability to provide an explanation of how the output was obtained.

 

Keywords – Artificial intelligence (AI), Differential equations, Numerical solution, ChatGPT.

To cite this article:

Koceski, S., Koceska, N., Lazarova, L.K., Miteva, M., & Zlatanovska, B. (2025). Exploring the performance of ChatGPT for numerical solution of ordinary differential equations. Journal of Technology and Science Education, 15(1), 18-34. https://doi.org/10.3926/jotse.2709

 

----------

    1. 1. Introduction

According to the definition of derivative of function, derivatives are based on certain change (change in the value of independent variable and the change in the value of function), thus because all the processes in the nature and also physical phenomena are based on particular change, wide range of them can be described with differential equations. This makes differential equations widely applicable in different fields (Abed & Khare, 2013; Lazarova, Stojkovikj, Miteva & Stojanova, 2021; Goyal, Kulczycki & Ram, 2022; Koceska & Koceski, 2022; Loginova, 2020; Mishi, Sabari, Amos, Egbogu, Kuje & Ojosipe, 2020; Momoniat, Myers, Banda & Charpin, 2012).

Although certain methods for analytical solution of different types of differential equations exist, also many numerical methods are used in solving differential equations. These methods are convenient when the differential equations are complex and cannot be solved analytically. Numerical methods are also used to simulate the behavior of physical systems and predict their performance under different conditions. The most often used methods in numerical analysis for determining solution of ordinary differential equations (ODEs) are (Atkinson, Han & Stewart, 2009): Picard’s method, the Taylor series method, Euler’s method, the Runge-Kutta method, Milne’s method, the Adams-Bashforth method. Each of the mentioned methods has its own advantages and limitations. Which method is the most suitable for use, depends on the characteristics of the differential equation and the level of accuracy required for the problem at hand.

Finding numerical solution and writing numerical algorithms in order to solve numerically differential equations, is not always easy to do, and it requires an exceptional mathematical as well as programming skills (Rangelov, Dineva & Manolis, 2020). An attempt to overcome this problem is to use the current potential of Natural Language Processing (NLP) and Artificial Intelligence (AI), to automatically generate numerical solutions. For several years now, AI has been used to find solutions to various problems in different fields (Loshkovska & Koceski, 2015) such as: medicine (Kocev, Achkoski, Bogatinov, Koceski, Trajkovik, Stevanoski et al., 2018; Stojanov & Koceski, 2014; Trajkovik, Gjorgievska, Koceski & Kulev, 2014), biology (Stojanov, Mileva & Koceski, 2012; Terzievska, Todorov, Miteva, Doneva, Dyankova, Metodieva et al., 2020), engineering (Stamov, 2021), robotics (Koceska, Koceski, Beomonte, Trajkovik & Garcia, 2019; Koceski, Panov, Koceska, Beomonte & Durante, 2014; Koceski & Koceska, 2016), tourism (Koceski & Petrevska, 2012) as well as economy (Koceska & Koceski, 2014).

Among the various AI-based models, ChatGPT (ChatGPT website, n.d.) is nowadays one of the most successful and widely used. It is a language model developed by OpenAI (OpenAI website, n.d.), trained on a large corpus of text data (including mathematical problems and concepts), to follow an instruction and to provide detailed and accurate answers on various questions. A main advantage of ChatGPT over other existing language models, is its ability to learn and understand instructions given by the user and respond appropriately. ChatGPT actually behaves like a human who is constantly upgrading and learning new things. The ChatGPT performances have been explored on different tasks: for text generation (Khalil & Er, 2023; Fu, Teng, Georgaklis, White & Schmidt, 2023, Jiao, Wang, Huang, Wang, & Tu, 2023), text classification (Kuzman, Mozetic & Ljubesic, 2023; Amin, Cambria & Schuller, 2023; Huang, Kwak & An, 2023), code generation (Megahed, Chen, Ferris & Jones-Farmer, 2023; Sobania, Briesch, Hanna & Petke, 2023), quality assessment (Kocmi & Federmann, 2023; Wang, Liang, Meng, Shi, Li, Xu et al., 2023) etc.) and in various fields (medicine (Tu, Ma, & Zhang, 2023; Lederman, Lederman & Verspoor, 2022), healthcare (Nov, Singh & Mann, 2023), physics (West, 2023; Kortemeyer, 2023; Lehnert, 2023), education (Cotton, Cotton & Shipway, 2023; Tapalova & Zhiyenbayeva, 2022; Kumar & Boulanger, 2020; Choi, Hickman, Monahan & Schwarcz, 2023) etc.

In the field of mathematics, Frieder, Pinchetti, Griffiths, Salvatori, Lukasiewicz, Petersen et al. (2023) tested ChatGPT math capabilities on datasets that were publicly available and on hand-crafted ones (GHOSTS and miniGHOSTS - created using novel methodology). The results suggest that the mathematical knowledge and skills of an average math graduate student are higher than ChatGPT’s math capabilities. Shakarian, Koyalamudi, Ngu and Mareedu (2023) explored the ChatGPT capabilities on Math Word Problems (MWPs), and concluded that ChatGPT’s capabilities varies significantly depending on the given task and on the requirement to demonstrate how the solution to that task was arrived at. Pardos and Bhandary (2023) used the Open Adaptive Tutoring system (OATutor) to examine the efficacy of generated answers by ChatGPT in learning algebra. According to the research 70% of answers passed manual verification and can contribute to positive learning outcomes. But these scores were lower than the scores of human prompts. In (Dao & Le, 2023) authors have been exploring ChatGPT’s math capabilities answering questions with multiple-choice, used for the Vietnamese National High School Graduation Exam (VNHSGE). Questions were prepared for different subjects and different difficulty levels. The results show that ChatGPT performed best on SAT Math competition (with the 70% success rate), followed by VNHSGE mathematics (58.8%). For the other exams the success rate, according to the authors, was lower. In his research, Borji (2023) investigate ChatGPT’s failures grouped into several categories, including mathematics, logic, and reasoning, among others.

Although exploring ChatGPT capabilities and testing its application in various fields are the focus of many researchers nowadays, to the best of authors’ knowledge, a prior study for exploring ChatGPT capabilities in finding numerical solution of differential equations has not been previously published. Koceska, Koceski, Lazarova, Miteva and Zlatanovska (2023) has conducted initial research to test the ChatGPT’s performances in numerical analysis with an emphasis on determining numerical solution of ODEs. In this study, an evaluation methodology that will evaluate ChatGPT’s performances in determining numerical solution of first and second order ODEs, from several aspects, is presented. It considers the free version of ChatGPT chat-bot program and the existing theories for learning and developing mathematical skills.

Namely, our methodology starts from the basic presumption that knowledge development is not a linear process; rather, it usually includes several phases among which there are: Conceptual Understanding, Procedural Knowledge, Problem Solving and Application in Real-world Contexts. Each of these phases contributes to development of specific abilities and skills. Proposed methodology tries to evaluate ChatGPT’s developed knowledge about numerical solution of ordinary differential equations of first and second order, using carefully tailored questions based on clearly defined indicators.

In the rest of this paper the methodology and evaluation process are described, and the obtained results are presented and discussed. 

2. Methodology

Development of mathematical knowledge refers to the process by which individuals acquire and advance their understanding of mathematical concepts, principles, and problem-solving techniques. Therefore, comprehending of any specific mathematical topic, usually includes several phases among which are:

  • Conceptual Understanding: It involves grasping fundamental mathematical concepts and understanding their definitions. 

  • Procedural Knowledge: This aspect involves the ability to perform mathematical operations, algorithms, and procedures accurately and efficiently.  

  • Problem Solving: Mathematical knowledge is not merely about memorizing formulas or procedures but also about using that knowledge to evaluate a solution of different problems.  

  • Application in Real-world Contexts: The ultimate goal of mathematical learning is to apply acquired knowledge and skills to real-world problems. It usually involves critical thinking i.e., making conjectures, logical reasoning and justifying claims. 

According to the existing theories for evaluation of developed mathematical knowledge in students, we have developed a methodology for evaluation of ChatGPT’s knowledge about abilities to solve numerically ordinary differential equations of first and second order. Proposed methodology tries to challenge ChatGPT on the given topic, using carefully tailored prompts, trying to reveal its maturity in each of the previously mentioned phases.

2.1. Conceptual Understanding and Procedural Knowledge

In order to evaluate the conceptual understanding and procedural knowledge a test was built based on the indicators of the conceptual understanding and procedural knowledge ability given in Table 1.

The test for evaluation of conceptual understanding was composed of 25 multi-choice and essay type of questions while the test for evaluation of procedural knowledge contained 20 multi-choice questions (MCQ). Each MCQ consisted of a stem and options. If there was a need for an auxiliary information, it was usually included in the stem. The biggest constrain in the process of questions’ definition was the fact that they should be defined using text only (without images or any other graphical representations). Moreover, in order to avoid mismatches and problems with more complicated formulas we have used ChatGPT Equation Renderer extension for Chrome that enables correct display of equations in ChatGPT using Latex notation. However, one of the most challenging tasks was formulation of multi-choice options that include one correct answer and multiple distractors. For definition of effective distractors, we have applied the domain knowledge and have applied common methodology that focuses on a selection of misconceptions or errors in thinking, reasoning, and problem solving by evaluating student answers on these or similar problems in the past.

 

Conceptual understanding indicators

Procedural understanding indicators

1.

Ability to restate the concept of numerical methods for solving ordinary differential equations

Ability to use and utilize certain numerical methods

2.

Ability to recognize, distinguish and classify the methods and approaches

Ability to apply numerical methods in particular situations

Table 1. Conceptual and procedural test indicators

The answers generated by ChatGPT were evaluated independently by three experts on a 5 points Lickert scale, where 5 is the highest grade. Final mark for every problem was obtained as an average of their grades. So, not only correctly selected option but, also the rational explanation behind it was evaluated.

2.2. Problem Solving Capacities

The ability to solve problems enables individuals to analyse complex situations, devise strategies, and arrive at logical solutions. In mathematics, problem-solving goes beyond mere calculation; it involves critical thinking, creativity, and perseverance. For evaluation of ChatGPT skills for numerical solutions of ODE a specific methodology for construction of problems dataset was developed and applied. The steps of the research process are shown on Figure 1.

 

Figure 1. Flowchart of the evaluation methodology

For the purpose of this research, we have extracted ordinary differential equations of first and second order from various academic books and textbooks (Lebl, 2022; Strang, 2022; Nagy, 2021; Nagle, Edward & Snider, 2012; Tenenbaum & Pollard, 1963; Bayen, Kong & Siauw, 2020; Johansson, 2015), and also written our own equations, in order to create the initial dataset. The three experts have reviewed this dataset and reduced the content to 100 equations (50 of them were first order and other 50 second order equations).

After creation of the initial dataset, we have started the conversation with ChatGPT. The answers generated by ChatGPT were evaluated by our experts, independently, on a scale from 1 to 5, where 5 is the highest grade. Final mark for every problem was obtained as an average of their grades.

2.3. Capability to Apply the Knowledge in Real-World Contexts

Contemporary theories of learning, state that the process of learning goes far beyond simple acquiring of information. It aims at development and nurturing of thinking abilities at higher level. This means the capacity to think more than remembering facts and emphasizing meaning to obtain solutions to problems by analysing, evaluating, and creating. According to Polaya (2004) problem-solving is the highest level of higher-order thinking ability by combining creative thinking with critical thinking. Capacity to apply the gained knowledge for solution of variety of mathematical problems is usually considered one of the main indicators and factors that reflects the quality of learning. However, solving real-world problems, goes one step beyond as it requires establishment of strong mathematical connections and using appropriate math concept or apparatus that fits properly with conditions given in the problem. Therefore, in order to evaluate the capabilities of ChatGPT on solving real-life problems using numerical methods for solution of ODE, we chose the problem described by the Newton’s law of cooling.

Newton’s law of cooling is a physics law stating that when an object with temperature T(t) at time t is placed in a surrounding with temperature Ts, then the rate of change of T at time t is proportional to T(t)−Ts. Therefore, this process could be described with the following differential equation:

 

(1)

Constant k is known as decay constant. In this context, k>0, since the temperature of the object must decrease if T>Ts, or increase if T<Ts.

To solve real-world problems described with this law, it is necessary to establish strong mathematical connections and to pass through series of processes such as: understanding the problem, modelling the problem, proposing methodologies or solution(s), application, and drawing conclusions.

The following methodology was applied: 1) clear presentation of the problem, 2) formulating the hypothesis, 3) proposing solution(s), 4) testing the solution, 5) drawing conclusions.

Simple conversation with ChatGPT was performed. Every answer was evaluated independently by three human experts using the following Likert (1-5; 1 being lowest and 5 highest value) scoring guidelines:

Score

Procedural knowledge indicators

1

No answer

2

Limited understanding of the problem, inability to recognize the components, connections are not clear, incomplete solutions

3

Fair understanding of the problem, able to recognize the components, but with less understanding, connections are not clear, incomplete solutions

4

Complete understanding of the problem, able to recognize the components with clear understanding, connections are not complete, incomplete solutions

5

Complete understanding of the problem, able to recognize the components with complete understanding, connections are fully established, solutions are complete and systematic

Table 2. List of scores and corresponding criteria for evaluation of real-world problem-solving capabilities

3. Evaluation

Entire evaluation process was conducted using the free research preview version of ChatGPT (release July 20, 2023). The evaluation process was conducted manually through ChatGPT’s user interface using Chrome browser.

All the questions aimed at evaluation of ChatGPT’s conceptual understanding and procedural knowledge were posed to ChatGPT in a single session. The stem and the options for each question were submitted as a prompt to ChatGPT. Answers on multi-choice questions, obtained from ChatGPT, besides the chosen option contained a short rational explanation. While the essay questions were answered in a narrative format. Part of the conversation can be seen in the following figures:

 

Figure 2. Part of the conversation for evaluation of procedural knowledge

 

Figure 3. Sample conversation for evaluation of conceptual knowledge – multi-choice question

For evaluation of problem-solving capabilities, each of the selected problems was subject of separate conversation. Conversations with ChatGPT consists of the following parts:

  • A prompt containing definition of the problem i.e., the differential equation, initial conditions, identification of particular numerical method (Picard’s method, Taylor’s series method, Euler’s method, Milne’s method, Runge-Kutta method or Adams-Bashforth method) that should be used, as well as requirement to find the value of the function at given point; 

  • A prompt with a requirement for generation of programming code corresponding to the solution. 

All the conversations with ChatGPT were conducted within the same session (one after another). Regardless of the method used, ChatGPT was asked to generate a Phyton code. Once the code was generated, it was transferred by the user to Google Colaboratory in order to be evaluated. Part of the conversation can be seen in the following figures: