Testing the artificial intelligence service deepai.org on its ability to follow the rules of formal logic

authors: Vasil Pihorovich, Andrii Samarskyi

Urgency of the research. The rapid progress of information technologies over the past few decades has significantly impacted many aspects of society. Special attention is drawn to the increasing use of artificial intelligence (AI). It is becoming more relevant and widely applied in various fields of science, education and technology, ranging from medicine to manufacturing automation. This poses a challenge for philosophers to conduct in-depth research on artificial intelligence and develop recommendations regarding its use.

Target setting. One of the key questions arising when considering the capabilities of artificial intelligence is its ability to adhere to the rules of formal logic. Formal logic is the foundation of rational thinking and is used to formalize thought processes and prove rules and laws. Therefore, it is essential to test how well artificial intelligence can perform tasks that require logical thinking.

Actual scientific researches and issues analysis. Research on large language models (LLM) compliance with the rules of formal logic has been conducted by various groups of scientists led by R. Nishant, J. Wan, M. Amirizani, T. Liu, and H. Chi. The work of S. Wang demonstrates that AI's mastery of the fundamental rules of inference still falls short of human capabilities and suggests approaches for testing and enhancing AI's logical skills. E. Larson shows why artificial intelligence cannot understand texts; however, he does not specifically investigate artificial intelligence's adherence to the rules of formal logic. B. Lin explores the use of mathematical methods to improve artificial intelligence accuracy. Research indicates that artificial intelligence can be trained to follow the rules of formal logic, but this process may require significant effort and resources.

The research objective. The task of this article is to test the artificial intelligence service at https://deepai.org/chat for its ability to adhere to the rules of formal logic, specifically solving problems related to modus ponens and modus tollens. Understanding this aspect will allow for a better understanding of the capabilities of language-based artificial intelligence and assist users in obtaining more accurate results.

The statement of basic materials. The article is dedicated to analyzing logical inferences, particularly the rules of modus ponens and modus tollens in the context of interaction with artificial intelligence (AI). Two examples are examined. In the first case, from two premises (“If it is raining outside, then the pavement is wet” and “The pavement outside is wet”), artificial intelligence makes an incorrect conclusion, asserting that “it is raining outside,” despite this being an incorrect application of modus ponens. The authors emphasize that common sense indicates that the pavement could be wet for other reasons. In the second example, concerning modus tollens, artificial intelligence reaches yet another incorrect conclusion. Again, it is noted that a correct conclusion can only be drawn from proper logical thinking schemes. Although artificial intelligence has succeeded in some logical tasks, the author considers this a coincidence rather than confirmation of logical thinking ability. Testing for errors in conclusions shows that even when artificial intelligence formulates conclusions incorrectly, it can indicate the need for additional information to make a correct inference. Thus, the article highlights the limitations of contemporary artificial intelligence models in understanding logic and underscores the necessity for critical thinking on the part of users when interacting with them.

Conclusions. To address the issue of logical errors in texts generated by artificial intelligence (AI), it is proposed to unite the efforts of specialists in logic and AI. It is important to develop programs that will automatically check the conformity of generated texts to the schemas of modus ponens and modus tollens. The schemas of these inferences can be formulated as mathematical models, facilitating the detection of errors. It is also essential to teach AI to identify antecedents and consequents in conditional statements. Moreover, it is vital to educate students on the fundamentals of formal logic and engage linguists in the development of control tools. Given the complexity and differences in languages, an interdisciplinary approach will be crucial for the success of such projects.

Keywords: artificial intelligence, formal logic, reasoning, language models, LLM, deepai.org/chat, modus ponens, modus tollens.

References:

1. Amirizaniani, M, Martin E, Sivachenko, M, Mashhadi, A & Shah, C 2024, ‘Can LLMs reason like humans? Assessing theory of mind reasoning in LLMs for open-ended questions’, in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pp. 34-44.

2. Chi, H, Li, H, Yang, W, Liu, F, Lan, L, Ren, X & Han, B 2024, ‘Unveiling causal reasoning in large language models: reality or mirage?’, in The Thirty-eighth Annual Conference on Neural Information Processing Systems. Available from : <https://openreview.net/pdf?id=1IU3P8VDbn>. [5 January 2025].

3. Larson, EJ 2021, The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do. Cambridge, MA: Harvard University Press.

4. Lin, В 2025, ‘Why Amazon is betting on ‘automated reasoning’ to reduce AI’s hallucinations’, The Wall Street Journal, 5 February. Available from : <https://www.wsj.com/articles/why-amazon-is-betting-on-automated-reasoning-to-reduce-ais-hallucinations-b838849e>. [7 January 2025].

5. Liu, T, Xu, W, Huang, W, Wang, X, Wang, J, Yang, H & Li, J 2024, ‘Logic-of-thought: injecting logic into contexts for full reasoning in large language models’, arXiv preprint arXiv:2409.17539. Available from : <https://doi.org/10.48550/arXiv.2409.17539>. [2 January 2025].

6. Nishant, R, Schneckenberg, D & Ravishankar, MN 2024, ‘The formal rationality of artificial intelligence-based algorithms and the problem of bias’, Journal of Information Technology, 39(1), pp. 19–40. Available from : <https://doi.org/10.1177/026839622311768>. [6 January 2025].

7. Wan, Y, Wang, W, Yang, Y, Yuan, Y, Huang, JT, He, P & Lyu, M 2024, ‘LogicAsker: evaluating and improving the logical reasoning ability of large language models’, in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 2124-2155. Available from : <https://doi.org/10.48550/arXiv.2401.00757>. [9 January 2025].

8. Wan, Y, Wang, W, Yang, Y, Yuan, Y, Huang, JT, He, P & Lyu, MR 2024, ‘A & b== b & a: triggering logical reasoning failures in large language models’, arXiv preprint arXiv:2401.00757. Available from : <https://arxiv.org/abs/2401.00757v1>. [9 January 2025].

9. Wang, S, Wei, Z, Choi, Y & Ren, X 2024, ‘Can LLMs reason with rules? Logic scaffolding for stress-testing and improving LLMs’, arXiv preprint arXiv:2402.11442. Available from : <https://doi.org/10.48550/arXiv.2402.11442>. [8 January 2025].

Testing the artificial intelligence service deepai.org on its ability to follow the rules of formal logic

📝