About 53 results
Open links in new tab
  1. Do Large Language Models Truly Grasp Addition? A Rule-Focused ...

    Feb 11, 2026 · Abstract Large language models (LLMs) achieve impressive results on advanced mathematics benchmarks but sometimes fail on basic arithmetic tasks, raising the question of …

  2. Large Language Models for Mathematical Reasoning: Progresses ...

    2 days ago · Abstract Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the …

  3. Other benchmarks, like C-EVAL (Huang et al., 2023), M3KE (Liu et al., 2023), etc., broadly include var- ious levels of math questions, such as elementary and advanced mathematics.

  4. Evaluation at the level of gradeschool mathematics tells us that these models are still error prone, but does not necessarily tell us how close they are to usefulness in tutoring other subjects (either more …

  5. This paper introduces a novel proactive prompting paradigm, instantiates it with the simple TBYS reasoning framework, and veries the effectiveness of TBYS on challenging advanced mathematics …

  6. Abstract Our study explores how well the state-of-the-art Large Language Models (LLMs), like GPT-4 and Mistral, can assess the quality of scientific summaries or, more fittingly, scientific synthe- ses, …

  7. Previous studies have revealed that incorporat- ing GMSL positively and lastingly inuences stu- dents' academic performance; for example,Yea- ger et al.(2019) found that exposing students to a growth …