With the accelerated development trend of the digital transformation and intelligent upgrade of the education industry, the education large model continues to explore comprehensive and in-depth integration with various aspects of education, fully empowering intelligent learning, intelligent teaching, intelligent scoring, and other core scenarios. To promote the healthy and sustainable development of the industry, the Artificial Intelligence Research Institute of the China Academy of Information and Communications Technology (hereinafter referred to as "CAICT") jointly compiled the standards for the education large model.
Recently, in the first round of evaluation organized by the CAICT, the MathGPT successfully completed the evaluation of the large model for education and obtained a 4+ certificate, becoming one of the first domestic enterprises to pass the evaluation and obtain the highest current rating. The evaluation of the education large model is based on the standard "Large-scale pre-training model technology and application evaluation methods oriented to the industry Part 3: Education Large Model", with a total of three capability domains, six sub-domains, and over 30 capability items. The standard focuses on the core needs of the education industry, forming an evaluation method for the maturity of the application of the education large model, which facilitates various parties to measure the application effectiveness of the education large model and promotes the upgrading and optimization of education large model products.
[Introduction to MathGPT]
Xueersi's MathGPT is a large language model independently developed by TAL Education Group,with problem-solving and explanation algorithms at its core. MathGPT features four core capabilities: automatic problem-solving in mathematics, correction of complex application questions, correction of compositions in Chinese and English, and personalized AI step-by-step explanations. The large model's powerful generation and understanding capabilities can not only solve students' personalized problems and map corresponding knowledge points from the questions they do not understand, supplemented by explanations, enabling students to have a comprehensive understanding, but also popularize to a wider range of students while containing a large number of high-quality teaching resources.
Figure 1 Xueersi "MathGPT" Interface
[Evaluation Overview]
The assessment metrics for educational LLMs cover Scenario Coverage, Functional Capabilities, Application Maturity, playing an important role in promoting industry development, enhancing technical influence, and standardizing services.
(1)Scenario Coverage: Focuses on the breadth of coverage of education model products, including subject support (mathematics, Chinese, physics, chemistry, etc.) and scenario support (knowledge retrieval, knowledge Q&A, self-learning, assessment and examination, etc.);
(2) Functional Capabilities:Focuses on task support and performance superiority of education large model, including computing, Q&A, analysis, creation, summary and other abilities;
(3) Application Maturity:Focuses on the safety, learnability, inspiration, and memorability of the education large model, including two dimensions of service diversity and service maturity.
Figure 2 Scope of EvaluationThis content is reposted from "Trusted AI Evaluation",
https://mp.weixin.qq.com/s/WpT_IqDih-5pztuaGcq4MQ?scene=25#wechat_redirect