Sci-Tech

The assessment report on the level of consciousness of the big language model shows that DeepSeeker R1 performs well in semantic consistency

2025-03-03   

On February 25th, the reporter learned from the International Artificial Intelligence DIKWP Evaluation Standards Committee of the World Artificial Consciousness Association that the "World's First Large Language Model Consciousness Level 'Recognition' White Box DIKWP Evaluation 2025 Report (100 question version)" (hereinafter referred to as the "Report"), led by the association and participated by more than 90 institutions and enterprises from more than 10 countries and regions around the world, was recently released. The core highlight of the report lies in the world's first consciousness level assessment system. The report is based on the DIKWP model and constructs a full chain evaluation system from the aspects of data, information, knowledge, wisdom, intention, etc. The test questions comprehensively cover the four modules of perception and information processing, knowledge construction and reasoning, intelligent application and problem solving, and intention recognition and adjustment of mainstream big language models, and systematically and quantitatively analyze the level of consciousness of mainstream big language models. The report conducted a comprehensive evaluation of the current mainstream big language models, including DeepSeeker V3 ChatGPT-o1、 Tongyi Qianwen -2.5 ChatGPT-4o、Kimi、 Wenxin Big Model 3.5 and Llama-3.1, etc. The evaluation results show that different models perform differently in different modules. For example, the perception and information processing section mainly examines the performance of the model in processing raw data, extracting information, and maintaining semantic consistency. ChatGPT-4o and ChatGPT-o1 perform well in data conversion and format processing, demonstrating stability. ChatGPT-o3-mini、ChatGPT-o3-mini-high、 Tongyi Qianwen-2.5, Kimi and Grok perform excellently in information extraction, especially in the path of data to information conversion. DeepSeek-R1, ChatGPT-4o, Kimi, and ChatGLM-4 Plus perform well in maintaining semantic consistency. The evaluation model for knowledge construction and reasoning assesses the ability to integrate information into knowledge and logical reasoning skills. The results showed that Tongyi Qianwen-2.5, ChatGLM-4 Plus, and ChatGPT-4o performed outstandingly. The evaluation of intent recognition and adjustment focuses on examining the model's ability to understand user intent and adjust output based on intent. The results showed that Doubao and Gemini-2.0 Flash Thinking Experimental performed well, accurately understanding users' questions and providing relevant answers. (New Society)

Edit:He Chuanning Responsible editor:Su Suiyue

Source:Sci-Tech Daily

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com

Recommended Reading Change it

Links