AI model that infers has emerged… 83% correct answer rate in math competition

by times news cr September 14, 2024

September 14, 2024

OpenAI Launches Inference-Specialized Model ‘o1’
Incorporate step-by-step thinking into your algorithms
Top 11% performance in international coding competitions
Evaluation of “close to human-level general-purpose AI”

OpenAI has released a new artificial intelligence (AI) model called ‘OpenAI o1’ that has the ability to reason. Reasoning ability means synthesizing various pieces of information to judge and solve problems on its own. O1 is evaluated to be close to human-level AI, artificial general intelligence (AGI).

According to the Financial Times and others, OpenAI released the o1 model, which specializes in inference, on the 12th (local time). Inference functions are essential in fields such as reasoning, mathematics, and science, where you have to think step by step to get answers. Professor Kim Myeong-ju of the Department of Information Security at Seoul Women’s University explained, “In the case of OpenAI’s existing ChatGPT model, to get answers to questions that require inference, you had to ask appropriate questions step by step, like a person doing a leading question.” “With o1, that work is included in the algorithm.”

According to the information disclosed on the OpenAI blog, the o1 model achieved an 83% accuracy rate in the International Mathematics Olympiad preliminary exam. The previous model, GPT-4o, only achieved a 13% accuracy rate. o1 also achieved a score in the top 11% in an international coding competition that evaluates coding skills. It achieved a 78% accuracy rate in science-related questions such as physics and chemistry, showing a level similar to that of a doctoral student.

OpenAI also released a video showing o1’s reasoning ability. When asked, “How many ‘r’s are there in Strawberry?” it answered, “Three.” It solved complex puzzles that existing AI models could not solve step by step. It even figured out the meaning of Korean sentences that Koreans have difficulty understanding and translated them into English.

“Previous models like ChatGPT would start answering questions immediately when asked, but this model can take a while,” said Jakub Pahotsky, OpenAI’s chief scientist. “It thinks about the problem in English, analyzes it, finds angles, and comes up with the best answer.” OpenAI CEO Sam Altman called the model “a new paradigm: AI that can reason about general-purpose, complex problems,” he added, though he added that the technology “still has its flaws and limitations.”

Some say that we need to wait and see whether the actual o1’s reasoning ability is at the level expected by the academic community. Professor Gary Marcus of New York University said, “I have seen many reasoning functions fall apart after careful review by the scientific community. I will be skeptical of new claims.”

OpenAI also released a smaller model, the ‘o1-mini’, along with the basic o1 model. It is characterized by its smaller size and faster speed compared to the o1. ChatGPT Plus and Chat Teams service subscribers can use the o1 service starting on the 12th.

Reporter Choi Ji-won [email protected]

Hot news right now

2024-09-14 16:10:04

previous post

Baekhyun, 4th mini album sold 1.09 million copies in the first week… own record

next post

Choo Kyung-ho: “I am very sorry about the plenary session on the 19th… I cannot cooperate with the agenda”

AI model that infers has emerged… 83% correct answer rate in math competition

OpenAI Launches Inference-Specialized Model ‘o1’ Incorporate step-by-step thinking into your algorithms Top 11% performance in international coding competitions Evaluation of “close to human-level general-purpose AI”

Related

Baekhyun, 4th mini album sold 1.09 million copies in the first week… own record

Choo Kyung-ho: “I am very sorry about the plenary session on the 19th… I cannot cooperate with the agenda”

You may also like

Leave a Comment Cancel Reply

OpenAI Launches Inference-Specialized Model ‘o1’
Incorporate step-by-step thinking into your algorithms
Top 11% performance in international coding competitions
Evaluation of “close to human-level general-purpose AI”