ChatGPT does not know 20% of the Spanish lexicon and makes mistakes in the remaining 80%

by time news

2023-11-14 10:45:05

According to data from OpenAI—the company that created it—every week, one hundred million people use this chatbot to do language-related tasks.

But although this tool is trained to maintain conversations and generate texts, it is known that it can produce answers that seem plausible, but are completely wrong.

To evaluate the performance of ChatGPT and its real capacity, a team of Spanish researchers has developed an application, called ChatWord, that allows evaluating the lexical knowledge of ChatGPT in several languages.

To do this, the team, made up of scientists from the Polytechnic University of Madrid (UPM), together with colleagues from the Carlos III University of Madrid (UC3M) and the University of Valladolid (UVa), used the words collected in the dictionary as reference. of the Royal Spanish Academy and those that appear in the Quixote by Miguel de Cervantes.

The study revealed that of the more than 90,000 words included in the dictionary of the Royal Spanish Academy, the ChatGPT3.5turbo model does not know approximately 20%, that is, about 18,000 words.

In addition, of the remaining 80% of the words in the dictionary and 90% of the words in the QuixoteChatGPT made errors in about 5% of the terms.

Very poor knowledge

The study recalls that a Spanish speaker recognizes 30,000 words on average, that is, almost a third of the Spanish lexicon, a figure that may seem poor compared to that of a machine, the authors warn.

“As often happens with artificial intelligence systems, all that glitters is not gold, and when analyzing the meanings that ChatGPT gives of the words, we see that there is a non-negligible percentage in which the meaning it indicates is incorrect,” he explains. Javier Conde, assistant professor at the Higher Technical School of Telecommunications Engineers (ETSIT) of the UPM and co-author of the work.

“Perhaps ChatGPT is not as wise today as it seems,” he adds.

Furthermore, the study recalls that large language models (LLMs), based on artificial intelligence and designed to process and understand natural language on a huge scale —⁠such as ChatGPT⁠— do not use words that do not they know

But for Pedro Reviriego, professor at the ETSIT and co-author of the research, the data is worrying because, if these systems only use the words they know, a scenario in which the newly generated content has an increasingly smaller number is “very feasible.” of different words”, and of little lexical richness, he warns.

The ChatWords app is a publicly accessible system, designed to be easy to use and expand.

Researchers want to evaluate other languages ​​and LLMs to better understand the lexical knowledge that AI tools have and how it evolves as new versions and tools appear.

#ChatGPT #Spanish #lexicon #mistakes #remaining

You may also like

Leave a Comment