How many words does ChatGPT know?

by time news

2023-11-13 14:21:28

Able to create original content from existing data, generative artificial intelligence (AI) applications have seen exponential development in recent months. There are millions of people who use them daily for the most diverse tasks. However, we have very limited knowledge of these tools. Until now, for example, we were unaware of something as basic as how many words of the Spanish language ChatGPT can identify.

Researchers from the Polytechnic University of Madrid (UPM), together with colleagues from the Carlos III University of Madrid and the University of Valladolid, set out to answer the question. To do this, they have developed the ChatWords appwhich allows evaluating the lexical knowledge that the artificial intelligence system has of different languages.

His initial study of more than 90,000 words contained in the dictionary from the Royal Academy of Language shows that the ChatGPT3.5turbo model ignores approximately 20%. And not only that. Of the remaining 80%, it offers erroneous meanings in some cases, around 5%.

The latest version of ChatGPT ignores approximately 20% of the Spanish lexicon and of the remaining 80%, it offers erroneous meanings around 5%.

To better understand the results, it is worth keeping in mind that a Spanish speaker recognizes 30,000 words on average, that is, almost a third of the Spanish lexicon. It may seem like poor data in front of the machine.

“But as often happens with AI systems, all that glitters is not gold, and analyzing the meanings that ChatGPT gives of the words, we see that there is a non-negligible percentage in which the meaning it indicates is incorrect,” he says. Javier Conde, assistant professor at the Higher Technical School of Telecommunications Engineers (ETSIT) of the UPM and one of the participants in the work. “Perhaps ChatGPT is not as wise today as it seems,” he adds.

Ensure lexical richness in AI

It stands to reason that large language models (LLMs), based on artificial intelligence and designed to process and understand natural language on an enormous scale, will not use words they do not know. For this reason, another concern arises.

The Spanish ChatWords application is open source and designed to be easy to use and extend

For Pedro Reviriegoco-author of the work and professor at ETSIT, a scenario in which the newly generated content has an increasingly smaller number of different words. Therefore, it is essential to guarantee lexical richness in the text created by artificial intelligence,” he maintains.

The ChatWords App It is open source and is designed to be easy to use and expand. The researchers’ next step is evaluate other languages and LLM to better understand the lexical knowledge that AI tools have and how it evolves as new versions and tools appear.

Their work is part of the Future Networks for Data Processing Centers and Operators project, financed by the State Research Agency, and is supported by OpenAI, the American laboratory responsible for ChatGPT, through its researcher access program. .

Rights: Creative Commons.

#words #ChatGPT

You may also like

Leave a Comment