“Generative AI learning, concerns over copyright infringement and data depletion… Prevention through procurement market”

by times news cr

2024-03-25 21:30:21

ⓒ News1

A specialized organization under the British Parliament published a report stating that copyrights may be damaged during the generative artificial intelligence (AI) learning process.

As a solution to this problem, a procurement market was proposed where data can be traded fairly in both places.

It is predicted that high-quality language data will be depleted by 2026, so the importance of the data market is expected to increase.

According to the Korea Copyright Commission on the 25th, the British House of Lords’ Communications and Digital Committee released a report on ‘Large Language Models and Generative AI’ last month.

Previously, the committee listened to opinions from various stakeholders, including AI companies and data rights holders. Concerns have been raised that large language models (LLMs) may infringe on data copyrights during the advancement process, such as learning.

AI companies such as Microsoft argued that preferential treatment should be given to the use of technology for learning purposes based on British copyright law. On the other hand, Getty Images, an American photo agency, opposed this claim, saying that AI learning should also seek prior permission from the data owner.

Getty Images criticized the opt-out policy adopted by some companies. This is a method that implicitly allows AI companies to use data, but later withdraws it if the copyright holder does not wish to do so. Getty Images pointed out that it is difficult for generative AI to undo what it has already learned.

The committee proposed a ‘data procurement market’ as a solution. This problem will be resolved if AI companies pay a fair price for data licenses and receive quality data.

The committee suggested, “AI companies must reveal where information on the web was searched and collected (web crawling) so that copyright holders can verify it,” and added, “The government has a responsibility to end related disputes through enactment and revision of laws.” .

He also said that copyright protection should be stipulated through the ‘AI Copyright Code of Conduct’ scheduled to be released this year.

Such strengthening of regulations could help AI companies from a long-term perspective.

This is because it is predicted that high-quality data needed for learning will soon run out. Once the data procurement market is established, data supply can become active.

AI company Epoch researchers also published related research results on the 2022 paper pre-release site archive (arXiv). According to research, high-quality language data essential for model advancement will be exhausted by 2026. The growth rate of linguistic data stocks will also slow from the current 7% per year to 1% by 2100.

There is also a movement to prepare for this at the industry level.

In Korea, Upstage launched the ‘1T Club’ last year and collected data with the goal of 1 trillion tokens (minimum meaningful unit of data). The purpose is to improve the Korean language skills of the LLM, which is advanced with a focus on foreign languages, while also resolving copyright issues. In return for the provision, the partner company received benefits related to the use of Upstage’s application programming interface (API).

An Upstage official explained, “We were able to collect enough high-quality data necessary for LLM advancement.”

(Seoul = News 1)

Hot news now

2024-03-25 21:30:21

You may also like

Leave a Comment