Meta, Apple, Bloomberg and other technology and media giants feed on the scripts of 53,000 films and 85,000 episodes of TV series to expand the database of their artificial intelligence, reveals an investigation by “The Atlantic” published on November 18, confirming Hollywood’s suspicions that have remained unproven until now.
“Imagine that I am there, at the end of this great ocean liner, with the wind ruffling my hair, and that you are there too, right in front of me. Do you feel it? I am the king of the world!”. This is what ChatGPT says if we ask it to speak like in the movie Titanic. Transfer of two legendary scenes from the film, the one in which the loving couple stands on the bow of the ocean liner, and the one in which Leonardo DiCaprio screams his most famous line at the top of his lungs (“I am the king of the world!”). It is on this observation of the agility of artificial intelligence in managing cinematographic references and dialogues, without the materials used to train them being officially known, that journalist Alex Reisner introduces his investigation published November 18 in the American magazine The Atlantic.
This reveals that these tools are fed with scripts from Hollywood productions to strengthen their database, confirming a generalized intuition in Hollywood, which no concrete element has so far been able to demonstrate. Meta, Apple, Nvidia, Salesforce, Bloomberg and other companies have trained their artificial intelligence with movie dialogues without too much difficulty: according to T.he Atlantic, including all the films nominated for an Oscar in the “best film” category between 1950 and 2016, several hundred episodes of Simpsonsa good part of Seinfeld et Twin peaksall seasons of The thread, The Soprano, Breaking Bad… In fact, it is not necessary to obtain the original scenes of the works, these contents are freely accessible on the site OpenSubtitles.org. Popular among fans of illegal downloads and fueled by contributions from Internet users, it hosts 9 million movie subtitle files in a hundred languages. A database”commonly used in industry», According to a spokesperson for the artificial intelligence company Anthropic.
These resources are certainly incomplete for artificial intelligence, compiled in bulk in a “14 gigabyte file containing short lines of dialogue”without any characters being named, and in such a way that it is impossible to know where a film begins and ends. But this material remains interesting because it transcribes countless ways of interacting between characters, relational dynamics, speech styles that the AI is then able to imitate.
Enough to revolt film professionals whose intellectual property is being infringed, while artificial intelligence companies are notorious for the opacity of the data sources they use without the consent of the authors. In this legislative Wild West, almost everything remains to be done in terms of protecting artists’ rights, more than a year after the historic social conflict in Hollywood which put the issue of artificial intelligence on the agenda, and while the causes brought against technology companies reveal the complexity of these new actors’ definition of plagiarism.
What are the ethical implications of using content from films and TV shows to train AI models?
Interview between the Editor of Time.news and AI Ethics Expert Dr. Ava Martinez
Editor: Welcome, Dr. Martinez! Thank you for joining us today. There’s been quite the stir recently, with the investigation by The Atlantic revealing that major tech companies like Meta and Apple have been using scripts from over 53,000 films and 85,000 TV episodes to train their AI models. What’s your initial reaction to this news?
Dr. Martinez: Thank you for having me! My initial reaction is a mix of concern and curiosity. This revelation confirms what many in the entertainment industry suspected—that large tech companies are leveraging cultural content without necessarily compensating the creators. It raises important questions about intellectual property rights and the ethics of AI development.
Editor: Absolutely. Hollywood has had its suspicions, but now we have concrete evidence. How does this practice impact the creators and writers in the film and television industry?
Dr. Martinez: The impact can be profound. Writers and creators are the backbone of the stories we cherish in film and television. If their work is being used to train AI systems without adequate recognition or compensation, it undermines the very foundation of creative industries. This could lead to a situation where the AI generates scripts that mimic human creativity without acknowledging the source material, further devaluing the original creations.
Editor: So, you’re suggesting this might lead to a kind of dilution of originality in storytelling?
Dr. Martinez: Precisely. If AI is trained on existing scripts, there’s a risk of it becoming a derivative tool rather than a creative partner. While AI can generate content that reflects existing narratives, it may struggle to produce genuinely innovative stories without human oversight and inspiration. The shortcut of using established works could result in a homogeneous output—something we definitely want to avoid in a field where diversity and originality should thrive.
Editor: You mentioned human oversight. What role do you see for writers in the age of AI? Can they coexist with these technologies?
Dr. Martinez: Absolutely! Writers and AI can coexist, but it requires a shift in how technology is utilized within the industry. Writers can harness AI as a tool to enhance their creativity, perhaps using it for brainstorming or to explore new narrative directions. However, for it to be a positive collaboration, there needs to be robust frameworks for protecting intellectual property and fair compensation for the original works.
Editor: That brings us to a crucial point—intellectual property rights. What changes do you foresee in this area due to the rise of AI in content generation?
Dr. Martinez: We are likely to see a push for more stringent intellectual property regulations tailored specifically for AI. This could involve clearer guidelines on how data can be sourced and used for AI training. Additionally, we might see the emergence of new licensing models that ensure artists are compensated when their work contributes to training datasets.
Editor: It sounds like a complex landscape ahead. Are there any examples of companies or initiatives that are pioneering ethical AI practices in the media to follow?
Dr. Martinez: There are a few noteworthy examples. Some organizations are already exploring ethical AI by creating datasets that respect creator rights and involve them in the process. Initiatives like Creative Commons licensing are also gaining traction to ensure that creators can choose how their work is used, even in AI applications. Moreover, companies that engage in transparency about their training data and algorithms will likely set themselves apart in this rapidly evolving landscape.
Editor: As we wrap up, Dr. Martinez, what advice would you give to creators in light of these developments?
Dr. Martinez: Stay informed and engage in the conversation about AI and its implications for your work. Advocate for fair practices and be proactive in understanding how AI can complement your creativity rather than replace it. Collaboration can lead to exciting new forms of storytelling if approached ethically. The future doesn’t have to be a battleground; it can be an avenue for innovation and partnership.
Editor: Thank you for those insightful remarks, Dr. Martinez! It’s clear that while AI holds immense potential, the balance between technology and creativity must be carefully managed. We appreciate your time and expertise today.
Dr. Martinez: Thank you! It’s been a pleasure discussing these critical issues.