Mastodon

NVIDIA is consuming a lifetime of YouTube per day and they probably aren’t even paying for Premium!

Posted by Matt Birchler
— 1 min read

Samantha Cole on the always lovely 404 Media: Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI

When asked about legal and ethical aspects of using copyrighted content to train an AI model, Nvidia defended its practice as being “in full compliance with the letter and the spirit of copyright law.” Internal conversations at Nvidia viewed by 404 Media show when employees working on the project raised questions about potential legal issues surrounding the use of datasets compiled by academics for research purposes and YouTube videos, managers told them they had clearance to use that content from the highest levels of the company.

I’m not a lawyer (something I find myself having to say more often these days), but it does feel slimy to use data that was collected with permission to perform academic research and then use it to do something totally different (and far more commercial). Again, not sure on the legality, but it feels to me like when you send someone a private photo and then they post that photo to social media without asking you if you’re okay with it.

Then this bit caught me as relevant to the whole LLM convo:

Slack messages from inside a channel the company set up for the project show employees using an open-source YouTube video downloader called yt-dlp

yt-dlp is a great tool that lets you download personal copies of videos from many sites on the internet. It’s a wonderful tool with good use cases, but it also made it possible for NVIDIA to acquire YouTube data in a way they simply could not have without it. I bring this up because one of the arguments I hear from Team “LLMs Should Not Exist” is that because LLMs can be used to do bad things, they should not be used at all.

I personally feel the same about yt-dlp as I do about LLMs in this regard: they can be used to do things that aren’t okay, but they are also benevolently used to do things that are useful. See also torrents, emulators, file sharing sites, Photoshop, social media, and just like…the internet itself. I’m not saying LLMs are perfect by any means, but this angle of attack doesn’t do much for me, personally.