Skip to player
Skip to main content
Skip to footer
Search
Connect
Watch fullscreen
Like
Comments
Bookmark
Share
Add to Playlist
Report
'Is This Knowledge Of Illegal Activity?': Josh Hawley Torches Meta For 'Pirating' Books To Train AI
Forbes Breaking News
Follow
yesterday
During a Senate Judiciary Committee hearing on Wednesday, Sen. Josh Hawley (R-MO) shows chat logs from Meta employees showing employees allegedly expressed doubts about pirating materials to train artificial intelligence models.
Category
🗞
News
Transcript
Display full video transcript
00:00
Thank you very much, Professor. Thanks for being here. Thanks again to all of our witnesses.
00:03
We're going to now have seven-minute rounds of questioning, and we'll see if we can fit in
00:10
maybe a couple of rounds, just depending on the time that we have. I'll start, and then we'll go
00:13
to the ranking member and any other members who arrive in that time. Professor Viswanathan,
00:19
let me just start with you, if I could, and let's see if we can just drill down on some of the
00:22
specifics here. Mr. Baldacci mentioned in his opening statement that AI could just feed dictionaries
00:29
into their platforms in order to train them. They don't do that. They prefer published works,
00:34
fully formed works. Why is that? Can you give us an insight into that? That's absolutely right.
00:39
They learn syntax, structure. They learn how we learn language, right? When you learn language,
00:45
you just don't learn words. You don't memorize words. You don't memorize notes when you learn music.
00:49
You learn structure and syntax, and the point that Professor Lee is making is correct. They need
00:55
large data sets. More is better to learn predictive language models. However, more is not everything.
01:06
It's not pirated works. So let me just ask this. You said that they are not buying the books. They're
01:14
not buying Mr. Baldacci's book or anybody's book who's sitting up here, anybody in the audience.
01:18
They're getting them. They're stealing them. They're pirating them from somewhere. If they're not
01:23
buying the books, they're not stealing them out of libraries, where are they getting them?
01:27
These large repositories of materials that are available online, there are many. Some are licit,
01:33
some are not licit. The pirate websites in particular are not licit. So if you need a lot
01:38
of material, you go out and you scoop up all that material that you can find. But you don't go to pirate
01:44
websites to get that material if what you want to do is legal. None of these works are licensed.
01:49
None of these works are licensed. No author has been compensated to date.
01:56
So how do they go to these, let's call them shadow libraries, to get the works illegally? They've
02:03
already, by the time they go to the shadow library, the works there are already stolen, right? They've
02:07
already stolen Mr. Baldacci's book, Professor Lee's book, everybody's, your books. They've stolen them.
02:12
How do they actually, when they go to the shadow library, how do they get them? I mean,
02:16
how does the AI company then take possession of the particular work?
02:21
There's a process called torrenting, and I will not trouble you all with the details of torrenting,
02:26
but essentially huge amounts of data stream to you and you get them. At the same time,
02:32
you can send them out. That's called seeding. You can send them out at the same time.
02:36
Uploading and downloading exist at the same time. This is a peer-to-peer process. So not only are you
02:42
taking in these pirated materials, you are also distributing them. The violation of copyright law
02:48
exists at the reproduction of these works, at the making available of them by the pirate libraries,
02:54
the dissemination of them, and your dissemination, Gen AI company, of them as well.
03:00
So they're both taking the works and distributing them as well in this thing you call it, kind of like
03:05
Napster, this thing that you call torrenting. Let me ask you this. I mean, that's not,
03:09
is torrenting legal? That's not legal, is it? Torrenting can be legal, but in this case it is
03:13
not. And in this particular case, this is benefiting the torrent. Now, I agree with Judge Alsup,
03:21
who said, if you're taking it from pirate libraries, no way. That is not acceptable, right? Part of what
03:27
we're seeing here, Judge Chabria said, well, it's not helping the pirate websites. Well, yes, it is.
03:33
The pirate websites, there's one in particular called Anna's Archive. They actually put on their
03:38
website, hey, Gen AI companies, come train on us. We'll do some data swaps, or you know what? You
03:44
can make us a donation, too. This is directly helping the pirate websites thrive, flourish,
03:50
proliferate. Let me ask you this. Have there been any, to your knowledge, any criminal enforcements
03:54
against these torrenting platforms? Yes, there have been attempts to. Again, it's like a game of
03:59
whack-a-mole. You get one, you knock it down, it pops up again in some jurisdiction that you don't
04:04
have control over. What's the key to a criminal enforcement? You know, civil versus criminal in
04:09
this context, when do we have a criminal case against torrenting? What's the key to that?
04:15
Okay, this is a really important point. What's criminal here? Criminal copyright liability has
04:21
two prongs to it. Prong one is you have to do it willfully, and prong two is you have to do it for
04:27
commercial advantage or gain. We clearly know that prong two is met. This is for commercial advantage or
04:31
gain. I don't think Meta is doing this out of the goodness of its heart. Prong one, willful means
04:36
you need to know that what you were doing is illegal. There's lots and lots of evidence now,
04:42
particularly from the cadre v. Meta case, that shows that they knew this was illegal. They even had to
04:48
ask all the way up the chain of command to Mark Zuckerberg and say, hey, is this okay? And he said,
04:52
yes, it did. Yes, it's okay. So not only did he do it knowing it was illegal, he did it knowingly,
04:57
he did it willfully, intentionally, and whether or not he knew what statute it was legal doesn't
05:02
matter. For this to be willful, you have to know that what you're doing is wrong, and this meets
05:07
that prong. So this is, in fact, amounting to what you might call criminal copyright liability.
05:13
Mr. Pratt, let me just ask you about this, about the willful aspect, and let's talk about Meta in
05:17
particular, since Professor Vizwanifan just mentioned Meta. They're one of the biggest monopolists
05:23
in the world, and one of the biggest AI companies now in the world, if not the biggest. So let's just
05:27
talk about them for a second. Meta uses torrents to acquire pirated data for its llama model. Is that
05:35
right? Correct. How much data would you estimate that Meta has torrented? It's illegally downloaded
05:44
and also then shared in this peer-to-peer scheme. It has pirated well over 200 terabytes
05:52
of copyrighted material from multiple, I don't call them shadow libraries because they're not
05:59
libraries, but illicit criminal enterprises. And how much has it paid the copyright holders for
06:06
these works that it's used, to your knowledge? Nothing. If nothing, zero. So billions of works,
06:14
billions of books like Mr. Baldacci's, zero payment. If Meta were to pay, do you have any idea what the
06:21
cost might be? I mean, did they ever, to your knowledge and your discovery, did they ever explore
06:25
paying? I mean, is there any sense of how much this might have cost them? Early on, they explored
06:31
licensing. They assigned two individuals part-time to attempt to license, and they decided it would take
06:40
too long, for example. And that's when they turned to piracy. At the time, they had, public documents show
06:48
that certainly tens of millions, if not hundreds of millions, had been contemplated for licensing at
06:55
that time. Okay, so let's just think about this. Hundreds of millions of dollars, that's the value,
07:00
maybe sort of the base, the bare value of the works that they've used, like the works that you all have
07:06
written on this panel. Hundreds of millions, and they paid zero of that. So let's just drill down a little
07:12
further. Did Meta know what they were doing was wrong? Do you, do you, Mr. Pritt, believe in the
07:18
evidence you've seen, that there's any evidence to suggest that Meta's employees knew what they were
07:23
doing is illegal? The documents that have become public clearly show that. Let's just look at a few
07:27
of these documents. I'm going to show you a few things, and I'll ask you to help me interpret them
07:31
to make sure that we get them right. Let's start here with a Meta employee, a Meta engineer working
07:36
on their AI project, Eleonora Prasani. She says, I don't think we should use pirated material. This
07:42
is in a chat with other Meta employees. I don't think we should use pirated material. I really need
07:47
to draw a line there. She goes on, I feel that using pirated material should be beyond our ethical
07:53
threshold. Sci-Hub, ResearchScape, LibGen are basically like Pirate Bay or something like that.
07:57
They are distributing content that is protected by copyright, and they're infringing it. How do you
08:03
read this, Mr. Pritt? Does this look like knowledge to you?
08:06
That's certainly what we've argued in the case. Let's look at another Meta employee.
08:14
Here is Nisha Deo in the same chat. She replies and said, it's the piracy on us knowing and being
08:23
accomplices that's the issue. This is a Meta engineer working on their AI project. It's the piracy on us
08:32
being knowing accomplices. That's the issue. Let's look at another one.
08:41
Here is the response that another Meta engineer in the same chat gave. Well, we want to buy books and be
08:50
nice, open people here, but however, to make it happen and not letting the bad guys win, that's the
08:57
beat China argument. We need to make a case fast and cut some corners here and there. We need to
09:04
cut some corners here and there. Mr. Pritt, what are we looking at here? I mean, is this knowledge of
09:09
illegal activity? When they refer to bad guys, I think they're actually referring to OpenAI and other
09:14
AI competitors. But yes, this is certainly one of the many documents that show that they knew these
09:22
were pirated websites that contained copyrighted materials and they were taking them for free.
09:26
So here we have it in black and white. Don't believe me. Read the evidence. These are Meta's
09:33
own engineers, Meta's own employees saying they know what they're doing is ethically wrong, illegal,
09:41
likely to subject them to legal liability, and they're doing it anyway because they need the money.
09:47
There's a lot more here. We'll come back to this. I want to give Senator Durbin a chance to ask questions.
09:53
Senator.
Recommended
5:48
|
Up next
Jacky Rosen Asks Expert About Adversaries Possibly ‘Co-opting AI’ To Gather Data, Promote Ideologies
Forbes Breaking News
5/12/2025
4:53
Josh Hawley Brings The Receipts When Questioning Facebook Whistleblower About 'Censorship Tools'
Forbes Breaking News
4/14/2025
5:24
'What Is The Action You Think The Government Needs To Take?': Thomas Massie Questions Witnesses About AI
Forbes Breaking News
5/12/2025
3:41
Meta presents advances in AI robot-human interaction
The Manila Times
2/8/2025
1:42
Meta's new AI-focused app
The Street
2/28/2025
5:12
FBI Director Kash Patel Discusses How AI Is Being Applied To Data That The FBI Has
Forbes Breaking News
5/12/2025
0:55
Video game performers protest unregulated AI use at Warner Bros. Studios
The Manila Times
8/2/2024
5:02
Meet America’s Richest Immigrants In 2025
Forbes
yesterday
2:10
Is The Search On For Federal Reserve Chair Jerome Powell's Successor?
Forbes
yesterday
3:14
Viral Labubu Dolls Resell For Thousands Online As TikTok’s New Big Hit
Forbes
yesterday
18:00
Cboe's Trailblazing Female Leaders Talk With Benzinga About Success In The Financial Industry
Benzinga
yesterday
8:42
From Coma To Comeback: How SeaStar's ($ICU) QUELIMMUNE Saved A Young Man’s Life
Benzinga
yesterday
1:09
Elon Musk's CRAZY $1 A Day Diet
Benzinga
yesterday
3:04
House Democrats Meet With NY City Mayoral Candidate Zohran Mamdani in Washington
TIME
yesterday
0:57
Israel Bombs Damascus, Warns Syria of ‘Painful Blows’ as Footage Shows Hit on Defense Ministry
TIME
yesterday
0:48
Understanding Trump’s Dismantling of the Education Department—and What’s At Stake
TIME
yesterday
2:36
Reporter To Trump: 'Are You Completely Ruling Out The Idea Of Firing Jerome Powell?'
Forbes Breaking News
yesterday
1:08
State Department Official Asked Why Israel Ambassador Huckabee Attended Netanyahu's Corruption Trial
Forbes Breaking News
yesterday
2:17
'We Will Never Stop Fighting': Vance Says 'It's Time To Keep Building' On Big Beautiful Bill Victory
Forbes Breaking News
yesterday
7:22
Josh Hawley Pulls Up Meta Employee Chat Logs In Front Of Congress Showing 'Pirating' Scheme
Forbes Breaking News
yesterday
5:57
Maggie Hassan Asks HHS Nom Point Blank: 'Would You Follow The Law, Or The President's Directive?'
Forbes Breaking News
yesterday
5:01
Josh Hawley Mercilessly Grills Professor Over AI Copyright 'Mass Theft'
Forbes Breaking News
yesterday
2:56
Reporter Presses Tammy Bruce For Specifics About Trump's 'Demand' On Lebanon To Disarm Hezbollah
Forbes Breaking News
yesterday
3:03
Debbie Dingell Brushes Off Questions Of Whether Dems Are Moving Towards Socialism After Mamdani Meeting
Forbes Breaking News
yesterday
2:00
State Department Spox: US Wants Israeli Forces, Syrian Government To Withdraw From Conflict Zone
Forbes Breaking News
yesterday