Skip to playerSkip to main contentSkip to footer
  • yesterday
During a Senate Judiciary Committee hearing on Wednesday, Sen. Josh Hawley (R-MO) shows chat logs from Meta employees showing employees allegedly expressed doubts about pirating materials to train artificial intelligence models.
Transcript
00:00Thank you very much, Professor. Thanks for being here. Thanks again to all of our witnesses.
00:03We're going to now have seven-minute rounds of questioning, and we'll see if we can fit in
00:10maybe a couple of rounds, just depending on the time that we have. I'll start, and then we'll go
00:13to the ranking member and any other members who arrive in that time. Professor Viswanathan,
00:19let me just start with you, if I could, and let's see if we can just drill down on some of the
00:22specifics here. Mr. Baldacci mentioned in his opening statement that AI could just feed dictionaries
00:29into their platforms in order to train them. They don't do that. They prefer published works,
00:34fully formed works. Why is that? Can you give us an insight into that? That's absolutely right.
00:39They learn syntax, structure. They learn how we learn language, right? When you learn language,
00:45you just don't learn words. You don't memorize words. You don't memorize notes when you learn music.
00:49You learn structure and syntax, and the point that Professor Lee is making is correct. They need
00:55large data sets. More is better to learn predictive language models. However, more is not everything.
01:06It's not pirated works. So let me just ask this. You said that they are not buying the books. They're
01:14not buying Mr. Baldacci's book or anybody's book who's sitting up here, anybody in the audience.
01:18They're getting them. They're stealing them. They're pirating them from somewhere. If they're not
01:23buying the books, they're not stealing them out of libraries, where are they getting them?
01:27These large repositories of materials that are available online, there are many. Some are licit,
01:33some are not licit. The pirate websites in particular are not licit. So if you need a lot
01:38of material, you go out and you scoop up all that material that you can find. But you don't go to pirate
01:44websites to get that material if what you want to do is legal. None of these works are licensed.
01:49None of these works are licensed. No author has been compensated to date.
01:56So how do they go to these, let's call them shadow libraries, to get the works illegally? They've
02:03already, by the time they go to the shadow library, the works there are already stolen, right? They've
02:07already stolen Mr. Baldacci's book, Professor Lee's book, everybody's, your books. They've stolen them.
02:12How do they actually, when they go to the shadow library, how do they get them? I mean,
02:16how does the AI company then take possession of the particular work?
02:21There's a process called torrenting, and I will not trouble you all with the details of torrenting,
02:26but essentially huge amounts of data stream to you and you get them. At the same time,
02:32you can send them out. That's called seeding. You can send them out at the same time.
02:36Uploading and downloading exist at the same time. This is a peer-to-peer process. So not only are you
02:42taking in these pirated materials, you are also distributing them. The violation of copyright law
02:48exists at the reproduction of these works, at the making available of them by the pirate libraries,
02:54the dissemination of them, and your dissemination, Gen AI company, of them as well.
03:00So they're both taking the works and distributing them as well in this thing you call it, kind of like
03:05Napster, this thing that you call torrenting. Let me ask you this. I mean, that's not,
03:09is torrenting legal? That's not legal, is it? Torrenting can be legal, but in this case it is
03:13not. And in this particular case, this is benefiting the torrent. Now, I agree with Judge Alsup,
03:21who said, if you're taking it from pirate libraries, no way. That is not acceptable, right? Part of what
03:27we're seeing here, Judge Chabria said, well, it's not helping the pirate websites. Well, yes, it is.
03:33The pirate websites, there's one in particular called Anna's Archive. They actually put on their
03:38website, hey, Gen AI companies, come train on us. We'll do some data swaps, or you know what? You
03:44can make us a donation, too. This is directly helping the pirate websites thrive, flourish,
03:50proliferate. Let me ask you this. Have there been any, to your knowledge, any criminal enforcements
03:54against these torrenting platforms? Yes, there have been attempts to. Again, it's like a game of
03:59whack-a-mole. You get one, you knock it down, it pops up again in some jurisdiction that you don't
04:04have control over. What's the key to a criminal enforcement? You know, civil versus criminal in
04:09this context, when do we have a criminal case against torrenting? What's the key to that?
04:15Okay, this is a really important point. What's criminal here? Criminal copyright liability has
04:21two prongs to it. Prong one is you have to do it willfully, and prong two is you have to do it for
04:27commercial advantage or gain. We clearly know that prong two is met. This is for commercial advantage or
04:31gain. I don't think Meta is doing this out of the goodness of its heart. Prong one, willful means
04:36you need to know that what you were doing is illegal. There's lots and lots of evidence now,
04:42particularly from the cadre v. Meta case, that shows that they knew this was illegal. They even had to
04:48ask all the way up the chain of command to Mark Zuckerberg and say, hey, is this okay? And he said,
04:52yes, it did. Yes, it's okay. So not only did he do it knowing it was illegal, he did it knowingly,
04:57he did it willfully, intentionally, and whether or not he knew what statute it was legal doesn't
05:02matter. For this to be willful, you have to know that what you're doing is wrong, and this meets
05:07that prong. So this is, in fact, amounting to what you might call criminal copyright liability.
05:13Mr. Pratt, let me just ask you about this, about the willful aspect, and let's talk about Meta in
05:17particular, since Professor Vizwanifan just mentioned Meta. They're one of the biggest monopolists
05:23in the world, and one of the biggest AI companies now in the world, if not the biggest. So let's just
05:27talk about them for a second. Meta uses torrents to acquire pirated data for its llama model. Is that
05:35right? Correct. How much data would you estimate that Meta has torrented? It's illegally downloaded
05:44and also then shared in this peer-to-peer scheme. It has pirated well over 200 terabytes
05:52of copyrighted material from multiple, I don't call them shadow libraries because they're not
05:59libraries, but illicit criminal enterprises. And how much has it paid the copyright holders for
06:06these works that it's used, to your knowledge? Nothing. If nothing, zero. So billions of works,
06:14billions of books like Mr. Baldacci's, zero payment. If Meta were to pay, do you have any idea what the
06:21cost might be? I mean, did they ever, to your knowledge and your discovery, did they ever explore
06:25paying? I mean, is there any sense of how much this might have cost them? Early on, they explored
06:31licensing. They assigned two individuals part-time to attempt to license, and they decided it would take
06:40too long, for example. And that's when they turned to piracy. At the time, they had, public documents show
06:48that certainly tens of millions, if not hundreds of millions, had been contemplated for licensing at
06:55that time. Okay, so let's just think about this. Hundreds of millions of dollars, that's the value,
07:00maybe sort of the base, the bare value of the works that they've used, like the works that you all have
07:06written on this panel. Hundreds of millions, and they paid zero of that. So let's just drill down a little
07:12further. Did Meta know what they were doing was wrong? Do you, do you, Mr. Pritt, believe in the
07:18evidence you've seen, that there's any evidence to suggest that Meta's employees knew what they were
07:23doing is illegal? The documents that have become public clearly show that. Let's just look at a few
07:27of these documents. I'm going to show you a few things, and I'll ask you to help me interpret them
07:31to make sure that we get them right. Let's start here with a Meta employee, a Meta engineer working
07:36on their AI project, Eleonora Prasani. She says, I don't think we should use pirated material. This
07:42is in a chat with other Meta employees. I don't think we should use pirated material. I really need
07:47to draw a line there. She goes on, I feel that using pirated material should be beyond our ethical
07:53threshold. Sci-Hub, ResearchScape, LibGen are basically like Pirate Bay or something like that.
07:57They are distributing content that is protected by copyright, and they're infringing it. How do you
08:03read this, Mr. Pritt? Does this look like knowledge to you?
08:06That's certainly what we've argued in the case. Let's look at another Meta employee.
08:14Here is Nisha Deo in the same chat. She replies and said, it's the piracy on us knowing and being
08:23accomplices that's the issue. This is a Meta engineer working on their AI project. It's the piracy on us
08:32being knowing accomplices. That's the issue. Let's look at another one.
08:41Here is the response that another Meta engineer in the same chat gave. Well, we want to buy books and be
08:50nice, open people here, but however, to make it happen and not letting the bad guys win, that's the
08:57beat China argument. We need to make a case fast and cut some corners here and there. We need to
09:04cut some corners here and there. Mr. Pritt, what are we looking at here? I mean, is this knowledge of
09:09illegal activity? When they refer to bad guys, I think they're actually referring to OpenAI and other
09:14AI competitors. But yes, this is certainly one of the many documents that show that they knew these
09:22were pirated websites that contained copyrighted materials and they were taking them for free.
09:26So here we have it in black and white. Don't believe me. Read the evidence. These are Meta's
09:33own engineers, Meta's own employees saying they know what they're doing is ethically wrong, illegal,
09:41likely to subject them to legal liability, and they're doing it anyway because they need the money.
09:47There's a lot more here. We'll come back to this. I want to give Senator Durbin a chance to ask questions.
09:53Senator.

Recommended