BuzzMachine: Posts about

Posts about congress

In the echo chamber

January 11, 2024 | 1 Comment

ai, artificial intelligence, congress, copyright, journalism

Well, that was surreal. I testified in a hearing about AI and the future of journalism held by the Senate Judiciary Subcommittee on Privacy, Technology, and the Law. Here is my written testimony and here’s the Reader’s Digest version in my opening remarks:

It was a privilege and honor to be invited to air my views on technology and the news. I went in knowing I had a role to play, as the odd man out. The other witnesses were lobbyists for the newspaper/magazine and broadcast industries and the CEO of a major magazine company. The staff knew I would present an alternative perspective. My fellow panelists noted before we sat down — nicely — that they disagreed with my written testimony. Job done. There was little opportunity to disagree in the hearing, for one speaks only when spoken to.

What struck me about the experience is not surprising: They call the internet an echo chamber. But, of course, there’s no greater echo chamber than Congress: lobbyists and legislators agreeing with each other about the laws they write and promote together. That’s what I witnessed in the hearing in a few key areas:

Licensing: The industry people and the politicians all took as gospel the idea that AI companies should have to license and pay for every bit of media content they use.

I disagree. I draw the analogy to what happened when radio started. Newspapers tried everything to keep radio out of news. In the end, to this day, radio rips and reads newspapers, taking in and repurposing information. That’s to the benefit of an informed society.

Why shouldn’t AI have the same right? I ask. Some have objected to my metaphor: Yes, I know, AI is a program and the machine doesn’t read or learn or have rights any more than a broadcast tower can listen and speak and vote. I spoke metaphorically, for if I had instead argued that, say, Google or Meta has a right to read and learn, that would have opened up a whole can of PR worms. The point is obvious, though: If AI creators would be required by law to license *everything* they use, that grants them lesser rights than media — including journalists, who, let’s be clear, read, learn from, and repurpose information from each other and from sources every day.

I think there’s a difference in using content to train a model versus producing output. It’s one matter for large language models to be taught the relationship of, say, the words “White” and “House.” I say that is fair and transformative use. But it’s a fair discussion to separate out questions of proper acquisition and terms of use when an application quotes from copyrighted material from behind a paywall in its output. The magazine executive cleverly conflated training and output, saying *any* use required licensing and payment. I believe that sets a dangerous precedent for news media itself.

If licensing and payment is required for all use of all content, then I say the doctrine of fair use could be eviscerated. The senators argued just the opposite, saying that if fair use is expanded, copyright becomes meaningless. We disagree.

JCPA: The so-called Journalism Competition and Preservation Act is a darling of many members of the committee. Like Canada’s disastrous Bill C-18 and Australia’s corrupt News Media Bargaining Code — which the senators and the lobbyists think are wonderful — the JCPA would allow large news organizations (those that earn more than $100,000 a year, leaving out countless small, local enterprises) to sidestep antitrust and gang together and force platforms to “negotiate” for the right to link to their content. It’s legislated blackmail. I didn’t have the chance to say that. Instead, the lobbyists and legislators all agreed how much they love the bill and can’t wait to try again to pass it.

Section 230: Members of the committee also want to pass legislation to exclude generative AI from the protections of Section 230, which enables public discourse online by protecting platforms from liability for what users say there while also allowing companies to moderate what is said. The chair said no witness in this series of hearings on AI has disagreed. I had the opportunity to say that he has found his first disagreement.

I always worry about attempts to slice away Section 230’s protections like a deli balogna. But more to the point, I tried to explain that there is nuance in deciding where liability should lie. In the beginning of print, printers were held liable — burned, beheaded, and behanded — for what came off their presses; then booksellers were responsible for what they sold; until ultimately authors were held responsible — which, some say, was the birth of the idea of authorship.

When I attended a World Economic Forum AI governance summit, there was much discussion about these questions in relation to AI. Holding the models liable for everything that could be done with them would, in my view, be like blaming the printing press for what is put on and what comes off it. At the event, some said responsibility should lie at the application level. That could be true if, for example, Michael Cohen was misled by Google when it placed Bard next to search, letting him believe it would act like search and giving him bogus case citations instead. I would say that responsiblity generally lies with the user, the person who instructs the program to say something bad or who uses the program’s output without checking it, as Cohen did. There is nuance.

Deep fakery: There was also some discussion of the machine being used to fool people and whether, in the example used, Meta should be held responsible and expected to verify and take down a fake video of someone made with AI — or else be sued. As ever, I caution against legislating official truth.

The most amusing moment in the hearing was when the senator from Tennessee complained that media are liberal and AI is liberal and for proof she said that if one asks ChatGPT to write a poem praising Donald Trump, it will refuse. But it would write a poem praising Joe Biden and she proceeded to read it to me. I said it was bad poetry. (BTW, she’s right: both ChatGPT and Bard won’t sing the praises of Trump but will say nice things about Biden. I’ll leave the discussion about so-called guardrails to another day.)

It was a fascinating experience. I was honored to be included.

For the sake of contrast, in the morning before the hearing, I called Sven Størmer Thaulow, chief data and technology officer for Schibsted, the much-admired (and properly so) news and media company of Scandinavia. Last summer, Thaulow called for Norwegian media companies to contribute their content freely to make a Norwegian-language large language model. “The response,” the company said, “was overwhelmingly positive.” I wanted to hear more.

Thaulow explained that they are examining the opportunities for a native-language LLM in two phases: first research, then commercialization. In the research phase now, working with universities, they want to see whether a native model beats an English-language adaptation, and in their benchmark tests, it does. As a media company, Schibsted has also experimented with using generative AI to allow readers to query its database of gadget reviews in conversation, rather than just searching — something I wish US news organizations would do: Instead of complaining about the technology, use it to explore new opportunities.

Media companies contribute their content to the research. A national organization is making a blanket deal and individual companies are free to opt out. Norway being Norway — sane and smart — 90 percent of its books are already digitized and the project may test whether adding them will improve the model’s performance. If it does, they and government will deal with compensation then.

All of this is before the commercial phase. When that comes, they will have to grapple with fair shares of value.

How much more sensible this approach is to what we see in the US, where technology companies and media companies face off, with Capitol Hill as as their field of play, each side trying to play the refs there. The AI companies, to my mind, rushed their services to market without sufficient research about impact and harm, misleading users (like hapless Michael Cohen) about their capabilities. Media companies rushed their lobbyists to Congress to cash in the political capital earned through journalism to seek protectionism and favors from the politicians their journalists are supposed to cover, independently. Politicians use legislation to curry favor in turn with powerful and rich industries.

Why can’t we be more like Norway?

Journalism and AI

January 9, 2024 | 1 Comment

ai, congress, journalism, LLMs, regulation, senate

Here are are my written remarks for a hearing on AI and the future of journalism for the Senate Judiciary Subcommittee on Privacy, Technology, and the Law, on January 10, 2024.

I have been a journalist for fifty years and a journalism professor for the last eighteen.

History

I would like to begin with three lessons on the history of news and copyright, which I learned researching my book, The Gutenberg Parenthesis: The Age of Print and its Lessons for the Age of the Internet (Bloomsbury, 2023):

First, America’s 1790 Copyright Act covered only charts, maps, and books. The New York Times’ suit against OpenAI claims that, “Since our nation’s founding, strong copyright protection has empowered those who gather and report news to secure the fruits of their labor and investment.” In truth, newspapers were not covered in the statute until 1909 and even then, according to Will Slauter, author of Who Owns the News: A History of Copyright (Stanford, 2019), there was debate over whether to include news articles, for they were the products of the institution more than an author.

Second, the Post Office Act of 1792 allowed newspapers to exchange copies for free, enabling journalists with the literal title of “scissors editor” to copy and reprint each others’ articles, with the explicit intent to create a network for news, and with it a nation.

Third, exactly a century ago, when print media faced their first competitor — radio — newspapers were hostile in their reception. Publishers strong-armed broadcasters into signing the 1933 Biltmore Agreement by threatening not to print program listings. The agreement limited radio to two news updates a day, without advertising; required radio to buy their news from newspapers’ wire services; and even forbade on-air commentators from discussing any event until twelve hours afterwards — a so-called “hot news doctrine,” which the Associated Press has since tried to resurrect. Newspapers lobbied to keep radio reporters out of the Congressional press galleries. They also lobbied for radio to be regulated, carving an exception to the First Amendment’s protections of freedom of expression and the press.

Publishers accused radio — just as they have since accused television and the internet and AI — of stealing “their” content, audience, and revenue, as if each had been granted them by royal privilege. In scholar Gwenyth Jackaway’s words, publishers “warned that the values of democracy and the survival of our political system” would be endangered by radio. That sounds much like the sacred rhetoric in The Times’ OpenAI suit: “Independent journalism is vital to our democracy. It is also increasingly rare and valuable.”

To this day, journalists — whether on radio or at The New York Times — read, learn from, and repurpose facts and knowledge gained from the work of fellow journalists. Without that assured freedom, newspapers and news on television and radio and online could not function. The real question at hand is whether artificial intelligence should have the same right that journalists and we all have: the right to read, the right to learn, the right to use information once known. If it is deprived of such rights, what might we lose?

Opportunities

Rather than dwelling on a battle of old technology and titans versus new, I prefer to focus here on the good that might come from news collaborating with this new technology.

First, though, a caveat: I argue it is irresponsible to use large language models where facts matter, for we know that LLMs have no sense of fact; they only predict words. News companies, including CNET, G/O Media, and Gannett, have misstepped, using the technology to manufacture articles at scale, strewn with errors. I covered the show-cause hearing for a New York attorney who (like President Trump’s former counsel, Michael Cohen) used an LLM to list case citations. Federal District Judge P. Kevin Castel made clear that the problem was not the technology but its misuse by humans. Lawyers and journalists alike must exercise caution in using generative AI to do their work.

Having said that, AI presents many intriguing possibilities for news and media. For example:

AI has proven to be excellent at translation. News organizations could use it to present their news internationally.

Large language models are good at summarizing a limited corpus of text. This is what Google’s NotebookLM does, helping writers organize their research.

AI can analyze more text than any one reporter. I brainstormed with an editor about having citizens record 100 school-board meetings so the technology could transcribe them and then answer questions about how many boards are discussing, say, banning books.

I am fascinated with the idea that AI could extend literacy, helping people who are intimidated by writing tell and illustrate their own stories.

A task force of academics from the Modern Language Association concluded AI in the classroom could help students with word play, analyzing writing styles, overcoming writers’ block, and stimulating discussion.

AI also enables anyone to write computer code. As an AI executive told me in a podcast about AI that I cohost, “English majors are taking the world back… The hottest programming language on planet Earth right now is English.”

Because LLMs are in essence a concordance of all available language online, I hope to see scholars examine them to study society’s biases and clichés.

And I see opportunities for publishers to put large language models in front of their content to allow readers to enter into dialog with that content, asking their own questions and creating new subscription benefits. I know an entrepreneur who is building such a business.

Note that in Norway, the country’s largest and most prestigious publisher, Schibsted, is leading the way to build a Norwegian-language large language model and is urging all publishers to contribute content. In the US, Aimee Rinehart, an executive student of mine at CUNY who works on AI at the Associated Press, is also studying the possibility of an LLM for the news industry.

Risks

All these opportunities and more are put at risk if we fence off the open internet into private fortresses.

Common Crawl is a foundation that for sixteen years has archived the entire web: 250 billion pages, 10 petabytes of text made available to scholars for free, yielding 10,000 research papers. I am disturbed to learn that The New York Times has demanded that the entire history of its content — that which was freely available — be erased. Personally, when I learned that my books were included in the Books3 data set used to train large language models, I was delighted, for I write not only to make money but also to spread ideas.

What happens to our information ecosystem when all authoritative news retreats behind paywalls, available only to privileged citizens and giant corporations able to pay for it? What happens to our democracy when all that is left out in public for free — to inform both citizens and machines — is propaganda, disinformation, conspiracies, spam, and lies? I well understand the economic plight of my industry, for I direct a Center for Entrepreneurial Journalism. But I also say we must have a discussion about journalism’s moral obligation to an informed society and about the right not only to speak but to learn.

Copyright

And we need to talk about reimaging copyright in this age of change, starting with a discussion about generative AI as fair and transformative use. When the Copyright Office sought opinions on artificial intelligence and copyright (Docket 2023-6), I responded with concern about an idea the Office raised of establishing compulsory licensing schemes for training data. Technology companies already offer simple opt-out mechanisms (see: robots.TXT).

Copyright at its origin in the Statute of Anne of 1710 was enacted not to protect creators, as is commonly asserted. Instead, it was passed at the demand of booksellers and publishers to establish a marketplace for creativity as a tradeable asset. Our concepts of creativity-as-content and content-as-property have their roots in copyright.

Now along come machines — large language models and generative AI — that manufacture endless content. University of Maryland Professor Matthew Kirschenbaum warns of what he calls “the Textpocalypse.” Artificial intelligence commodifies the idea of content, even devalues it. I welcome this. For I hope it might drive journalists to understand that their value is not in manufacturing the commodity, content. Instead, they must see journalism as a service to help citizens inform public discourse and improve their communities.

In 2012, I led a series of discussions with multiple stakeholders — media executives, creative artists, policymakers — for a project with the World Economic Forum on rethinking intellectual property and the support of creativity in the digital age. In the safe space of Davos, even media executives would concede that copyright is outmoded. Out of this work, I conceived of a framework I call “creditright,” which I’ve written is “the right to receive credit for contributions to a chain of collaborative inspiration, creation, and recommendation of creative work. Creditright would permit the behaviors we want to encourage to be recognized and rewarded. Those behaviors might include inspiring a work, creating that work, remixing it, collaborating in it, performing it, promoting it. The rewards might be payment or merely credit as its own reward.” It is just one idea, intended to spark discussion.

Publishers constantly try to extend copyright’s restrictions in their favor, arguing that platforms owe them the advertising revenue they lost when their customers fled for better, competitive deals online. This began in 2013 with German publishers lobbying for a Leistungsschutzrecht, or ancillary copyright, which inspired further protectionist legislation, including Spain’s link tax, articles 15 and 17 of the EU’s Copyright Directive, Australia’s News Media Bargaining Code, and most recently Canada’s Bill C-18, which requires large platforms — namely Google and Facebook — to negotiate with publishers for the right to link to their news. To gain an exemption from the law, Google agreed to pay about $75 million to publishers — generous, but hardly enough to save the industry. Meta decided instead to take down links to news rather than being forced to pay to link. That is Meta’s right under Canada’s Charter of Rights and Freedoms, for compelled speech is not free speech.

In this process, lobbyists for Canada’s publishers insisted that their headlines were valuable while Meta’s links were not. The nonmarket intervention of C-18 sided with the publishers. But as it turned out, when those links disappeared, Facebook lost no traffic while publishers lost up to a third of theirs. The market spoke: Links are valuable. Legislation to restrict linking would break the internet for all.

I fear that the proposed Journalism Competition and Preservation Act (JCPA) and the California Journalism Protection Act (CJPA) could have similar effect here. As a journalist, I must say that I am offended to see publishers lobby for protectionist legislation, trading on the political capital earned through journalism. The news should remain independent of — not beholden to — the public officials it covers. I worry that publishers will attempt to extend copyright to their benefit not only with search and social platforms but now with AI companies, disadvantaging new and small competitors in an act of regulatory capture.

Support for innovation

The answer for both technology and journalism is to support innovation. That means enabling open-source development, encouraging both AI models and data — such as that offered by Common Crawl — to be shared freely.

Rather than protecting the big, old newspaper chains — many of them now controlled by hedge funds, which will not invest or innovate in news — it is better to nurture new competition. Take, for example, the 450 members of the New Jersey News Commons, which I helped start a decade ago at Montclair State University; and the 475 members of the Local Independent Online News Publishers; the 425 members of the Institute for Nonprofit News; and the 4,000 members of the News Product Alliance, which I also helped start at CUNY. This is where innovation in news is occurring: bottom-up, grass-roots efforts emergent from communities.

There are many movements to rebuild journalism. I helped develop one: a degree program called Engagement Journalism. Others include Solutions Journalism, Constructive Journalism, Reparative Journalism, Dialog Journalism, and Collaborative Journalism. What they share is an ethic of first listening to communities and their needs.

In my upcoming book, The Web We Weave, I ask technologists, scholars, media, users, and governments to enter into covenants of mutual obligation for the future of the internet and, by extension, AI.

There I propose that you, as government, promise first to protect the rights of speech and assembly made possible by the internet. Base decisions that affect internet rights on rational proof of harms, not protectionism for threatened industries and not media’s moral panic. Do not splinter the internet along national borders. And encourage and enable new competition and openness rather than entrenching incumbent interests through regulatory capture.

In short, I seek a Hippocratic Oath for the internet: First, do no harm.

Statement to the Judiciary Subcommittee on Antitrust

March 12, 2021 | Comments Off

antitrust, congress, facebook, google, Internet, newspapers

I was called about possibly testifying to a hearing of the House Judiciary Subcommittee on Antitrust regarding technology companies. That’s not happening but I decided to submit a statement to the committee. Here, minus my bio, is what I have to say:

Statement to the Subcommittee:

I write to the committee to express my concern about often well-intentioned but ill-conceived internet regulation, which could have deleterious effects on freedom of expression; which tends to protect incumbent media and technology companies at the expense of innovation and competition; and whose unintended consequence is frequently to grant internet platforms yet greater power. It is worthwhile to examine the effects of internet regulation elsewhere as it is debated here.

Consider, for example, Australia’s media code. The net result, according to the news site Crikey, is that the country’s existing media duopoly of News Corp. and the Nine Network will receive 90 percent of the money being paid by Google and Facebook, both of which are now in the position to decide which news organizations should receive support. Small news startups that might compete with the powerful incumbents receive no protection or support in the law. The Australian code amounts to a link tax — for those companies that link to news are required to pay for news — and Sir Tim Berners-Lee, inventor of the web, testified to Australian legislators that such a precedent would “make the web unworkable around the world.” It would break the internet. I regret that in the end, Google and Facebook succumbed to what I see as corporate and political blackmail.

In Europe, various changes to copyright law — Germany’s Leistungsschutzrecht, Spain’s link tax, the EU’s Articles 15 and 17 of the its Directive on Copyright — amount to regulatory capture, for the large internet companies can afford compliance but I have spoken with smaller competitors for whom the expense and effort are crippling. Germany’s NetzDG hate-speech law requires Facebook to decide — in a private company rather than an open courtroom — what speech is manifestly illegal. Europe’s Right to be Forgotten court decision puts Google in the position of deciding what speech should be remembered or forgotten. The UK is considering regulation that would require platforms to take down “legal but harmful speech.”

Online speech is imperiled in many quarters. In Italy, Facebook was forced to reinstate a site for a neo-fascist group. Poland has announced a new law that would require platforms to carry all legal speech, a nightmare that would protect the worst of the net. I would remind us that compelled speech is not free speech. In addition, Singapore instituted a fake-news law, which puts internet companies in the unwanted position of being arbiters of truth. Similarly, India is enacting regulation that would require platforms to take down speech that is false or threatens national unity.

In the United States, Google’s recent announcement that it will forego ad targeting on the web based on third-party data was applauded by privacy advocates who have demonized web cookies as so-called “surveillance capitalism.” But this again amounts to regulatory capture as Google itself has plentiful first-party data about consumer behavior as well as the resources and technical means to innovate in advertising. Incumbent publishers, on the other hand, are stuck without their own first-party data or innovation. I know this because in my university center, I spent years trying to convince publishers to change their product and business strategies to prepare for this day. They generally insisted on relying on their dying print businesses and on third-party ad networks online, and now they are retreating behind paywalls. As a result, just when we need it most, reliable news is becoming a product for the privileged few who can afford it. According to Oxford’s Reuters Institute, only 20 percent of Americans pay for online news and it is a winner-take-all market with most people paying for only one subscription for news — almost two thirds of subscriptions go to just three publishers: The New York Times, The Washington Post, and Rupert Murdoch’s News Corp.

Note well that most local newspaper companies in the United States are now controlled by hedge funds, which are not inclined to invest in innovation and which, by their nature, tend to sell assets and draw cash out of these enterprises. If there ever were an attempt to enact an Australia-like law here — if it could overcome clear First Amendment objections — any money resulting from it would end up in the balance sheets of hedge-fund owners and would benefit neither journalism nor innovation at legacy, local news companies.

Thus to grant newspaper owners an exemption from antitrust, as has been discussed, would be profoundly anti-competitive, for it would — as in Australia — entrench the interests of the largest companies on both sides of the table, media and technology.

Similarly, I argue that breaking up major technology companies is an emotional response to the discussion of technology and power. It would not meet the test of rectifying consumer harm, for users benefit tremendously from free, open, and inexpensive services. Also, there is considerable competition; note Microsoft’s role in this debate.

Instead, in both industries — technology and media — the best cure for concerns about size is to encourage and support entrepreneurship and new competition. In my university, I started a first-of-its-kind program in entrepreneurial journalism to teach journalists to do just that. I hope next to turn my attention to internet studies, to foster the design and creation of a next generation of the net: one built not just to speak but to listen, one designed to build bridges rather than battlements, one that protects the benefits of today’s historically unprecedented opportunity to hear voices too long not heard in mass media. There is much work to be done and much opportunity to create competitors to the present proprietors of the net and media. This is where we should focus our attention in policy.

The net is yet young. We don’t fully know what it is and may not for generations, even centuries. Note that the first newspaper was not published until a century and a half after Gutenberg introduced movable type. In my research for a book on the end of the Gutenberg age, I have learned much about the reaction to the introduction of printing. After initial and brief utopian glee at its prospects, authorities worried greatly about print’s power to spread the fake news of the day, to cause unrest (the Reformation and the Thirty Years’ War), and to disrupt institutions. I have also learned that governments’ attempts to control printing and thus speech largely failed. In a prescient 1998 paper for the RAND Corporation, “The Information Age and the Printing Press: Looking Backward to See Ahead,” James Dewar argued persuasively for “a) keeping the Internet unregulated, and b) taking a much more experimental approach to information policy. Societies who regulated the printing press suffered and continue to suffer today in comparison with those who didn’t.”

In what I have said here, it might sound as if I oppose all internet regulation. I do not. I worked for more than a year with a Transatlantic High-Level Working Group on Content Moderation Online and Freedom of Expression, convened by former FCC Commissioner Susan Ness under the auspices of the Universities of Pennsylvania and Amsterdam. The group included many experts and luminaries, such as former Secretary of Homeland Security Michael Chertoff, former Ambassador Eileen Donahoe, former Estonian President Toomas Ilves, and former members of the European Parliament Marietje Schaake and Erika Mann. Our report recommended a flexible framework for internet regulation based on transparency as the basis of accountability as well as the establishment of e-courts to rule on matters of legality where that should occur, in public and in court.

To put this in my terms, I have long argued that both technology and media companies should make covenants of mutual obligation with their users and the public — not just rules for users but promises from the companies for what we may expect of them in building useful, respectful, and productive services and environments. In the model of the Federal Trade Commission, I would favor requiring them to provide data about their implementation and impact so as to hold them accountable to their promises. I also hope for a multistakeholder forum — of technologists, lawmakers, regulators, civil society, academics, and users — to grapple with new and unforeseeable problems, such as pandemics, and to exploit new opportunities.

Internet regulation should not be about punishing power or success but instead about creating the means to work together for a better internet, a better society, a better future.