text and data mining – DailySynapse

UK drops preferred copyright proposal for AI training

synapse — Sun, 29 Mar 2026 11:21:05 +0000

The UK government has published its report on copyright and Artificial Intelligence, along with an economic impact assessment required under the Data Use and Access Act 2025. The clearest shift is that a broad text and data mining exception, which would have allowed use of copyright works for training generative Artificial Intelligence unless rightsholders opted out, is no longer the government’s preferred approach. No replacement preference has been adopted yet, and the government is instead seeking more evidence before deciding on any wider reform.

The change follows strong opposition from creative industries and consultation responses that heavily favored stronger copyright protection. The report notes that 88% favoured strengthening copyright by requiring licences in all cases and only 3% supporting the government’s preferred option. For now, the status quo remains in place, with the UK retaining only a narrow text and data mining exception for non-commercial research purposes. The government also identified other possibilities for future consideration, including a focused exception for specific use cases such as science, research, or public interest activity, and a broader exception combined with a statutory licence or levy.

Transparency remains one of the few areas with broad agreement. There is currently no UK requirement for developers to disclose which works were used to train generative Artificial Intelligence models or how those works were obtained. The government said consultation responses showed broad support for greater transparency, while also revealing disagreement over how detailed disclosures should be and how they should be delivered. It plans to watch developments in other jurisdictions, including the EU and California, and work with industry and experts on best practice.

The government also declined to change copyright law for models trained overseas and then brought into the UK market. It warned that extending UK copyright law in that way could deter models from being offered in the UK and could create negative effects for downstream developers of Artificial Intelligence systems. The report says this question should be worked through by the English courts under current law, while policymakers monitor international developments. It also acknowledged that developments in other countries, particularly the EU and US, could have a significant impact on outcomes in the UK, with greater clarity expected in those jurisdictions in the next year or so.

On licensing, the government does not plan to intervene for now and will continue monitoring the market as it develops. It also maintained its preference to remove copyright protection for computer generated works unless stronger evidence emerges in support of those provisions. Separately, it plans to explore options for tackling unauthorized digital replicas, including deepfakes that mimic a person’s image or voice, and will consider whether a new personality right may be appropriate. The overall result is continued uncertainty, with major policy decisions deferred and no immediate legislative resolution for either rightsholders or Artificial Intelligence developers.

How NotebookLM navigates copyright, contracts, and privacy in academic use

synapse — Sat, 07 Mar 2026 20:50:18 +0000

NotebookLM is presented as a legally safer alternative to general Artificial Intelligence chatbots for academic research and teaching because it uses retrieval-augmented generation, pulling answers only from user-uploaded sources rather than mixing them into a global training set. Google states that sources uploaded to NotebookLM stay private unless a notebook is explicitly shared and that NotebookLM does not train on uploaded data, which reduces the risk often associated with sending copyrighted or privileged materials to general-purpose tools. For faculty, that means students can upload items like course syllabi and receive grounded answers that cite specific passages, without the system drawing from unrelated documents or exposing the materials beyond the user’s own workspace.

Using copyrighted materials in NotebookLM still hinges on fair use and lawful access. Uploads must come from legitimate sources, not “pirate” libraries or platforms that prohibit such use, echoing a key distinction drawn in Bartz v. Anthropic, where using copyrighted books to train Artificial Intelligence was treated differently from maintaining a permanent library of pirated content. Users must avoid publicly sharing entire notebooks built from copyrighted works, since fair use in cases like Author’s Guild v. HathiTrust has turned on databases being transformative and about the works rather than substituting for them. Educators are also warned not to use NotebookLM outputs as a replacement for commercial resources such as textbooks, as underscored by the Thomson Reuters v ROSS Intelligence ruling, where a legal research tool that effectively replaced a proprietary service lost its fair use defense.

Beyond statutory copyright, publisher terms of service and text and data mining clauses can limit what may be uploaded, creating disparities between well-funded labs that can license high-tier tools and others tempted to ignore contractual restrictions. NotebookLM is highlighted as a powerful synthesizer of complex material, but the U.S. Copyright Office maintains that outputs generated by Artificial Intelligence are not automatically owned by the prompter, since “mere provision of prompts” does not secure copyright; human users must add “sufficient expressive elements” through arrangement, annotation, and integration to claim protection. On the privacy side, NotebookLM is classified as a “Core Service” in Google Workspace for Education, promising enterprise-grade safeguards, a “closed system” that grounds answers only in uploaded documents, and a policy that data is not used to train models without explicit permission. However, these protections depend on using institutional accounts, since personal @gmail.com accounts enable public sharing features and fall outside that closed-loop safety net, raising serious FERPA and confidentiality concerns for any student data or sensitive research uploaded outside a Workspace for Education environment.

Text and data mining emerges as critical driver of United Kingdom competitiveness and industrial strategy

synapse — Thu, 19 Feb 2026 17:36:10 +0000

Text and data mining is becoming a foundational capability for United Kingdom companies seeking to harness artificial intelligence and data-driven tools, with high stakes for national competitiveness and the success of the government’s Industrial Strategy (Invest 2035). New research commissioned by Microsoft and conducted by Public First links the design of text and data mining rules directly to whether artificial intelligence can deliver the government’s targeted productivity gains of up to 1.5% annually across sectors such as life sciences, financial services and advanced manufacturing. The study combines a nationally representative survey of 1,000 United Kingdom businesses with an economic model that assesses how four different policy scenarios could affect artificial intelligence adoption and its contribution to GDP by 2035.

The findings indicate that around a million United Kingdom businesses already use text and data mining, and that this number is likely to increase over the next two to three years as firms deepen their use of artificial intelligence and cloud technologies. Three-quarters of United Kingdom businesses have used artificial intelligence tools at least once, and almost one in five (19%) use specialised text and data mining tools to analyse large datasets. Usage is particularly advanced in strategic sectors, where 34% of life sciences firms and 33% of financial services firms are already mining and analysing data to maintain competitiveness. Companies using text and data mining rely on both internal sources such as customer records, transaction data and clinical notes, and external sources such as market feeds, research databases, news and public websites. 74% of text and data mining users say external data is essential to their business, and over a third of users (39%) report that legal uncertainty is holding them back from further artificial intelligence innovation.

The economic modelling underscores how much growth is at risk depending on the regulatory approach. Under the most innovation-friendly scenario of a full commercial text and data mining exemption, artificial intelligence adoption could contribute £510 billion to United Kingdom GDP by 2035. Under the most restrictive scenario, where licences are required for all copyrighted content, that figure falls to £290 billion, a loss of £220 billion or 43% of the potential gain that artificial intelligence adoption could bring to United Kingdom businesses. A scenario that mirrors the European Union approach would still leave the United Kingdom £60 billion worse off than the most optimistic outcome, which the report equates to losing the entire United Kingdom defence budget in 2025/26. In life sciences alone, up to £26 billion of potential GDP could be at risk, in financial services up to £23 billion and in manufacturing up to £20bn. With Japan, Singapore and the European Union already allowing commercial text and data mining, the research concludes that the United Kingdom has fallen behind and risks higher costs, legal uncertainty and even relocation of business functions overseas unless it adopts a pro-data availability strategy, including a commercial text and data mining exemption, rapid enabling regulation and pro-growth regulatory tools such as sandboxes.