Text and Data Mining Exceptions in the Development of Generative AI Models: What the EU Member States Could Learn from the Japanese ‘Non-Enjoyment’ Purposes?

The European Union (EU) text and data mining (TDM) provisions are a progressive move, but the horizon is still uncertain for both generative artificial intelligence (GenAI) models researchers and developers. This article suggests that to drive innovation and further the commitment to the digital single market, during the national implementation, EU Member States could consider taking the Japanese broad, all-encompassing and ‘non-enjoyment-based’ TDM as an example. The Japanese ‘non-enjoyment’ purposes, however, are not foreign to the European continental view of copyright. A similar concept can be found under the German concept of “Freier Werkgenuss” or enjoyment of the work. A flexible TDM exception built upon the German notion of non-enjoyment purposes could become an opening clause to foster innovation and creativity in the age of GenAI. Moreover, the article argues that an opening clause allowing TDM with ‘non-enjoyment’ purposes could be permissible under the so-called three-step test. This article further suggests, if there is no political will to safeguard “the right to read should be the right to mine” and to provide a welcoming environment for GenAI researchers and developers, when shaping the legal interpretation through national case law, the EU Member States could consider the following: (1) advocate for 72 hours of response if technological protection measures (TPMs) are preventing TDM, and (2) Robot Exclusion Standard (robot.txt) as a warning when TDM is not allowed on a website. It is now in the hands of the EU Member States, whether to protect the interests of rightholders or to create a balance between safeguarding ‘the right to read should be the right to mine’, protecting rightholders exclusivity, and creating a supportive environment for the GenAI models researcher and developers.


-B PASCAL IN "LES DEUX INFINIS." 2
Pascal through his poetry illuminates the disparity that people foster while comparing themselves to the world, which is transforming quickly all around them. Humans have no choice but to seek refuge in their beliefs. Although those words date from a time when technology was not even a phantasmagoria, they might nevertheless capture the mindset of a copyright owner who is both captivated and horrified by the current progress of artificial intelligence ("AI"). It is difficult to avoid an adaptation, but at the same time, lawmakers find it challenging to create a normative vision that can capture and keep up with this technological advancement.
Nowadays, AI systems are capable of producing human-level creative output, such as poetry, stories, jokes, music, paintings, etc., as well as, the growing automation of tasks typically performed by human artists. In this article, these AI systems are referred to as 'generative AI (GenAI)' models. 3 These GenAI systems have been fuelled in particular by new data-driven technologies. 4 The development of GenAI models or AI in general, cannot be separated from data (data in this article refers to non-personal data which includes any literary and artistic works such as text, music, pictures etc.). 5 The value produced by data is a key factor in determining the present and future of GenAI. 6 The value of data as such generally lies in the extraction of value rather than in the data or text considered independently. 7 Enabling the discovery of new patterns and relations of creative outputs requires GenAI to conduct an analysis of the substantial amounts of data. The analysis of the data, which is practically impossible to accomplish manually, is efficiently done using an automated computational analysis known as 'Text and Data Mining' ("TDM"). 8 TDM (stricto sensu) can be described as "the selection and application of complex algorithms to the transformed alphanumerical dataset to gather hidden information." 9 From the copyright and related rights microscope, TDM plays an important role in analysing large amounts of information in digital form including images, text and sound contained in "a large amount of diversified time series data generated at a high speed by industrial equipment" 10 or well-known as 'Big Data'. The purpose is to gain new knowledge and uncover new patterns, for the development of GenAI. 11 In essence, the process of creating outputs with GenAI models involved TDM through (i) access to content, (ii) extraction and/or copying of content, and (iii) mining of text and/or data and knowledge discovery, TDM creates rich and diverse data sets that are then utilized to train and feed AI for creative purposes. 12 The data used in the 'extraction and/or copying of content' stage may require authorisation from the relevant rightholders. To create a balance between rightholders exclusivity and TDM, the EU passed Directive 2019/790 ("EU CDSM Directive"), 13 which includes two necessary TDM exceptions. This was done to 6 "AI will exploit the digital data from people and things to automate and assist in what we do today, as well as find new ways of doing things that we've not imagined before." A Popescu, 'EconPapers: The Value of Data From an Artificial Intelligence Perspective', (Econpapers: The Value of Data From An Artificial Intelligence Perspective, 2019). Available at: <https://econpapers.repec.org/article/edtaucjcm/v_3a5_3ay_3a2019_3ai_3a1_3ap172-194.htm> accessed December 10, 2022 at 176. eliminate legal uncertainties and to compete with legal systems that offer a more conducive environment for TDM, for example, Japan which provides the broadest TDM exception in the world. However, the question remains, will these TDM exceptions be able to encourage innovation? Unfortunately, academics and legal experts have the opposite opinion.
This article aims to answer the following question: How could the legal framework in the EU best accommodate research and innovation in the development of GenAI models made possible by TDM? Should the EU Member States, during the national implementation of the CDSM Directive or when shaping the legal interpretation into national case law, take the Japanese TDM exception as an example? This article will try to answer the following questions by assessing the EU and Japanese TDM exceptions and related law cases, and analysing whether the Japanese TDM exceptions suit the European Continental copyright system and are compatible with the so-called 'three-step test'. The article is structured as follows: Section 2 examines the importance of TDM in the development of GenAI models and copyright issues that might arise. Section 3 analyses the newly introduced TDM exception in the EU. Section 4 presents the Japanese TDM exceptions and the rationale behind the 'non-enjoyment' purposes. Section 5 discusses the possible implications of the Japanese 'non-enjoyment' purposes doctrine to the EU Member States and its similarity to the German doctrine 'Freier Werkgenuss', the three-step test and several recommendations to the EU Member States who do not wish to implement a broader TDM exception.

Definition of TDM
The definition of TDM must be made crystal clear if the rights, exceptions, and current legal discourse concerning TDM and AI are to be addressed. TDM generally refers to the process of obtaining valuable information from massive amounts of data. It is generally acknowledged that TDM plays an important role in the knowledge discovery process. 14 The EU CDSM Directive describes TDM as "any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations." 15 According to the Japanese Copyright Act, TDM is a "data analysis (meaning the extraction, comparison, classification, or other statistical analysis of language, sound, or image data, or another element of which a large number of works or a large number of data is composed." 16 TDM is a technique for processing large amounts of text or data that are beyond the capacity of human minds and is recognized as such by both the EU

The Procedure of TDM
Large amounts of text and data can be processed, extracted, and recombined using the TDM technique to disclose new insights into the existing information or even produce new knowledge. 18 As stipulated in Illustration 1 below, the AI systems must have access to the content to accomplish this, and they might even need to copy or extract the content. This section attempts to describe the TDM process in a simple manner and to learn more about the legal issues involved. In general, TDM activities can be carried out in various ways and for a myriad of purposes and often fall into one of the following categories: Step 1: Access to Content The first and most important phase in TDM activities is content accessibility. 19 Access to content might be in the form of text or data, depending on the type of mining that will be done. As shown in Illustration 1 below, in general, raw data, target data, and pre-processed data are all related to one another and are all indispensable for this first step of TDM. 20

2.2
Step 2: Extraction and/or Copying of Content In this stage, as shown in Illustration 1 below, to transform raw data, target data and/or preprocessed data into patterns, one requires to do the extraction and/or copying of content during the TDM process. The final method in most GenAI models occurs in step 3, as shown in Illustration 1 below. 21 In most cases, mining of text and/or data and knowledge discovery includes data cleaning and pre-processing, data transformation, and pattern evaluation. First, to increase the dependability of the data and its effectiveness, data cleaning and preprocessing will look for missing data and delete noisy, redundant, and low-quality data from the data collection.

Illustration 1. Three Common Steps in TDM.
Based on application-specific criteria, specialized algorithms are utilized to search for and remove undesirable data. 22 Second, data transformation prepares data for use by data mining algorithms. As a result, the data must be consolidated and aggregated. The data is consolidated based on functions, attributes, features, and so on. 23 Third, pattern evaluation requires the trend and patterns obtained from various data mining methods and iterations to be represented in discrete forms such as bar graphs, pie charts, histograms, and so on to study the impact of data collected and transformed during previous steps. 24

TDM, GenAI Models and Copyright
3.1 TDM and GenAI: As Close As Two Coats of Paint 'There is no reason why the simple shapes of stories can't be fed into computers' In 1995, Vonnegut presented his theory about the shapes of stories. The theory holds that emotional arcs can take a variety of forms and that stories often follow them. In his lecture, 21 E Rosati, supra note 19, at 71. For more details regarding the objective of predictive TDM, see also, for further reference, UM Fayyad, G Piatetsky-Shapiro and P Smyth, 'Knowledge Discovery and Data Mining: Towards a Unifying Framework.' (KDD, 1996 Vonnegut sketched up a number of storylines, such as "Man falls into a hole, Man climbs out of a hole" and the more complicated "Boy meets Girl, Boy loses Girl, Boy gets Girl." However, there is no consensus about the number of various emotional arcs that appear in stories or how long it takes a story to reach its climax. 26 A couple of decades later, we are finally witnessing a major shift in the process of mapping 'emotional arcs'. Researchers at the University of Vermont in Burlington used sentiment analysis to map the emotional arcs of over 1,700 stories and then used TDM techniques to reveal the most common arcs. 27 This research eventually inspired GenAI models researchers and developers and proved that TDM may be used to train machine learning, which is one of the most fundamental parts of AI, for the aim of AI-driven creativity. 28 There are myriad examples of GenAI models producing artistic and literary content, 29 ChatGPT-4, DALL-E 2 and Stability AI are some of the GenAI models that have caught the attention of many people worldwide. This section will focus on analysing the use of DALL-E 2 and Stability AI systems. In 2022, OpenAI and Stability AI introduced a revolutionary deep neural network that can create original, realistic images and art from a text description, inspired by Vonnegut's theory, for example, "an astronaut chilling on Mars," or "a teddy bear playing a basketball." 30 In its operation process, both DALL-E 2 and Stability AI use the TDM technique to obtain realistic images and art from a text description. 31 It employs a technique known as "stable diffusion," 32 which begins with a pattern of random dots and progressively changes that pattern to resemble a picture when it identifies certain characteristics of that 26  29 e.g., 'DALL-E 2', Available at: <https://openai.com/dall-e-2/,> accessed 10 July 2022. 'Ai-Da' the world's first ultra-realistic artist robot, Available at: <https://www.ai-darobot.com/,> accessed 10 July 2022. 'MuseNet' (OpenAI, 25 April 2019), Available at: <https://openai.com/blog/musenet/> accessed 10 July 2022 (music generation); 'InferKit Demo'. Available at: <https://app.inferkit.com/demo> accessed 10 February 2022 (text generation); 'Image GPT' (OpenAI, 17 June 2020). image. 33 Both DALL-E 2 and Stability AI are operated by a contrastive model called CLIP or 'Contrastive Language-Image Pre-training' which has been shown to learn robust representations of images that capture both semantics and style. 34 Stability AI, however, has obtained its training data from the world's best multi-modal datasets called "LAION-5B". 35 This dataset is "a CLIP-filtered dataset of 5.85 billion high-quality image-text pairs, their CLIP ViT-L/14 embeddings, kNN-indices, a web interface for exploration & subset-creation and NSFW-and watermark-detection scores and tools." The datasets used by LAION-5B are licensed under the Creative Common CC-BY 4.0 license. 36 In the example of DALL-E 2, as shown in Illustration 2 below, the system involves four iterative stages to produce an image namely (1) CLIP, (2) Prior Model, (3) Decoder Diffusion Model or unCLIP and (4) DALL-E 2 as the final output. 37

Illustration 2. A Simplified unCLIP Training Process. 38
Without TDM, the DALL-E 2 system cannot perform steps 1, 2 and 3 for the following reasons: 39 First, from the stage (1) until (3), the DALL-E 2 system analyzes hundreds of texts and images with the TDM method. In these three stages, DALL-E 2 does not copy the copyrighted works being fed to the system, instead, the system uses the data to find a new 33 DALL E 2 official website, ibid. pattern. DALL-E 2 is an example of a two-part model consisting of a previous model and a decoder or unCLIP. 40 Second, the decoder is termed unCLIP because it reverses the original CLIP model's (step 1) process and TDM makes it possible by assisting the system to construct a 'mental' representation (embedding) from an image and make an original picture from a generic mental representation. Last, with the help of TDM, the mental representation encodes the main features that are semantically meaningful during the process such as pictures of people, animals, objects, style, colours, background, etc., so that the DALL-E 2 system can generate a novel image that retains these characteristics while varying the non-essential features.
To conclude, the TDM processes employed in DALL-E 2 generate robust and diverse data sets that are then used to feed and train the DALL-E 2 system or any other GenAI models for creative purposes. However, DALL-E 2 does not publicly announce where they obtain the training data. If they used a dataset available online, there is a possible conflict between TDM techniques and copyright protection, because works or subject matter used in the TDM process, such as pictures and text, may be protected under copyright law. 41 In the EU, under Directive 2001/29/EC (InfoSoc Directive), 42 Directive 2009/24/EC (Software Directive) 43 or Directive 96/9/EC (Database Directive), 44 one is required to ask the relevant rightholder's permission before copying a work.

Can Big Data be Protected by Copyright and Related Rights?
The emergence of AI-driven creativity is predominantly driven by the rising availability of data. 45 It is nearly impossible for any GenAI models to analyse large amounts of digital text and/or data to discover new patterns without the help of TDM. 46 Because the value of data does not lie in the data or text itself, but in the extraction of value, 47 and since the main function of data during the TDM process is to find new patterns, should GenAI models researchers and developers worry about the copyright protection of the data being used in the extraction phase? As being said, "one of the basic and fundamental principles of copyright law is that data is as such not protected; copyright only protects the creative form, not the information incorporated 40 This AI system converts a sentence to a picture by concatenating both models. DALL E 2 inserts a text into the 'black box,' through TDM which generates a well-defined image. Ibid.  in the protected work". 48 Given that certain uses in TDM may not be subject to copyright laws, GenAI models researchers and developers may not need to worry about any copyright and related rights issues. 49 As it has been argued by Geiger, Frosio and Bulayenko: 50 this activity [TDM] is outside the scope of exclusive rights and that any restriction would amount to undermine the underlying rationales of copyright protection and result in an inadmissible restriction of freedom of expression and information as protected by e.g. the European Court of Human Rights (ECHR) and the Charter of Fundamental Rights of the European Union.
The potential of copyright infringement in this circumstance does not pose a concern because data as such is not protected by copyright. 51 However, given the three Vs (volume, velocity, and variety) that apply to big data, ordinary "data" must be separated from big data. As a result, copyright may exist in the text, images, sounds, and other artistic works, which are eventually susceptible to TDM activities. 52 Moreover, big data may apply to the right of reproduction as well as sui generis database rights in some instances. As shown in Illustration 1, not all TDM activities include data copying and/or extraction throughout the mining process, which occurs in step 2. The material used, technological instruments used, and the scope of the mining technique mostly determine copying. 53 Not all copying activities require prior consent, such as those that come outside the purview of EU Acquis' exceptions and limitations. 54 However, there may be legal restrictions in place when TDM techniques include copying and/or extracting the relevant data for AI projects. 55 By way of example, the CJEU confirmed in the landmark case of Infopaq I, C-5/08 when it was ruled that copying of text excerpts containing at least eleven words of copyrightable materials may trigger copyright protection (and the risk , when the preinstalled add-ons permit access to private servers where copyright protected works have been made available to the public without the rightholder's authorization, the CJEU reaffirmed that this exemption cannot be relied upon by users. This was the case when the CJEU considered the term "lawful use" in Article 5(1) InfoSoc Directive. 55 E Rosati, supra note 8, at 206-209. See also, Christensen K, supra note 11, at 21. of infringement). 56 In this context, the possibility of copyright infringement occurs since AI depends on processing vast amounts of data derived through TDM, particularly in any GenAI models when TDM is applied to Big Data comprising protectable works like text and images. 57 Moreover, in terms of related rights, the CJEU in Pelham, C-476/17 58 established that "recognisability" rather than "originality" serves as the primary criterion for related rights, meaning that even minor elements of a bigger work may be eligible for related rights protection. 59 TDM will certainly violate related rights of the relevant rightholders because it may involve reproduction that results in the creation of a copy of the protected work without the possibility of choosing specific parts from that work during the TDM process that may not meet the standard for recognisability or additional alteration of the work per se. 60 From the lens of sui generis database right, TDM may infringe the extraction and the reutilisation of a substantial part of the contents of a database, when processing Big Data for AI. In this regard, the CJEU has in BHB v. WH, C-203/02 affirmed that the temporary or permanent transfer of data from one media to another and storage of that data is sufficient to be regarded as an extraction. As a result, this right will cover TDM since this operation is  (39) In the light of the foregoing considerations, the answer to the first and sixth questions is that Article 2(c) of Directive 2001/29 must, in the light of the Charter, be interpreted as meaning that the phonogram producer's exclusive right under that provision to reproduce and distribute his or her phonogram allows him or her to prevent another person from taking a sound sample, even if very short, of his or her phonogram for the purposes of including that sample in another phonogram, unless that sample is included in the phonogram in a modified form unrecognisable to the ear. essential for the process. 61 Hence, to lawfully conduct TDM, GenAI models researchers and developers would always need authorisation from the relevant rightholders. However, when TDM may be entitled to protection under the statutory and non-mandatory pre-existing exceptions and limitations provided in the EU acquis, such authorization is not necessary. 62 However, the question remains: will the current legal framework (copyright exceptions and limitations) suffice to accommodate the advancement of technologies, especially TDM for the development of GenAI models?

III.
Reclassifying Text and Data Mining Exceptions in the EU

Introduction
The discussed and introduced in the sections that follow, along with any comments and objections from the literature. 65

Article 3 -Scientific Research Exception
Article 3 of the EU CDSM Directive provides an exception for the acts of "reproductions and extractions made by research organisations and cultural heritage institutions to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access." As has been said, only research organizations and cultural heritage institutions are permitted to conduct TDM for scientific research purposes, and they must have lawful access to the works or subject matter in question. 66 Besides research organisations, Article 3 also allows cultural heritage institutions, which encompass publicly accessible libraries and museums, archives, film or audio heritage institutions, and other heritage institutions. 67 To be eligible for the application of Article 3 of the Directive, it is unclear whether research organisations and cultural heritage institutions must be established in the EU. 68 Furthermore, the concept of scientific research is only hinted at in Recital 12 and is intended to include both the natural and human sciences. 69 Article 3 of the EU CDSM Directive is narrowly defined, only several entities as previously mentioned above can benefit from the exception to conduct TDM for scientific research purposes only. Any contractual provision that conflicts with the exceptions provided by Article 3 will be unlawful. 70 The regime of the new exceptions in Article 3 EU DSM Directive is summarised in the following table.

Article 4 -The Limited Exception
Article 4 of the EU CDSM Directive provides an exception for reproductions and extractions of lawfully accessible works/subject matter for TDM to provide significant legal certainty for both private and public entities undertaking TDM. 71 This means that, unlike the strictly limited beneficiaries in Article 3, any entity can profit from the TDM exception under Article 4 of this Directive. This exemption, however, is subject to rightholders reservations, including through "machine readable means in the case of content made publicly available online." 72 To put it in another way, this is solely an opt-out mechanism in which the relevant rightholders can prevent others from conducting TDM. Moreover, Article 4 is intended to provide legal clarity for the TDM users which do not fulfil all the conditions of the existing exception for temporary acts of reproduction provided for in Article 5(1) of Directive 2001/29/EC by allowing "the copies made to be retained for as long as is necessary for those text and data mining purposes." 73 The new exceptions regime in Article 4 of the EU DSM Directive is summarized in Table 4 below.

The New TDM Exceptions: Analysis, Responses and Critiques from the Literature
There are numerous advantages to the newly introduced TDM exceptions in the EU. The inclusion of Articles 3 and 4 of the Directive achieves the following primary policy objectives: First, it is intended to create a standardised, uniform level playing field for researchers across the EU to conduct TDM projects lawfully. Second, the Directive focuses on harmonising the legislation of the Member States through a mandatory solution, in which a unified framework for TDM activities under the EU CDSM Directive will accelerate innovation by encouraging EU-wide, coordinated, bigger research programs. 74 However, the new reform continues to have negative consequences. 75 This section will include notable scholars' remarks, responses, and criticisms. 76 They range from the scope of the exception that applies to unqualified beneficiaries through an opt-out mechanism to the numerous restrictions that apply to the research purpose exception. A summary of the key points of assessment is provided below.

An Overall Assessment of the Reform: Articles 3 and 4
Several copyright scholars contend that TDM should be outside the copyright realm. 77 Margoni and Kretschmer argue that the formulation of the two new TDM exceptions is "conceptually wrong, theoretically flawed and normatively unambitious." 78 As the saying goes, 'the right to 73  read should be the right to mine,' 79 but not when it is blocked by the requirement of lawful access and restriction to the specific beneficiaries. Article 3 of the EU CDSM Directive should not be restricted to research organizations, but should be accessible to all entities who have lawful access to underlying mined materials, especially to avoid hindering start-ups, GenAI models researchers and developers and independent researchers in AI in general. 80 The difference between commercial and non-commercial purposes was also heavily criticized. 81 Furthermore, Hilty and Richter hold that the requirement of lawful access has the potential to disadvantage smaller or less wealthy research organizations or institutions. 82 By denying lawful access, relevant rightholders can effectively prevent certain parts of existing work, for example, from ever being subject to TDM. 83 Others have challenged the requirement of lawful access, fearing that rightholders may incorporate TDM in their pricing and further escalate overall costs. 84 Higher prices, TDM fees, and the availability of the resource for licensing may result in lower quality and/or quantity of TDM. 85 To summarize, the new TDM exceptions regime has severe flaws: as Quintais points out, the scope of the articles is too narrow and "this regime will probably not lead to simplification and harmonisation of the system of exceptions in EU copyright law, as it continues to allow and extraction' has been seen as potentially problematic for communicating TDM outcomes, especially for GenAI models that work with Natural Language Processing (NLP) such as ChatGPT-4, which is trained on a variety of copyright-protected datasets (i.e. texts) and potentially involves reproduction in part, those models cannot be distributed or communicated publicly since reproducing a work, as little as 11 consecutive words, could be protected by copyright. 87 However, in the case of DALL-E 2 and any other AI image generator systems, the standard for copying might be difficult to meet, since the Infopaq case is irrelevant here.

Article 3: TDM for Scientific Research and Limited Beneficiaries
The inclusion of Article 3 of the EU CDSM Directive achieves major policy objectives. 88 "It is set to provide a normalised level playing field for researchers across Europe to lawfully carry out TDM projects. The major positive impacts of the proposal lie in its focus on harmonisation of member states' laws, through a mandatory solution." 89 Some praised the reduction in fragmentation of national approaches to TDM, 90 but others noted that the promised harmonization and legal certainty did not occur due to inadequate wording and regulatory process. 91 Furthermore, the prohibition on contractual override is a breakthrough that should be appreciated. This is a critical provision because, as previously stated, to conduct TDM, GenAI models researchers and developers must access numerous databases containing copyrighted materials and accept Terms of Use that frequently limit TDM. 92 Further, the notion of research organisation received critiques. Some argue that the scope of Article 3 is prohibitively narrow when defining the nature of the research organization. 93 To be eligible for TDM, research organisations "must operate on a not-profit basis, or re-invest all its profits into its scientific research or pursue a public interest mission funded by public funds or public contracts." 94 In this context, commercial-based research organisations such as OpenAI, 95 the creator of ChatGPT-4 and DALL E 2 or will not be able to conduct their TDM with non-personal data available in the EU, as they are not eligible to do so. The EU legislators explicitly stated that they wanted to ensure that scientific research undertaken for TDM purposes remained neutral and independent from industry. However, keep in mind that public funding and investment are scarce, and many research organizations rely on the private sector to obtain the required funding for cutting-edge research. 96 This narrowly-defined research organization is capable of putting innovation in the EU on the back burner. Last, the EU CDSM Directive does not define the terminology of 'scientific research'. The specific purpose of scientific research in Article 3 has been criticized as possibly generating issues for existing licenses, such as those for educational purposes, and may lead to restrictive interpretations. 97

Article 4: The EU Obsession with Licensing?
The reservation or opt-out mechanism in Article 4 has drawn intense criticism because it might hamper the advancement of AI in the EU. 98 The most attention-grabbing point is Article 4(3) which allows the relevant rightholders to reserve the right to perform TDM activities, As of now, reservations are made as mentioned in Recital 18 in an 'appropriate manner'. 99 For this purpose, the recital differentiated between two separate scenarios such as: First, in the case of content that has been made publicly available online, it should only be considered appropriate to reserve the rights in Article 4(1) by the use of machine-readable means. 100 Second, it might be appropriate to reserve the rights by other means, such as contractual agreements or a unilateral declaration. 101 Overall, rightholders shall only be allowed to reserve the TDM activity for content that is publicly available online by implementing appropriate technological measures, in line with the analogy drawn by the court in VG Bild-Kunst, C-392/19. 102 It should be noted, though, that in 95 OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence benefits all of humanity. See OpenAI official website. Available at: <https://openai.com/about/.> accessed January 10, 2023. the absence of such robust technological measures in place, it might be daunting for GenAI researchers and developers to determine "whether the concerned rightholders intended to reserve the doing of TDM activities in relation to their copyright works and other protected subject matter, including when these are subject to sub-licenses." 103 However, even with effective technological measures, for most GenAI researchers and developers, Article 4 is a nightmare that comes true. As we know, GenAI researchers and developers need a huge amount of data corpus in the form of text, images etc, and gaining permission to mine from various rightholders can be an exhausting task. Some consider that Article 4 symbolizes the EU CDSM Directive's 'obsession with licensing' and, as a result, favours private ordering above public policy. 104

IV.
The Japanese TDM Exceptions: The New Paradise for AI & Machine Learning

Introduction
In 2016, Japan identified AI as one of the most important technological foundations for establishing a super-smart society, well known as 'Society 5.0.' To support the development of AI and technology, an AI Technology Strategy Council was established per instructions from Prime Minister Abe. 105 The Japanese government is getting ready for the 'singularity,' a terminology used to describe the time when AI surpassed human intelligence. 106 Hayashi, the CEO of HEROZ, Inc., 107 One of the Japanese biggest GenAI model developers underscores the Japanese government's intention to support the development of AI-driven creativity by saying, "although AI engineers are in short supply throughout the world, Japan has a solid number of highly capable AI engineers." 108 To provide legal certainty and flexibility to AI It is permissible to exploit a work, in any way and to the extent considered necessary, in any of the following cases, or in any other case in which it is not a person's purpose to personally enjoy or cause another person to enjoy the thoughts or sentiments expressed in that work; provided, however, that this does not apply if the action would unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation: (i) if it is done for use in testing to develop or put into practical use technology that is connected with the recording of sounds or visuals of a work or other such exploitation; (ii) if it is done for use in data analysis (meaning the extraction, comparison, classification, or other statistical analysis of the constituent language, sounds, images, or other elemental data from a large number of works or a large volume of other such data; the same applies in Article 47-5, paragraph (1), item (ii)); (iii) if it is exploited in the course of computer data processing or otherwise exploited in a way that does not involve what is expressed in the work being perceived by the human senses (for works of computer programming, such exploitation excludes. it nor causing another person to enjoy it (for example TDM). 123 As Ueno argues, "exploitation of this kind does not prejudice the copyright holder's interests protected by copyright law." 124

Revisiting the Concept of 'Freier Werkgenuss' under the German Copyright Act
Hugenholtz and Senftleben argue "the need for having more openness in copyright law is almost self-evident in this 'information society' of highly dynamic and unpredictable change." 125 To promote innovation in the advancement of GenAI models and AI in general, EU Member States could consider the Japanese 'non-enjoyment' purposes as an alternative to providing a flexible, but not completely open (i.e., fair-use-like) provision, as this concept will suit the codification-focused EU civil law tradition. 126 One may wonder whether the Japanese 'non-enjoyment' purposes are suitable to be applied in the EU given the different copyright legal systems.
The principle of 'enjoyment' in copyright, on the other hand, is not novel to the EU. The first reference to the 'non-enjoyment' concept can be found in the German Federal Court of Justice or Bundesgerichtshof (hereinafter BGH) judgement of 4.10.1990 -I ZR 139/89, 127 where the defendant used and exploited the system software of the plaintiff in the context of resale in an inadmissible manner; the BGH notes that the pure use, in contrast to the technical rights of use, is not covered by copyright. The use of work as such is not a copyright-relevant process. This applies to using a computer program as well as reading a book, listening to a piece of music, seeing a work of visual art or watching a movie. 128 This case illustrates that the BGH employs the notion of 'enjoyment' of the work or well known in Germany as 'Freier Werkgenuss.' Moreover, in the case of G. Radio-Werke GmbH., F. i.Bay., v. GEMA, the BGH rules the following: 129 The object of protection of copyright is an intangible good, which, according to its intended purpose, generally serves primarily the intellectual or aesthetic enjoyment of the individual, which by its very nature takes place in the purely private sphere in the case of many intellectual works. enjoyment and the exploitation of rights is related to acts that lead to or allow an enjoyment, akin to the Japanese 'enjoyment' concept, in which enjoyment is what gives the ultimate inner justification of copyright protection. 130 Moreover, in the case of Grundig-Reporter, 131 the German court established that the 'nonenjoyment' should not be subject to copyright exclusivity, where "enabling the satisfaction of intellectual needs is what exploitation rights entailed in copyright are concerned with; if an action does not precede or enable this intellectual satisfaction, it is irrelevant for copyright." 132 This case was years before TDM processes were first introduced. However, the notion of 'enable intellectual satisfaction' could be applicable to TDM as well, as TDM does not allow individuals to enjoy someone else's work. Schack echoed this statement by adding: 133 If one realizes that the TDM only uses the simple data, but not the intellectual content of the analysed works, then this analysis method does not even interfere with the scope of copyright protection. Technically, there is a reproduction, it does not convey any enjoyment of the work here, and TDM does not trigger a statutory claim to remuneration.
It is worth mentioning that in the BGH judgement of 29 April 2010, I ZR 69/08, 134 the court stipulates that small thumbnails containing copyright-protected works be considered to enable the enjoyment of a work. This is "because the thumbnails are the works concerned of the plaintiff in full, they do not merely represent a public notification or description of their content as of 12 sec. 2 UrhG, more they already enable the enjoyment of the work." 135 In this case, one might argue that no matter how small the copyrighted works presented in the result of TDM, especially in the development of GenAI models, will be categorized as copyright infringement. This article argues that establishing how large and small the copyrighted works influenced the result of creative AI-assisted output, during the TDM processes, is a difficult threshold to meet. Again, as mentioned several times in the previous section, the aim of TDM in the development of GenAI is to find a new pattern from a work. This article suggests putting it this way: "only in such cases where truly is no enjoyment, no matter how little, would the non-enjoyment exception be applicable". 136 A flexible TDM exception, built upon the 'Freier Werkgenuss' and inspired by the Japanesestyle exception, might expressly state the legality of non-enjoyment uses, providing legal assurance for GenAI models researchers and developers in the EU. 137 In France, Article L.122-3 CPI defines reproduction right as "the material fixation of the work by all means that enable to communicate it to the public in an indirect way." 138 We can see here that the French definition of the right of reproduction presupposes communication to the public and the formulation of the article appears to reflect a sense of enjoying or causing another to enjoy as a prerequisite to copyright infringement. 139 In comparison to the fair-use type, the Japanese 'non-enjoyment' doctrine appears to be closer to the European continental view of copyright. 140

The 'Non-Enjoyment' Purposes as an 'Opening Clause' to Drive Innovation
The introduction of a flexible TDM exception, flexible but not too open as the US fair use doctrine, could be effective to boost innovation and the competitiveness of GenAI models in the EU. 141 This is because "an enumerated list of exceptions and limitations has shown little flexibility in adapting to evolving market and technological conditions." 142 The 'nonenjoyment' approach could be the most logical basis for a flexible TDM in the EU because the Japanese exception allows exploitation in any case as long as the purpose is not to cause another person to enjoy the work. 143 This provision thus resembles an 'opening clause,' which "should address uses that are not yet covered by existing exceptions and limitations but are justified by important public interest rationales and fundamental rights such as freedom of expression and the right to information." 144 that the EU would provide a favorable environment for the development of AI-driven creativity with the implementation of the "non-enjoyment" clause.

VI. The 'Non-Enjoyment' Purposes and Three-Step Test: Oh Yes, Test Passed!
"narrow in quantitative as well as a qualitative sense." 154 The 'non-enjoyment' TDM is narrow in scope as it only applies to several ranges of activities. 155 Additionally, by the quantitative criteria, 'non-enjoyment' is constrained in favour of public interest targets, to achieve a balance between copyright and public interests including education, AI research, information transparency, and the right of free expression. 156 The first step is passed.

The Second
Step: Does TDM allowing 'non-enjoyment' exploitation of copyrighted works conflict with the normal exploitation of the work?
One should realize that TDM is different from other aspects of copyright such as adaptation, in the case of GenAI model and as previously discussed, the TDM process does not allow anyone to enjoy the fruits of intellectual labor, but even if it were, it would not conflict with the normal exploitation of the work in the way rightholders usually exploit their work. 157 TDM build upon the 'Freier Werkgenuss' concept unproblematically passed the second step as the WTO Panel provides: 158 An exception or limitation to an exclusive right […] rises to the level of a conflict with a normal exploitation of the work […] if uses, that in principle are covered by that right, but exempted under the exception or limitation enter into economic competition with the ways that right holders normally extract economic value from that right to the work (i.e., the copyright) and thereby deprive them of significant or tangible commercial gains.

The Third
Step: Does TDM allowing 'non-enjoyment' exploitation of copyrighted works unreasonably infringe the legitimate interest of the right holder?
Once one considers that TDM and other 'non-enjoyment' purposes should not be relevant for copyright purposes, such exceptions should be permissible under this step. WTO Panel defines a legitimate interest as "relates to lawfulness from a legal positivist perspective, but it has also the connotation of legitimacy from a more normative perspective, in the context of calling for the protection of interests that are justifiable in the light of the objectives that underlie the protection of exclusive rights." As a result, it should not be prohibited to participate in TDM since the goals that underpin protection do not warrant an extension to it. Therefore, TDM exclusivity is not a legitimate interest. 159 Even if it were a legitimate interest, it is unlikely that any copyrighted works used during the TDM process would have independent economic values derived from it, 160 as unreasonable prejudice would occur only if "an exception or limitation causes or has the potential to cause an unreasonable loss income to the copyright owner." 161

VII. Good Luck, Europe: What the EU Member States Can Do Without Flexible TDM Exceptions?
It should be noted that the EU CDSM Directive recognizes the potential of both scientific and non-scientific TDM, but due to the restrictive scope of the exceptions, it fails to grasp that potential. 162 Following on from the critiques highlighted in the previous section, numerous scholars and stakeholders have proposed plenty of recommendations that, in their opinion, would improve the legal framework of TDM. Some of these suggestions are summarized shortly as follows: First, numerous critics proposed changes to the list of TDM beneficiaries, ranging from removing purpose-specificity to eliminating any difference between commercial and non-commercial research. 163 Second, as previously discussed, the criterion of lawful access has been widely criticized from a diversity of viewpoints. Why would we need any further restriction if one wants to conduct TDM must have lawful access, to begin with? 164 To comply with lawful access requirements and to avoid unnecessary high licensing costs, this article suggests the establishment of a centralised repository of numerous open access data/information comprising literary and artistic works, similar to LAION-5B datasets used by Stability AI, where the data corpus can be collected, maintained, or exchanged between different market players. This might be an alternative option that Member States can do to foster the development of GenAI models. 165 Third, Geiger et al considered fair remuneration as an alternative for the opt-out mechanism and "might have been considered provided that harm can be demonstrated on the basis of relevant empirical data." 166 This article argues that the remuneration mechanism would indeed be improvements over the current system; however, paying remuneration individually to a collective management organisation for copyrighted works used for TDM is nearly impossible for GenAI models researchers or developers who do not have sufficient financial means. This will lead to the same problems as in the digitised music industry, when 'the winner takes all' and only 'rich' GenAI research organisations and developers will be able to survive in the EU.

Robot Exclusion Standard (robot.txt) as a Warning When TDM is Not Allowed on a Website
In the case of GenAI models, when materials such as songs, poetry, paintings, etc., are published online on the website, an automatic way of indicating that a website is not qualified for TDM is required. The robot exclusion standard, for example, robot.txt, which has been extensively utilized since the mid-1990s, might be an option. 173 Almost all websites on the planet follow the standard for restricting what can be mined by robot.txt. Search engine platforms implement this standard to serve as machine-readable terms and conditions. 174 What is envisaged by Article 4(3) of the EU CDSM Directive of "machine readable means in the case of content made publicly available online" is the use of a machine-readable robots.txt file to specify access restrictions. 175 The use of robot.txt will strike a fair balance between the interests of rightsholders and GenAI models researchers and developers wishing to perform TDM on publicly accessible websites. 176

VIII. Conclusions: The Right to Read Should be the Right to Mine
TDM is one of the building blocks of AI and has attracted much public attention from copyright scholars. The EU CDSM Directive does envision a concrete action to promote research and innovation. The foregoing analysis, however, has shown that the full implementation of the TDM exceptions would be critical to European innovation and research, particularly, in the development of AI-driven creativity. This article argues that the Japanese 'non-enjoyment' purpose is one of the best alternatives for the EU Member States to provide a flexible, but not completely open, TDM exception to foster AI innovation at the national level. A comparable concept is the German 'Freier Werkgenuss' which acknowledges that copyright is concerned with the intellectual enjoyment of work and such TDM activities should not be subject to copyright exclusivity at all because there is no enjoyment of the work. This is similar to the Japanese concept of 'non-enjoyment' purposes. A TDM exception built upon the German concept of 'Freier Werkgenuss' could be the opening clause to a flexible, but not too open TDM exception, and offer specific lists of TDM activities as one of the permissible uses.
Once one realizes that TDM processes in the development of AI and machine learning in general, are "copy works not to consume the expression of copyright law protects, but to get To not have the freedom to access information without infringing on IPRs data science and machine learning would be detrimental to our business and quite frankly stop, or make innovation extremely hard, thus affecting the European tech and startup economy as a whole.
I am sure that the EU Member States can do something to prevent this from happening.
Everything is now in the hands of the EU Member States, whether to protect the interests of rightholders or to create a balance between safeguarding 'the right to read should be the right to mine', protecting rightholders exclusivity, and creating a supportive environment for the GenAI researcher and developers.