This article has been saved to your Favorites!

Attys Beware: Generative AI Can Also Hallucinate Metadata

By Daniel Garrie, Jennifer Deutsch and Morgan B. Ward Doran · 2025-11-04 17:24:36 -0500 ·

Daniel Garrie
Daniel Garrie
Jennifer Deutsch
Jennifer Deutsch
Morgan B. Ward Doran
Morgan B. Ward Doran
Consider this increasingly plausible hypothetical: A litigation team drafts an important brief with the assistance of generative artificial intelligence.

Cognizant of the missteps that befell attorneys who adopted AI in its early days,[1] and aware of the latest research into the problem of hallucination,[2] the case team carefully reviews every sentence and each citation to ensure factual accuracy and legal fidelity. Confident that they have identified and scrubbed all potential AI-generated errors, the team files the brief with the court.

Unbeknownst to the team, however, the brief includes AI-generated metadata stating that its author is John Milton, and the law firm, or company, is Milton Chadwick & Waters.

A reviewing court may or may not find it amusing that the brief was purportedly written by Al Pacino's character, Satan, in the 1997 film "The Devil's Advocate" — but the presence of this information will certainly undermine the credibility of the true author and law firm. Or consider the court's reaction if the metadata instead indicated that the brief's author was a prominent attorney at the law firm representing the opposing party.

While these examples are outliers, there are real-world consequences of the legal industry's burgeoning adoption of AI tools to facilitate workflows. This is particularly true in areas of law that deal more directly with the timing, authenticity and provenance of documentary evidence.

Attorneys across practice areas now routinely use AI to conduct research, draft contracts, prepare briefs, generate discovery requests, produce internal memoranda and write emails. Similarly, attorneys' clients also use AI to create their own documents, some of which will be subsequently scrutinized as evidence in legal proceedings.

When AI generates a document, it may quietly populate or modify hidden fields that are embedded in the document — called metadata — with fictitious or misleading information. These AI-generated hallucinations are just as dangerous, if not more so, as errors in the body of documents, because they are overlooked by most users, appear authentic, and can have significant implications in discovery, authentication and privilege disputes.

While much of the existing discourse regarding the impact of AI on the practice of law focuses on AI-generated hallucinations in the content of documents — e.g., factually inaccurate case references, fabricated statutes or imaginary precedent — relatively little attention has been paid to metadata hallucinations in those same documents.

This article introduces and briefly examines the problems that AI-generated hallucinations in the metadata of documents present to the legal profession.

What Metadata Is

Metadata is often described as data about data. It is the hidden layer of information embedded into every digital file that provides basic information about the file. This information is categorized in fields, typically including descriptive data such as the author, last modified date and creation date. In practice, metadata functions like an ersatz digital audit trail.

For the types of documents that attorneys and their clients most often deal with — such as Microsoft Office, e.g., .docx, .xlsx and .pptx; Adobe Acrobat, i.e., .pdf; or image files — metadata can reveal the:

  • Author and organization name;

  • Dates of creation, last modification and printing;

  • Version history and tracked changes;

  • Embedded comments or hidden notes; and

  • Other information.

Email metadata is even more comprehensive, with headers providing intricate details about the sender, recipient, encryption and authentication.

Notably, metadata is created automatically by the software and the operating system as a user generates, saves, shares, sends or otherwise alters a document. The system logs each action by updating, modifying or adding new metadata fields. This process happens largely in the background, without the user's knowledge or awareness, which is why metadata often provides valuable insights beyond those visible in the main content of a file.

In litigation, metadata is not merely a technical curiosity; it is evidence. Courts and opposing counsel routinely rely on metadata to authenticate documents, establish chain of custody, or prove when and by whom a document was created.

Under the Federal Rules of Civil Procedure, parties must often produce subpoenaed electronically stored information in its native format. These native-format files will include metadata that can be lost when files are converted to a secondary format for production. Critically, that metadata can be used to prove authenticity and establish admissibility.[3]

Why AI Hallucinates Metadata

Metadata hallucinations arise from the way large language models generate output. Despite the conversational nature of their user-friendly front-ends — e.g., ChatGPT, Gemini, Copilot, etc. — LLMs do not know or understand facts in the way people do. Instead, they create output by predicting the text or information that is most likely to appear next in a sequence — e.g., a sentence or image — based on patterns learned in their training.

Generative AI hallucinations occur because "the training and evaluation procedures [used to create LLMs] reward guessing over acknowledging uncertainty," according to a September research paper from OpenAI.[4]

This fundamental characteristic of how LLMs are developed — i.e., that they "are primarily evaluated using [assessments] that penalize uncertainty," in the words of the OpenAI paper — affects both the viewable content of a document and its metadata.[5]

When creating any document, AI necessarily produces not only the visible output — e.g., the actual text or image — but also the underlying details that its training dictates a complete document of that type should contain, such as the author, creation date or version history. Faced with uncertain or nonexistent data to enter into these fields for a particular document, AI will fabricate the data, drawing on examples, placeholders or approximations from its training. These hallucinated metadata may appear authentic, but they are entirely fictional.

The problem is exacerbated when AI is embedded into document generation software like Microsoft Word or Google Docs, which already create metadata by default — even without the addition of AI tools. The result is a hybrid output in which the software may attribute authorship to one user, while the AI inserts conflicting or invented details into the metadata field "Author."

As articulated by the OpenAI paper, "Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty."[6] The difference is that these guesses are embedded in the file's hidden metadata, where they are often overlooked and, therefore, more likely to mislead viewers.

Where AI Metadata Hallucinations Appear

Hallucinations in AI-generated documents can surface across multiple layers of metadata. Importantly, these errors are not confined to obscure technical fields; they can appear both in the metadata that any user can readily display with a few mouse clicks, and in the deeper metadata that is only viewable using specialized forensic or e-discovery tools.

At the surface level, hallucinations show up in a document's properties, available and easily accessed through the user interface of a Microsoft Word, Excel, PowerPoint or Adobe Acrobat file. In a Word document, the "Show All Properties" tool reveals basic metadata information, such as the "Author," "Company" or "Created Date" fields — all of which AI may hallucinate.

Because metadata has historically been automatically created by the software and the system, rather than users, these fields appear authoritative and are often presumed to be valid. As such, these metadata fields are often relied upon by lawyers, experts and courts to establish the credibility or authenticity of evidence.

As research for this article, we directed several AI tools to generate office-type documents and examined whether they contained metadata hallucinations. They all did. Across document types, the AI tools fabricated information for several key metadata fields commonly found in office-type documents, including the date, author and comment fields.

Again, because the user's operating system and applications insert these fields automatically into a document's metadata, they are often presumed to be authentic and valid. In our samples, however, the AI tools inserted false and inaccurate information into these fields, often including details regarding the developer of the underlying code that powered the AI's ability to programmatically create, read and update the documents.

Alarmingly, we also observed AI-generated hallucinations in the deeper technical metadata of the documents, which we accessed through specialized forensic tools like X-Ways, EnCase and Magnet Axiom. Using these tools, we found that the AI-generated documents included fabricated hash values, false version histories and inaccurate time stamps. Because forensic tools are designed to report the exact information that is automatically embedded in files, they displayed these AI-hallucinated values as if they were authentic.

This is particularly problematic because the output from forensic tools has historically carried the considerable weight of forensic authority — typically considered the ground truth. Now, however, it is necessary to question whether forensic metadata was hallucinated by AI and is, therefore, either inaccurate or entirely fictitious.

The result is a dual risk: Lawyers may be misled by hallucinated metadata displayed in the file's basic properties, and courts or experts may be misled by hallucinated metadata revealed through forensic analysis. In both contexts, AI's tendency to guess at the proper response can create a record of seemingly authentic information about authors and companies that do not exist, at times that are not accurate.

Implications of AI Metadata Hallucinations

AI-generated metadata hallucination creates profound challenges for the legal community on foundational issues like discovery, document authentication, privilege and professional responsibility. Its existence means that there may no longer be an absolute ground truth for documents.

Instead, legal and forensic practitioners must now assess whether it is necessary to cross-check each document's metadata against other supporting evidence to determine its provenance and accuracy. This introduces a new elemental requirement into forensic analysis: one that is not generally known; has not been widely adopted; and necessitates education, time and resources to implement.

Conclusion

Generative AI metadata hallucination is an underreported and underappreciated, but increasingly important, risk in this blossoming era of AI-driven legal practice.

Unlike content-based AI hallucination, metadata hallucinations alter the core identifying elements of documents — what heretofore was considered the ground truth. As such, they threaten the integrity of discovery, the reliability of evidence and the ability to definitively identify the provenance of electronic documents.

Legal practitioners must be wary of the nascent problem of metadata hallucination and consider how to best address it going forward.



Daniel B. Garrie is the founder and managing partner of Law & Forensics LLC.

Jennifer Deutsch is the director of privacy services at Law & Forensics.

Morgan B. Ward Doran, Ph.D., is an expert witness at Law & Forensics. He is also senior special counsel at the U.S. Securities and Exchange Commission. The SEC disclaims responsibility for any private publication or statement of any SEC employee or commissioner. This article expresses the author's views and does not necessarily reflect those of the commission, the commissioners or other members of the staff.

The opinions expressed are those of the author(s) and do not necessarily reflect the views of their employer, its clients, or Portfolio Media Inc., or any of its or their respective affiliates. This article is for general information purposes and is not intended to be and should not be taken as legal advice.


[1] See, e.g., Gauthier v. Goodyear Tire & Rubber Co. , 2024 BL 431433, E.D. Tex., No. 1:23-cv-281; Coomer v. Lindell, et al. , Case No. 1:22-cv-01129-NYW-SBP, 07.07.2025.

[2] See Why Language Models Hallucinate, Kalai et al., https://arxiv.org/pdf/2509.04664.

[3] See Federal Rules of Evidence 901 and 902.

[4] See Why Language Models Hallucinate, Kalai et al.

[5] Id.

[6] Id.

For a reprint of this article, please contact reprints@law360.com.