What You Need to Know Before Uploading Internal Documents to AI Tools

Uploading Company Documents to AI Tools

As generative AI tools, like ChatGPT, Gemini and others, become more common in the workplace, many companies are eager to integrate this new technology into their workflows.

In the rush, an important question often gets missed: what happens to your internal files once they’re uploaded to an AI platform?

Especially with free or personal versions, uploaded data is often used by the AI platform or vendor to improve and train the models you’re using, which means your uploaded data, prompts and generated responses are at risk of showing up anywhere and everywhere, even as part of a commercial product.

In this post, we’ll walk through the key usage rights you need to understand, the red flags to look for in AI usage terms and the steps your legal or compliance team should take before sharing company documents with any AI tool.

Why AI Usage Rights Matter Now More Than Ever

AI tools are showing up in more business settings every day. But behind the convenience, there are real questions about where the inputted information or documents goes once you hit “submit.”

Unless the AI version you use allows you to turn off data sharing, or is under a business license with strict permissions, the content you enter may be used by the AI vendor. Even if you strip out names, certain patterns or phrases can still reveal business details you do not intend to air outside your organization.

Several companies have already run into problems after employees shared internal data with AI tools.

This kind of exposure can’t be undone. Deletion of a document or conversation from an AI platform does not necessarily mean the underlying data is wiped from the vendor’s systems immediately, or at all, depending on their retention policies. Sharing protected information with a third-party platform may violate NDAs, contracts, data privacy laws and professional ethical obligations. It can also erode trust with clients or partners who expect that their information will be handled with care.

We dive into this more in our blog here: Legal Implications of AI Technologies: 6 Tips to Minimize Risk

Can AI Platforms Use Your Content to Train Their Models?

Passing content into an AI platform often does more than generate an answer. Your information may also be used to train the large language model (LLM) used by the AI provider. Many of these tools amass user inputs to refine their models—unless you’ve taken steps to opt out. Whether or not that’s allowed depends on the platform’s terms and the type of account you’re using.

By the way, what do you think was used to train these models beginning a few years ago? If you used any free information service including search engines (remember 1-800-GOOG411?) your search terms and browsing habits probably helped train a model.

Data Protection in AI Platforms Depends on the Tool and the Terms

Every version of each AI platform can vary widely in how user data is handled. Some use every prompt or uploaded document to improve their models, while others restrict that practice, especially for paying customers. The key difference often comes down to which version of the tool you're using and what the terms of service allow. In many instances, the top providers of AI-based tools offer separate terms for standard and enterprise users.

Make sure you’re reading the right one! That’s not always discernible. If you aren’t sure, you’re not alone.

Regardless of the platform you choose to use, do not assume your data is protected by default. If LLM training is permitted, your information and inputs may become part of the model’s broader knowledge.

“The safest approach is to treat every platform as wide open with free reign by default. Then confirm with written assurances where your data goes, who can access it and how long it stays in the system. It’s not as simple as asking "Hey, ChatGPT can I trust you?” - Tara Swaminatha, ZeroDay Law

Where to Look: Terms of Service and Data Processing Agreements

The legal framework that governs AI use and your data usually lives in two places: the Terms of Service (TOS) and the Privacy Policy or Data Processing Agreement (DPA). These documents define how the provider collects, uses, stores and shares all information, including personal information.

The TOS typically applies to all users and outlines the provider’s general rights. A PP or DPA outlines users’ rights (and your rights) in personal information and memorializes specific privacy and security obligations for controllers and processors.

Look for the following phrases when reviewing user agreements:

“Improving the product.” This phrase often signals that the provider may use your input to train or refine their models. If the tool states it collects data to improve services or performance, that likely includes reviewing or storing what you submit and using it in development, testing and production environments.
“Aggregated and anonymized.” Many platforms claim they only use data that has been stripped of identifying details. But aggregated data can still reveal patterns or proprietary processes. If the language is vague, ask for clarification.
“Data retention periods.” Look for specific timelines. Some tools keep data for 30 days. Others store it indefinitely. If no timeframe is listed, assume the provider has broad control over how long they retain your content and look for how they purge the data or if it is just subject to overwrite. Check whether your admin users can change the data retention period and/or delete data manually.
“Data ownership and licensing.” The agreement should clearly state that you retain ownership of the data you submit and derivative works if you can. Be cautious of any language that grants the provider a “perpetual” or “irrevocable” license to use your content without restriction.
“Restrictions on upload content.” Some Terms of Service explicitly prohibit uploading information that is subject to statutory, regulatory or contractual restrictions (such as personal information or data covered by any other authorities, e.g., HIPAA, FERPA, NDAs or government confidentiality obligations.

These terms indicate whether your data stays secure or is used to train the platform. If anything is unclear or vaguely explained, the tool may not be a safe choice. Being aware of the risks is important if you choose to use a platform with ambiguous terms.

Additional note: If you upload restricted data in breach of the agreement and the platform experiences a data breach, you may have no indemnity from the platform and would likely be forced to cover the cost of any civil claims against you. Typically, AI vendors disclaim liability when customers violate their terms, leaving you fully exposed. To reduce some AI-related risks, look into bringing the AI tool's engine and data stores in-house.

Common AI Use Loopholes

The TOS for some AI platforms include clauses that expose your data in ways that aren’t obvious at first glance.

Be aware of linked documents. Although this applies to any lengthy online TOSs not just for AI tools’, the main agreement often references multiple separate online policy pages that change, expand and sometimes contradict the main terms. If you don’t track those details, you may agree to more than you intended. Hidden clauses buried in linked documents might supersede the main agreement.
Look out for a provider's right to change terms. Some TOS’ give the provider the right to change terms at any time. Depending on the product and content, this may or may not violate privacy-related obligations. Nevertheless, without a requirement to notify users, your rights can shift without warning. Some platforms also auto-enroll users in data sharing for model improvement. Although they will probably have to change their tunes, unless you make sure you’ve opted-out when given the option your inputs may be used for training even under a paid account.
Understand third-party integrations. If an AI tool connects with other products or services, your data may flow to vendors, which is common if not ubiquitous. However, make sure to understand the information that may be shared and what rights the AI tool’s vendors have in the data, sometimes third parties can use personal data more broadly than the AI tool vendor itself.

These loopholes make it essential to review every layer of the platform’s policies, not just the primary contract.

“Loopholes often hide in the fine print you never see. This can include buried links, silent defaults and vague or ambiguous permissions that give vendors far more access than users realize. The real risks are often in how terms like “content” or “confidential” are defined, not to mention conflicting provisions about data subject’s rights and anything the platform omits from the agreements.” - Tara Swaminatha, Principal and Founder of ZeroDay Law

What AI Usage Rights Should You Require?

Before uploading any internal content to an AI tool, make sure the platform’s usage terms align with your company’s privacy and confidentiality standards. Certain rights and restrictions should be non-negotiable to maintain control over your data.

Non-Negotiable Clauses for Enterprise Use

When companies use AI tools in business settings, some contract terms are too important to concede. These clauses spell out how your data is handled, who can see it and what the vendor is permitted to do.

Look for a direct statement that your data will not be used to train or improve the model. That restriction should cover both what you upload and what the platform generates in return. Check the definitions of

Example: “Your content will not be used to train or improve any AI models, now or in the future.”

The agreement should also confirm that your company keeps your current IP rights in everything you submit. Without that clause, the vendor may claim broad rights to reuse or repurpose your material.

Example: “Customers retain full ownership of all uploaded content, prompts and outputs.”

Finally, make sure the vendor is only processing your data to deliver the service you signed up for. This prevents your information from being shared, stored or analyzed for other purposes.

Example: “Customer content will be used solely for the operation of the tool and in order to perform obligations under this agreement.”

Agreements that include this kind of precise, restrictive language are better equipped to support enterprise-level privacy and legal standards. Anything less creates uncertainty and increases exposure.

Does your financial services company use AI? Read more about the specific AI concerns in the financial industry.

Practical Steps Before You Upload to Any AI Platform

Before sharing company content with an AI platform, stop and review the essentials. A short internal check can help you avoid unnecessary risk and protect against privacy or compliance issues.

Start by asking the right questions:

Are you using personally identifiable information, confidential business materials or sensitive legal content?
Do you know where the data will be stored and who will have access to it?
Have you reviewed the exact terms that apply to the version of the platform you’re using? This generally goes beyond the provider’s general policy and should be tied to your specific tier or plan.

If any of these questions raise doubts, pause and reassess. Establishing a clear AI use policy can help set guardrails for your team and ensure that everyone knows what can and cannot be shared. Regular training and communication within your team can reinforce these guidelines and keep team members aware of evolving risks and platform changes.

For experimental use or testing, always work with redacted content or dummy documents. When practical, consider using enterprise-grade tools with sandboxed environments that keep your data isolated and under your control.

Taking these steps up front protects not only your data, but also your ability to use AI tools responsibly as part of your broader business strategy.

Interested in learning more? Read our blog: AI and the Law: Balancing Technological Innovations with Traditional Legal Values

Final Takeaway: Read the Fine Print or Risk the Fallout

Once your documents are uploaded to an AI platform, you may not be able to get them back or control how they are used. If the terms allow for training, retention or sharing, that content could live on in ways your company never intended.

Do not rely on assumptions or default settings.

Make sure legal and compliance teams review the full terms of service and any related agreements before any internal content is shared.

Need help reviewing your AI platform terms? Review our AI legal services to see how we can help you navigate AI with confidence or contact ZeroDay Law for a risk evaluation and guidance tailored to your business.

Take a look at our additional AI legal resources: