
Beyond open vs. closed: Understanding the spectrum of AI transparency
5 minute read time
Artificial intelligence (AI) is transforming industries, from software development to cybersecurity. But as AI adoption grows, so does the discussion around its accessibility and transparency. Unlike traditional software, where the concept of open source is well-defined, AI introduces additional complexities — particularly around training data, model parameters, and architecture openness.
What is open source AI?
The Open Source Initiative (OSI) has put forth an official definition of open source AI, which does not require training data or model parameters to be openly available. However, many in the AI and open source communities argue that true transparency requires full access to these components.
Rather than a binary debate over whether AI is open or closed, the more relevant discussion is about how transparent AI models are across different dimensions — source code, model parameters, and training data.
Let's explore the nuances of AI transparency, how different models compare in openness, and why these distinctions matter.
Defining the full spectrum of AI transparency
Rather than forcing AI into a strict open source vs. closed source dichotomy, it's more useful to assess how transparent an AI system is.
A truly open AI model would provide full access to:
-
Source code – The AI model's architecture and implementation.
-
Model parameters – The learned weights and settings that define how the AI behaves.
-
Training data – The datasets used to train and refine the model.
Many AI projects that claim to be open source make only some of these components available. This leads to a spectrum of openness, rather than a strict yes/no classification.
Key aspects of AI transparency
-
Publicly available code and models: The AI model's architecture, training processes, and datasets are accessible to developers and researchers.
-
Modifiability: Users can tweak and improve the AI system based on their needs.
-
Training data availability: Many AI models do not disclose training data due to privacy, licensing, or competitive concerns.
-
Transparency and trust: Open access to AI components fosters greater scrutiny and ethical AI development, but there is no single definition of what makes an AI model "open."
AI transparency and traditional open source
Since AI transparency exists along a spectrum, a simple open vs. closed comparison doesn't capture the reality of AI model accessibility.
Below is a comparison of different AI transparency dimensions:
Feature |
Fully transparent |
Partially transparent |
Closed source |
Source code |
Open |
Partially open |
Proprietary |
Model parameters |
Open |
Restricted access |
Proprietary |
Training data |
Open |
Not disclosed |
Proprietary |
Many widely used AI models, such as OpenAI's GPT-4 and Google's Gemini, fall into the partially transparent category, where some elements are open while others remain proprietary. By contrast, models like Meta's Llama and DeepSeek offer more openness but still withhold key aspects like training data.
Major players in AI transparency
Several organizations and projects are at the forefront of AI transparency, each offering different levels of openness.
Meta (Llama series)
Meta has made significant contributions to AI with its Llama (Large Language Model Meta AI) series. However, while Llama 2 was released with relatively permissive licensing and model weights, Meta has not made the training data open, which some argue means it does not fully meet the definition of open source AI.
DeepSeek AI
DeepSeek AI is a growing open source initiative that focuses on developing high-quality AI models. While its models and code are publicly available, it is unclear if DeepSeek's training datasets are fully open, placing it in the partially transparent category.
Hugging Face
A central hub for open source AI, Hugging Face provides a vast ecosystem for sharing, training, and fine-tuning AI models. Many of the models hosted on Hugging Face vary in openness, reinforcing the broader discussion around how different AI projects define transparency.
Mistral AI
Mistral AI develops competitive AI models that rival closed-source alternatives. However, similar to Llama, Mistral's models are open in terms of model weights and code but lack fully open training datasets, adding to the ongoing debate about what qualifies as a transparent AI model.
Why AI transparency matters
Instead of focusing on whether AI is strictly open or closed, organizations should assess how transparent an AI model is based on different criteria.
Here's why it matters:
-
Security and compliance: Organizations need to understand how AI models are built and trained to ensure security and regulatory compliance.
-
Innovation acceleration: More transparency fosters innovation by allowing developers to build upon existing models.
-
Regulatory and ethical concerns: Transparency helps mitigate concerns over bias, ethical misuse, and explanation.
-
Enterprise adoption: Businesses evaluating AI solutions need visibility into what components are open, partially open, or proprietary to make informed decisions.
The future of AI transparency
As AI continues to evolve, the conversation is shifting from a binary "open vs. closed" debate to one focused on transparency across different dimensions. Regulatory bodies and industry leaders are already discussing AI governance and responsible deployment, which will impact how organizations disclose AI components.
Whether companies choose fully open, partially open, or proprietary AI models, one thing is clear: the need for transparency in AI development and beyond will only continue to grow.
At Sonatype, we are closely monitoring these AI trends, particularly in relation to software supply chain security. To learn more about AI in software development, check out our insights.

Aaron is a technical writer on Sonatype's Marketing team. He works at a crossroads of technical writing, developer advocacy, software development, and open source. He aims to get developers and non-technical collaborators to work well together via experimentation, feedback, and iteration so they ...
Explore All Posts by Aaron Linskens