Some software developers disagree with the open-source community on licensing and compliance issues, arguing that the community needs to redefine what constitutes free open-source code.
The term “open washing” has emerged, referring to what some industry experts claim is the practice of AI companies misusing the “open source” label. As the artificial intelligence rush intensifies, efforts to redefine terms for AI processes have only added to the confusion.
Recent accusations that Meta “open washed” the description of its Llama AI model as true open source fueled the latest volley in the technical confrontation. Some in the industry, like Ann Schlemmer, CEO of open-source database firm Percona, have suggested that open-source licensing be replaced with a “fair source” designation.
Schlemmer, a strong advocate for adherence to open-source principles, expressed concern over the potential misuse of open-source terminology. She wants clear definitions and guardrails for AI’s inclusion in open source that align with understanding the core principles of open-source software.
“What does open-source software mean when it comes to AI models? [It refers to] the code is available, here’s the licensing, and here’s what you can do with it. Then we are piling on AI,” she told LinuxInsider.
The use of AI data is being mixed in as if it were software, which is where the confusion within the industry originates.
“Well, the data is not the software. Data is data. There are already privacy laws to regulate that use,” she added.
New Definition for Open-Source AI Systems
The Open Source Initiative (OSI) released an updated definition for open-source AI systems on Oct. 28, encouraging organizations to do more instead of slapping the “open source” term on AI work. OSI is a California-based public benefit corporation that promotes open source worldwide.
In a published interview elsewhere, OSI’s Executive Director Stefano Maffulli said that Meta’s labeling of the Llama foundation model as open source confuses users and pollutes the open-source concept. This action occurs as governments and agencies, including the European Union, increasingly support open-source software.
In response, OSI issued the first version of Open Source AI Definition 1.0 (OSAID) to define what qualifies as open-source software more explicitly. The document follows a year-long global community design process. It offers a standard for community-led, open, and public evaluations to validate whether an AI system can be deemed open-source AI.
“The co-design process that led to version 1.0 of the Open Source AI Definition was well-developed, thorough, inclusive, and fair,” said Carlo Piana, OSI board chair, in the press release.
The new definition requires open source models to provide enough information to enable a skilled person to use training data to recreate a substantially equivalent system using the same or similar data, noted Ayah Bdeir, lead for AI strategy at Mozilla, in the OSI announcement.
“[It] goes further than what many proprietary or ostensibly open source models do today,” she said. “This is the starting point to addressing the complexities of how AI training data should be treated, acknowledging the challenges of sharing full datasets while working to make open datasets a more commonplace part of the AI ecosystem.”
The text of the OSAID v.1.0 and a partial list of the global stakeholders endorsing the definition are available on the OSI website.
Dissension Over OSI’s Open-Source AI Definition
Schlemmer, who did not participate in writing OSI’s open-source definition, said she and others have concerns about the OSI content. OSAID does not resolve all the issues, she contended, and some content needs to be backtracked.
“Clearly, this is not said and done right, even by their own admitting. That reception has been overwhelming, but not in the positive sense,” Schlemmer added.
She compared the growing practice of loosely referring to something as an open-source product to what occurs in other industries. For example, the food industry uses the words “organic” or “natural” to suggest an assumption of a product’s contents or benefit to consumers.
“How much [of labeling a software product open source] is a marketing ploy?” she questioned.
Is Changing Definition an Enforcement Solution?
Open-source supporters often boast about how the technology is deployed globally. Only rarely is an issue cited about license enforcement issues.
Schlemmer admitted that economic pressures drive changes in open-source licenses. It often becomes a balancing act between sharing free open-source code and monetizing software development.
For example, companies like MongoDB, her own Percona, and Elastic have adapted their licensing strategies to balance commercial interests with open-source principles. In these cases, license violations or enforcement were not involved.
“Several tools exist in the ecosystem, and compliance groups in corporate departments help people be compliant. Particularly in the larger organizations, there are frameworks,” said Schlemmer.
Individual developers may not recognize all those nuances. However, many license changes are based on determining the economic value of the project’s original owner.
Reaffirming True Open-Source Standards
Schlemmer is optimistic about the future of open source. Developers can build upon open-source code without violating licenses. However, changes in licensing can limit their ability to monetize.
These concerns highlight the potential erosion of open-source adoption due to license changes and the need for ongoing vigilance. She cautioned that it will take continuous evolution of open-source licensing and adaptation to new technologies and market pressures to resolve lingering issues.
“We must keep going back to the core tenet of open-source software and be very clear as to what that means and doesn’t mean,” Schlemmer recommended. “What problem are we trying to solve as technology evolves?”
Some of those challenges have already been addressed, she added. We have a framework for the open-source definition with clear labels and licenses.
“So, what’s this new concept? Why does what we already have no longer apply when we reference back?”
That is what needs to be aligned.