Using open AI in public sector software projects is now more relevant than ever. As technology continues to advance, new regulations, such as the EU AI Act, set new standards for public sector AI projects. At the same time, the question of openness in AI has become central: how can we ensure that AI solutions are not only efficient but also safe, ethical, and transparent?
The recently (Oct 3, 2024) publiched for review Open Source AI Definition (OSAID 1.0 RC1) aims to clarify what "open AI" truly means. It defines openness in AI by covering source code, training data, and algorithms. But how does this translate to public sector projects, and what can open AI offer public services?
One of the greatest benefits of open AI for the public sector is its ability to modernize outdated systems and improve service quality. Here are a few examples from public sector that can be shared publically. For example, the we consulted the Smart Rehabilitation project by Jyväskylä University of Applied Sciences, where they aimed to use AI to improve healthcare solutions by analyzing patient data and helping make decisions faster and more accurately. The goal was to achieve better healthcare outcomes and more efficient use of resources, showing how AI can modernize traditional public sector systems, bringing clear benefits to both citizens and service providers. The project was related to a global initiative and was followed up by a project related to digital rehabilitation in Rwanda
Another great example is the Memory Lab project with Digitalia institute and University of Applied Sciences of South-East Finland (XAMK) which used AI to automate and analyze archives and other big data. The project also aimed and opening new possibilities for innovation and collaboration between the public and private sectors, allowing for better use of open data and open source systems.
However, using open AI is not without risks. One of the major challenges is open-washing, where AI solutions are marketed as open, but have hidden restrictions. LLAMA 2 is an example of a model that claims to be open but has limitations on its commercial use, which can blur the line of what "open" really means.
Data privacy is another critical concern, especially for the public sector, which often handles sensitive information. The metadata automation project with the Finnish National Audiovisual Institute showed that while open data and AI offer huge opportunities, without proper governance, privacy and intellectual property can be at risk.
How can public sector organizations make sure that the AI solutions they use are truly open and safe? This very simplfied Openness Assessment Matrix is part of helpful tools we use with our clients. It's a simple and practical tool to evaluate the openness of AI solutions in four key areas: source code, training data, input data, and API.
In the Memory Lab project, openness was high in terms of source code and API, but there were challenges with managing the training data. In the ditial rehabilitation and audiovisual arhives projects, input and training data data openness was more limited, but privacy and governance were strong. This shows that while a solution may be technically open, its real-world openness depends on many factors.
Openness of Source Code
Is the AI model’s source code publicly available and modifiable by others?
- Is the source code fully accessible without restrictions?
- Are contribution rules and licenses clear?
- Is there a risk of open-washing?
Example: LLAMA 2 is labeled as open-source, but its commercial use is restricted
Openness of Training Data
Is the dataset used to train the AI model openly accessible and well-documented?
- Is the data accessible and ethical?
- Is the data free from bias and well-documented?
- Are there governance issues in using this data?
Example: Some models are trained on proprietary datasets, limiting openness and transparency
Openness of Input Data
Can external data be easily input into the AI system, and how open is that process?
- How easy is it to integrate external data?
- Are there restrictions on input data formats?
- How is privacy protected?
Example: Open models may accept various input formats, but privacy concerns arise when handling sensitive public data
Openness of API
Is the API used to interact with the AI system open and well-documented, with no restrictions?
- Is the API fully open and well-documented?
- Are there usage restrictions like rate limits?
- Does the API support interoperability?
Example: APIs like GPT's are partially open, but usage caps and access restrictions can limit usability
It’s clear that open AI offers huge potential for the public sector. But to fully take advantage of these opportunities, organizations must be able to evaluate how open their solutions really are and manage the associated risks. The new EU AI Act sets stricter requirements for AI use, particularly in public services.
We have recently developed the Designing Compliant Data Products course in collaboration with Laurea University. This course provides tools for understanding and meeting the regulatory demands of AI and data, helping public sector actors keep up with the latest challenges in AI regulation.
Open AI presents huge opportunities for public sector organizations, but without clear governance and strong privacy protections, it can also pose significant risks. The Openness Assessment Matrix offers a concrete way to evaluate the openness of AI solutions and helps public actors make better decisions when using AI.
A question for you: How well does your organization ensure the openness and security of its AI solutions? What steps have you taken to take advantage of AI opportunities in your public sector projects?