Project Leader: Professor Ellen P. Goodman

With Dr. Hector Postigo, the Rutgers Institute for Information Policy & Law (RIIPL) piloted a natural language search database of policy proposals and legislative enactments aimed at ensuring fair, transparent, and democratic implementation of AI. We fine-tuned Open AI’s pre-trained GPT-2’s recent 355M weights with a corpus composed of AI/Big Data/Algorithmic decision-making transparency policy, position papers, and laws collected in 2019.  We conducted a demonstration of the resulting text generation capability. The goal is for advocates to be able to input prompts based on their policy objectives, and receive outputs suggesting language for policy or advocacy narratives to meet their objectives. Policy activists may send  10-15 word prompts for their AI policy initiatives to aipolicyriipl@gmail.com to produce outputs for review. 

Here’s as an example of the process

FRONT END

Input:  “Algorithmic decision-making can pose dangers to human rights”

BACK END:

Output:

“Decision systems that assume predictability about human behavior can be prone to error. These are the errors of a data-driven society. AI systems that assume predictability about human behavior lead to the denial of fundamental human rights, and are in direct contravention of all basic human rights treaties and international human rights instruments. As the Human Rights Council has stated: “No human being should be arbitrarily denied the right to request the reasons for adverse decisions or to appeal them.

At the same time AI systems that assume there is some inherent bias in underlying data, or that certain biases are less important than others, or that there is some inherent validity to the data, results, and algorithms used to make decisions, also have the potential to worsen abuses. These errors are not necessarily accidental. They are the product of cultures, design and implementation of systems, including their training set, algorithms, data weights, and interpretation tools. If these errors are not kept in check, as they should be, the built environment can serve to reinforce them.

The design of AI systems to be used in human rights and human security fields must take into account both the historical and contextual factors, and ensure that people will be able to understand and explain their context. The design of AI systems that take into account the needs of people in a different culture, time, place, and context, as well as the way AI systems are used, can have profound effects on how people are perceived and treated.

Recognizing these issues will ensure that we are able to use the tools that humanity has entrusted to us to address the most pressing rights and security challenges of our time.”

 ***

The models on hand can return narratives of varying sizes and fidelity. They will not always generate grammatically perfect or focused outputs, nor outputs that are always factually correct. The tool is intended only to assist in text generation that must be reviewed and checked for accuracy by the user.  When used in this way, the models can generate texts that 1) reference the policy lexicon it was fine-tuned with, 2) are factually relevant, and 3) assist in users’ objectives. We designed this tool with a specific set of users in mind, not for a general audience.  We assume users will come to the interface with a policy objective and will have some general knowledge about policy issues’ “key words” and use them in their prompts, guiding the model toward the sorts of outputs needed.  In short, the model needs user guidance (via topical specificity in inputs).   

Acknowledgements:  Thanks to Open AI for releasing GPT-2, which was used for this project. Texts were generated using Max Woolf’s GPT-2-Simple Python packages, our tweaks to the package’s parameters, the primary documents we gathered for the training corpora, and our efforts to make them readable to the GPT-2 algorithm. Currently the corpora encompass a number of topics relevant to those seeking AI/Big Data/Algorithmic transparency.  Fine-tuning, data gathering, and cleaning was done in-house using our hardware. Our ongoing work continues to organize the larger document database into topic specific corpora, fine-tuning them for users whose policy objectives are defined with more granularity.