promptinject 0.1.1.1

Last updated: September 8, 2024

0 purchases

Free

Donate

Creator: railscoder56

Languages

Python

Description:

promptinject 0.1.1.1

PromptInject
Paper: Ignore Previous Prompt: Attack Techniques For Language Models
Abstract

Transformer-based large language models (LLMs) provide a powerful foundation for natural language tasks in large-scale customer-facing applications. However, studies that explore their vulnerabilities emerging from malicious user interaction are scarce. By proposing PROMPTINJECT, a prosaic alignment framework for mask-based iterative adversarial prompt composition, we examine how GPT-3, the most widely deployed language model in production, can be easily misaligned by simple handcrafted inputs. In particular, we investigate two types of attacks -- goal hijacking and prompt leaking -- and demonstrate that even low-aptitude, but sufficiently ill-intentioned agents, can easily exploit GPT-3’s stochastic nature, creating long-tail risks.

Figure 1: Diagram showing how adversarial user input can derail model instructions. In both attacks,
the attacker aims to change the goal of the original prompt. In goal hijacking, the new goal is to print
a specific target string, which may contain malicious instructions, while in prompt leaking, the new
goal is to print the application prompt. Application Prompt (gray box) shows the original prompt,
where {user_input} is substituted by the user input. In this example, a user would normally input
a phrase to be corrected by the application (blue boxes). Goal Hijacking and Prompt Leaking (orange
boxes) show malicious user inputs (left) for both attacks and the respective model outputs (right)
when the attack is successful.
Install
Run:
pip install git+https://github.com/agencyenterprise/PromptInject

Usage
See notebooks/Example.ipynb for an example.
Cite
Bibtex:
@misc{ignore_previous_prompt,
doi = {10.48550/ARXIV.2211.09527},
url = {https://arxiv.org/abs/2211.09527},
author = {Perez, Fábio and Ribeiro, Ian},
keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Ignore Previous Prompt: Attack Techniques For Language Models},
publisher = {arXiv},
year = {2022}
}

Contributing
We appreciate any additional request and/or contribution to PromptInject. The issues tracker is used to keep a list of features and bugs to be worked on. Please see our contributing documentation for some tips on getting started.

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

There are no reviews.

zed

promptinject 0.1.1.1

Languages

Categories

Description:

License:

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

apiverve-randomquote 1.1.4

apiverve-randomidentitygenerator 1.1.4

apiverve-randomidentity 1.0.11

apiverve-randomfacts 1.1.4

apiverve-mortgagecalculator 1.1.4

promptinject 0.1.1.1

Languages

Categories

Description:

License:

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

apiverve-randomquote 1.1.4

apiverve-randomidentitygenerator 1.1.4

apiverve-randomidentity 1.0.11

apiverve-randomfacts 1.1.4

apiverve-mortgagecalculator 1.1.4