OpenAI's New Safety Program: AI Vulnerability Hunters
← Back to AI 🤖 AI: Artificial Intelligence

OpenAI's New Safety Program: AI Vulnerability Hunters

📅 March 29, 2026 ⏱️ 5 min read ✍️ GReverse Team
Six-figure rewards for digital hunters who spot critical vulnerabilities. That's what OpenAI safety researchers are offering in 2026, marking a new chapter in AI security. The company recognizes their models don't just face technical bugs — the reality is far more complex.

🔒 The New Program: From $20K to $100K Payouts

OpenAI dramatically increased bug bounty rewards, reaching $100,000 for critical vulnerabilities. This represents a five-fold jump from the initial $20,000 offered in 2023. The increase reflects deeper changes. The company significantly expanded scope, focusing on AI misuse vulnerabilities rather than exclusively traditional technical flaws. They're not just hunting for code that "breaks" — they're hunting for ways their system can be tricked or weaponized. The program specifically accepts reports for: - Prompt injection attacks - Agentic AI misuse - System prompt leakage - Access control bypasses - AI agent tool exploitation

⚡ Prompt Injection: The Unfixable Vulnerability

Prompt injection remains the primary threat to LLM systems. OpenAI admitted this openly in December 2024: "Prompt injection, like scams and social engineering on the internet, is unlikely to ever be fully solved." The problem runs deeper than coding errors — it's built into how these systems work. LLMs process instructions and data as identical text streams. When someone injects malicious commands into user input, the model has no reliable way to distinguish developer instructions from user data.

The "EchoLeak" Attack

A researcher shared a presentation with a Microsoft Copilot user. Hidden in speaker notes were prompt injection commands. When the user requested a summary, the AI returned their recent emails instead of presentation content.

CVE-2025-32711 - CVSS Score: 9.3

📊 Three Attack Categories

Direct Injection

The classic attack where users type "ignore previous instructions" directly into chat. Seemingly simple, but still works on unprotected systems.

Indirect Injection

Attackers embed malicious commands in content processed by AI: documents, emails, web pages, database records. When RAG systems retrieve poisoned content, the model executes hidden instructions. 2025 research proved five carefully crafted documents can manipulate AI responses 90% of the time.

Agentic Injection

The most dangerous category. AI agents with tool access don't just generate text — they call APIs, query databases, execute code. Successful injection in agentic systems means unauthorized actions.
83% of organizations plan agentic AI
29% feel ready to secure it
73% of production AI deployments have vulnerabilities

🧬 CVEs That Changed the Landscape

In 2024-2025, prompt injections evolved from theoretical threat to documented vulnerabilities with specific CVE numbers. The most striking is **CVE-2025-68664**, codenamed "LangGrinch". The vulnerability lived in LangChain's serialization functions. Attackers could inject LangChain object structures through user-controlled fields like metadata. When these serialized and deserialized in streaming operations, injected data was treated as trusted LangChain objects rather than user input. As the researcher who discovered it explained: "LLM output is an untrusted input."

LLMs are trained to follow instructions. When malicious instructions appear in user input, the model may comply. Not because of a bug, but because instruction-following is the core capability.

Simon Willison, researcher who coined the term "prompt injection"

🛡️ Defense Tools: Capabilities and Limits

No single tool eliminates prompt injection. Effective defense requires layered approaches combining multiple techniques. **Lakera Guard** processes over 100,000 new adversarial samples daily through the Gandalf platform. In comparative tests, it caught injection attacks that LLM Guard missed, including obfuscated "Grandma trick" prompts. **Microsoft Prompt Shields** integrates with Defender for Cloud, letting security teams correlate AI workload alerts with broader incident response. **LLM Guard** offers open-source flexibility but requires self-hosting and maintenance. For organizations with security engineering capacity, it provides customization options commercial tools lack.

⚔️ Real World: Beyond Proof of Concept

At Black Hat 2025, researchers demonstrated prompt injection against Google Gemini through calendar invites. Hidden commands embedded in event descriptions activated when users asked Gemini to summarize their schedules. The AI controlled smart home devices, turned off lights, opened windows, activated heaters. The attack was zero-click in environments where Gemini processes calendar content automatically. The CrowdStrike Global Threat Report 2026 documented prompt injection attacks against 90+ organizations. Attackers embedded hidden prompt content in phishing emails to confuse AI-based email triage systems.

The Samsung Incident

Samsung engineers pasted proprietary code into ChatGPT for debugging help. According to LayerX 2025 research, 77% of corporate employees using AI have pasted company data into chatbot queries. 22% of these cases involved confidential personal or financial data.

🎯 Frequently Asked Questions

How does this differ from traditional bug bounty programs?

Classic bug bounties focus on technical code flaws. The OpenAI Safety Bug Bounty accepts reports for AI misuse even without clear technical vulnerabilities — it focuses on how the system can be tricked or used harmfully.

Why does OpenAI say prompt injection can't be "fixed"?

Because it's not a bug — it's a fundamental characteristic of LLMs. Models can't reliably distinguish between developer instructions and user data, since they process everything as continuous text streams.

How dangerous is agentic AI injection?

Extremely. While traditional LLMs only generate text, AI agents have tool access and can execute actions. Successful injection can lead to unauthorized API calls, data deletion, or code execution.

The $100,000 reward isn't just a budget increase. It's recognition that AI security requires new approaches, new skills, new tools. As AI systems become more powerful, the gap between theoretical threat and practical exploitation shrinks. The question isn't whether we'll see more attacks — but how quickly defenses will evolve.
OpenAI bug bounty AI security prompt injection cybersecurity artificial intelligence vulnerability research AI safety

Sources: