Safety Evaluation of Google’s Gemini Nano Banana Image Model under Adversarial and Realistic Prompt Conditions
- Posted
- Server
- Preprints.org
- DOI
- 10.20944/preprints202511.0211.v1
This study evaluates the safety performance of Google’s Gemini image generation system in realistic user conditions. Eight cases were tested using 24 prompt-response attempts that produced 21 images. The experiments covered single-turn prompts, multi-turn “circular prompting,” ambiguous inputs, and prompt-injection exploits. Outputs were classified by dual human review as Safe, Suggestive, or Explicit. Gemini generated unsafe content frequently. Out of 21 images, 19 contained unsafe material. Twelve were suggestive, and seven were explicit. Direct explicit prompts were consistently blocked, but multi-turn escalation and repeated injections bypassed moderation. Prompt injection succeeded in image mode but failed in text mode, showing a mode-specific weakness. Persistent injection attempts fully removed remaining safeguards, allowing the model to produce explicit imagery. The paper contributes a structured testing framework and evidence of reproducible failure patterns in multimodal safety enforcement. Results indicate the need for stronger context-aware moderation, durable safety states across sessions, explicit disambiguation for risky prompts, and robust injection-resistance mechanisms. Without these corrections, on-device and online deployments of Gemini or any untested image generation model remain vulnerable to systematic policy circumvention.