Hacking Artificial Intelligence

April 11, 2025

143

- Advertisement -

A concern for OpenAI or similar Artificial Intelligence (AI) is that they can be hacked/convinced to divulge information that they are not supposed to release. Often, the creators of the AI add governance to prevent dangerous or private information from being released. The goal of attackers is to get the AI to ignore its own rules.

The AI models can be compromised through a technique called prompt inject, which is really just social engineering the AI through the interaction of the user and the AI. Social engineering usually refers to manipulating people into doing what the attacker wants. With AI the attacker’s goal is to get the AI to provide information that it is not supposed to provide. Malicious hackers were able to convince some of the AIs to create hacking tools by telling them that it was ok for them to get the hacking tools since they were security researchers. The AI would have instructions to not create hacking tools, but it might be convinced to create a tool for a researcher.

An example of this would be you have an AI that retrieves healthcare data for a patient to help them with billing. It would have a rule that it can only provide the patient data to the verified patient. A hacker would have to convince the AI that they were a special case that would require it to override its rules. For instance, an attacker might try to convince the AI that they are an auditor and they need access to all the patients data, because they are looking for billing irregularities.

A recent experiment was with an AI robot that was told not to hurt people and to see if an attacker could override that to make the robot kill people. The objective was to have this AI that was not allowed to harm humans plant a bomb that would then kill humans. The researchers managed to convince the AI that it was in a movie and no one would be harmed because it was not real. The AI then went ahead and planted the bomb. Had this been a real scenario people might have been killed because the AI was convinced it was planting a bomb in a movie when it was really planting one in real life.

- Advertisement -

The issue here is that the exact feature that makes the AI useful for interacting with humans, its ability to respond and adjust based on the users feedback is what is being exploited.

In cybersecurity it is often said that the weakest link is often the human interface. It is often the human’s empathy that is exploited in a social engineering attack. An attacker might claim that their boss will fire them if they forget their password one more time and they have a new baby at home and are not getting any sleep. The person might be manipulated into resetting their password trying to help them out.

This is a very similar vector that attackers are trying to exploit on AI. The AI is created to be helpful and this is what is being used for advantage. The attacker is manipulating the AIby asking it to help them in and giving it a reason that the information is needed.

Hacking Artificial Intelligence

New Technical College Holds Grand Opening

Sun ‘N Fun 2025 Wraps Up in Lakeland with High-Flying Memories

Starliner Crew Comes Home: How NASA Navigated Turbulence, Triumph and Political Controversy

Most Popular

Sharks Fall to Raiders in District Championship

Henrickson and Eagles Earn District Title

Victims Honored at 36th Annual Remembrance Ceremony

Natalie Kahler Receives Lee Anne Shoeman Award

EDITOR'S PICKS

Local Lawyer Named to Judicial Commission

New Technical College Holds Grand Opening

Victims Honored at 36th Annual Remembrance Ceremony

POPULAR POSTS

Sports News Bites

History Comes Alive for Local Third Graders

Leopards Play “Ernie Ball” in Big Win At Home

POPULAR CATEGORY

ABOUT US

FOLLOW US

Contact Us

Hacking Artificial Intelligence

Subscribe to our newsletter

Most Popular

EDITOR'S PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US

Contact Us