DarkBERT: experts predict evil Dark Web chat bot ‘definitely a possibility’

Korean researchers have created an AI model trained on data from the dark web, called DarkBERT. What if criminal groups used the same know-how? Experts explain.

Melanie Burgess and Sayee Shree Ravi Sankar

3 min read

July 5, 2023 - 9:52AM

News Corp Australia Network

ChatGPT helps people be more efficient. So what if nefarious groups created a chat bot using data from the dark web? Picture: Lionel BONAVENTURE / AFP

Hacking

Don't miss out on the headlines from Hacking. Followed categories will be added to My News.

Artificial intelligence could be set to take an evil turn, with researchers proving it is possible to train a language model using data from the dark web – the internet’s depraved underbelly.

While DarkBERT, created by Korean PhD students, was developed ethically and with altruistic intentions, some experts are concerned nefarious groups could work on similar projects and create highly productive criminals or supercharge cyber attacks.

DarkBERT’s goal was to help law enforcement identify cybersecurity threats on the dark web by learning its specific language slang.

Any child abuse material and personal information such as email addresses were excluded from the training data.

However, Torrens University’s Centre for Artificial Intelligence Research and Optimisation director Professor Seyedali Mirjalili said other groups may develop similar models without these guardrails in place.

“Definitely this is a possibility,” he said.

“It’s definitely a concern.”

Professor Seyedali Mirjalili was named the world's top AI researcher in The Australian’s Research 2023 magazine. Picture: NCA NewsWire / Dan Peled

Prof Mirjalili believed the biggest threat of this hypothetical AI tool would be its ability to create malicious content.

“The dark web is full of source codes for cyber attacks so the language model could learn the code and (help people do their) own attacks,” he said.

“Because people sell (hacked) personal data on the dark web, it means phishing emails could be more personalised. People could use language models to create more convincing (scams).”

He said open source language models – including the one behind ChatGPT – had given nefarious AI projects a leg up.

“If you’re a small team with limited resources, you can take one of the open source models already built and trained on the surface web, and retrain on the dark web,” he said.

“When we want to prepare a meal, we buy the ingredients and it takes maybe one hour, as opposed to planting and growing the crops, and it may take one year to prepare the same dish from scratch.”

Prof Mirjalili said if someone created an AI chat bot trained on dark web data, it might help them “find the cheapest drugs or find markets to sell malicious services”.

The hypothetical bot could make criminals more productive.

“Just as ChatGPT now makes us more efficient, definitely people could leverage it for those purposes as well,” he said.

Accenture’s Jacqui Kernot said creating a dark web chat bot would be a very complex task. Picture: Supplied

But the creation of an unethical ChatGPT-style bot using dark web data was not simple or cheap.

Accenture security director for Australia and New Zealand Jacqui Kernot said it would require substantial expertise in natural language processing, machine learning, and large-scale language model training.

“While acquiring dark web data can be challenging due to its hidden nature, determined individuals or groups can still utilise it for building a chat bot,” she said.

“It is plausible that individuals or groups are currently working on unethical chat bots based on dark web data, especially as large language models become more accessible.”

DarkBERT creator Youngjin Jin said people were already using ChatGPT to create malware code. Picture: Supplied

One of the researchers involved in the DarkBERT project, Youngjin Jin, believed the chances of a malicious group creating an evil ChatGPT were “very slim to none”.

“(It’s just) too computationally expensive to pretrain a model like this from scratch,” the Korea Advanced Institute of Science & Technology student said.

“DarkBERT itself has been trained on four state-of-the-art GPUs (graphics processing units) for an entire month, just on 5GB of data.

“Trying to build a generative pretrained model for something like ChatGPT, that would require a lot more computing, resources and data.

“Building a model like this would not be very profitable.”

Mr Youngjin said a “much more threatening” and likely AI model was one trained specifically off malware code to generate more malware code – but this didn’t require the dark web and was already being done using ChatGPT.

UNSW’s Sebastian Sequoiah-Grayson said an evil ChatGPT was unlikely. Picture: Supplied

UNSW senior lecturer and computer science expert Dr Sebastian Sequoiah-Grayson was also not convinced anyone would bother making an evil ChatGPT – or at least not release it widely.

“As soon as you did that on the regular internet you would be found out,” he said.

“There is nothing stopping anyone technically from making it – in the same way I could create a chat bot that will help you make a nuclear warhead – but not many people would see it as in their best interest to do so.

“People engaged in criminal activity work out quickly that discretion is key to a long and prosperous life.”

More Coverage

AI explained in nine ways: How to use it to get a payrise

Melanie Burgess, Sayee Shree Ravi Sankar and Peter Judd

‘ChatGPT on steroids’: What AutoGPT means for Aussies

Melanie Burgess

Originally published as DarkBERT: experts predict evil Dark Web chat bot ‘definitely a possibility’

DarkBERT: experts predict evil Dark Web chat bot ‘definitely a possibility’

Hacking

More Coverage

‘Dystopian’: 64 million people exposed

Dire warning after Qantas breach