CNN Business  — 

A picture of a woman breastfeeding a baby. A fully clothed woman taking selfies in the mirror. A photo of a vase. These images were all wrongly flagged by Tumblr as improper.

Tumblr began its crackdown on adult content several weeks ago. But behind the scenes its technology still struggles to figure out the difference between what’s banned and approved nudity.

In December, Tumblr banned adult content — specifically “images, videos, or GIFs that show real-life human genitals or female-presenting nipples” to clean up its blogging platform. A handful of exceptions include nudity as art or part of a political event, and written content like erotica.

Tumblr said it would enforce its rules with automated detection, human moderators, and community users flagging objectionable posts. But some Tumblr users are complaining about reportedly misflagged images.

It’s tough for AI to make distinctions between, say, nudity for the sake of art and politics or nudity that is pornographic. An algorithm must be so focused on a specific task, such as spotting faces in pictures, so it can get tripped up by differences humans would see as trivial such as lighting.

To get a sense for how well this can work online, CNN Business created a test Tumblr page and posted photos that don’t violate the service’s policy but might be challenging for AI to sort out. Images included nude sculptures, bare-breasted political demonstrators, and unclothed mannequins.

Most of the posts had no issues. However, we received emails immediately after posting several images that said they were hidden from public view for possibly violating Tumblr’s community guidelines.

As far as we could tell, none of the posts actually did. And the images that were flagged weren’t always the ones we expected. Some images with political nudity — such as bare-breasted female French protesters, painted silver and clothed in red cloaks — appeared fine. Others, including a picture of topless women who had painted their bodies with the Spanish phrase “Mi cuerpo no es obsceno” (“my body is not obscene”), were not. A crowd of mannequins was flagged, too.

Tumblr declined to comment but said in a December blog post that making distinctions between adult content and political and artistic nudity “is not simple at scale,” and it knows “there will be mistakes.” The company has an estimated 21.3 million monthly users, according to eMarketer.

Ethan Zuckerman, director of the Center for Civic Media at MIT, said part of the difficulty in using AI to discern different types of nudity is that while there is plenty of pornography out there, there aren’t all that many images of people getting naked for political reasons.

“You just don’t have as much training data to draw from,” he said.

Tech companies such as Tumblr, Twitter and Facebook are increasingly turning to artificial intelligence as a solution to all kinds of problems, particularly for scrubbing unsavory posts from social networks. But AI’s ability to moderate online content — whether it’s photos, videos, or images — is still quite limited, Zuckerman and other experts say. It can help humans pick out bad posts online, but it’s likely to remain complementary rather than become a panacea in the years ahead.

Facebook in particular has emerged as an AI advocate. In April, CEO Mark Zuckerberg told Congress more than 30 times during 10 hours of questioning that AI would help get rid of social network issues such as hate speech and fake news.

It makes sense to deploy such technology, considering it’d be nearly impossible for human moderators to monitor the content created by a social site’s millions (or, in Facebook’s case, billions) of users.

Dan Goldwasser, an assistant professor at Purdue University who studies natural language processing and machine learning, believes AI will get better at this in the future and we should have realistic expectations for its use in the meantime.

“In some sense, it’s a matter of, well, if you set the bar quite low, AI can be very successful,” Goldwasser said.

A machine-learning algorithm — a type of AI that learns from mounds of data and gets better over time — can identify offensive language or pictures used in specific contexts. That’s because these kinds of posts follow patterns on which AI can be trained. For example, if you give a machine-learning algorithm plenty of racial slurs or nude photos, it can learn to spot those things in text and images.

Often, it’s trickier for machines to flag the nasty things humans post online, Goldwasser said. Inflammatory social media posts, for instance, may not include clearly offensive language; they could instead include false statements about a person or group of people that lead to violence.

AI also has a hard time understanding uniquely human interactions such as humor, sarcasm, and irony. It might sound like you’re saying something mean when it’s meant to be a joke.

Understanding context — including who’s writing or uploading an image, identifying the target audience, and determining surrounding social environment — can be key to figuring out the meaning behind a social-network post. And that’s a lot easier for us than it is for AI.

As AI tools improve, humans will remain an important part of the moderation process. Sarah T. Roberts, an assistant professor at UCLA who researches content moderation and social media, points out that humans are especially good at dissenting when necessary.

For example, we may be able to identify that an image depicting a violent scene is actually a war crime against a group of people. This would be very hard for a computer to determine.

“I think people will always be better at [understanding] nuance and context,” she said.

Zuckerman also believes humans will always play a role in finding and stopping the spread of negative online content.

“Human creativity prevents full automation from coming into play,” he said. “I think we’re always going to find ways to surprise and shock and create new imagery.”