Yahoo Open Sources Deep Learning Model for Pornography

Looking to keep content that is not safe for work off your work screen? Artificial intelligence may be able to help you do that. On Friday, Yahoo research engineer Jay Mahadeokar and senior director of product management Gerry Pesavento published a blog post announcing the release of the company’s “deep learning model that will allow developers to experiment with a classifier for NSFW detection, and provide feedback to us on ways to improve the classifier.” In essence, Yahoo is open sourcing its algorithms for detecting pornographic images.

“Automatically identifying that an image is not suitable/safe for work (NSFW), including offensive and adult images, is an important problem which researchers have been trying to tackle for decades,” the Yahoo team wrote on Friday. “With the evolution of computer vision, improved training data, and deep learning algorithms, computers are now able to automatically classify NSFW image content with greater precision.”

That said, an open source model or algorithm for identifying NSFW images doesn’t currently exist, Yahoo pointed out. As such, “in the spirit of collaboration and with the hope of advancing this endeavor,” the company is filling the gap, providing a machine learning tool that focuses on solely identifying pornographic images. The reason for this specificity, Yahoo explained, is that “what may be objectionable in one context can be suitable in another.” But for these purposes, porn is decidedly unsuitable.

Yahoo’s system uses deep learning to assign an image a score between 0 and 1 to determine just how NSFW it really is. “Developers can use this score to filter images below a certain suitable threshold based on a ROC curve for specific use-cases, or use this signal to rank images in search results,” Yahoo said. But bear in mind that there’s no guarantee of accuracy here — really, that’s where you come in. “This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images,” the blog post concluded. “We do not provide guarantees of accuracy of output, rather, we make this available for developers to explore and enhance as an open source project.”

Editors’ Recommendations