Facebook aims to be more engaging to humans in latest AI research

For anyone who has marveled at the breathtaking array of challenges that Facebook has faced this year regarding its place in society, the latest bit of research from the company’s artificial intelligence team offers a fascinating goal: To be “more engaging to humans.”

Researchers at the Facebook AI unit found a way to train machine learning models to spit out not merely factual representations of images, but rather captions to photos that could take on a number of styles of comment that might be more interesting to a person, and that, crucially, are meant to represent the attitude, or personality traits, of disembodied entity that is doing the commenting.

Traditional machine learning tasks that successfully place a description automatically on an image “are useful to verify that a machine understands the content of an image,” they write, but “they are not engaging to humans as captions.”

Also: Facebook’s fact-checkers train AI to detect “deep fake” videos

Personality, in this case, could range from sweet to arrogant to anxious, and various arrangements in between. A picture of a sandwich, for example, could be affectionately labeled, “That is a lovely sandwich,” or, more derisively, “I make better food than this.”

The work is a mash-up of several state-of-the-art techniques, such as how to determine the content of an image, and then how to generate novel sentences.

The paper, “Engaging image captioning via personality,”and posted on the arXiv pre-print service, was authored by Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, and Jason Weston of Facebook AI Research.

The neural network model the authors created, which they dub “TransResNet,” relies upon several state-of-the-art programs built to “encode” image data, including the “ResNet152” encoder developed by Sébastian Marcel and Yann Rodriguez in a piece of software called “Torchvision,” in 2010.

The output of that encoder is then given to a “multi-layer perceptron with ReLU [restricted linear unit] activation units.” To that, the authors add an “embedding” of a personality trait. Next, the authors train two encoders on what they call a “next-utterance retrieval task,” which leverages a database that holds dialog consisting of “1.7 billion pairs of utterances, where one encodes the context and another the candidates for the next utterance.”


Facebook’s AI team offer up the fruits of their “TransResNet” machine learning model, which identifies the substance of images and then combines human utterances so as to form a caption that has personalities of various kinds.

Facebook AI Research.

The authors then show that the TransResNet is competitive or even superior on a bunch of standard benchmark tests for applying a caption to an image. But in order to show that the personality of a caption can have an impact, they had groups of people look at human-authored captions and the automatically generated captions and say which they found “more engaging.”

Report the authors: “Captions conditioned on a personality were found to be significantly more engaging than those that were neutral captions of the image, with a win rate of 64.5 percent, which is statistically significant using a binomial two-tailed test.”

And when comparing their work to “engaging” captions authored by people, the researchers found “our best TransResNet model […] almost matched human authors, with a win rate of 49.5 percent (difference not significant, p > 0.6).”

Also: Google AI researchers find strange new reason to play Jeopardy!

The authors note this is a benchmark from which to pursue further development of their model, “leaving the possibility of superhuman performance coming soon in this domain.”

Interestingly, the authors left by the way side some personality traits they could not model, such as “allocentric, insouciant, flexible, earthy and invisible,” all of which, they write, are difficult to interpret.

There may be a broader lesson in all this about the mood in the world. In the study groups where humans were asked to evaluate how engaging a caption is, the authors write that when they were presented with both a caption that expressed no particular personality. one that’s just factual, on the one hand, and a caption that expressed a positive point of view – “nice kitty!” or some such – on the other hand, people tended to find the positive caption more engaging. But when presented with negative captions, people found them less engaging than those that were just factual. Enough with the negativity, might be the takeaway.

Previous and related coverage:

Early AI adopters report big returns

New study shows artificial intelligence technology is paying off, but organizations face challenges.

Oracle introduces new enterprise digital assistant

Going beyond typical chatbots built for a single purpose, the Oracle Digital Assistant can be trained to support domain skills from multiple applications

AI delivering returns for enterprise early adopters, but not industries created equal

Deloitte’s annual AI survey reveals a bit of realism, cybersecurity worries and a 17 percent median return on investment.

Machine learning now the top skill sought by developers

SlashData’s latest survey of 20,000 developers identifies machine learning and data science are the skills to know for 2019.

What is deep learning? Everything you need to know

The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.

Related stories:

Bitcoin (BTC) $ 51,669.00
Ethereum (ETH) $ 1,790.21
Tether (USDT) $ 1.00
Binance Coin (BNB) $ 235.83
Cardano (ADA) $ 1.11
Polkadot (DOT) $ 34.26
XRP (XRP) $ 0.472373
Uniswap (UNI) $ 32.87
Chainlink (LINK) $ 30.98
Litecoin (LTC) $ 188.97