Since the advent of social media and camera phones, people have had a peculiar fascination with taking pictures of their meals, applying an unappetizing photo filter, and posting them to Instagram, perhaps in an effort to incite jealousy of whatever delicious confection that they are about to devour. Now, thanks to Google, taking such photos of your food may have a more practical purpose in the near future.
At the Rework Deep Learning Summit this week in Boston, Google scientist Kevin Murphy unveiled “Im2Calories,” an AI system that uses a complex learning algorithm that can analyze photos of food and estimate the calories on your plate.
Im2Calories is able to identify food objects from an Instagram-quality photograph, distinguishing eggs from bacon, and even any visible condiments. The system then measures the size of the food by comparing it to the size of the plate and produces an estimate of its calorie content.
The AI powering Im2Calories incorporates a “deep learning” AI system, combining pattern recognition with visual analysis. It’s designed to be somewhat self-reliant, improving its performance with use. The way the system is built, you’ll be able to correct it in case it misidentifies food items. “If it only works 30 percent of the time, it’s enough that people will start using it,” Murphy said. “We’ll collect data, and it’ll get better over time.”
Current diet diary apps have you manually entering your food and calorie intake, which can be frustrating as you attempt to estimate serving sizes and the like. The convenience of simply taking a picture and having a computer do it for you may be too good to pass up on. That convenience may be waiting in the wings for a while, however, as the patent was only recently revealed, with no details on when it will actually be available to the public.
The technology behind Im2Calories could also be applied to other avenues of data analysis. “If we can do this for food, that’s just the killer app,” Murphy said. “Suppose we did street scene analysis. We don’t want to just say there are cars in this intersection. That’s boring. We want to do things like localize cars, count the cars, get attributes of the cars, which way are they facing. Then we can do things like traffic scene analysis, predict where the most likely parking spot is. And since this is all learned from data, the technology is the same, you just change the data.”