The secret behind the most accurate food recognition

Business, Engineering

Katrin Keppler

July 25, 2022

The recognition is the core of every autonomous self-checkout system. To reach a recognition rate of up to 100% some may count on RFID tags and expensive modifications. Nowadays, that is not necessary any longer! In this blog post, we reveal how to reach such high recognition rates with computer vision.

Image recognition – What it's all about

With the help of image recognition computers can identify objects in pictures. For humans, this seems like a simple task but when it comes to computers it becomes a complex project that needs lots of specifications. Thanks to highly efficient computers it is now possible to recognize objects with artificial intelligence. The key to this solution are big data sets that “teach” the computer how different objects look. This teaching proceeds with the help of a so-called “algorithm”, an exceptionally large mathematical equation.

The more data flows into this equation the more reliable it gets. Once the teaching process is completed, good computer vision algorithms can identify objects based on only one picture. (You can find further information in our blog post “Teaching in only one picture!?”).

AI does not equal AI

Crucial for how good the recognition works is how the information contained in a picture, e.g., colors, size, or shapes, is proceeded. The more specific information is, the more valuable it is. We have optimized our process to use only information relevant towards recognizing food and beverages. Uniquely, we use a pixel-precise segmentation that blurs out the background and disturbing articles in the teaching process. These are therefore disregarded in the recognition and cannot bias the equation.

Pixel-precise segmentation – the silver bullet

But how does our pixel-precise segmentation work, and what distinguishes it from other computer vision algorithms on the market? As mentioned before, the pixel-precise segmentation only uses those areas in the picture, that are relevant for the food recognition. The following three pictures visualize this.

Original

Computer vision using bounding boxes

Pixel-precise computer vision of the visioncheckout

The areas analyzed during the recognition are the frames with a pink background. The recognition considers everything inside the box on the 2nd image. It often works fine, since the object to be recognized makes up the largest part of the box. However, there are also many cases in which the boxes do not supply a satisfying food recognition result.
That is why we use pixel-precise segmentation. In this type of computer vision, no rigid box surrounds the articles. Instead, the AI calculates precisely fitting masks that frame the items to be recognized. This way, no overlaps occur, and each article is recognized accurately and reliably. A few examples will illustrate the differences between the two types of image processing and their consequences.

Overlapping boxes

The delicious iced tea, a different salad dressing, and an additional dessert or fruit? The fuller it gets on your tray, the more likely the boxes around the individual items will overlap. In this case, sides and smaller articles might easily be overseen and therefore forgotten to be booked into the POS system. Bugs like that can get expensive. With the help of our pixel-precise segmentation articles cannot get into the recognition frame of other items. Even ketchup packets that lie halfway under someone’s plate will be detected.

Different backgrounds

You are in a rush, you have only one plate or you simply forgot something. The situation where a customer wants to check out without a tray is daily business and shouldn’t be a problem for an autonomous self-checkout. But the thing is: The more background areas are considered, the more important it is, that they look the same in every picture. As the pixel-precise segmentation does not consider any background for the recognition, these cases are no problem for the visioncheckout.

Varying sizes

Different sized portions are standard in many canteens and are happily accepted by guests. For the AI to distinguish between the different plate or bottle sizes, it relies on the size of the masks created and infers the size of the item from this. The problem: Depending on the orientation of the item, the size of the box drawn around it changes.

However, the mask around the item is not affected by its orientation but wraps precisely around its outline, different portion sizes can be reliably detected using artificial intelligence.