Your average picture probably isn’t worth as much as a thousand words. You can only learn so much from selfies. But sometimes, you need to know where an image came from, no matter how many words it’s worth. There are reverse image search engines like Google’s, TinEye’s, Bing’s, Yandex’s, Pixsy’s, and many more that can do this.
How do they recognize what to watch for, though, if you don’t give them any words? The most important question is, how do they find it? Each search engine’s reverse image search works a little differently, and they don’t share their exact algorithms, but the basic ideas are out there and easy to understand.
Pictures may be more unique than fingerprints because two pictures can’t have the same arrangement of pixels, while the chance that two fingerprints will match is about 1 in 64 billion, which isn’t too bad. But how can a picture be fingerprinted? Depending on the algorithm, the steps are different, but most follow the same basic pattern.
First, you have to evaluate the image’s features, including color, texture, gradients, shapes, relationships between different parts of the picture, and even stuff like the Fourier series (a method of breaking images down into sine and cosine).
Let’s say we need a fingerprint of the following picture to find it. We could do this by using the image’s color histogram, Fourier Transform, and texture map, which you can see below. If an image were resampled, blurred, rotated, or changed in some other way, some algorithms would try to find hits by using the above and other features.
Encoding, Storing and Looking
Every part of a fingerprint’s image can be turned into a string of letters and numbers that are easy to store and look up in a database. Whatever features are taken from the picture and saved will be the entry for that picture in the reverse image search engine. As of February 2020, TinEye’s database has about 39.6 billion indexed images.
This means that they’ve run their algorithm over that many pictures and are storing all of the fingerprints to compare searched images to them. Figure out which images are similar. This is the second most important part of the algorithm. When you upload a picture, it will go through the fingerprinting algorithm of the reverse image search engine.
The search engine will then attempt to find the entries with fingerprints that are the closest. This is called image distance. Each search engine chooses which factors to compare and how much weight to give them. However, they all try to find a total image distance that is as close to zero as possible.
How about Machine Learning/AI?
Reverse image search was pretty good before it was possible to use AI because of fingerprinting and indexing. Since AI is good at processing images, though, many of the biggest search engines probably use things like convolutional neural networks (CNNs) to help extract and label features.
For example, Google could use a CNN in its reverse image search to find likely keywords for the picture and show relevant web and image results, just like they’ve been doing for a while in Google Photos. This is a step up from simple feature extraction and image distance in reverse image search.
Convolutional neural networks run images through numerous filters that map out different types of features and then try to classify them based on how they were trained before. That’s an oversimplification of class, but to speak, CNN makes google images much more accurate and useful. They are probably being used along with the older computer vision fingerprint identification methods.
What’s the best image search engine that works backward?
Different algorithms make each image look different. Different search engines are good at different things, but they all want to find a match for the picture you uploaded. Google Images, for example, has a pretty good hit rate, but it does a lot of “best guessing,” which gives you a lot of photos that look similar but aren’t the same.
That’s great if you’re looking for a mood or general classification, but an engine like TinEye is much more interested in finding identical images, even if they’ve been heavily edited. It can even find images within photos, making it slightly better if you need an exact match.
Yandex, a Russian search engine, is said to have a good image search tool, though it seems to do best with Russian topics, which makes sense. Tools like Pixsy and ImageRaider are designed to find cases of unauthorized use.
Because of this, they tend to have more features like notifications and focus on keeping an eye on user photo libraries. If one search engine does not provide the results you are looking for, it is in your best interest to check the other search engines to see if they do. This is because the algorithms are always updated and generally kept secret.