Hollywood movies like to exaggerate. They zoom photos a million times — and output numbers from a single pixel.
Although this is incredible, scientific research in this area has been going on for a long time. Back in the 90s, theoretical works and PoC were published with the restoration of text from blurred images. In 2012, Vladimir Yuzhakov wrote on Habr about his SmartDeblur program for restoring blurred and defocused images.
Despite the fairly good development of science in this direction, there has not yet been a specialized tool specifically for recovering passwords (text) after pixelation. Depix is the first such tool.
In 2019, Dmitry Vatolin, head of the video group of the computer graphics and multimedia Laboratory of the Moscow state University, spoke about the current state of science in terms of sharpening photos. He said that the Russian police constantly ask him for help, although they do not understand the complexity of the problem:
The questions are always the same. “We have a video with the suspect, please help to restore the face”… “Help to increase the number from the DVR”… ” here you can not see the hands of a person, please help to increase»… And so on in the same spirit.
To make it clear what we are talking about-here is a real example of a highly compressed video sent, where you are asked to restore a blurred face (the size of which is equivalent to about 8 pixels).
Blur removal tools, history and research
Images can be blurred in many ways. Pixelation with linear filters for pixel blocks is just one option. Most blur algorithms tend to blend / stretch pixels because they try to mimic natural blurring that is caused by camera movement or defocusing.
There are a variety of tools to sharpen in common tasks, such as sharpening photos. Unfortunately, it is with passwords that a different approach is needed. Here, the height of the characters is only a couple of blocks, so it makes no sense to just increase the sharpness, writes the author of the Depix program.
Above, we have provided links to some of the tools and research published on Habr since 2012.
Recent developments in the field of artificial intelligence have generated bizarre headlines in the news, such as ” Researchers have created a tool that perfectly increases the sharpness of faces.” The illustration below shows examples from a scientific article describing the PULSE algorithm by researchers from Duke University (USA).
But in fact, it does not restore photos, but generates new images that are blurred into the same pixels. The Foundation of these works is laid by the RAISR algorithm from 2016. The AI generates faces that blur into the image given at the input. It is important to understand that the generated face is not the original from which the original blur is derived.
Algorithms like PULSE seem new, but they have a very long history of tools for removing blurring. Back in 1994 (!) Mark bouillet of the Southwest research Institute (USA) wrote a program to generate “Plutons”, blur images and compare them with real photos taken from the Hubble telescope.
Restoring a number
In a well-known article from 2006, Dhira Venkatraman explains an algorithm for how to recover a pixelated credit card number. The idea is simple: generate all credit card numbers, Pixelate them — and compare the result with the pixelated number.
For example, we see a photo of a check or Bank card with a blurred number on the Internet. As you can see, here a linear filter for 8×8 pixel blocks was used for blurring:
How do I recover these numbers?
2. The script generates images for all numbers.
3. We blur each image according to the sample of the original image.
4. We determine the brightness vector of each image. The type vector contains the brightness values of each block.
Здесь номер чека 0000001 соответствует вектору
5. Find the vector with the minimum distance from the original one (after normalization).
d(0000001) = 1.9363
d(0000002) = 1.9373
d(1124587) = 0.12566
d(1124588) = 0.00000
So we find the check number: 1124588.
In 2019, Somdev Sangwan described an interesting method for restoring blurred faces in OSINT investigations. The method is as follows: the resolution of the photo is increased in Photoshop. It is first eroded:
And then the Yandex image search is launched (more advanced than Google Images). In this case, Yandex performs a” brute force ” of the face in the image:
It is easy to see that all the described methods have something in common. If there is not enough information to correctly restore the image, then we perform pixelation of similar data — and check whether they match.
This is the basis for our password recovery algorithm from screenshots.
Description of the algorithm for password recovery
A linear filter for blocks is a deterministic algorithm, then pixelating the same values always results in the same block. You can try to restore the text in approximately the same way as the numbers in the example above. Each block or combination of blocks can be considered as a sub-task.
The algorithm has certain limitations. It requires the same size and color of the text on the same background. Modern text editors also add hue and saturation, allowing for a huge number of possible font options in the screenshot.
Here is a fairly simple solution. We take the de Bruijn sequence for the expected characters, insert it into the same editor and take a screenshot. This screenshot is used as a wildcard image for similar blocks:
pip pip, all working))) print('hello world!')
This sequence includes all two-character combinations. It is important to use two-character combinations, because some pixel blocks cover more than one character.
For the search to work, you need a block with exactly the same pixel configuration. For example, in the test image, the algorithm could not find part of the letter ‘o’, because in the generated image, this block also included part of the next letter, and in the original image it was clean.
Creating a de Bruijn sequence with spaces around it obviously creates the same problem, only the other way around: the algorithm will not be able to find the correct blocks where a neighboring letter has hit the edge of the block. You can generate an image with all variants of letter combinations, as well as with empty spaces at the edges. Using this image, the search will take longer, but will give better results.
For most of the blurred image, the tool finds the results of a single match for blocks. Then it is checked that the matches of the surrounding blocks are at the same geometric distance as in the blurred image.
After passing through all the blocks, the program directly outputs all the correct blocks, and for blocks with multiple matches, it outputs the average value. The output isn’t perfect, but it works pretty well. The figure shows a test image with random characters. Most characters can be read.
The source code of the Depix program is published on Github.
By the way, the described technique echoes some well-known cryptographic attacks. For example, this is similar to hacking hashes, similar to an attack on the ECB block cipher and a plaintext-based attack (KPA).
So if you want to delete information from the screenshot-delete it completely, filling it with a solid color. Although here you need to think with your head.