JPEG and repairing JPEG using JPEG-Repair
To a PC a file is nothing but a bunch of binary data. The most accessible form for humans to ‘read’ that data is in HEX code. Below you see a hex dump of the JPEG file we’ll use as an example on this page.
When I started my attempts to repair corrupt JPEG files it was this hex editor I used. I modified, added or removed bytes. This is also called ‘patching’, directly altering data within binary data. I then used a photo viewer to watch the effects.
Essentially JPEG-Repair uses the exact same method I used back then. The difference is that it hides the HEX code and instead provides you with visual feedback when adding / removing bytes to / from the JPEG data stream. So you do not have to edit or remove bytes using a hex editor, instead you tell JPEG-Repair to remove bytes from a certain offset in the JPEG file.
JPEG-Repair also maps the raw data to image data. As JPEG data is compressed there’s no 1 : 1 relation between position of raw data in the file to position within the image. If the top part of a photo for example contains mostly blue sky then this will compress well. This means that if blue sky occupies half the image, it may very well be only one 3rd of the raw data is used to store information. By mapping the data JPEG-Repair allows you to click the image, while JPEG-Repair will make sure you edit the corresponding raw data.
JPEG-Repair’s MCU editor takes this a step further. JPEG does not store pixels individually. Instead it takes blocks of pixels (commonly 8 x 8 pixels), and then encodes and compresses that as a whole. The MCU editor allows you to add or remove MCUs and even to modify color and brightness data.
JPEG-Repair is not a photo editor!
So, JPEG-Repair offers tools to alter the HEX code inside a JPEG file. This is fundamentally different from what a normal photo editor does. JPEG-Repair is more like a hex editor than a photo editor.
A photo editor ‘decodes’ an image and places it on a ‘canvas’. On this canvas it stores data on each pixel, it’s brightness and color. If you change the brightness of the picture it updates the information for all pixels individually. When you’re done editing and save the file, it takes all that pixel data and re-encodes it to save it to a (new) file. So if you changed brightness you’ll end up with a completely new file.
As said JPEG-Repair edits the raw data, not the image on the canvas. The changes you see in JPEG-Repair as a result of that are the result of the changes in the raw data, the rest of the raw data remains as it is, this in contrast to a photo editor.
So, the user interface of JPEG-Repair is basically nothing else than an overlay for the raw hex data inside a JPEG file. This is where JPEG-Repair is totally different from ordinary photo editors.
About JPEG
JPEG-repair allows to patch intact headers on files with corrupt headers. You can also edit the raw data that makes up the actual image. a JPEG file is made up of several sections. Each section is preceded by a JPEG Marker which defines what the following section is all about. Each marker defines the size of the section so that the next marker can be found. If any of those markers is corrupt, the file is corrupt. A number (of tiny) sections contains data that the decoder (which is built in to your photo viewer or editor, but also into web browsers) needs to decode the image data. The largest section is the actual encoded and compressed image data. If this ‘raw’ data that makes up the actual image (JPEG encoded data) becomes corrupt, your file may only partially ‘render’: Part of the picture shows, while the rest is distorted or absent (Grey block, colored block, striped block).
JPEG-Repair allows you to repair the section preceding the raw image data (sometimes called the header) using an intact JPEG file. It can also repair some errors in the raw JPEG data and allows you to patch (alter) this raw data. The editor allows you to remove, change and add data to the JPEG bitstream.
The actual encoded data is organized in blocks called minimum coded units (MCU). These block are typically 8×8. 16×8 or 16×16 pixels in size. Unless the image was coded with Restart Markers there are no points of reference within the image bitstream. So, there is no table, index or recognizable structures that indicate the start and size of individual MCUs, and this is why JPEG-Repair needs to map raw data to the image you see in the preview JPEG-Repair displays. I call this ‘pseudo-decoding’.
A decoder simply starts decoding until a JPEG Marker is encountered (either Restart Marker or EOI Marker). Size of a single MCU depends on it’s content and it’s compress-ability. One MCU may occupy x bytes, the next y bytes or n bytes. It is therefor quite complex to link any specific byte offset within a JPEG to the actual image. To mitigate this JPEG-Repair ‘pseudo-decodes’ the image bitstream. By doing to it can map MCU data to specific data within the stream. It also allows JPEG-Repair to detect certain errors to occur during decoding. Decoded data however is at no point used to render and image, which is why I refer to it as the pseudo-decoder.
JPEG Repair Software
Existing JPEG File Repair software mainly addresses damage and corruption in the header section (see image above). It is a vital part because without it the actual image data can’t be decoded. If we look at the size of the different sections we see this header makes up only an entire fraction of the total data in the file. The actual chance that corruption is only limited to this tiny area is small. If damage extends beyond the header most automatic repairs will fail.