Not all corrupt JPEGs can be repaired. In fact, a large number of damaged JPEG files I receive on a daily basis can not. So I spend a fair amount of time on files that I nor anyone else can repair. About 90% of these non repairable files suffer from one of the following issues:
- File size is 0 KB or way to small
- File is filled with zeros entirely
- File is filled with a byte pattern like FF FF or 55 55 etc.
- File contains seemingly random data which is not JPEG data
Ideally people check their corrupt files prior to sending them to me. I created a short video that shows you how you can use a hex editor to triage the condition of a damaged JPEG file.
Still many that submit files to my Online JPEG Repair Service or request my help because my utility JPG-Repair is unable to repair their corrupt JPG photos send me in fact files that are beyond repair.
Let it be clear that I do not blame any one! I understand that for many ‘average’ computer users this type of tinkering is very challenging. And yet that does not stop me from wondering if I could offer those people the tools to determine the condition of a seemingly corrupt JPEG photo.
How to see if data is JPEG data?
If a JPEG can be repaired or not mainly depends on one thing: Is the data within the corrupt file (mainly) JPEG data. If the answer to this question is yes, then most likely the file can be repaired.
I as a ‘trained’ JPEG tinkerer use a hex editor for that. A quick glance at the hex representation of the data inside the file allows me to judge if the file still contains JPEG data. Of course anyone can spot if a file is filled with zeros, but what if it is (partially) filled with seemingly random data.
The new version of JPG-Repair comes with a new set of tools that will allow anyone to estimate the condition of a JPEG file and to determine if a JPEG photo can be repaired or not.
JPG-Repair will examine a file and:
- Determine entropy
- Display a byte distribution histogram
Entropy is originally a concept in thermodynamics and I am hardly qualified to even begin to give you a description. I am going to borrow one from WikepediA: Ludwig Boltzmann explained the entropy as a measure of the number of possible microscopic configurations Ω of the individual atoms and molecules of the system (microstates) which comply with the macroscopic state (macrostate) of the system.
Information theory borrowed the concept and I am going with the definition as I find it at stackoverflow.com: In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannonentropy, which quantifies, in the sense of an expected value, the information contained in a message, usually in units such as bits.
Now, how on earth can this help us? Again, I am no expert, but I learned that the encoded and compressed data as it found in JPEG files has high entropy. Well, any compressed data has high entropy and JPEG data is in fact compressed. Many encrypted files have even higher entropy. I will try to explain in layman’s terms how I understand this.
Assume following uncompressed data:
AAAAAABBAACAAAABCCCAAAA = 23 characters. Let’s try compress this using the simplest method I can come up with right now:
6A2B2AC4AB3C4A = 14 characters. Instead of writing out 6 A’s, we simply indicate by 6A that uncompressed data contains AAAAAA.
When we compare the compressed and the uncompressed data we see uncompressed data is less random and less predictable. Compressed data appears more random. Cutting corners here, but you could call that higher entropy. And this is sort of what we see in JPEG files as well.
We can use this to our advantage. As a rule of thumb: If we see low entropy in a JPEG file then it is unlikely it can be repaired. If entropy is too high we’re probably looking at encrypted data. The latter I often see in JPEGs shot with Android phones.
Furthermore, if we look at JPEG data, we see a more or less uniform distribution of bytes. A byte is a 8 bit value. With 8 bits we can define numbers in the range of 0 – 255 (in hex, 00h – FFh).
Using a histogram we can for example easily spot if the full spectrum of bytes is used. I explained before that I often see corrupt JPEG files that consists of nothing but zeros. This will give ultra low entropy and can be easily spotted in the histogram.
Basically there are 2 scenarios:
- Image partially renders however at some point it turns into a grey, colored or corrupt block
- Image does not display at all or even a preview
Image partially corrupt
If an image partially renders the question is if we can get to render it completely. This situation is quite common, however not as common as the JPEG not displaying at all.
- Lower half of file filled with a byte pattern is quite common. Results in too low entropy, and although hard to see here, histogram most right bar (FFh or 255) is maxed out.
- Lower half filled with zeros is I also see quite often, results in low entropy and again hard to see, most left column (00h) of the histogram is maxed out.
- An invalid marker results in the exact same symptom when opening a JPEG in the Windows Photo Viewer, however entropy is as expected for a JPEG file and histogram looks like what it should like for a JPEG file. As a result, this file was repairable and actually automatically corrected by JPG-Repair.
JPEG does not display (and no preview thumbnail)
A JPEG that does not render at all is the most common issue submitted to my online JPEG repair service. Unfortunately often these can not be repaired due to no actual JPEG data being present. People using my JPG-Repair utility often contact me if anything can be done. Part of the problem is that JPG-Repair simply told ‘an error occurred’. Entropy and the histogram now should make it clear if there is a chance if the JPEG can be fixed or not.
- A JPEG only containing zeros can of course not be repaired, shows low entropy and histogram bar on the left is maxed out
- Same as a zero filled JPEG, histogram right side is maxed out as the byte pattern was FF FF FF etc.
- Unknown data inside this JPEG, the maximum entropy value may suggest encrypted data. I often see encrypted data in JPEGs that were shot with Android phones. These files are technically not corrupt.
- Example left, bottom shows what a valid JPEG approximately looks like. The right amount of entropy and full byte spectrum used. If JPG-Repair can not repair this file (using a sample file) then you should send me an email. Please include corrupt file + a sample shot with the same camera and settings.
- Next JPEG does not contain JPEG data, entropy too low and only half the byte spectrum is used.
- Last example, too low entropy and the histogram looks nothing like a normal JPEG.