JpegDigger future plans and entropy in JPEG data

By | October 1, 2019

When I first started working on JpegDigger it happened based on the case I had at hand. Typically a customer asked me to repair corrupt photos and the (my) diagnosis was that the corruption was either due incorrect recovery or a corrupt file system (rather than individual corrupt files).

And then at some point I wrote all that in a more or less generic RAW recovery algorithm with some on the fly error repair and rule of thumb reassembly of non contiguous files. That is basically the current status.

Things that are on the to-do list as far as I am concerned are (in no particular order):

  1. Ability to process disk image files. Currently I mount disk images using OFS Mount.
  2. Option to create a disk image, probably limited to external disks such as memory cards.
  3. List physical drives and make it possible to scan those.
  4. Entropy map.

Assuming the first 3 are self explanatory, I will describe 4 a tad more detailed:

JpegDigger Entropy map

The problem

There are a few common problems I encounter when dealing with issues brought to me by my customers:

  1. Empty files or files not containing actual JPEG data. Empty files are often the result of fake memory cards. On these cards the file system points to non existing memory. Files that are copied or recovered are therefor empty. Files not containing JPEG data are often the ‘product’ of file or photo recovery and undelete software. The software finds a pointer to a JPEG file and uses it to recover clusters associated with the file. Either the pointers are incorrect or the file was overwritten with new data.
  2. Encrypted files which are recovered from memory cards used in Android or Apple smart phones. Files were individually encrypted, file system points to JPEG files. The recover software saves the photos as such.
  3. Card is simply empty. The photo recovery software scans the memory card and finally simply reports that no files were found. This is often very frustrating as the question why photos can not be recovered remains without answer. I have worked on cards that were kept for years in the hope that some day the data can be retrieved.

I think calculating entropy and displaying that in some form can address these issues.

Entropy in JPEG data

Entropy is a measurement for ‘chaos’. High entropy means a high degree of chaos, low suggests order and predictability. Limiting myself to data, a property of compressed data is high entropy. Encrypted data can often be detected by the highest state of entropy.

Low entropy example: 0x00, 00, 00, 00, 00 etc.. Or 0xFF, FF, FF, FF, FF and so on. Almost by definition this is uncompressed data as the latter is easily compressed by (0xFF)*5.

One property of JPEG data is that it is compressed. So, if we would calculate entropy for each block we scan we can detect:

  • If data is present at all
  • Encrypted data
  • High entropy compressed data

Entropy map and what it can tell us

The JpegDigger entropy map is not a definitive answer but a useful tool to determine if data can potentially be recovered. It must be noted that high entropy as a result of compression is not exclusive to JPEG but can also indicate ZIP files, MP4 data etc..

High entropy in first part of memory card suggests JPEG (compressed) data

High entropy in first part of memory card suggests JPEG (compressed) data, largest part contains no data. Visually represented by JpegDigger entropy map.

In above illustration entropy map shows high entropy data for a good part of the first half of a memory card. The largest portion of the drive does not contain any data. A completely black entropy map tells us the drive contains no data at all. A largely cyan map means the data is largely the result of encryption. In the latter 2 cases recovery software is no solution for the data loss.

The video shows how no detection of JPEG files although the entropy map shows high, JPEG like entropy. Since the card is used to store photos we can assume the data is present and something else is wrong. In this case the software was not using a correct set of parameters: the software tries to divide the drive in logical blocks to reduce the time required to scan. In order for this to work it needs the correct block size and the correct offset to the first block. To determine these values it interprets the boot sector BPB (BIOS parameter block which in this case was corrupt. JpegDigger contains a tool to help determine correct values. After I used it JPEG files can be recovered.

In this case the entropy map gave a valuable hint: There is data and if you’re not detecting it you’re doing something wrong.


Leave a Reply

Your email address will not be published. Required fields are marked *