In this post I will try to explain why most tools fail recovering fragmented digital photos and instead recover corrupt JPEG files. If half the data is missing due to file fragmentation I can not repair the JPEG (see: JPEG Repair Service). It is better to have a tool that can recover fragmented files directly.
Why photo recovery and undelete software recover corrupt jpeg files.
Digital image recovery and photo recovery software
The main problem with digital image recovery from memory cards as used in most modern camera’s today is fragmentation. 99.9% of the photo recovery tools available on the market use a simple method for recovering digital images: header/footer carving. They are able to detect the start and the end of an image file. This is sufficient to recover contiguous JPEG files. In case files are fragmented current software will recover, at best, corrupt JPEG files.
Contiguous file, correctly recovered JPEG:
IMG_001 | IMG_001 | IMG_001 | IMG_001 | IMG_002 | IMG_002 | IMG_002 | IMG_002 |
fragmented file, incorrectly recovered JPEG:
IMG_001 | IMG_001 | IMG_001 | IMG_002 | IMG_002 | IMG_001 | IMG_002 | IMG_002 |
Depending on the software and method used the fragmented file IMG_001 will be:
- cut off prematurely because of header of next file was found, ergo it will recover corrupt jpeg files.
- or include part of another file and be cut off because of a max file size setting. Most software has such a setting to prevent the software from keeping adding data to the recovered JPEG on and on, as long as no footer was detected. So it cuts them off at say 4 MB.
IMG_002 will probably be cut off prematurely as soon as the footer of IMG_001 is detected which it will interpret as the end of the JPEG.
Classic file recovery and undelete software
As most memory cards are formatted with a FAT type based file system file system, file recovery and undelete software will also fail recovering fragmented files. If a file is deleted or the entire card was accidentally formatted the file allocation table (FAT) is zeroed. The FAT is essential for locating all clusters that make up a file. Once the start cluster of a file is retrieved from the file’s directory entry the FAT is used to determine the other clusters using a linked list. Assume a file starts in cluster 400 (according to directory entry). In the FAT the entry for cluster 400 points to the next cluster. That one points to the next, etc..
For arguments sake assume a file occupies 8 clusters. So that would be 400 -> 401 -> 402 -> 403 -> 404 -> 405-> 406 -> 407 if the file is contiguous. Now a fragmented file FAT chain could look like: 400 -> 401 -> 402 -> 403 -> 544 -> 545-> 546 -> 547. As soon as a file is deleted ALL these will be zeroed and we’ll only have the directory entry pointing to the firs cluster of the file.
Typically, a recovered image that was fragmented will look like this (or worse):
Using more advanced techniques it will be possible to recover that same corrupt JPEG file like this:
Recover corrupt jpeg files the right way
First task is for the software to somehow validate the recovered file. If it comes to the conclusion the JPEG is valid no further action for this file is needed. It then needs to keep track of clusters or sectors that were allocated to valid files. As these are ‘accounted for’ there is no need to to examine them to complete files that were found invalid by the validation routine. It then needs to process all invalid files and try to match these with the data in clusters that weren’t accounted for yet and sort of ‘glue’ files together.
Validation of files
Key in the recovery of fragmented files is a validation function. A JPEG file is typically a chain of markers, where each marker points to the next. One marker may describe certain properties of the image file such as a file name, size, settings used, camera model etc.. Other are more directly related to the image such as the Huffman tables (needed to decompress image data and the actual image data.
As the largest portion of a JPEG (or any other image file) is probably the actual image data it is most likely gaps due to file fragmentation are in this area. To validate this data a JPEG decoder needs to be incorporated in the software. This way the software can test if a certain combination of clusters renders a valid JPEG file.
The larger the media, the more combinations are possible, so this is a very time consuming process. The example below show analysis of a 32 GB SD card using a mobile i5 4200m CPU. The entire process took more than 7 hours to complete! It’s up the user to decide if the time spent is worth the gain of the additional 19 images recovered using advanced file carving techniques vs the 895 recovered using more or less conventional methods.
Note: Part of the slowness is caused by the fact that the memory card was accessed directly. Speed can be improved by first imaging the SD card and analyse the image file instead (as the image file is stored on HDD which offers better access times).
This software is now available. It can recover fragmented files of the following types:
- JPEG images,
- CR2 Canon raw image files,
- NEF Nikon raw image files,
- Fragmented AVI/MJPEG files,
- Certain variants of fragmented MP4 video files.