Possible regression regarding the reporting of overlapping / linked files

A forum on data recovery using the professional data recovery software R-STUDIO.
Posts: 40
Joined: Sun Jan 31, 2016 5:45 pm
Location: France

Possible regression regarding the reporting of overlapping / linked files

Post by abolibibelot » Fri Sep 18, 2020 6:12 pm

As I had been using R-Studio 8.7 for quite some time (I tried 8.10 at some point but it would crash frequently so I went back to 8.7 which proved consistently reliable), I tried to upgrade yesterday. First I tried v. 8.14, but it caused a CPU usage spike whenever started (13% on a 4T/8T CPU which means that one virtual core gets saturated). Then I tried 8.13, which had no such issue. But doing a new scan of a drive I've already thoroughly analysed with three different versions of R-Studio (7.7, 8.0, 8.7), I noticed what I considered to be a regression in the way this new version deals with overlapping / linked files, i.e., files, which are reported by the filesystem as starting on the same sector.

With R-Studio 8.7, in the “Matroska Video” under “Extra found files”, I would find :
1) files retrieved through the filesystem analysis which are completely recoverable, appearing with their original name ;
2) files which were not retrieved through the filesystem analysis and are only appearing in “Extra found files”, with a random number as their name (I already suggested that it would be better for those files to be named according to a fixed pattern, preferably based on their starting sector or cluster, absolute or relative to the start of the partition) ;
3) and files which were identified through the filesystem analysis but as something else (i.e. something other than MKV files) and were therefore overwritten by an unknown MKV file, which appear with the original name of that other file, which is no longer recoverable as such, but might very well still be recoverable as a complete MKV file, although it is not possible in partition mode (those files would only appear as complete MKV files in “Raw files” mode, which is another problem I'll adress below).

With R-Studio 8.13, some actual MKV files which used to appear in “Extra found files” (1) have been replaced by other files retrieved through the filesystem analysis which have been overwritten by these MKV files (3), beginning on the same sector. And so those perfectly valid MKV files appear in their original directory under “Root” without the blue square which normally indicates that a file is linked to a folder under “Extra found files”. It has the upside of allowing to identify those files (3) as not being what they appear to be, but, conversely, it has the disadvantage of making it appear like those perfectly valid MKV files (1) are not what they appear to be, although they are.

Here, most of the files with extensions other than MKV appeared in R-Studio 8.7 as the MKV files which overwrote them. For instance, “ErroticaArchive-Boana-Coccinelle-by-Arturo-high-0025.jpg” is reported as beginning on sector 1598699752, yet the file actually located on this sector is “Columbo S01E08 - Short Fuse.mkv”, which is not listed here. With R-Studio 8.7, it was the other way around : this file appeared with the link symbol as it was also present under “Extra found files”, whereas the JPG file appeared with no symbol at all (since it actually has a MKV header and therefore can't be linked to the “JPEG Image” virtual directory as other still recoverable JPG files in the same original directory).
And therefore, in the original directory where “Columbo S01E08 - Short Fuse.mkv” is located under “Root”, it appears with no link symbol.

Then, I also noticed that two files detected by pure file signature search, which were 100% recoverable with R-Studio 8.7 (all MKV files which were not truncated / fragmented were perfect duplicates of those recovered with Photorec -- a welcome improvement over older versions which had trouble identifying MKV files), appear with R-Studio 8.13 with a significantly inferior size, for no apparent reason -- there's no obvious pattern or common file signature at the points where they were cut. (If that's any relevant, the “Data Interpreter” field becomes totally empty when selecting either one of the last 7 bytes -- but the same happens with files correctly detected, and the same happens too with R-Studio 8.7, so that probably has nothing to do with that issue.) I'm noticing just now that the very useful feature “show files in HexEditor” seems to have vanished from the “File” menu, so I can't even use it to check what file, if any, the area beyond the point where those two MKV files were cut is attributed to. Has this feature totally disappeared, or is it now only available in the “technician” edition ?

(On those two screenshots, R-Studio 8.7 on the left, R-Studio 8.13 on the right, showing the same file starting on the same sector ; the file identified by RS 8.7 is complete, the one identified by RS 8.13 is cut short for no obvious reason.)

To get both the advantage of identifying overwritten files, and verifying that valid files are indeed valid, and also being able to extract unreferenced files which may have overwritten files which are still referenced, ALL files reported as beginning on the same sector should be listed and linked in “Extra found files” -- even those detected only through file signature search, because, as it currently is, it is not possible to recover those files when opening a specific partition if a file was identified through the filesystem analysis as starting on the same sector ; for instance, at sector 3415264800 starts a 5972729086 bytes MKV file, complete and contiguous / non fragmented, but it is not listed because a JPG file starting on the same sector is listed instead ; it turns out that the MKV file is fully recoverable although no longer referenced, while the JPG file is not recoverable at all as it was overwritten by the MKV file, and so extracting that JPG file creates a completely useless file which corresponds to the first few MB of a MKV video but has a .jpg extension.
And there should be a column showing the first sector of each file, so that files could get sorted by that criterion to quickly identify which files got overwritten by which, as well as a “Starting Sector” tab beside the current “Real”, “Extensions”... tabs, which would show files sorted by intervals of starting sector (customizable intervals ideally, with the option to disable any grouping and simply listing all files in a single list, with a warning if there are more than XXX files and it is known to cause performance issues).

Then I noticed that the “Recovery chances” column (which didn't exist in v. 8.7) isn't showing any useful information for files which were retrieved through the analysis of a deleted / formatted partition : for all of them the status is marked as “undetermined”. In fact it's the same for the vast majority of files : there's a folder which was stealthily written to that drive some months ago, probably during a Windows update while that drive was connected (no idea why, it probably chose this drive because it was the one with the most available space, as it appeared as totally empty), so it was written on the current partition / filesystem, but even the files in that folder have the “undetermined” status. Only one MHT file which I accidentally saved on this drive when doing the initial recovery, and the few files in “System volume information” or “$Recycle.bin”, do get a specific status, which is “Good (existing file)”. (And also empty files get a “Zero-sized (Empty file)” status, but that definitely falls in the “Captain Obvious” category.) I have yet to see an actually useful information like “this file was overwritten by ...” (which Recuva has been providing for years -- see below).

Even when it's not possible to assess which file got overwritten by which based on the filesystem analysis, it should be trivial to at least indicate if the file's header does match its extension, or not. If it does not, it generally means that the file was overwritten, but it can also mean that the extension was wrong to begin with. In any case, a simple comment like “MKV header” / “ZIP header” / whatnot, or “no known header”, would be very helpful.

But, guess what, I did a quick test with Recuva -- I mean really quick : I stopped the deep scan after a mere three minutes (and with an old version to boot, probably from 2016... well, I just checked, actually it hasn't been updated since 2016) -- and it does display specific “State” information for all files identified through this very partial analysis, information which seems perfectly accurate based on my earlier painstaking manual investigations with R-Studio and WinHex ! How come a 4 years old version of a free software does better than a recent state-of-the-art version of one of the most reputed commercial solutions in the data recovery business ?

Here Recuva clearly indicates that the aforementioned JPG file has been overwritten by the “Columbo S01E08” MKV file. In less than five minutes, shower included (this was reportedly the nickname of french president Jacques Chirac in his hectic youth).

Another thing for which Recuva fares better than R-Studio is exporting lists : it is still not possible from R-Studio's HexEditor window to export a complete list of a file's sectors (which allowed me once to salvage a bunch of files which would otherwise have been lost), and when exporting a list of identified files, R-Studio merely indicates the name and path, while Recuva also indicates the size, with a clearer layout. I already suggested it : it would be very useful to have extra information in that list : size, timestamps, starting sector, checksum if such a feature were ever implemented (which I also already suggested).

That's all folks !