MKV header & raw file recovery

Post a reply


This question is a means of preventing automated form submissions by spambots.
Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :arrow: :| :mrgreen: :geek: :ugeek:

BBCode is ON
[img] is ON
[url] is ON
Smilies are ON

Topic review
   

Expand view Topic review: MKV header & raw file recovery

Re: MKV header & raw file recovery

by Alt » Sun Nov 11, 2018 8:17 am

abolibibelot wrote: Sun Nov 11, 2018 5:00 am
A small but great improvement would be to get a consistent file naming pattern for “extra found files”, as in Photorec. Photorec's generated files are named fXXXXXX.ext, where XXXXXX is the number of the file's first allocated sector. That way, when doing several successive scans with different parameters (e.g. different file types checked / unchecked), the same file would appear with the same name in the recovery tree, which would prevent from extracting duplicates, or having to painstakingly verify each one by verifying its size and searching its first sector number in the hexadecimal viewer. (One caveat with the current version of Photorec is that this naming scheme is no longer valid beyond the 2TB mark, most likely because of a 32 bits limitation : for instance a MKV file carved at sector 4303643744 is named “f8676448.mkv”.) To keep a continuity with the current file naming scheme, files could be named with the first sector number and a padding of 0s depending on the total number of found files, for instance a JPG file located at sector 123456 could be named “00123456.jpg”.
R-Studio names such files using their metadata. But I'll pass your suggestions to our developers.

Re: MKV header & raw file recovery

by abolibibelot » Sun Nov 11, 2018 5:00 am

Re-checking this thread a year later, again... :)
And I can confirm that indeed the built-in MKV detection has vastly improved in the later versions. Now it's on par with Photorec for that particular file type.

A small but great improvement would be to get a consistent file naming pattern for “extra found files”, as in Photorec. Photorec's generated files are named fXXXXXX.ext, where XXXXXX is the number of the file's first allocated sector. That way, when doing several successive scans with different parameters (e.g. different file types checked / unchecked), the same file would appear with the same name in the recovery tree, which would prevent from extracting duplicates, or having to painstakingly verify each one by verifying its size and searching its first sector number in the hexadecimal viewer. (One caveat with the current version of Photorec is that this naming scheme is no longer valid beyond the 2TB mark, most likely because of a 32 bits limitation : for instance a MKV file carved at sector 4303643744 is named “f8676448.mkv”.) To keep a continuity with the current file naming scheme, files could be named with the first sector number and a padding of 0s depending on the total number of found files, for instance a JPG file located at sector 123456 could be named “00123456.jpg”.

It would also be very useful to have an optional field somewhere saying if a file appears to have been overwritten by another, similar to what Recuva does, even in “quick scan” mode.

The ability to compute MD5 checksums would be very useful too. There should be an option to compute those during the initial scan (which may slow it down a bit but should still be way quicker than running a second analysis just for that purpose), or afterwards, for each file or a folder / group of folder. WinHex has such an option when using the “refine volume snapshot” module, and also on a file-by-file basis, or even on a manual selection within a file or volume.

The “save file names fo file” function could be improved, with selectable fields for the output, showing at least the file size (which Recuva includes when generating lists with its equivalent feature), timestamps, first allocated sector, and MD5.

It might be better to create a new thread for that kind of suggestions... I'll try to do that later...

Re: MKV header & raw file recovery

by Alt » Thu Oct 12, 2017 11:41 am

I forgot to tell that now R-Studio detects mkv files much better when the built-in file type is used.

Re: MKV header & raw file recovery

by Alt » Tue Oct 10, 2017 4:47 pm

Recovery of mkv files had been seriously improved since R-Studio 8.1.165145 (Oct 25, 2016). If you have an mkv file that R-Studio cannot recognize, please let us know.

Re: MKV header & raw file recovery

by abolibibelot » Fri Oct 06, 2017 6:40 am

I'm re-reading this one-year-old thread I initiated.
– Could noone come up with an explanation regarding those weird file sizes, using the custom template provided by “Alt” ?
– Has there been any progress in later versions of R-Studio for MKV files detection and recovery ? (Version 7.7 was used at the time. In that particular case, I finally extracted all the files with Photorec, selecting only Matroska file type, which worked very well, even though it would have been more convenient with R-Studio's GUI.)

Re: MKV header & raw file recovery

by abolibibelot » Tue Sep 20, 2016 6:51 am

Sorry to be so late giving an update, I let this issue (and those hard drives) lie for quite a while.
Last night I tested this definition file (which is simple and clever in that particular case, but may not be specific enough to be implemented as is in R-Studio : other file types could contain “matroska” -- this very web page for instance -- and if that string happens to be at an offset of +8 or +24 it's going to be recognized as a MKV file) on the whole used portion of the second hard drive (the one with no remaining file system). So, it does seem to work for header detection, but the identified file sizes are crazy.

http://www.cjoint.com/c/FIukUJBJaiy (direct link : http://www.cjoint.com/doc/16_09/FIukUJB ... 3%A9es.png)
http://www.cjoint.com/c/FIulKYpw8Jy (direct link : http://www.cjoint.com/doc/16_09/FIulKYp ... ype-1-.png)

It doesn't cut each file where the next one begins. For example, files 0000.mkv and 0001.mkv are respectively 65 793 949 696 and 58 764 492 800 bytes, the difference is 7 029 456 896 bytes, which is exactly the size of the first file extracted manually with WinHex. Same for the next ones, then 0006.mkv has the correct size (7314735104, about 7GB), then it goes again to about 70GB... The largest file size as it appears in the preview panel is 482 484 944 896 (449GB !).

http://www.cjoint.com/c/FIulnBkuXGy (direct link : http://www.cjoint.com/doc/16_09/FIulnBk ... illes-.png)

What could explain such a behaviour, and is there a possible fix ? (I could still continue extracting those damn files manually with WinHex, but there are about 300 of them, so it's going to be quite a chore... Photorec identifies the headers correctly, and doesn't produce such humongous file sizes, but sometimes a file is cut short for no apparent reason, and it doesn't allow to select an interval to avoid scanning portions known to be empty or to resume the recovery on such a large volume, so it's not ideal either.)

Here are the offsets and sizes of the first 15 files I manually identified, if it can help :

01 : 36004757504-43034214399 > 7029456896
02 : 43034214400-51874496511 > 8840282112
03 : 51874496512-61262053375 > 9387556864
61262053376-61262069759 > 16384 = remnant of index / folder structure
04 : 61262069760-69479497727 > 8217427968
05 : 69479497728-80038916095 > 10559418368
80038916096-80038920191 > 4096 = remnant of index / folder structure
06 : 80038920192-94483972095 > 14445051904
07 : 94483972096-101798707199 > 7314735104
08 : 101798707200-107662475263 > 5863768064
09 : 107662475264-114702745599 > 7040270336
10 : 114702745600-122918207487 > 8215461888
11 : 122918207488-129958674431 > 7040466944
12 : 129958674432-136995340287 > 7036665856
13 : 136995340288-144033447935 > 7038107648
14 : 144033447936-147548536831 > 3515088896
15 : 147548536832-158103551999 > 10555015168

Re: MKV header & raw file recovery

by Alt » Fri Apr 15, 2016 1:04 pm

Here what I created and tested for the mkv files of both types:
<?xml version="1.0" encoding="UTF-8"?>
<FileTypeList version="2.0">
<FileType id="50001" group="Matroska" description="Matroska_Type1" features="" extension="mkv">
<Begin combine="AND">
<Signature offset="8">matroska</Signature>
</Begin>
</FileType>
<FileType id="50002" group="Matroska" description="Matroska_Type2" features="" extension="mkv">
<Begin combine="AND">
<Signature offset="24">matroska</Signature>
</Begin>
</FileType>
</FileTypeList>

The only difference is that the string "matroska" appears at the offset of 8 bytes for Type 1 and at the offset of 24 bytes for Type 2.
Unfortunately, no way to find the end of file.

Re: MKV header & raw file recovery

by Alt » Thu Apr 14, 2016 2:06 pm

I downloaded the files. Will look at them ASAP.

Re: MKV header & raw file recovery

by abolibibelot » Tue Apr 12, 2016 1:28 pm

If you send me 3-4 small pieces (a header and short body piece) of those mkv files, I can write that Known File Type description file.
Thanks for your diligence. I'd be glad to have contributed to the improvement of this excellent software.
Here are six 1 MB pieces cut with WinHex (the total sizes indicated in the file names are only approximate, as those files were extracted manually with WinHex as I indicated before, since no residual file system could be identified on that HDD). It's enough for them to be fully recognized by MediaInfo, so you can verify which muxer and which parameters was used for each of them. There are two with type 1 header (the one currently recognized by R-Studio) and four with type 2 header. (I found no clear pattern, the type of header doesn't seem to be correlated with an older or newer version of MKVMerge. -- Actually it appears that the files with type 2 header have an encoding date information displayed in MediaInfo, whereas those with type 1 don't.)
http://www.cjoint.com/c/FDmr4YCwLAy
Here's also the "File Type Signature Search" included in WinHex, which I cited in my previous message :
http://www.cjoint.com/c/FDmsjvt8bmy

Re: MKV header & raw file recovery

by Alt » Mon Apr 04, 2016 4:40 pm

If you send me 3-4 small pieces (a header and short body piece) of those mkv files, I can write that Known File Type description file.

Top