3

I have a HDD with an un-allocated section, likely reserved to later add a partition for backups. I want to know if there is data in said section, or if i can flatten it.

A few years ago some windows update on a dualboot broke my partition table. Luckily I could guess the partitions rather well. Except I do not know if there was data on one currently unallocated section of the disk. How to check if there's data. I would like to use that section now, but I would prefer to recover files if they ever existed.

NOTES:

  • the disk was new before the partition crash (aka only partitioned exactly once). If there is data, the section was in use and I'd like to restore the data some day.
  • if there was a partition, it was either NTFS or ext4
  • if there was a partition, it was not scrambled/locked/... but just plain.

Current Partitions

I don't need an absolute. A very likely or very unlikely would be okay too. Just mention it in your answer.

3
  • Unallocated, by definition, means no partitions, let alone "data"... And given your report there's nothing useful that can be hypothetically retrieved from there. Commented Oct 31 at 12:40
  • 1
    testdisk may help reassure you (or otherwise) Commented Oct 31 at 12:42
  • 2
    @ChanganAuto Unallocated areas are still sectors of the disk. Disk sectors, by definition, contain data, 512 bytes of it each. The question is, is this data that OP cares about? Say, remnants of a previous filesystem? Or is it garbage data that has no relevance for OP? Commented Nov 1 at 19:24

2 Answers 2

14

the disk was new before. If the section was not in use before the partition table wreck there should be nothing there

Usually, drives come zero initialized these days (there's no guarantees, though – some are also delivered pre-formatted to a given file system!).

So, quite honestly, to avoid over-thinking this: You want to estimate the information in a stream of bytes.

Luckily, information theory tells us that high-information regions cannot be compressed much without losing information. So, if you have a fast compressor, running it on the region and checking how large the resulting file is a quick way to figure out.

If there is useful data, then that data cannot compress to "nearly no data" when put through a fast compressor. If there's only zeros, then that will compress to-damn-near-nothing, if there's mostly zeros and a few file system markers but nothing of value, you get a few kB of size after compression.

So, I'd frankly do the following:

# this can be done with "sudo losetup" as well. udisksctl loop-setup \ -b /dev/sda \ --offset={START OF UNALLOCATED SPACE IN BYTES} \ --size={SIZE OF UNALLOCATED SPACE} # outputs a device name, take note! zstd --compress \ -f \ -T0 \ -o /some/other/storage/image.zst \ /block/device/printed/by/previous/step 

and then check the size of the resulting image.zst. Bonus: if it's of significant size, it's already a backup of that region.

12
  • 1
    Thanks. That sounds like a sound idea. I'll try it and mark as answer if it works. Commented Oct 31 at 12:54
  • udiskctl for this interesting, wouldn't have thought of that. I'd probably use losetup, or maybe create a partition for the unallocated space (but don't let gparted or whatever format the new partition). And then use file -s /dev/xxx on the resulting device name to see if it was previously formatted, or just try to mount it (readonly is safest), and copy the files on it if it does mount. Commented Oct 31 at 13:10
  • 1
    @Teck-freak sorry, currently on the run. I think, however, that you're more than capable enough of reading man losetup and mapping the parameters here to the parameters there! Commented Oct 31 at 13:22
  • 3
    @ChrisDavies thanks! It's primarily feasible through the impressive speeds that modern compressors achieve; without, we'd need to start thinking at some entropy-estimators that are less readily avaialble. Commented Oct 31 at 15:29
  • 3
    @MarcusMüller indeed I read an article some time ago, where they used deflate for DNA analysis to get a feeling how closely different species are related. Brilliant really. Anyway your solution was precisely what I was looking for. I had something like that in mind, but just couldn't get the last bit of it. Thanks and cheers mate. Commented Nov 1 at 11:20
5

I'll keep the well crafted answer of Marcus Müller accepted, as it works well and send me down the right track.

In the end I looped over the disc with two (read-only) loopback devices offset by a byte and compared the streams.

# binding a section of /dev/sda to /dev/loop0 losetup --read-only \ --offset {START OF UNALLOCATED SPACE IN BYTES} \ --sizelimit {SIZE OF UNALLOCATED SPACE} \ /dev/loop0 \ /dev/sda # ... and the same offset by one losetup -r -o {START OF UNALLOCATED SPACE IN BYTES PLUS 1} \ --sizelimit {SIZE OF UNALLOCATED SPACE} /dev/loop1 /dev/sda # now compare the streams. cmp will be quiet if both streams are equal # and will output the first byte where they differ otherwhise cmp /dev/loop0 /dev/loop1 

When it hit something I stepped over one MiB and continued. In my case there was but a single case of this, which I streamed into a file, both to take a closer look and as a backup.

losetup --detach /dev/loop0 losetup -r -o {START OF UNALLOCATED SPACE + OFFSET TO WHERE BYTES DIFFERED} --sizelimit 1M /dev/loop0 /dev/sda cat /dev/loop0 > /path/to/my/backupFile 

Taking a closer look with Okteta I could see that it was the start of a NTFS partition, which I never had used and thus didn't restore when fixing my partitioning table. Mystery solved. Keeping the 1MiB file for future analysis and tinkering.

Sidenote:

  1. I did all of the above as SuperUser. Use sudo instead, if you wish.
  2. If you use cmp -l /dev/loop0 /dev/loop1 you will get (as decimal numbers) triplets of {Bytenumber} {Byte from /dev/loop0} {Byte from /dev/loop1}, one for each difference cmp can find. Example: 237529 10 331. Wasn't too usefull for me, but it might help you.
  3. just cmp-ing the two streams like that was possible by the insight that the unwritten section was probably zeroed. If it was not zeroes, or at least a single byte pattern, it would not be a viable option (as opposed to Marcus Müllers compression trick).
3
  • 1
    Is there any advantage in creating a second loop device with a byte offset over simply comparing against /dev/zero? Commented Nov 2 at 20:31
  • @Bob It's a form of autocorrelation. Commented Nov 3 at 13:46
  • cmp -l output has decimal byte offsets (first column), but octal byte values. That's why the byte values can be greater than 255 (but never greater than 377). Commented Nov 3 at 13:58

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.