Steganography

I have always feel a deep interest in steganography, the art of concealing information inside information. Not surprisingly, steganography is probably as old as cryptography with the first recorded use of it done by Demaratus, king of Sparta.

With the advent of the digital era and the massive data sizes we handle, steganography is being a very important field for researchers. Most of the more common digital steganography is done in image files because they are a very good container, being the LSB (Less significant bit) replacement the simplest technique, which allows to hide information inside a image with a minor alteration that can’t be noted by the human eye.

But steganography using digital images is not the only way of hide digital information. There are steganography in, practically, every corner of the digital word, even in TCP/IP packets! I am going to describe some ways to conceal information in USB mass storage devices that use FAT32 (most of them). These techniques allow to hide information that is not going to be shown by regular file explorers (except the last example which is probably not a truly steganography scheme but I thought it was a cool addition) and they are not going to delete any other user data. Although described here in a briefly and basic manner, they can be used in sophisticated schemes to hide data.

1) Information in the Boot sector
The first sector is called Master Boot Record (MBR), it contains information about different partitions and, more importantly for our purposes, executable code to be used when the unit is bootable.

If the USB drive is not bootable, that part of the first sector is not used, those are 446 bytes to store information that is not going to be readable from a common file explorer.

2) EOF (End of file) occultation
Files are stored in one or more clusters (each of them usually of 4k, but they can be up to 32k or as small as a single sector) conveniently linked in the FAT. If we have, for example, a single file that weighs 1024 bytes, it would be stored in a single cluster with the remaining 3072 bytes not available for any other allocation. The remaining bytes are a good place to hide information.

raw bmp
Example of a bmp file that only requires three sectors of a 8-sector cluster to be stored, leaving most of the last sector used and five more available to store more information.

3) Information encoded in the FAT
As previously mentioned, files are composed by a one or more clusters that are referred by a 32-bit (28 bits used) number. Those clusters can be located through the FAT (File Allocation Table), that contains a linked list of cluster entries for each file.

fat_demostration
Simple example of how two files and their clusters could be located in a portion of the FAT.

As shown in the previous figure, cluster entries doesn’t need to be contiguously allocated. Systems tend to allocate those entries contiguously for efficiency but in this case we are going to handle the allocation in a different way. We are going to put every cluster entry in well-thought distance from one of them that will be the reference. This way we can assign values to different distances, coding information on those distances!

fat_codification
A single byte coded in a file of nine clusters

In this simple example a single byte is coded using the Modulo-2 operator in every distance from the first cluster. Giving these results:

Bit7 = Mod2(d1) = 0
Bit6 = Mod2(d2) = 1
Bit5 = Mod2(d3) = 0
Bit4 = Mod2(d4) = 1
Bit3 = Mod2(d5) = 1
Bit2 = Mod2(d6) = 0
Bit1 = Mod2(d7) = 1
Bit0 = Mod2(d8) = 0

Different Modulo operations can be done to code more info, for example Modulo-256, that would code more info in fewer cluster entries but the file will be more fragmented.

fat_codification_255
Eight bytes coded in a eight file using a Mod-256 operator on the distance between cluster entries. Note that, in this examples, distances are measured from the previous cluster. If one of the distances needed for coding one byte is occupied by other file, the next valid distance will be dn + 255, next dn + 255 * 2 … with a generalized form of dn + 255 * k for k= 1, 2, 3…

Applying those techniques to a real file of 256Mb, which using a cluster size of 4k will use 65536 clusters, will achieve 8191 bytes of hidden information using Mod-2 operator or 65535 bytes using the Mod-256 operator. This technique sacrifices efficiency in order to get better concealment of the information. Besides that, more schemes can be done like reducing the cluster size, using a reduced alphabet if possible, compressing the concealed information with the Huffman algorithm or even using more advances techniques that I don’t really can grasp 🙂

A potential problem that could arise would be an auto-defragmention of the media by the operating system but usb drives almost always use flash technology which, in the first place, doesn’t gain as much as the old hard drives from defragmentation and also discourages write operations because write cycles are not unlimited. For those reasons is not probable that a modern OS would perform an auto-defragmentation of a USB flash drive device.

4) Using the Short File Name legacy feature
If you were on the computers during the 80’s and early 90’s you will remember that file names in the different FAT systems were limited to the 8.3 filenames format. With the development of the FAT32 systems, support for longer names was added without ignoring retro-compatibility.

The content of a folder with both the long names and the short names listed. When a long name is needed the short name is a really a hash that identifies uniquely the file.

If we have a file named “preguntas_frecuentes.pdf” windows will generate the short name “PREGUN~1.PDF”. If we create a second file named “preguntas_frecuentes_2.pdf” the short name will be PREGUN~2.PDF “.

During development of Pincho, a USB Mass Storage Library for Android, for the sake of simplicity, I refused the mess of having to generate well-built short filenames (I shouldn’t let pass the chance to present you this amazing article about 8.3 filename and its checksum ) adding a long file entry in every case, even if the file don’t really need it. I found some surprises:

1) Even if a long filename entry is added before the normal file entry Windows would ignore it if the name length is less or equal than eight. A long filename entry with the name “FiLeS” will be ignored and show as “FILE” in the file explorer.
2) Short filenames doesn’t need to be created from the long filename.
3) In order to force Windows to read the long filenames, the short filenames must be interpreted by the OS as truncated.

A folder written by Pincho library. The short names are created randomly except the tilde character in the 7th position and the last number in order to force Windows to treat them as truncated

The first six characters of the short name could be used to store information, creating for example 30 files would allow to store 180 bytes of information. Short filenames are not listed by default but it is possible using the “dir \x” command. In order to use this approach in a useful way cryptography should be used too but many people doesn’t seem aware of this little legacy feature that still survives 🙂

If I come up with other schemes that would be interesting I will post them here.

Happy crafting! 🙂

felhr85

My personal site about programming and tech stuff

Steganography in USB mass storage devices