Hiding information in plain sight goes way back, and it’s got a name - Steganography. In honor of that tradition, here's a tour through the intentionally unseeable world of hiding information inside imagery.
Staying true to form, the world of steganography is jam-packed with terms that discourage participation. So I produced this A - Z of Steganography, to act as a key word chaperone as we descend into the shadowy depths of hidden data.
At 8:42am on an overcast August morning in 2018, Principal Engineer at GE Xiaoqing Zheng was marched out of his home in Eastern New York by the FBI. The event was a hot topic for the sleepy town of Niskayuna, just north of Albany. Zheng wasn’t exactly a neighborhood tough. His LinkedIn includes a degree from the Sloan School of Management, a PHD in Aeronautical Engineering from Northwestern, 13 years as a senior technical leader at one of America’s most venerable technology companies, and 29 patents under his belt. The only person that didn’t look surprised that morning was the man in the handcuffs.
Zheng had just been charged on 14 counts of conspiring to steal General Electric’s trade secrets. According to the complaint, he’d allegedly been exfiltrating GE intellectual property to Tianyi Aviation Technology Co. in Nanjing, a company the feds accused him of starting with funding from the Chinese government.
Over the course of his employment at GE Power & Water in Schenectady, Zheng allegedly stole 20,000 electronic files detailing design models, engineering drawings, configuration files, and material specifications for gas and steam turbines. According to the complaint, Zheng transported the privileged data off of GE’s highly-fortified corporate servers in a manner considered unusual even for the most seasoned Information Security specialists. He concealed it all inside a low-res picture of a sunset. He emailed the image to his personal account, titling it: “Nice view to keep.”
By hiding the illicit information inside his sunset, Zheng was partaking in an advanced form of a tradition that’s been alive and well for at least two millennia: Steganography.
The word Steganography comes from the Greek steganós, meaning "covered or concealed," and graphia meaning "writing."
In the family of covert communications, steganography has a more well-known sibling in cryptography. Distinguishing the two from the drop is key. Cryptography is the science of writing in secret code (encrypting information), while steganography is concerned with making information invisible entirely, or hiding it in plain sight.
In practice, cryptographers scramble data between point of departure and destination to keep it safe. The intended recipients then receive a key that turns unrecognizable ciphers into intelligible data. No key, no dice.
Cryptography has a glaring built-in vulnerability though: it sticks out. Encrypted characters are recognizable by their very randomness. This noticeability represents cryptography’s chief security threat. Once identified, attackers can begin deciphering the encrypted data.
This is where steganography holds a serious advantage. Steganography does not attract attention to itself as an object of scrutiny. So, while cryptography renders data unreadable, steganography hides the presence of the information altogether. It’s the Milford Man of communications.
Steganography precedes our modern digital world by a solid 2,500 years. From its beginnings onward, the goal has remained the same: Enable one party to communicate with another party without a third party being aware that communication is even occurring.
Here’s a quick rip through the history of physical steganography.
440 BC - The Greek ruler Histiaeus shaved the head of a servant, tattooed his scalp with a secret message, let his hair grow back, then sent him through enemy territory with an instruction - "When thou art come to Miletus, bid Aristagoras shave thy head, and look thereon."
200 BC - The geoglyphs of the Nazca contain 143 topographical line drawings that went undetected until the 1920s when they were viewed from the air by Peruvian military and civilian pilots.
1499 - German Benedictine abbot, cryptographer and occultist Johannes Trithemius published Steganographia, a book purporting to be about magic. A decryption key eventually revealed the first two volumes to actually be about cryptography and steganography 🤯
1503 - According to a recent finding by Italian researcher Silvano Vinceti, in his Mona Lisa, Leonardo DaVinci microscopically painted an “L” over the right eye of his subject, an “S” over the left eye and a “72” under an arched bridge in the backdrop.
1509 - A panel on Michelangelo’s contribution to the Sistine Chapel portrays David defeating Goliath. David's stance is in the shape of the Hebrew letter "gimel." This letter refers to reward and punishment, paralleling nicely with the underdog story.
1510 - Hieronymus Bosch's triptych “Garden of Earthly Delights” contains a musical score on a figure’s hindquarters (excuse my language). The notation was discovered in 2014 by a college student and piano teacher, who after playing it, described the music as “LITERALLY the 600-year-old butt song from hell.”
1887 - Two Dutch scientists used a high-intensity X-ray from a particle accelerator to scan Van Gogh’s “Patch of Grass” revealing a hidden portrait of a woman.
1930 - During the Franco-Prussian war, Parisian photographer René Dagron used a photographic shrinking technique to increase the number of messages a pigeon could carry. This shrunken image became known as a Microdot. Post-WW1 German spies adopted the technique of shrinking messages up to 200x. Microdots as small as a typewriter’s period were then covered by reflective adhesives (for findability) or inserted into the side of postcards.
1941 - British army officer Major Alexis Casdagli was captured and imprisoned by the German forces in the early days of WW2. To pass the time he began stitching what would become known as The Casdagli Sampler. Along the border of the embroidered canvas, which features German swastikas, American eagles, British lions and Soviet hammer and sickles, are a series of irregular lines and dots. It is Morse code for “Fuck Hitler.”
1968 - The crew of the USS Pueblo were captured by North Koreans and brutally tortured. In propaganda photos aimed at showing how well the captives were being treated, the POWs snuck in their middle fingers in what became known as The Digit Affair.
Then computers showed up and the decidedly subversive art of hiding imagery in plain sight got a whole lot more sophisticated.
The personal computer was a lightning rod for new approaches to steganography. The sheer amount of data inside one image offered practitioners an expansive playground in which to smuggle information. Pioneers tried all sorts of approaches: blog-steganography, sneaking data into file info, hiding messages inside file folder systems, and even strategically delaying packets of information that made up an image, the sequence of which formed a message unto itself.
One of the first high-profile applications of digital steganography occurred in 1993. The laptop of suspected World Trade Center and Philippine Airlines Flight 434 bomber Ramzi Youseff was seized. Buried deep in the hard drive were a collection of encrypted files. After their discovery, it took the FBI over a year to decipher the hidden information. They contained plans for future attacks, flight schedules, projected detonation times, and chemical formulae.
Osama Bin Laden was also an avid practitioner of steganography. USA Today ran a story in 2001 reporting that Bin Laden and his associates had been hiding information crucial to terrorist plots in images on pornographic websites as early as 1996.
Steganography has been used by criminals and terrorists, but like any form of cryptography, it's used by a wide array of people. Steganography is used by intelligence agencies, artists, activists, hackers, coders, puzzle makers, and more. As steganographic techniques evolved, so did methods for evaluating their impenetrability. To scrutinize a work, the steganography community developed a risk matrix of sorts, calling it the ‘trade-off tetrahedron.’
The four points represent the key requirements for incognito payload transmission. The ‘trade-off’ implies that favoring one of the requirements always compromises the other three in some way.
Of all the digital steganography techniques that proliferated in the early days, one fared particularly well against the trade-off tetrahedron.
The mid-1980s gave rise to a technique called Least Significant Bit Substitution. LSB hides data inside the information least important to the quality of an image, thereby minimizing image distortion. This information-redundant region of an image is comprised of the least significant bits of each pixel. LSB can be performed in 5 manual steps:
The circled bit above has the least impact on the color of the pixel the sequence instructs. Aka, if you change this number to 0, the pixel will still look almost exactly as red as it did before the alteration.
To give you an idea how trivial that bit of information on the right is - as you move one-by-one along the bits to the left, they double in value. So in an image’s 8-bit binary sequence the left-most bit is 128x more impactful on what gets delivered to the human eye than the right-most. And that’s just for a single pixel. Aka if you change that last bit you’re changing 1/245th of the pixel. AKA no one’s noticing anything.
4) Make minor alterations to each least significant bit, so that when each bit is isolated from the sequence and re-compiled they form a new meaning. The example below starts with 3 pixels-worth of binary data from an image, isolates the least significant bits, and flips 4 of them, so that the least significants together form a new sequence that then spells the letter “e” in binary. The message is now hidden inside the “stego image”.
5) Transmit the stego image to a recipient, who can use an algorithm to detect anomalous changes in pixel value and then extract the hidden message.
Outside of being a total nuisance, Least Significant Bit Substitution has some serious limitations. Chief among them is the LSB method puts a ceiling on how many pixels can be altered and therefore how much data can be hidden. Packing too much hidden data inside pixels eventually causes image degradation. This can be caught by the human eye or the trip wires of forensics software. To remain under the radar, steganographers using the LSB method can only manipulate 15% of an image’s content.
Then, along came researchers and high-performance steganographers Richard Eason and Eiji Kawaguchi. In a paper titled Principles and Applications of BPCS-Steganography, the two illustrated how to jack hiding capacity to a staggering 60%.
To raise the bar on the Least Significant Bit technique, Eason and Kawaguchi manipulated the shortcomings of human vision. To maximize perceptual imperceptibility, they devised a method of splitting an image into two distinct regions—an ‘informative region’ and a ‘noise-like region.’ An informative region consists of a simple pattern that the eye can recognize instantaneously; a noise-like region consists of a pattern so complex our brains simply don’t register it.
Imagine you are arriving to a day at the beach. There’s sand as far as the eye can see. This stretch made up of billions of grains is a noise-like region. It’s so many objects with so much sameness your eyes can’t really distinguish one from the other. You can however cross-reference the khaki colored shape that traces the ocean with your memory and be virtually guaranteed that you are indeed at a beach. So, assured, you sit down on a patch of noise-like sand. As you get settled a pink flamingo strolls by your spot. Its pink plumage jumps out from the color of the beach, its impossibly thin stick legs bend recognizably. This is an informative region!
Every image contains informative and noisy regions, which become mathematically mappable at a microscopic scale. Here’s what what it looks like up-close.
To map these regions, steganographers slice images into 8 planes.
That stack of slices looks like this when laid down next to each other.
Now broken down into bits, it’s easy to identify informative and noisy regions. To make it even easier, a grid is placed over the image and a noise value is attributed to each cell. This is achieved by measuring the amount of pixel contrast inside each segment. This is also known as an image’s border value. Here are some examples.
In Figure 1, a white pixel is surrounded by 4 black pixels. That gives it a border value of 4 because there are four contrasting borders. In Figure 2, there are no black pixels bumping up against white pixels. This gives it a border value of 0. Figure 3 has a border value of 12. AKA, Figure 2 is informative, whereas Figure 3 is bringing the NOISE.
Eason and Eiji proposed embedding secret data into imagery by matching their noisy regions, which usually consist of 45% of the image. That’s the easy part.
To camouflage the informative region of a secret image into a noisy region, it needs to be converted into noise. At the pixel level, that conjugation process looks like this.\
In the above, (1) is a piece of an initial image. It sticks out like a sore thumb. By matching it with the host image’s foreground pattern (4) and background pattern (5), it can nestle in undetected.
Here’s an example in action!
The carrier is converted to grayscale. Both images are then plane sliced. A complexity measure is calculated on both blocks for both images. A conjugation is performed on the ‘simple’ or ‘informative’ blocks of the secret image, turning them into noise. And finally, the secret image is then embedded into the carrier image to form a final stego image.
To perform this delicate act you need a finicky software package like Matlab and a tremendous amount of patience. It’s all a lot.
Luckily there are off the shelf tools to do the heavy lifting for you.
A simple Google search returns a mountain of turn-key software out there for the data-cloaking enthusiast. I tried them all. Here are the top three, ranked by ease-of-use, versatility, and whether they flooded my computer with Russian bots. Bonus points for Mac-friendliness, cause that’s where I’m at.
• Steghide: For usability, Steghide shines. It’s also a super-lightweight piece of software. So if you’re trying to hide the very fact that you’re stegging in the first place, there’s an added benefit.
• Xiao Steganography: If you’ve got a PC, this is your best bet. The community generally agrees. An extra security feature enables users to encrypt data before embedding it into the image for double the protection.
• Open Puff: While this piece has remained solely focused on visual steganography, you can conceal data in all types of carrier media files. Open Puff can trojan horse hidden data in a wide array of imagery, audio, video and flash-files.
• Metasynth: On the audio front, don’t sleep on Metasynth. This is the software Aphex Twin used to drop images into his music. A story for another day though.
‘Obfuscation layer’ is a prominent term in the digital forensics lexicon. These layers offer additional levels of protection from attackers. The more layers, the less perceivable and therefore more safe a hidden message. To peer into the future of steganography is to gaze through an ever-growing and increasingly opaque stack of obfuscation layers.
Mind-blowing work is being done to advance the field by re-distributing hidden messages inside the 2-dimensional plane of flat imagery. In 2013, American computer scientist Chuck Easttom filed a patent for processing a data message into a plurality of message blocks each representing a portion of the original. These fragments are in turn inserted into a plurality of images. Zhongliang Yang, a Tsinghua University Doctor of Engineering (and total beast, I might add) has built a model for enhanced cognitive-imperceptibility using neural networks. A 2021 paper outlined a method for mimicking the patterns of human chromosomal sequences to build a ‘DNA algorithm’ that dictates the dispersal of altered bits around an image, thereby increasing a hidden message’s undetectability.
It is however one dimension beyond flat imagery that poses the most possibility for steganography’s future. Up until a few years ago, techniques mostly focused on embedding data by modifying pixel values. With the increase in accessibility of 3-D hardware for CAD, Virtual Reality, video games and 3D printing, the steganography conversation began tilting towards the usage of 3-D geometry as the host object.
A report funded by The Natural Science Foundation of China has become somewhat of a primer on 3-D steganography. It opens with a promise that “the flexible data structure of 3-D geometry may provide ample room for hosting secret information.”
All digital 3-D objects are comprised of a series of mesh faces, 2-D polygonal shapes that together approximate the shape of an IRL object. The most common polygonal shape used in 3-D objects is the triangle. It’s not uncommon for video game designers to flex the number of triangles in their compositions. Here are some rudimentary 3-D objects composed of triangle mesh.
Every 3-D object contains a group of rearrangeable vertices and triangles that can be manipulated without changing the geometry of the visual output.
The above spotlights an extension domain. This is the area in which a hidden message can be stowed away without being noticed. In (A), the height of the domain is established by extending the vertex from v(i) up to v’(i). In (B), the rotation of the domain is then established as a right angle at the base of the vertex. A hidden message can be embedded in this range. Meanwhile, there is no seeable difference in the triangle.
In (A), v(j) - v (j) - v(k) represents a visible triangle, whereas v(j) - v’(i) - v(k) represents the exit edges that don’t register to the human eye. These are then plotted out along a mesh in (B). Each face represents one bit’s worth of space for a hidden message.
Zooming out, the standard 3-D mesh human figure in say, a video game, contains somewhere in the neighborhood of 150,000 triangles. That puts the triangle count of an entire scene north of 1 million and a count of a billion in an entire game - extraordinary depths in which to hide mountains of data.
While these developments mark a major milestone for embedding capacities, they are merely the beginning of a whole next timeline and range of applications for steganography. Hidden messages can now be contemplated inside 3-D printed material, recording devices inside medical implants and even our own biological systems.
I had a chance to talk with cybersecurity expert and steganography patent-holder Chuck Easttom. We talked about the future of Steganography. “The bottom line is if you really want to hide something, the biggest determining factor on whether or not you’re going to be able to hide it is the ratio between what you’re hiding and what you’re hiding it in,” Chuck said. Forecasting the future, Easttom surmised “any time there is a new venue for any type of communication I would expect someone to eventually utilize that to hide data.”
Steganography has always formed a symbiosis with the ways in which we choose to communicate. As we spend more of our time, sharing via increasingly complex visual modes, the room for hiding data will only increase. For many in the Information Security industry this poses a concern. For other industries though, this innovation promises a quantum leap in range of expression.
What a time to be disguised.