How to Keep It A Secret
by Jeff Prosise, PC Magazine, July 1994
Data security is a hot topic these days and with good reason: As PCs proliferate and as an increasing number of people become computer literate, more information seems to be flowing more freely. Chances are your computer's hard disk contains a few files that you wouldn't want just anyone to see. Maybe you have a spreadsheet file with profit projections for next year or a list of your employees with salary histories and yearly performance ratings. Some kinds of information aren't meant for general consumption and might even hurt your business if exposed. A PC is a great place to store this information, and yet the very act of putting it on a PC compromises its security. If you can get to it, others can too.
There are many ways to secure sensitive data files from unauthorized access, but few are as effective  or as convenient  as encrypting them. Encryption is a process by which the data in a file is altered to a form that is unusable by others. Decryption is the reverse: translating an encrypted file back to a usable form. A simple encryption program might let you enter a password and a filename. Then it would use the password to scramble the file's contents. Until the file is decrypted with the same password, it is a meaningless jumble of 1s and 0s. You might have created the original file with a word processor, but when you try to call up the encrypted file in the same word processor, all you'll see are funnylooking characters. For an example encryption program, check out this issue's Utilities column "WinCrypt Protects Your Data."
There are many ways to encrypt data with a computer. Some, like the oftenused XOR cipher, are surprisingly simple. Others, like the United States government's Data Encryption Standard (DES), are rather complex. The security of these encryption methods varies a great deal. If you encrypt a file with DES, you are virtually assured that its contents are safe. A runofthemill encryption program downloaded from a bulletin board, on the other hand, might protect you from the person in the office next door but is unlikely to stop a trained cryptologist. That's not necessarily bad; after all, you may not need that kind of security. But a little understanding will go along way toward ensuring that you don't make the mistake of entrusting an extremely sensitive file to a comparatively insecure encryption program.
Here's a brief look at how data encryption works and some of the methods used to accomplish it. We'll also discuss some of the legal issues involved, public key cryptography, and digital signaturesissues of vital importance that will one day affect users and nonusers alike.
SYSTEMS WITHOUT KEYS
Let's start by defining a few terms. A cryptosystem is a set of rules that define how data is to be encrypted and decrypted. Cryptosystems generally fall into one of three categories: those that do not use keys, those that use one key, and those that use two keys. A key is a bit pattern used to encrypt and decrypt a message; the key is frequently derived from a password. Systems that use one key (or singlekey systems) use the same key for encryption and decryption. Systems that use two keys, which are often referred to as publickey cryptosystems (for reasons that we'll discuss later), use one key to encrypt data and another to decrypt it.
Systems that do not use keys depend on the secrecy of the encryption algorithm. If the algorithm gets out, a nonkeyed system is useless. The simplest example of this type of system is the Caesar cipher, which substitutes every letter in a message with the letter that is three letters higherA becomes D, B becomes E, and so on. Encrypted with a Caesar cipher, the message
EVERYGOODBOYDOESFINE
becomes
HYHUBJRRGERBGRHVILQH
If you know that a Caesar cipher was used to encrypt the message, decryption is a simple matter of substituting letters that are three places lower in the alphabet. Even if the exact encryption method remains a secret, such ciphers (monoalphabetic substitution ciphers) are easily decoded by taking advantage of the fact that certain letters of the alphabetfor example, E and T appear in the English language more frequently than others. This kind of encryption is commonly used in the newspaper puzzle games known as cryptograms.
You could improve on this simple encryption method by dividing the message text into groups of ten letters and shifting the first letter in each group right one place, the second letter two places, and so on. This is known as apolyalphabetic substitution. Polyalphabetic substitution helps mask the letters' frequencies and makes the code harder to break. Still, a trained cryptologist with a computer would waste little time getting to the bottom of this kind of scheme.
If substitution ciphers aren't your bag, an alternative is a transposition cipher. A transposition cipher changes the order in which letters appear but not the letters themselves. A simple transposition cipher applied to
EVERYGOODBOYDOESFINE
would yield
EGOSVOYFEODIRDONYBEE
How does it work? First, write the original message in columnar format:
EGOS
VOYF
EODI
RDON
YBEE
Then generate the encoded message by reading from left to right and top to bottom. A cryptologist could break this code by using a computer to assemble all the possible arrangements of the letters. Armed with an electronic dictionary, the computer could then pick out the arrangements containing words. You could further obfuscate the input data by mixing substitution and transposition ciphers both to alter and to shuffle the characters in the original message. But the fact remains that a nonkeyed system is secure only as long as the encryption method remains a secret, and everyone knows that secrets are made to be told.
SINGLEKEY CRYPTOSYSTEMS
To provide true security, a cryptosystem must allow the user to enter a password that factors into the encryption process. The advantage of such a system is that knowing the algorithm isn't all that's required to break the code: You also need the password. One of the simplest ways to apply a password to a stream of unencrypted text is to use a Vigenere cipher, which uses the letters in the password to derive a substitution cipher. Suppose that the message is EVERYGOODBOYDOESFINE and the password is SHAKESPEARE. Write them down like this:
EVERYGOODBOYDOESFINE
SHAKESPEARESHAKESPEA
Now use the letters in the second line to determine how many places to the right the corresponding letters in the line above should be shifted, where A means shift 0 places, B means shift 1, and so on. For example, the first letter in the original messageEbecomes W, because S means go right 18 places, and counting 18 letters from E brings us to W, If the sum of two letters is more than 26, just wrap back around to A and continue counting. Carrying out this procedure for the remainder of the message yields the encrypted text
WCEBCYDSDSSQKOOWXXRE
It's easy to write a short BASIC program that performs a Vigenere cipher. Figure 1 shows how to do it in QBasic. The concept of the Vigenere cipher can be extended to binary data (data composed of bytes, whose values can be anywhere from 0 to 255, as opposed to data that is restricted to letters of the alphabet) by simply adding ASCII character codes and wrapping back around to 0 any time a sum exceeds 255. One of the most popular singlekey cryptosystems used on computers is the XOR cipher, which is also called a Vernam cipher. An XOR (short for exclusive OR) operation manipulates bits of data according to the following truth table
0 XOR 0 =0
0 XOR 1 =1
1 XOR 0 =1
1 XOR 1 =0
In other words, a resulting bit is set to 1 if one and only one of the input bits is a 1; if both bits are set to 1, the XOR operation will result in O. An XOR cipher XORs each character in a password with each character in the data to be encrypted. To XOR 2byte values, you break them down into bits and XOR the bits individually. Here's how the letter P (ASCII 80) would be XORed with the letter A (ASCII 65):
01000001
01010000

00010001
The remarkable thing about this cipher is that XORing P back into the result produces A:
00010001
01010000

01000001
The upshot is that performing a XOR operation on XORencrypted data with the same key a second time decrypts the stream. This is the principle on which WinCrypt (this issue's utility) works and explains why WinCrypt doesn't care whether it's encrypting or decrypting a stream of data. WinCrypt generates a 512 byte key from your password. Once the key is determined, the actual encryption process is trivial. Most of the work goes into generating the key and making it as unpredictable as possible. The XOR cipher bears the distinction of being the only cryptosystem that is unbreakable if implemented correctly. If the key contains as many bits as the data that is being encrypted and the bits in the key are truly random, then it's impossible to decode an XOR cipher. Shorter key lengths compromise the system, however, and most XORbased cryptosystems are in fact relatively easy to break.
One of the most secure singlekey cryptosystems in existenceone that just about everybody has heard of is the DES. The DES was developed by IBM and submitted to the National Bureau of Standards (now the National Institute of Standards and Technology) in response to that agency's 1974 request for proposals regarding a standard cryptographic algorithm. In 1976, the algorithm was adopted as a federal standard and thus became the de facto U.S. government approved cryptosystem. Much analysis since that time has generally proved DES to be a secure, generalpurpose, dataencryption algorithm.
The DES algorithm encrypts data in blocks of 64 bits using a 56bit input key. Basically, each block undergoes 18 operations. The first and final operations, which are called permutations, shuffle the bits in the block using a predefined table that dictates which bits go where. In the first permutation, for example, bit 1 is moved to bit 58, bit 58 is moved to bit 7, and so on. The 16 operations in between, which are referred to as Transformations, mix bits from the key with bits from the data using a combination of XOR, transposition, and substitution ciphers. The result is a thoroughly scrambled data stream that defies conventional attempts to unscramble without the input key.
IBM patented the DES but also granted the National Bureau of Standards a nonexclusive, royaltyfree license for its use. You can find DES encryption programs on bulletin boards around the country but be careful about writing and publishing one of your own or publishing any encryption program, for that matter. The U.S. Government restricts the export of encryption software that is "deemed vital to national security interests," which basically means any program that uses DES or a similarly sophisticated encryption algorithm. If you upload anything but the most trivial encryption program to a bulletin board without first getting approval from the Bureau of Export Administration or the Office of Defense Trade Controls, you may be letting yourself in for some trouble.
PUBLICKEY CRYPTOSYSTEMS
Most twokey cryptosystems are referred to as public key cryptosystems because of how the keys are used. Typically, the encryption key (the public key) is freely divulged so that anyone can encrypt a message with it. The decryption key (the private key) is kept secret so that only one individual can decrypt messages. The benefit is that if you want to send an encrypted message to someone, you don't have to send a secret password with it. Encrypt the message with the recipient's public key, and he'll be able to decipher the message with his private key.
The field of computer cryptography abounds with publickey cryptosystems. The bestknown is the RSA algorithm, which takes its name from the first letters in the last names of its three creators (Ron Rivest, Adi Shamir, and Leonard Adleman). The theory behind RSA goes like this: Pick a pair of very large prime numbers and call them p and q. Multiply p and q together to produce an even larger number called n. Now pick another large number, which we'll call e, that is less than n and relatively prime to the product of p1 and q1. (Relatively prime means neither number is a factor of the other.) The numbers n and e now constitute the public key, and the private key d can be computed from e, p, and q using a mathematical formula. To encrypt a block of data, raise the number the block represents (the number you get by lining up all the bits in the block of data and treating them as one very large number) to the power of e and perform a module n operation on the result. (For the mathematically uninitiated, the modulus operation here is the same as dividing the result by n and keeping the remainder.) To decrypt that same block, raise it to the power of d and perform a module n operation on the result. Perform the encryption operation on every block of data in the message, and you've successfully encrypted it using RSA. The security of RSA is predicated on the fact that it is very, very difficult even for the fastest computersto factor large numbers that are the products of prime numbers. It is critical that once you compute n from p and q, you discard p and q forever. If p and q become known, that RSA is worthless because the private key can be calculated. RSA can be broken, but it would take the world's fastest supercomputer hundreds or thousands of years to derive the private key from the public key by guessing the values of p and q. Are there enough prime numbers out there to rule out a trialanderror approach to calculating the factors of n? You bet. The number of prime numbers whose length is 512 bits or less is more than 1E150 (the number 1 followed by 150 zeros), which, according to a publication from RSA Laboratories, is "more than the number of atoms in the known universe."
RSA is important because it also supports the concept of digital signatures, which can be used to authenticate electronic documents the same way handwritten signatures are used to authenticate paper documents. Here's how a digital signature works, using the paradigm of a sender, a receiver, and an electronic document to be sent from the sender to the receiver: The sender runs a program that uses a hash algorithm to generate a digital fingerprinta pattern of bits that uniquely identifies a much larger pattern of bitsfor the document and encrypts the fingerprint with his private key (note the role reversal for the public and private keys). This is the sender's digital signature, which is transmitted along with the data. The receiver decrypts the signature with the public key and runs the same hash program on the document. If the digital fingerprint output by the hash program does not match the fingerprint sent by the sender (after that has been decrypted), then the signature is invalid. If the fingerprints match, however, then the receiver can be quite sure that the digital signature is authentic.
Think about the implications: If the document is altered en route, the fingerprints will not match (the output from the hash programs will be different), and the receiver will know that the data has been tampered with. If the sender's signature has been forged (encrypted with the wrong private key), the fingerprints won't match either. Therefore, the digital signature verifies both the identity of the sender and the authenticity of the data in the document.
As the world becomes increasingly computerized, the need for a standard method of authenticating electronically transmitted documents becomes more and more vital. To that end, in 1991, the National Institute of Standards and Technology proposed the Digital Signature Standard (DSS) and the Digital Signature Algorithm (DSA), which it intends to make available worldwide on a royaltyfree basis.
CLIPPER AND SKIPJACK
More recently, the government has become embroiled in controversy over the proposed Clipper chip (lately renamed the Key Escrow chip), which was designed by the National Security Agency (NSA) for the purpose of encrypting data transmitted via securevoice products (for example, specially built telephones). The controversy comes because the NSA won't release details of the encryption algorithm, which is named Skipjack, and more importantly because there is an acknowledged "back door "into the chip that would allow the government to decode transmissions. Theoretically, the back door would be used only by law enforcement and intelligence agencies for legitimate purposes in the fight against crime, but giving the government such broad power to eavesdrop on private, supposedly secure conversations makes a lot of Americans uneasy.
The Skipjack algorithm itself is secure, or so the NSA claims. (The NSA did allow a panel of outside experts to examine the algorithm, and a preliminary report from the panel seems to verify the NSA's claims.) It's the design of the chip that allows privileged access to encrypted data. Manufacturers of secure data products aren't forced to use the Clipper chip, but the government's restrictions on the export of encryption devices means secure voice products that do not use the government's chip may be limited to U.S. markets. At the moment, the Clinton administration appears resolved to leave both the back door and the export restrictions in place, presenting U.S. manufacturers with an unenviable choice.
Another recent cryptologyrelated controversy arose when a programmer named Phillip Zimmerman released into the public domain a version of RSA that he claims to have developed independently. His system is called PGP, for Pretty Good Privacy. The program is available for free on the Internet and bulletin boards. Since PGP has spread to foreign countries, the government has intervened. In the face of government inquiries, Zimmerman has claimed that the good that comes from using cryptography in foreign countries (such as its use by Amnesty International and other humanitarian organizations) outweighs the bad.
FURTHER READING
If you care to learn more about the burgeoning field of computer cryptography, check out Bruce Schneier's Applied Cryptography: Protocols, Algorithms, and Source Code in C (John Wiley & Sons, 1994). Schneier covers cryptographic techniques and methods in more detail than you'll probably ever care to know. From DES to RSA, this book covers them all and easily ranks as one of the most authoritative in its field.
PC Magazine, July 1994
