Basics of Cryptology

The fact that some demos have poor, easy-to-crack coding schemes that hardly serve as a protection against rippers has given me the inspiration for writing this article. I'm only going to talk about the very basics of cryptology - Caesar's code, binary operations, how to crack such coding schemes and strategies to make it a bit more difficult. For more advanced subjects such as public key cryptography or standard algorithms such as DES and RSA, I recommend the book "Applied Cryptography" which was written by the famous mathematician, cryptologist and privacy rights activist *Bruce Schneier*.

Definitions

Cryptology is the science of data encryption. It consists of two subdisciplines, cryptography (= coding) and cryptanalysis (= cracking).

Cryptography is basically about finding an algorithm that allows you to code (encrypt) data in such a way that only some particular recipients will be able to decode and read it, while all other people will have no chance to restore the unencrypted data. Cryptanalysis, by contrast, is the discipline of decoding the encrypted data even though you don't have the necessary information (i.e. decoding algorithm or, in more advanced cryptography, decoding key).

Simple Coding Schemes

Most introductions to cryptology start with Caesar's code. This one makes no exception. Caesar's code was used by the army of the acient Romans for the transfer of secret text messages. It was very simple: Every character was shifted by three, i.e. A became D, B became E and so on. Thus the unencoded string "CAESAR" would become "FDHVDU". In order to decode an encrypted string, the recipient only had to do the reverse operation. [1]

Caesar's code could be implemented by using the ADD/SUB (plus/minus) or INC/DEC operators. Apply the same operation (e.g. ADD x, 3) to every character of a file, and you'll thus encode the file with a Caesar-like code; in order to decode it, apply the reverse operation (e.g. SUB x, 3) to every character.

The standard set of operators of a modern CPU also gives you a couple of other possibilities to implement such a primitive coding scheme. E.g., one of the most popular coding schemes utilizes the XOR operator. [2] In contrast to other binary operators such as AND and OR, XOR is reversible, i.e.: (a XOR key) XOR key = a. Try the same with AND or OR and you'll see it won't work. This primitive scheme was used to encode the textures in the demos "Iconoclast" and "Animal attraction" by ASD. You can decode the textures by simply performing the operation "XOR 63" to each character in each .jpg file. This is exactly what my program xor_dec.cpp, which is included in the bonus pack to this issue of Hugi (h32bonus.zip), is doing.

Another type of operator you could use is the rotation operators - ROL and ROR. Moreover, you could also use NOT. In fact, NOT a = a XOR a. You cannot, however, use shift operators (SHL and SHR) since these aren't reversible.

After taking a glance at the cryptographic aspects of these simple coding algorithms, let's now think about the cryptanalytic part.

How to crack such simple schemes

Imagine you have a file that seems to be encrypted but you don't know how. However, you suspect that it's just a simple scheme like the ones presented above. You don't know what kind of file it is and thus have no idea what file format it might be is in, though.

First of all, how can you make sure that it's such a scheme? It's possible by means of statistical evaluation. I admit I don't know myself where to get this data, but I'm sure there is some statistical data on the typical distribution of various characters in files of different types (e.g. .txt, .jpg, .mp3 etc.) available somewhere on the Net. You could do such statistical analysis yourself and then check if there's a match. If there is a strong match, then you might have already discovered the coding scheme this way!

But this may be too complicated for beginners. I guess that most of the time you know what kind of file it is you want to decode, anyway. For example, in "Animal attraction" by ASD the "encrypted" textures had the extension .jpg. If you know the file format, this will give you a great advantage: You can now look up information on this format and check if there are some parts of the file that are always the same. For example, the first four characters in a .jpg file are always "JFIF". Thus you already have some plain text, which simplifies finding the correct coding scheme a lot!

Now you only have to compare the coded string with the plain-text string character by character (byte by byte) and wonder what operations might have been applied. You might want to start with XOR as it's perhaps the most commonly used simple codng scheme. Let's say 'a' is a plain-text character and 'b' is the corresponding coded character while 'k' stands for the (unknown, to be discovered) key that was used to code the file (i.e. a XOR k = b). Then the following equation must be true for any {a, b}: a XOR b = k. This was the case in "Animal attraction", for example.

If this fails, then you might check for AND/SUB next. Compute k = b - a for one set {a, b} and see if a + k = b applies to all other sets {a, b}. I'll leave it as an exercise for you to figure out how to do it for ROL/ROR and for NOT.

You see: You might need to perform some systematic trial and error testing, but basically, it's very easy to crack such a simple coding scheme. I'm sure you've realized that such schemes don't really serve as a protection and you're now wondering how to make cracking more difficult?

Strategies for more difficult-to-crack schemes

1. Combine operations: Don't just perform a XOR k. Perform (NOT ((a XOR k1) + k2)) ROL k3, for example. This will be much harder to crack using analytical methods (i.e. what I've just called "systematic trial and error"). However, cracking such a scheme is still a piece of cake using statistical methods.

2. Use Monoalphabetic substitution: Instead of applying deterministic operations, simply create a table of sets {a, b}. Choose yourself what 'a' to replace with what 'b' instead of using a computer. Even non-programmers could create such a scheme. It will be impossible to crack this using analytical methods. However, it will, again, be a "finger exercise" for any experienced cryptologist/hacker to crack this scheme using statistical methods.

3. Use Incremental keys: Neither the aforementioned analytical nor statistical methods will lead to a satisfactory result if you alter the key while working your way up from the first to the last character, e.g. by incrementing it on every new character. This will be a greater challenge for the cracker. However, it's not impossible to crack this, either.

4. Use Vignère's method: The idea behind Vignère's method is to use a string instead of a chracter as the key. If your key consists of n characters, then the first character of the plain text will be coded with the first character of the key, the second with the second and so on. Character no. n+1 of the plain text will again be coded with the first character of the key, etc. [3] Attention: If you use the XOR operator to code your file, then it's possible to find out the length of the key by means of a technique called "counting coincidence". Create a copy of the coded string moved by a particular number of bytes (let's call it 'n') and then XOR the two strings. If n is greater than the size of the key, a bit more than 6 % of the characters will have remained the same; otherwise this percentage (a.k.a. "coincidence index") will be lower than 0.4 %. [4] By means of this information, it's possible to further hack the scheme by means of analytical as well as statistical methods.

5. Consider not only replacing the characters, but also changing the order. This will make cracking yet more difficult. [5]

6. Use a One-time pad (OTP): Now this is the most difficult-to-crack scheme there is. Actually, from a purely mathematical point of view, it's even impossible to crack a well-made OTP. [6] A OTP is a key that is the same size as the plain text. I.e., it's a variant of Vignère's method with the numbers of characters of the plain text and the key being the same. If this OTP is totally random, it's impossible to crack this scheme. However - and that's the big snag - it's actually impossible to use this scheme in a demo, for you also have to store the OTP somewhere! If you store it unencrypted, then the cracker will be able to succeed provided he keeps searching for the OTP long enough. You may also encode the OTP with some scheme, though. However, as I guess you've already realized, for a skilled and patient attacker, it's not impossible to crack even this.

Conclusion

With the ideas I've presented here, it isn't possible to achieve perfect encryption. But it's possible to make cracking a greater challenge than in some of the currently available demos.

References

[1] Johann Blieberger et al.: Informatik. Grundlagen. Vierte, überarbeitete Auflage. Erschienen in der Reihe: Springers Lehrbücher der Informatik. Wien: Springer-Verlag, 2002. p. 56

[2] Bruce Schneier: Angewandte Kryptographie. Protokolle, Algorithmen und Sourcecode in C. 1. Auflage. Bonn (u.a.): Addison-Wesley, 1996. p. 15 f.

[3] same book as in [1], pp. 56 f.

[4] same book as in [2], p. 17

[5] same book as in [1], pp. 57 f.

[6] same book as in [2], pp. 17 ff.