|

Reprinted with Permission by Quest
Software June 2005
|
Protect from Prying Eyes: Encryption in Oracle 10g, Part 1
Arup Nanda
This article was
reprinted with permission from
DBAzine.com.
What is Encryption?
Simply put, encryption is the art of disguising data.
Suppose the combination to my wall safe is 37529. Because I am not very good at
remembering numbers, I keep on forgetting it. So I come up an idea — I write it
down on a piece of paper and tape it on the safe! There it is — it’s there when
I need it, and I don’t have to worry about forgetting it. What a great idea!
But, there is a slight problem. If a burglar happens to
see it, there is no need for him to try to crack the code — it’s all there in
plain sight! So, I change the strategy a little bit — I decided to disguise the
combination or alter it in such a way that only I know how it was altered. For
instance, I may decide to reverse the digits; the number becomes 92573, which is
on the piece of paper. The burglar will see the number 92573, which is not the
safe combination. He will not be able to guess the correct one since he does not
know how it was altered in the first place.
Yet over time, this method is rendered useless. After a
few burglars visit my house, they all learn that I simply reverse the digits.
Alas! I have to figure out a new encryption scheme. The modified scheme I come
up with is to add a number to the actual value — say, 9. So my safe combination
37529 becomes 37529 + 9 = 37538, which goes to the piece of paper. To decipher
the true meaning, the burglar has to know to two things:
- The logic used to perform the encryption (i.e., adding
a number to the actual value)
- The number that is added (i.e., 9, in this case)
If I suspect that the burglars have somehow learned the
first, I can simply change the number 9 to something else. It’s important to
understand the differences between the means to achieve the encryption and the
value used to scramble original data. The first concept, the logic used to
scramble is known as an algorithm, and the value used to scramble the actual
data is known as the encryption key. We can change one, keep the other constant,
and get a different encrypted value. For instance, I can keep the algorithm the
same, but change the key to arrive at a different encrypted value.
In the case of my safe combination, the algorithm is very
simple — I merely add the key to the original data to come up with the encrypted
value. However, this simplistic algorithm has no place in reality; encryption
algorithms are actually far more complex. It’s not the purpose of this article
to go into the details of encryption algorithms; that would fall into the realm
of academia. As users, we do not need to delve into the details, but to learn
the basics to use in practice. Because the roots of this algorithm are common
knowledge, it’s not practical to hide the algorithm itself. But by using a
secret key, the encrypted value can remain safe — decipherable only by using the
right key.
Since the “key” is literally the key to decipher the data,
my encrypted value is safe as the key itself. For instance, in this case, the
burglar knows the algorithm (i.e., that I used a single-digit number to arrive
at the encrypted value). But what is the number? Well, he doesn’t have to work
hard — there are only ten single digit numbers — from zero to nine. He has to
make up to ten guesses to get the right number, something he can definitely
accomplish in the short time span. But what if I used a two-digit number? In
that case, the burglar would have to make 100 guesses — a more difficult feat. I
could frustrate the burglar more with a three-digit key, which forces him to
make up to 1,000 guesses. The point is, while keeping the algorithm the same,
encryption can be made more secure by increasing the number of digits in the
key. For real-life encryption, the same concept holds true — increasing the size
of the key will force the intruder to make more guesses, and make the code more
difficult to crack. In digital encryption, since computers are used to encrypt
and decrypt, making 1000 guesses, even 1 million guesses is trivial and can be
done in matter of minutes; hence, keys must be very long to offer a real barrier
to guessing.
In summary, you need the following for encryption
Types of Encryption
In the encryption case we’ve been talking about, note a
very important behavior: the key used to decrypt is the same as the one used to
encrypt. This type of encryption is known as symmetric key encryption, since the
same key is used in both cases. Symmetric key encryption makes the algorithms
easier to implement; however, there is a fundamental security risk inherent in
this model. Since the same key must be used to decrypt the data that is
encrypted, the key must be protected along with the data. What if the intruder
gets the key? He will be able to decrypt every encrypted value, seriously
compromising the security of the system. In addition to this, another inherent
problem is key storage itself. And if the key is lost, the encrypted data can
never be decrypted.
To address these issues, another kind of encryption
exists, in which two different keys are used to encrypt and decrypt data. Since
the keys are different, this scheme is known as asymmetric encryption. In this
method, a set of two keys are generated simultaneously. One key, known as the
“public key” is given out to the others. This key is used to encrypt the data.
While decrypting, the user uses the other key, known as “private key” to decrypt
the value. The public key is made well known to the world and is not made a
secret. The private key can be stored securely since it is used only during the
decryption process. Since the private key is not used during encryption, this
not required to be given out to the sender of the data. This obviously increases
the security of the key.
However, note that the public and private keys, although
not the same are mathematically connected – they have to be. This makes the
guessing of the private key a little bit easier for someone who knows the public
key. To reduce the risk, the key must be longer, usually 1024 bits, unlike the
typical 128-bit key sizes in symmetric encryption schemes. A larger key also
means higher computational processing requirements on the server, which
translates to longer elapsed time. This makes asymmetric encryption schemes a
little undesirable.
When to Use What Type
For data-in-motion-type encryptions, in which the data is
encrypted to be sent across wire to reduce chances of eavesdropping, the
asymmetric encryption is usually used, due to two primary reasons. First, the
single key use in symmetric key encryption is not practical; and second, the
length of time for which the encryption key must be in use is very short, so
much smaller encryption keys can be used. Even if the intruder cracks the key,
the data is already transmitted, reducing chances of compromise. In case of
encryption of data-at-rest, the key is held for a much longer duration, so a
longer key is necessary. To balance the need to reduce key length and reduce
risk of exposure, asymmetric encryption schemes are used in data-at-rest.
Block and Streaming Encryption
Now that we have seen how encryption schemes are commonly
used, an interesting question is raised: How does an algorithm work in data
input? For the simple example of the safe combination, I have used an algorithm
that adds a number to the entire input data. Note that I stress the word,
“entire,” which means the entire data must be available to the algorithm before
it can be encrypted. In case of streaming data, such as one flowing between two
servers over the network or from the Web server to the browser, the data is not
available in its entirety. To encrypt the data that is completely available, the
algorithm breaks it into chunks (of typically eight bytes) and then applies
encryption; but streaming data cannot be broken into chunks. Hence, a different
strategy is employed for streaming data. The other type, in which the entire
data is available, is known as block encryption.
Encryption Algorithms
Now that you understand how encryption works, let’s focus
on one important part of encryption: the algorithm.
One of the first algorithms to be used was the Digital Encryption Standard
(DES), which breaks up data into blocks of 64 bits and encrypts them using a
64-bit key. Of the 64 bits, only 56 are used. A smaller key length makes this a
fast encryption approach; however, with modern computers, an intruder can easily
break the 56-bit key.
To address the security risks, a modified form of DES was
introduced — Triple DES or DES3. This makes three passes of the input data
through the DES algorithm, so it is reasonably secure for most applications.
Another form of DES3 uses three keys to further secure the encryption. Three-key
DES3 and two-key DES3 use keys of 156 and 112 bits respectively.
DES3 was considered adequate for most applications for a
significant amount of time, but advancement in computing resources diminished
the safety of the algorithm. Additionally, the compute-intensive algorithm
proved to be too strenuous on the server housing the database. This led to the
development of Advanced Encryption Standard (AES).
AES uses one of the three key lengths — 128, 192, and 256,
depending on application. The more the key length, the longer the computation
cycle, but also, the lesser the risk of an intruder breaking it.
Another type of encryption is RC4, which is a streaming
encryption algorithm.
Padding
Let’s revisit a point that was made in the previous
discussion about block encryption: an algorithm breaks the input data into
blocks of eight bytes (typically) and encrypts one block at a time. What if the
length of the data is not divisible by eight? Then the last block of bytes will
be less than eight bytes, but the algorithm will not be able to operate on this
block because it is smaller in size.
To address this, a value must be added to the data to make
the length exactly divisible by eight; this process is known as padding. The
value is removed when the data is decrypted. In most cases, a simple padding
with a known value such as zero is acceptable; this process is known as
zero-padding. However, padding by zero also makes the data somewhat prone to
discovery by an intruder, since he can now guess what zero looks like in
encrypted format. Therefore, padding by zero is not considered cryptographically
secure. Instead, a value that is less vulnerable to break-ins is added by a
standard known as Public Key Cryptography Standard #5, or PKCS#5.
Of course, if you know that the data is in lengths of
multiples of eight, then there is no need to pad.
Chaining
Continuing along the previous line of discussion, input
data is broken into chunks and then encrypted. Since data is separated, wouldn’t
it be nice to somehow de-link the encrypted blocks so that even if a single
block is cracked, the rest will be intact?
This is when chaining comes in: it specifies how these
chunks of data are linked (or de-linked) during encryption. In the most common
format, a block is XOR-ed with the encrypted value of the previous block. The
resultant value is then passed to the encryption algorithm. The same is repeated
for all the blocks. This scheme is known as Cyclic Block Chaining. Since the
first block has no previous block, a value known as an initialization vector
(IV) is used to XOR.
In another form of chaining, each block is independently
encrypted, known as Electronic Code Book chaining method. Other methods are
Cipher Feedback and Output Feedback.
Hashing
So far, we have talked about a concept in which the input
data is masked to protect it from prying eyes. However, this does not protect
the data from being modified by an intruder. For instance, assume that we are
transmitting the salary of the CIO and a disgruntled employee decided the change
the salary from 10 million to 10 thousand by chopping off a few zeroes at the
end. How does the receiving end know that this data has been altered? Or, how
can we know that the data has not been altered?
To address the issue, a concept called checksum is used.
It has been used extensively in the past to ensure filesystem integrity, and it
has been in use for some time. The idea is to calculate a checksum value by
applying some algorithm to an input value. Then, upon receiving the input value,
the receiver also applies the same algorithm and gets a checksum.
He compares the checksum calculated with what came with
the data; if they match, the data has not been tampered with. If they differ,
then the data has been compromised. This checksum is often the hash value of the
input value.
Consider the implications of the checksum concept — the
input value itself is not masked, because the intention is not to prevent people
from seeing it. The intent is to prevent people from modifying it. Of course,
data can be encrypted as well; but the purpose of hashing is not to prevent
exposure.
The second important concept to remember is that this hash
value is calculated from the input value, but the input value cannot be
calculated from the hash value. This is important to understand as it has use in
other applications. For instance, password management functions can store hash
values in tables. When a user wants to validate passwords, the function can hash
the user’s entry and compare the hash to the hash, instead of comparing plain
text. This way, the actual password is never displayed and is never compromised.
Oracle password management functions the same way — it uses a one-way hash value
in the password column of the USER$ table.
MAC
Hashing provides an important function — validating the
integrity of a piece of data. However, consider this scenario: An intruder might
come to know the hashing algorithm used and then use it to calculate the hash
value of the input data after changing it. The receiver gets the tampered input
data as well as the changed hash value. Since the values match, the receiver
will not be able to determine that the data has been compromised.
To prevent this situation, the concept known as Message
Authentication Code (MAC) was developed, which uses a “key” to calculate the
hash value (known as MAC value in this instance). The same key is used by the
receiver in calculating MAC value. Since key is not likely to be known by the
intruder, a changed MAC value will be different and data compromise will be
unearthed.
Implementing Encryption
With the conceptual foundation in place, let’s delve into
the actual task of implementing an encryption system.
In order to implement an encryption system, you do not
have to reinvent the wheel and create the algorithms; they are already available
in supplied packages. Oracle provides a package called dbms_obfuscation_toolkit
(DOTK) that contains several procedures and functions to allow you to achieve
encryption. In Oracle 10g, a new package named dbms_crypto offers similar but
improved functionality. Even though the older package is still available in
Oracle 10g, it’s recommended that the dbms_crypto be used for all encryption
related matters for the following reasons:
- Crypto provided AES encryption methods; DOTK does not.
AES is now the de facto standard in many circumstances and is preferred,
because of its increased security and speed.
- Crypto provides automatic padding. In DOTK, you have to
provide padding of data smaller than the block length.
- Crypto provides PKCS#5 padding, which is
cryptographically more secure than user-supplied padding (such as padding with
zeroes).
And of course, DOTK may be deprecated soon.
Please bear in mind that Crypto (and DOTK, as well) offer
only symmetric encryption only (i.e., the same key is used to encrypt and
decrypt). Asymmetric encryption, in which a public-private key pair is used, is
not available in either package. To implement such functionality, you will need
to use Oracle Advanced Security, an extra-cost option for the database. For most
common applications, symmetric key is adequate. We will focus on Crypto for the
remainder of this article; OAS will be covered in a future article.
Encryption
From our earlier discussion on defining encryption, you
know that to encrypt data, you need the following:
- An encryption algorithm
- An encryption key
- A padding method
- A chaining method
What you choose to use for the key depends on your choice
of algorithm. Once you have identified the algorithm to use (say, AES128) you
have to use a key that is 128 bits long. Crypto provides constants in the
package to indicate different types of algorithms. Table 1 shows these constants
with the algorithms and the key lengths they need.
| Constant Name |
Description |
Effective Key Length |
ENCRYPT_DES
|
Data Encryption Standard. Faster but not secure. |
56 |
ENCRYPT_3DES_2KEY
|
Modified Triple Data Encryption Standard. Operates on a block 3 times
with 2 keys. |
112 |
ENCRYPT_3DES
|
Triple Data Encryption Standard. Operates on a block 3 times but with
one key. |
156 |
ENCRYPT_AES128
|
Advanced Encryption Standard |
128 |
ENCRYPT_AES192
|
Advanced Encryption Standard |
192 |
ENCRYPT_AES256
|
Advanced Encryption Standard. |
256 |
ENCRYPT_RC4
|
This is the only stream cipher. |
|
Table 1: Encryption algorithms.
Once you choose your algorithm, you have to decide on what
key to use. For example, if you have chosen AES128 as the algorithm, you have to
choose a 128-bit key, and this key has to be of a RAW data type.
Suppose we decide to use the AES scheme with 128-bit keys
along with CBC chaining and PKCS#5 padding. The following piece of code performs
the encryption of a data value “ConfidentialData” with the key as
“1234567890123456.”
1 declare
2 l_key varchar2(2000) := '1234567890123456';
3 l_in_val varchar2(2000) := 'ConfidentialData';
4 l_mod number := dbms_crypto.ENCRYPT_AES128
5 + dbms_crypto.CHAIN_CBC
6 + dbms_crypto.PAD_PKCS5;
7 l_enc raw (2000);
8 begin
9 l_enc := dbms_crypto.encrypt
10 (
11 UTL_I18N.STRING_TO_RAW (l_in_val, 'AL32UTF8'),
12 l_mod,
13 UTL_I18N.STRING_TO_RAW (l_key, 'AL32UTF8')
14 );
15 dbms_output.put_line ('Encrypted='||l_enc);
16* end;
SQL> /
Encrypted=C0777257DFBF8BA9A4C1F724F921C43C70D0C0A94E2950BBB6BA2FE78695A6FC
PL/SQL procedure successfully completed.
SQL>
|
Let’s analyze the above piece of code line by line.
| Line |
Description |
| 2 |
The key is defined here. As you can see the key is exactly 16
characters, which it must be for AES to work. |
| 3 |
The input value, which needs to encrypted. |
| 4 — 6 |
These lines need some more explanation; hence it has been provided at
the end of this table. |
| 7 |
We define a variable to hold the encrypted value. |
| 9 |
The function encrypt is called. |
| 11 |
The function expects the input value to be in RAW, not varchar2 as is
the reality. In this line, we have converted it to raw using the function
strings_to_raw in the supplied package utl_i18n. |
| 13 |
As with the input value, the function also expects the key to be RAW as
well. Hence we convert it here. |
| 15 |
Finally, we display the encrypted value. However, in reality you will
not display the value as it is meaningless; you will probably use it for
something else, such as store it. |
The encrypted value, also in RAW, is displayed as a
hexadecimal string. This is the basic workings of the encrypt function. Now,
let’s take a look at the lines we omitted from the previous discussion.
Remember, to encrypt you need the following components:
- An input value
- A key
- An algorithm
- A padding scheme
- A chaining scheme
In the previously cited piece of code, the input value and
the key were provided in lines three and two respectively — but where are the
rest of the necessary components?
The rest are all crowded into lines four through six.
These options are specified in the package dbms_crypto as named constants. For
instance, the AES 128-bit algorithm is specified by the constant ENCRYPT_AES128.
Table 1 lists these encryption constants. Similarly, the CBC chaining method is
specified by CHAIN_CBC and the PKCS#5 padding is specified by PAD_PKCS5. All
these three constants have been specified as having been added together in the
input parameter MOD in the function encrypt(). This is exactly how dbms_crypto
expects to get the specifics of algorithm — padding and chaining. The
combination of these three is known as a modifier to the function.
The following is a list of padding modifiers available:
| Constant |
Description |
PAD_NONE
|
No padding is done. If length of the data is exactly same as the block
size or a multiple thereof, this option is used. |
PAD_ZERO
|
The data is padded with zeroes to make it a multiple of block size. |
PAD_PKCS5
|
PKCS#5 based padding is done. This is the safest and the recommended
approach. |
The following options are available for chaining:
| Constant |
Description |
| CHAIN_ECB |
Electronic Code Book standard |
| CHAIN_CBC |
Cyclic Buffer Chaining method |
| CHAIN_CFB |
Cypher Feedback method |
| CHAIN_OFB |
Output Feedback |
|
If you wanted to use no padding and Electronic Code Book
chaining, you could have called the ENCRYPT function as:
4 l_mod number := dbms_crypto.ENCRYPT_AES128
5 + dbms_crypto.CHAIN_ECB
6 + dbms_crypto.PAD_NONE;
Similarly you can specify any combination of algorithm,
padding and chaining to suit your needs.
Coming next in part 2: Decryption, key management,
key storage, and more.
Arup
Nanda is the Lead DBA at Starwood Hotels & Resorts. He has
been an Oracle DBA for more than 11 years, touching all aspects of database
management — from modeling to performance tuning to disaster recovery. He is the
co-author of the book
Oracle Privacy Security
Auditing (2003, Rampant Tech Press).