Emitting b64 symbols
That brings us to handling the b64 encoding. Let’s break it down.
A base 64 character is a bit different than that of hex, in a couple ways. For one, the b64 character set doesn’t start with 0 so we’ll have to offset all the chunks of it. Also there are actually 65 b64 characters. #= isn’t converted over to a number itself, it is used for alignment of the number - a padding character.
Since b64 is used to encode data, and is often parsed as a string from the big endian side, some inputs into a pair of encode-decode tools, eg (parse-b64 (print-b64 input)), would not output the input. I think of padding like this: first we divide up the binary into octets. Since b64 is 6 digits, and the least common multiple of 6 and 8 is 24, we take groups that produce 24 bits to make conversion easy. Four b64 characters become three octets.
The rule for converting from octets to b64 goes; if there are no partial groups then we don’t need padding. If the last group has one octet, we need two pads. If it is two octets we need one pad. So converting back is simple, for every pad remove the last octet from the output.
More details are available on the RFC: https://www.rfc-editor.org/rfc/rfc4648.html
So, first things first, let’s build the character conversion functions. We can take the framework of hex-char and hex-code and modify them slightly for b64.
Can we use the built in alphanumeric as our type? Quick experiment:
|
|
Well, that isn’t going to work. Turns out a bunch of characters such as Ö are considered alphanumeric. We’ll have to make our own type.
The main question here is how to handle padding on the char-b64 function. We have a couple options to consider. First we can have char-b64 return something, say a :pad keyword when padding is encountered. Or we could change the b64-char type to not include the padding character, and have a pad type. Lastly we could error on a padding character and push the requirement for handling it up to the calling function.
I try to avoid pushing things up as that results in duplicated code, everything that calls down will need handling of its own and we could do it at this level so let’s not do that. That leaves the first two options. Returning :pad is kind of nice because it will need to be handled in the parser. But the issue here is the discrepancy between b64-code-p and b64-char-p - one has 64 true inputs and the other 65. That means we should make a padding type and have a test up in char-b64 to test for it.
Following the pattern used previously on hex types we get:
|
|
Running the test modified to use b64-char-p instead of alphanumeric-p works as expected:
|
|
The functions b64-char and char-b64 will be similar again to hex. Since padding is handled at a higher level we can ignore it till then. We also won’t need to handle case as b64 uses both:
|
|
That error call is superfluous because the assert call will prevent any valid input from falling through the cond, but the compiler complains without it as code-char can’t accept a nil input.
The complementary function is almost identical to the hex version:
|
|
Which means we now should be able to convert to b64 and back again for all 64 character types:
|
|
We’ll continue on the path towards converting hex and b64 to each-other in the next entry.