Emitting b64 symbols

That brings us to handling the b64 encoding. Let’s break it down.

A base 64 character is a bit different than that of hex, in a couple ways. For one, the b64 character set doesn’t start with 0 so we’ll have to offset all the chunks of it. Also there are actually 65 b64 characters. #= isn’t converted over to a number itself, it is used for alignment of the number - a padding character.

Since b64 is used to encode data, and is often parsed as a string from the big endian side, some inputs into a pair of encode-decode tools, eg (parse-b64 (print-b64 input)), would not output the input. I think of padding like this: first we divide up the binary into octets. Since b64 is 6 digits, and the least common multiple of 6 and 8 is 24, we take groups that produce 24 bits to make conversion easy. Four b64 characters become three octets.

The rule for converting from octets to b64 goes; if there are no partial groups then we don’t need padding. If the last group has one octet, we need two pads. If it is two octets we need one pad. So converting back is simple, for every pad remove the last octet from the output.

More details are available on the RFC: https://www.rfc-editor.org/rfc/rfc4648.html

So, first things first, let’s build the character conversion functions. We can take the framework of hex-char and hex-code and modify them slightly for b64.

Can we use the built in alphanumeric as our type? Quick experiment:

0
1
2
3
(loop for code from 0 to 255
      counting (alphanumericp (code-char code)))

=> 127

Well, that isn’t going to work. Turns out a bunch of characters such as Ö are considered alphanumeric. We’ll have to make our own type.

The main question here is how to handle padding on the char-b64 function. We have a couple options to consider. First we can have char-b64 return something, say a :pad keyword when padding is encountered. Or we could change the b64-char type to not include the padding character, and have a pad type. Lastly we could error on a padding character and push the requirement for handling it up to the calling function.

I try to avoid pushing things up as that results in duplicated code, everything that calls down will need handling of its own and we could do it at this level so let’s not do that. That leaves the first two options. Returning :pad is kind of nice because it will need to be handled in the parser. But the issue here is the discrepancy between b64-code-p and b64-char-p - one has 64 true inputs and the other 65. That means we should make a padding type and have a test up in char-b64 to test for it.

Following the pattern used previously on hex types we get:

 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
(defun b64-char-p (char)
  "A b64 char is one of 0-9, A-Z, a-z. Or one of #\/ #\+ #\=."
  (let ((code (char-code char))
        (lower-a-code (char-code #\a))
        (lower-z-code (char-code #\z))
        (upper-a-code (char-code #\A))
        (upper-Z-code (char-code #\Z))
        (0-code (char-code #\0))
        (9-code (char-code #\9))
        (/-code (char-code #\/))
        (+-code (char-code #\+))
    (or (and (>= code lower-a-code)
             (<= code lower-z-code))
        (and (>= code upper-a-code)
             (<= code upper-z-code))
        (and (>= code 0-code)
             (<= code 9-code))
        (= code /-code)
        (= code +-code)))))

(defun b64-code-p (code)
  "A b64-code is an integer between 0 and 63 inclusive."
  (typep code '(integer 0 63)))

(defun b64-pad-p (char)
  "A b64-pad is the = character."
  (char= char #\=)

(deftype b64-char ()
  "A b64 char is one of 0-9, A-Z, a-z. Or one of #\/ #\+ #\=."
  `(satisfies b64-char-p))

(deftype b64-code ()
  "A b64-code is an integer between 0 and 63 inclusive."
  `(satisfies b64-code-p))

(deftype b64-pad ()
  "A b64-pad is the = character."
  `(satisfies b64-pad-p))

Running the test modified to use b64-char-p instead of alphanumeric-p works as expected:

0
1
2
3
(loop for code from 0 to 255
      counting (b64-char-p (code-char code)))

=> 64

The functions b64-char and char-b64 will be similar again to hex. Since padding is handled at a higher level we can ignore it till then. We also won’t need to handle case as b64 uses both:

 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
(defun b64-char (b64-code)
  "Return the b64 char for the 6-bit code."
  (assert (typep b64-code 'b64-code)
  (let ((code (cond ((<= b64-code 25)
                     (+ b64-code (char-code #\A)))
                    ((<= b64-code 51)
                     (+ -26 b64-code (char-code #\a)))
                    ((<= b64-code 61)
                     (+ -52 b64-code (char-code #\0)))
                    ((= b64-code 62)
                     (char-code #\+))
                    ((= b64-code 63)
                     (char-code #\/))
                    (t (error "Not a b64-code")))))
    (code-char code)))

That error call is superfluous because the assert call will prevent any valid input from falling through the cond, but the compiler complains without it as code-char can’t accept a nil input.

The complementary function is almost identical to the hex version:

 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
(defun char-b64 (char)
  "Return the 6-bit b64 code represented by the b64 character."
  (assert (typep char 'b64-char)
  (let ((upper-case-a-code (char-code #\A))
        (upper-case-z-code (char-code #\Z))
        (lower-case-a-code (char-code #\a))
        (lower-case-z-code (char-code #\z))
        (zero-code (char-code #\0))
        (nine-code (char-code #\9))
        (code (char-code char)))
    (cond ((and (>= code upper-case-a-code)
                (<= code upper-case-z-code))
           (- code upper-case-a-code))
          ((and (>= code lower-case-a-code)
                (<= code lower-case-z-code))
           (- code -26 lower-case-a-code))
          ((and (>= code zero-code)
                (<= code nine-code))
           (- code -52 zero-code))
          ((= code (char-code #\+))
           62)
          ((= code (char-code #\/))
           63))))

Which means we now should be able to convert to b64 and back again for all 64 character types:

0
1
2
3
4
(loop for code from 0 to 63
      do (assert (= code (char-b64 (b64-char code))))
      finally (return t))

=> T

We’ll continue on the path towards converting hex and b64 to each-other in the next entry.