Emitting hex symbols

Diving into the deepest of the style warnings we get from running the previous code, the next functions to write are:

  • #'hex-char
  • #'hex-code

In the Common Lisp spec, the only restriction given for how characters are supposed to be arranged is that they should be in increasing order, that is the code for #\b should be bigger than that of #\a. In the actual implementations, as of this writing, they’re all using ASCII for the alphanumeric characters. This means I could hard-code them, but since I want a bit more flexibility in here, and all encodings I am aware of keep numbers and letters grouped and in order, I will do something a bit more tricky.

Say we want to convert 12 to #\c and back again. Rather than use a hardcoded table, we can look up the code for #\a, which we know represents 10, and shift over +2 to get the code for #\c. Anything between 0 and 9 will be even easier as the only shift is their distance from #\0.

A simplified version of this would look like:

0
1
2
3
4
5
(defun hex-char (hex)
  (let ((0-code (char-code #\0)) )
        (a-code (char-code #\a))
    (if (<= hex 9)
        (+ hex 0-code))))
        (+ hex -10 a-code)

Basic idea is to look up the index of #\a and #\0, then offset the input number from there.

This works but there are a couple gotchas we need to address. For one, the input isn’t sanitized, it would allow for larger inputs than 15, and would then give invalid results. We should also add an &key parameter for whether to uppercase or lowercase the result.

This presents us with a decision to make: where to catch errors. The most built-in style, I think, would be to create a hex-code and hex-char type. Then we can have a typecheck handling the assertions. These can be the types:

  • 'hex-char
  • 'hex-code

It is normal in Common Lisp to use the same name-space for types as for functions. It is a Lisp-N after all. Those types can be defined in terms of satisfying predicates:

0
1
2
3
4
5
6
(deftype hex-char ()
  "A hex-char is one of 0-9, A-F, or a-f."
  `(satisfies hex-char-p))

(deftype hex-code ()
  "A hex-code is an integer between 0 and 15 inclusive."
  `(satisfies hex-code-p))

Which leaves the predicates to be defined:

 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
(defun hex-char-p (char)
  "A hex char is one of 0-9, A-F, or a-f."
  (let ((code (char-code char))
        (lower-a-code (char-code #\a))
        (lower-f-code (char-code #\f))
        (upper-a-code (char-code #\A))
        (upper-f-code (char-code #\F))
        (0-code (char-code #\0))
        (9-code (char-code #\9)))
    (or (and (>= code lower-a-code)
             (<= code lower-f-code))
        (and (>= code upper-a-code)
             (<= code upper-f-code))
        (and (>= code 0-code)
             (<= code 9-code)))))

(defun hex-code-p (code)
  "A hex-code is an integer between 0 and 15 inclusive."
  (typep code '(integer 0 15)))

Notice how much code is repeated. I don’t particularly like this, and may write some macro to generate all of the boilerplate in it. My usual way of making macros is to make the function it will output first, then backtrack to build up a Lisp generating function, then wrap that in a macro if needed to make the syntax more predictable. That comes later tho, if at all.

Now, to add type-checking to the hex-char function, as well as accounting for upper-case:

 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(defun hex-char (code &key (case :lower))
  "Return the hex char for the code where 0->0 and 15->f. If the code is above
  9, and case is :upper, return an upper-case char, otherwise the default is
  :lower to return a lower-case char."
  (assert (typep code 'hex-code))
  (assert (or (eq case :lower) (eq case :upper)))
  (code-char (if (<= code 9)
                 (+ code (char-code #\0)))
                 (+ code -10 (if (eq case :lower)
                                 (char-code #\a)
                                 (char-code #\A)))))

There are a few ways I could have arranged this. Since it is likely to be called in a tight loop, I tried to put as few branches as possible. Since the offsets are only used one I dropped the let. It was useful for clarity but not for performance. We could go further and drop the assertions for declarations, but I take safety over performance when it comes to cryptographic code.

A couple quick tests at the REPL confirms this only accepts proper hex codes, and produces the required hex chars. Next to build the inverse function.

We don’t need to specify case here, since it should accept both lower and upper. We can use the type spec as well, right from the start this time since that is already written.

I can see two ways to write this. First we could find which range the code for the character is between, then offset to get the hex code:

 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
(defun char-hex (char)
  "Return the 4-bit hex integer represented by the character."
  (assert (typep char 'hex-char))
  (let* ((code (char-code char))
         (lower-a-code (char-code #\a))
         (lower-f-code (char-code #\f))
         (upper-a-code (char-code #\A))
         (upper-f-code (char-code #\F))
         (0-code (char-code #\0))
         (9-code (char-code #\9)))
    (cond ((and (>= code 0-code)
                (<= code 9-code))
           (- code 0-code))
          ((and (>= code lower-a-code)
                (<= code lower-f-code))
           (- code -10 lower-a-code))
          ((and (>= code upper-a-code)
                (<= code upper-f-code))
           (- code -10 upper-a-code))

Alternatively, since we know the type is already constraining the result to the correct number we can just subtract the right offset:

0
1
2
3
4
5
6
7
(defun char-hex-2 (char)
  "Return the 4-bit hex integer represented by the character."
  (assert (typep char 'hex-char))
  (- (char-code char) (if (digit-char-p char)
                          (char-code #\0)
                          (- 10 (if (lower-case-p char)
                                    (char-code #\a)
                                    (char-code #\A))))))

I wonder which is faster in a tight loop. They are both of the same complexity since the input is single character, but the second one looks like it would have a less branching. So my guess would be the second one.

Wrong! Apparently the first one is twice as quick on my machine for alphabetic characters. For digits the difference is less since there is one fewer branch. Interestingly the first one seems to be consistent in timing regardless of the input. I wonder what the compiler is doing and how declarations would impact it. Anyways, we will go with the first one.

Let’s see if going from int to char and back again works for all valid inputs:

0
1
2
3
4
(loop for test from 0 to 15
      do (assert (= test (char-hex (hex-char test))))
      finally (return t))

=> T