\& = \1 + \2. Or, baby steps in Lisp regexp

Between writing news items, I sometimes twiddle with little pieces of Lisp.

Bob Wiley: Baby steps?

Dr. Leo Marvin: It means setting small, reasonable goals.

What about Bob

Here is a tiny function that I’m playing with, eventually to become part of a larger program.

(defun gijs-subhead ()
  "html tags for subheadings and headlines"
  (interactive)
  (goto-char (point-min))
  (while (re-search-forward "\\(.+?[A-Z0-9a-z]\\)\\([^\.\\|\"]$\\)" nil t) (replace-match "<h5>\\&</h5>" t nil)))

This goes over a text, finds all bits that don’t end in a . (full stop) or a " (quote mark) and put these bits in html-tags.

I took me a while to understand why my earlier incantation was always eating the last character. For example, a subheading in a text would be ‘Tax evasion’ (without the single quote marks) and my function would change it into Tax evasio.

My error was in understanding the replace-match

I first used this:

(replace-match "<h5>\\1</h5>" t nil)

but the correct version is:

(replace-match "<h5>\\&</h5>" t nil)

To understand the difference, look at the re-search-forward string.

(re-search-forward "\\(.+?[A-Z0-9a-z]\\)\\([^\.\\|\"]$\\)

The re-search-forward string consists of two parts, 1 and 2, separated by () braces, and these are escaped by double slashes \\. And there is, ofcourse, the whole string &. So, actually, as far as the replace-match is concerned, there are three parts.

When the replace-match takes only part ‘\\1’ , it omits ‘\\2’. And that second part defines all last characters except the full stop or the quote mark. Hence, \\1 = \\& - \\2.

Thanks for your patience, Cecil.

Avatar
Gijs Hillenius
Context for Digital Government

Policy specialist on open source in public services, knowlegde transfer expert

Related