💾 Archived View for idiomdrottning.org › strse captured on 2024-08-31 at 14:32:06. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-03-20)

-=-=-=-=-=-=-

strse

Strse (rhymes with terse) is a string DSL for Scheme.

(strse "this is freaking awesome"
        "is" "at"
        (only second "at") "was so"
        (only third 'word) string-upcase
        "frea" "ve"
        "king" "ry"
        (=> adjective "very") (conc adjective ", " adjective)
        (only last 'word) "nice")

⇒ “that was SO very, very nice”

The first argument is the source string, followed by any number of alternating search patterns and replacement expressions.

Search patterns

Strse is nothing but a thin, glory-hogging, unnecessary veneer on top of Alex Shinn’s wonderful irregex, so search patterns can be both Unix style regexes and sexp style SREs.

(strse "banana" "na$" "lity")

⇒ “banality”

(strse "banana" 'eol "rama")

⇒ “bananarama”

Replacement expressions

Replacement expressions are a single expression (but one that has access to all of Scheme, including begin, let etc).

Replacement expressions have access to two anaphoric vars. You can get the whole input string (for the current step) with the name it, and you can get the matches and submatches by giving numeric arguments to m. So (m 0) is the whole match, and (m 1) the first submatch etc.

If a replacement expression evaluates to a string, that becomes the replacement text for the match.

If it evaluates to a procedure, it is applied to the matched substring (as a whole, not considering submatches). If that outputs a string, that becomes the replacement text for the match.

If you supply a literal SRE, all named submatches are bound to their names!

(strse "oh my word" (: word " " (=> pronoun word)) pronoun)

⇒ “my word”

(strse "oh my word" (: word " " (=> pronoun word))
       (conc pronoun pronoun pronoun))

⇒ “mymymy word”

What the heck is a “literal SRE”? Normally, irregex SREs are quoted symbols or lists.

(strse "all vampires are named" 'word "dracula")

⇒ “dracula dracula dracula dracula”

But if strse sees a pair that does not start with quote or quasiquote, it’ll get access to the named submatches in there (and then add the quote for you). Atoms are not messed with, so you can supply previously bound regexes:

(let ((lucy '(: word space word)))
  (strse "all vampires are named" lucy "dracula"))

⇒ “dracula dracula”

In that case, it can’t see inside of those regexes in order to bind submatches to names.

Replacement operators

You can optionally add a single operator.

If you don’t, you get your garden variety replace all, one pass.

then, entire, and return go on the replacement, and the others go on the search pattern.

then

If you just want to execute side-effects on a match without changing the string, wrap them in a then special form. This also provides an implicit begin.

(strse "hippopotamus"
       "elephant" (then (print "I saw an elephant!"))
       "hippo" (then (print "I saw a hippo!"))
       "tiger" (then (print "I saw a cat!")))

I saw a hippo!

⇒ “hippopotamus”

Another example:

(define (acc)
  (let ((things '()))
    (lambda thing
      (if (null? thing)
      things
      (push! (car thing) things)))))

(define (extract str)
  (define digs (acc))
  (define words (acc))
  (strse str
     (= 3 num) (then (digs (string->number (m 0))))
     (+ alpha) (then (words (m 0))))
  (list (digs) (words)))

(extract "it will get 234 and 123 and 747 but not 1983 or 42 but then again 420")

⇒ ((420 198 747 123 234) (“again” “then” “but” “or” “not” “but” “and” “and” “get” “will” “it”))

entire

Replace the entire string, not just the matched part, if there is a match. This also provides an implicit begin.

(strse "chirp chirp birds"
       "chir" "shee"
       "sheep" (entire "The sentence got woolly"))

⇒ “The sentence got woolly”

recursively

Keep running the same replacement recursively. This can hang unless your search eventually terminates, but it can be really handy as long as you are careful.

(strse "aaaaaaaah!" "aa" "a")

⇒ “aaaah!”

(strse "aaaaaaaah!" (recursively "aa") "a")

⇒ “ah!”

truly

Keep going as normal if there is a match, but if there isn’t, stop strse and return #f without evaluating any further.

(strse "parrot"
       (truly "a") (begin (print "Found a") "i")
       (truly "e") (begin (print "Found e") "i")
       (truly "o") (begin (print "Found o") "i"))

Found a

⇒ #f

return

If there’s a match, stop strse and return the value (doesn’t have to be a string, and if it’s a procedure it’s applied to (m 0)). Provides an implicit begin but you don’t have to use side-effects, you can use pure functions with this one if you want. You have access to named groups.

(strse "bamana" "x" (return 'nope) "m" (return (list 'a 'b 'c (m 0))) "a" "j")

⇒ (a b c “m”)

If there’s no match, it returns the string (and keeps going, if there are more operations):

(strse "bamana" "x" (return 'nope) "o" (return (list 'a 'b 'c (m 0))) "a" "j")

⇒ “bjmjnj”

only

Replace just one match even if there are more. You need to supply a list index (zero-indexed. Negative numbers count from the back, so -2 is the second last, -1 is the last) or a list accessor function like cadr or last.

strse?

(strse? str reg)

Just returns #t if reg is in str and #f otherwise.

(strse? reg)

Returns a predicate that takes a str argument and checks if reg is in it.

In other words, it’s curried on it’s second argument, kind of a backwards currying but often convenient.

strse*

Like strse, but works on anything.

(strse* '(SOME (OLD STYLE) LISP) 'word string-downcase)
⇒ (some (old style) lisp)

(strse* 3422 "2" "1")
⇒ 3411

strse*?

Like strse? but works on anything.

(map (strse?* "1") (iota 13))
⇒ (#f #t #f #f #f #f #f #f #f #f #t #t #t)

Porting from the old version of strse

The old vesion of strse let you jam extra magic booleans and numbers in there, and, that still works. Old code should still work. Yay cruft in the name of backwards compatibility!

The point of the new version is to be more consistent, alternating patterns and replacements.

Here is a Rosetta from old to new.

(strse s foo bar)
(strse s foo bar)

(strse s foo (then bar))
(strse s foo (then bar))

(strse s foo bar #f)
(strse s (truly foo) bar)

(strse s foo bar 0)
(strse s (entire foo) bar)

(strse s foo bar 3)
(strse s (only third foo) bar)
;; or:
(strse s (only 2 foo) bar)

Both will work: a list accessor function like first or last or third, or a zero-based positive index, or a negative index. The old index is one-based.

Currying

From strse 1.38, both strse and strse* are also available curried. Just elide the initial string to get a procedure that calls strse on that string, like this:

(map (strse "a" "b") '("banan" "abba" "citron"))
⇒ ("bbnbn" "bbbb" "citron")

Source code

For a repo,

git clone https://idiomdrottning.org/strse

irregex