💾 Archived View for gemini.susa.net › notes_on_my_search_script.gmi captured on 2022-06-03 at 22:52:12. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

Notes on a simple site-seach script

The scripts described here depend on Bash, Ack, and Awk to search the site contents and present them in a readable format.

Ack is a grep-like utility that tends to be my first choice for searching file, mainly because it produces nice terminal output and searches recursively by default.

I have used Ack here, but really grep would do just as well. Ack is written in Perl, so we're launching a heavy interpreter to do the search, though I don't care so much right now.

So, the script prompts for a QUERY_STRING if one doesn't exist, otherwise it decodes and sanitises the query string and invokes ack. The results are piped through an awk script that converts the filename into a Gemtext relative link, and lists the matching lines and line numbers for that file.

The variable IGNORE_DIRS allows me to specify a space-separated list of directories to omit when searching (e.g. cgi-bin). The query string is very heavily sanitised just to avoid gotchas with possible code injection - this is a Bash shell after all. Only alphanumeric, space and underscore are currently allowed. See the 'tr' call to extend these.

The search script can be run from the command line for testing just by setting the QUERY_STRING variable: -

QUERY_STRING="some_keyword" ./search

cgi-bin/search

#!/bin/bash

function urldecode() {
    # Replace-ALL (//) '+' with <space>
    : "${*//+/ }";
    # Replace-ALL (//) '%' with escape-x and evaluate (-e) on echo 
    echo -e "${_//%/\\x}";
}

if [[ "${QUERY_STRING}" == "" ]]; then
    echo -ne "10 Please enter a search term\r\n"
    exit
fi

echo -ne "20 text/gemini\r\n"

DECODED=$(urldecode "${QUERY_STRING}"|tr -cd '[A-Za-z0-9 _]')

echo "# Search results for: ${DECODED}"

IGNORE_DIRS="cgi-bin gsps"
IGNORE_PARAM=''

for d in ${IGNORE_DIRS}; do
    IGNORE_PARAM+="--ignore-dir=${d} "
done

echo "Ignoring '${IGNORE_PARAM}'"

cd ../../content/
/usr/bin/ack -i ${IGNORE_PARAM}  "${DECODED}"|./cgi-bin/ack2gemtext.awk

cgi-bin/ack2gemtext.awk

#!/usr/bin/awk -f

BEGIN {
    FS = ":";
    filename=""
}

{
    if($1 != filename) {
        print "=> /" $1 " " $1 "\n";
        filename = $1
    }
    print "(Line " $2 "): " $3 "";
}