HTTPS->HTTP Proxy for Plucker

Entered on Sony PEG UX50

20190823

So, plucker did not show in apt search plucker on my Raspberry Pi. I tried manual installations - this failed. On a whim I installed the synaptic package manager and searched. To my surprise plucker was on the list! Step one was complete.

I was able to make plucker documents from my terminal, but as is commonly known - it is only possible to do so for http accessible pages. So, to accomplish plucker docs on sites using https, one needs a proxy to take it to http. Since no one should allow the unwashed masses free rein to move web traffic through their machine, it makes sense to only run this sort of thing on a web server that the outside world cannot touch.

I setup apache2 on the Raspberry Pi of Anguish since it is not an Internet facing machine, and I am comfy with apache having used it for over 20 years. I made sure cgi scripts were enabled on this installation. Then I started throwng together some curl, a bucket of sed, a pinch of awk… It got ugly pretty quick.

I have it working for the 1st page it hits. All links are changed to hit your proxy. Some sites work fairly well. Some not at all. I am seeking help in improving this idea as I cannot be alone in having a use for this, and plucker is not the only archaic piece of software that cannot surf https…

Anyway, the code follows:

#!/bin/bash
#depends on curl and gridsite-clients
#change this to your server
server="rpoa"

ths="http:\/\/$server\/cgi-bin\/doit.cgi\?"
req=$(urlencode -d "$1")
req2="$ths$(urlencode "$req")"
dom=$ths"https:\/\/"$( printf "$1" | awk -F[/:] '{print $4}' )

#print your mime header!
curl -sSI "$req" | printf "%s\n\n" "$(grep "^Content-Type:" \
| sed 's/text\/plain/text\/html/')"

#harvest and massage your content, changing hyperlinks to this proxy
curl -sS "$req" | sed "/href=\"[[:alnum:]]/ s@href=\"@href=\"$req2@g; \
/href=\"\// s/href=\"/href=\"https:\/\/$dom/g; /src=\"[[:alnum:]]/ \
s@src=\"@src=\"$req2@g; /src=\"\// s/src=\"/src=\"$dom/g; \
s@"$req2"http@http@g;s@https://http://@http://@g"