💾 Archived View for 1436.ninja › Phlog › 20190823.post captured on 2020-09-24 at 01:20:43.

View Raw

More Information

-=-=-=-=-=-=-

# HTTPS->HTTP Proxy for Plucker
### Entered on Sony PEG UX50
### 20190823

So, plucker did not show in `apt search plucker`
on my Raspberry Pi. I tried manual installations - this
failed. On a whim I installed the synaptic package
manager and searched. To my surprise plucker was
on the list! Step one was complete.

I was able to make plucker documents from my
terminal, but as is commonly known - it is only
possible to do so for http accessible pages. So, to
accomplish plucker docs on sites using https, one 
needs a proxy to take it to http. Since no one should
allow the unwashed masses free rein to move
web traffic through their machine, it makes sense
to only run this sort of thing on a web server that
the outside world cannot touch.

I setup apache2 on the Raspberry Pi of Anguish since
it is not an Internet facing machine, and I am comfy
with apache having used it for over 20 years. I made
sure cgi scripts were enabled on this installation. 
Then I started throwng together some curl, a bucket of 
sed, a pinch of awk… It got ugly pretty quick.

I have it working for the 1st page it hits. All links are
changed to hit your proxy. Some sites work fairly
well. Some not at all. I am seeking help in improving 
this idea as I cannot be alone in having a use for this,
and plucker is not the only archaic piece of software
that cannot surf https…

Anyway, the code follows:

!/bin/bash

depends on curl and gridsite-clients

change this to your server

server="rpoa"

ths="http:\/\/$server\/cgi-bin\/doit.cgi\?"

req=$(urlencode -d "$1")

req2="$ths$(urlencode "$req")"

dom=$ths"https:\/\/"$( printf "$1" | awk -F[/:] '{print $4}' )

print your mime header!

curl -sSI "$req" | printf "%s\n\n" "$(grep "^Content-Type:" \

| sed 's/text\/plain/text\/html/')"

harvest and massage your content, changing hyperlinks to this proxy

curl -sS "$req" | sed "/href=\"[[:alnum:]]/ s@href=\"@href=\"$req2@g; \

/href=\"\// s/href=\"/href=\"https:\/\/$dom/g; /src=\"[[:alnum:]]/ \

s@src=\"@src=\"$req2@g; /src=\"\// s/src=\"/src=\"$dom/g; \

s@"$req2"http@http@g;s@https://http://@http://@g"