Web Adjuster

Web Adjuster is a Tornado-based, domain-rewriting proxy for applying custom processing to Web pages. It is particularly meant for users of smartphones etc as these might not support browser extensions. Web Adjuster can:

Domain rewriting means you do not need to be able to change the device’s proxy settings—you simply go to a different address. However, only the domain part is different, so most in-site scripting should work as-is, without needing delicate alterations to its URI handling. For example, if you have a server called adjuster.example.org and you want to see www.example.com, simply go to www.example.com.adjuster.example.org. Your server ideally needs a wildcard domain, but you can manage without one in some cases, and Web Adjuster can also be a “real” HTTP proxy for local use on a desktop etc.

Because it is based on a single-threaded event-driven Tornado server, Web Adjuster can efficiently handle connections even on a low-power machine like the original Raspberry Pi. (Add-on programs run in other threads, but this is seldom a slow-down in practice.) Tornado also makes Web Adjuster easier to set up: it is a separate, self-contained server that doesn’t need to be worked into the configuration of another one—it can listen on an alternate port (and can be password protected)—but if you prefer you *can* configure it to share port 80 with another server.

Installation

1. Make sure Python and Tornado are on the system. (Web Adjuster has been tested in both Python 2 and Python 3, and with Tornado versions between 2.x and 6.x.)

2. Download adjuster.py

adjuster.py

Options for Web Adjuster v3.237

General options

Network listening and security settings

DNS and website settings

General adjustment options

External processing options

Javascript execution options

Server control options

Media conversion options

Character rendering options

Dynamic DNS options

Speedup options

Logging options

Tornado-provided logging options are not listed above because they might vary across Tornado versions; run python adjuster.py --help to see a full list of the ones available on your setup. They typically include log_file_max_size, log_file_num_backups, log_file_prefix and log_to_stderr.

WSGI mode

Web Adjuster is best run as a standalone server (see above) or behind a proxy like nginx, but if you must use WSGI then you can do it like this:

2.1. In your wrapper Python script, import adjuster

2.2. Set options via adjuster.options. optionName = value (remembering to set port to 80; options are as above, but some of them, such as server control options, do not apply to WSGI mode)

2.3. Do myApp = adjuster.make_WSGI_application()

2.4. Do something with myApp, according to whatever WSGI framework you are using.

2.4.1. Make an app.yaml file like:

runtime: python312
automatic_scaling:
  max_instances: 1
  min_instances: 0

2.4.2. Place and/or symlink this app.yaml along with adjuster.py and your wrapper script, which you should call main.py and change myApp to app, and also copy or symlink the tornado subdirectory from a download of Tornado version 5.1.1 or below (this is so AppEngine will run gunicorn for you; version 6 dropped WSGI functionality; alternatively you can directly set up Tornado in non-WSGI mode but this is more difficult on AppEngine Standard). You might also need to create an empty “placeholder” version of fcntl.py (Tornado 3 also needs an empty ssl.py but this shouldn’t be done if you downloaded Tornado 5)

2.4.3. If your settings need PIL or LXML, add a requirements.txt like

pillow
lxml

2.4.4. Deploy via gcloud app deploy app.yaml --project followed by the app ID you registered. Since 2020, a payment method must be entered even if you use only the “free tier”. Google said the above setting of max_instances: 1 “usually keeps your instance hour usage within the free tier” but from 2023 gcloud app deploy started to replicate a “bucket” across multiple regions using traffic that’s no longer included in Google’s “free tier”—it charges about a penny every time you update your app, invoiced as “Networking Traffic Egress GCP Replication within Northern America” and might cause problems later if your billing details are outdated (I’ve not figured out a way to make AppEngine plus buckets all single region); in some cases Google will hold back from actually debiting your bank until more than a few pennies are owed, which means you’ll repeatedly get emailed monthly bills saying the amount is still outstanding.

Options that call external programs are unlikely to work in AppEngine Standard but you can use htmlFilter with Python functions (see above; if you have large modules not always used then you might want to import these *on demand*)

ErrorDocument 404 /wrapper.cgi
Options -Indexes
ErrorDocument 403 /wrapper.cgi

in .htaccess (and ensure AllowOverride All is set in the config files) to send all requests to the CGI, which should then import adjuster from outside the webspace (e.g. by adding to sys.path first)—but it’s not necessary to send other requests to the CGI if you set submitPath to the CGI’s path plus ? and want only the ‘enter your own text’ functionality.

To-do list

When dealing with a very slow site, it would be nice if Web Adjuster could start sending text to the browser *before* the upstream fetch has completed.

License

Web Adjuster is free software licensed under the Apache License, Version 2.0 (this is also the license used by Tornado itself). If you use it in a good project, I’d appreciate hearing about it.

Citation

If you need to cite a peer-reviewed paper:

Silas S. Brown.  Web Annotation with Modified-Yarowsky and Other Algorithms.  Overload 112 (December 2012) pp.4-7

Legal

All material © Silas S. Brown unless otherwise stated. Android is a trademark of Google LLC. Apache is a registered trademark of The Apache Software Foundation. AppEngine is possibly a trademark of Google LLC. Energenie is a trademark of Sandal Plc. Firefox is a registered trademark of The Mozilla Foundation. GitHub is a trademark of GitHub Inc. Google is a trademark of Google LLC. iPhone is a trademark of Apple in some countries. Javascript is a trademark of Oracle Corporation in the US. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Mac is a trademark of Apple Inc. MP3 is a trademark that was registered in Europe to Hypermedia GmbH Webcasting but I was unable to confirm its current holder. Post Office is a registered trademark of Post Office Limited. Python is a trademark of the Python Software Foundation. Raspberry Pi is a trademark of the Raspberry Pi Foundation. Unicode is a registered trademark of Unicode, Inc. in the United States and other countries. Unix is a trademark of The Open Group. Windows is a registered trademark of Microsoft Corp. Any other trademarks I mentioned without realising are trademarks of their respective holders.