đŸ Archived View for gemini.ctrl-c.club âș ~ssb22 âș polipo2.gmi captured on 2023-03-20 at 19:00:30. Gemini links have been rewritten to link to archived content
âŹ ïž Previous capture (2023-01-29)
-=-=-=-=-=-=-
Polipo2 is a small caching web proxy with an easily-readable data format.âIt runs fast when handling a *few* connections, but Squid is better for extreme loads (see performance below), and nginx with ngx_http_proxy_connect_module is likely better if you donât want to cache (Polipo2âs cache is an integral part of its operation and cannot be disabled; you could also set up Tinyproxy as a non-caching proxy but it might be less reliable than nginx).
Polipo2 is a 2017 fork of Juliusz Chroboczekâs âPolipoâ [octopus] after he ceased maintaining the original in 2016.
Polipo2 is not intended for general client-side useâas Juliusz said when he stepped down, caching proxies are becoming obsolete for general use on the client side due to the increasing prevalence of encrypted alternatives to HTTP that reduce caching proxies to simple relays.âIf all you want is a simple relay (for example so your Web traffic originates from a remote IP address), then you can do better by using a VPN or a SOCKS5 proxy.
Polipo2 is now intended as a drop-in caching layer for experimental HTTP proxies such as Web Adjuster (see Adjusterâs --upstream-proxy option; I suggest increasing polipo2âs serverSlots and serverMaxSlots when used with parentProxy=localhost:8124).
Polipo2 is available on GitHub, and on a suitably well-provisioned GNU/Linux system you can do:
git clone https://github.com/ssb22/polipo2.git cd polipo2 make sudo make install
the last step being optional (you can run polipo2 directly from your user account, e.g. ./polipo2 logFile=/tmp/polipo2.log pidFile=/tmp/polipo2.pid diskCacheRoot=""; if you are using it with Web Adjusterâs --upstream-proxy=:8123 you should also add parentProxy=localhost:8124 to the Polipo2 options, and consider serverSlots=256 and serverMaxSlots=256).
Polipo2 (like the original Polipo) uses the POSIX poll() system call (like 4.2BSDâs select() without the 1024-socket limit) to monitor its in-progress connections from a single thread.âIt does not use the more advanced epoll/kqueue mechanism that modern versions of large-scale proxies like Squid can use (and that Tornado and hence Web Adjuster can use) on the GNU/Linux and BSD platforms.
The problem with poll() is, it can monitor a *huge* number of connections at once to tell your code when one or more of them needs processing, but it doesnât tell you *which* ones need this processing.âSo when poll() returns, Polipo2 has to spend CPU time looping through *all* of its open connections to see what needs doing.âBy contrast, Linuxâs epoll and BSDâs kqueue can point your code *directly* at the connections that need attention, eliminating that loop.
This is not an issue if you have only a few dozen connections going at once, but once youâre in the tens of thousands, you *will* notice a CPU holdup from those tens of thousands of extra checks that have to be done every time anything happens on any connection!
If thatâs your situation, Iâd recommend switching to Squid, which is more scalable.âI currently have no plans to upgrade Polipo2âs poll() into an epoll/kqueue, as Polipo2 is intended for small-scale experimental use (so if youâre getting big, âbite the bulletâ and install Squid).
http_port 127.0.0.1:3128
http_access allow localhost
cache_peer 127.0.0.1 parent 3129 0 no-query no-digest
never_direct allow all
dead_peer_timeout 99 seconds
access_log none localhost
cache_mem 256 MB
(see Squidâs cache_dir directive if you also want a disk cache)
As with Polipo, Polipo2 does not have options to limit the maximum size of its disk cache.âYou could periodically purge from a separate process and then send SIGUSR2 (discard objects) to the running Polipo2, but if the machine is expected to stay up, itâs likely easier to run in RAM+swap (by setting diskCacheRoot="") and then the size can be constrained more accurately (defaults to 25% of RAM; see note on differences with Polipo below).
On the other hand, a non-expiring disk cache is a useful option if you wish to collect a *corpus* of material from a site as users browse it (without needing to run a âcrawlerâ which might annoy the site); Polipo2âs file format is quite easy for other programs to read.âObviously youâd have to respect the copyright on the resulting material.
So far:
1. If running in RAM only (diskCacheRoot=""), Polipo2âs chunkHighMark now defaults to 25% of the physical RAM *even on machines above 96M* (its default is not limited to 24M as it is when diskCacheRoot is set). objectHighMark defaults to 2048 objects for every 24M in chunkHighMark.âYou can still override these of course (remember chunkHighMark is in *bytes*).
2. Polipo2 contains a little more âdefensive codeâ to catch segmentation faults before they happen.
3. Polipo2 does not upset your system administrator by having a home page that says âno longer maintainedâ in large letters at the top.âIf I come across additional problems, I intend to either fix them or (in the case of a security problem I canât easily fix) remove or disable the offending functionality so Polipo2 stays âsysadmin-friendlyâ.
All material © Silas S. Brown unless otherwise stated. GitHub is a trademark of GitHub Inc. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Any other trademarks I mentioned without realising are trademarks of their respective holders.