💾 Archived View for thrig.me › blog › 2024 › 07 › 04 › port-knocking.gmi captured on 2024-07-09 at 01:45:37. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Port Knocking

Port knocking is a method of altering the configuration of a server, usually its firewall rules, in response to particular sequences of connections to network ports that need not be open. The firewall logs or network traffic is monitored by something that takes the necessary actions when a correct knock sequence is observed. I've never used port knocking, though there may be situations where one has a "hidden fortress" that runs no external services yet where one will want to open up SSH or wireguard to only particular remote addresses.

Port knocking was apparently improved on via single packet authentication,

https://www.cipherdyne.org/fwknop/

which avoids having to send multiple packets. Why is sending multiple packets bad? Reasons include the attacker replaying your traffic, or that the packets of port knocking may look like a port scan and thus be flagged by an intrusion detection system. There's not much information you can send with a short knock sequence. Up to the MTU bytes can instead be sent in a single packet, which allows for nifty things like random data and checksums, and a lone packet is more likely to fly under the radar of an IDS or quick-to-throttle firewall. The subsequent SSH or VPN traffic to a host that previously did not allow those may however be noteworthy to an IDS. Maybe delay things so the knock packet and the opening and use of the protocol aren't so close together in time, if that's a concern? The knock packet might also be made to look like something else.

Alternatives

If the host-to-be-connected-to runs a network service, then a message to that service could cause a change to the firewall rules. This has similar problems as port knocking, in that there must be code which may have security vulnerabilities, or the message may suffer from replay attacks. With a Gemini server, one could either run a program under a particular URL, possibly mandating a SSL client certificate, or scan the server log for particular requests that indicate the firewall should be opened for the requesting IP address. A log scanner would be limited to what the server can put into the logs, while code could do more, at the cost of an embiggened attack surface. Any URL that lets an attacker gain a toehold on the system could be a problem.

Another method is to have a console, often faked via the web for cloud virts, though this may only be local. Alternative means of access however open up creative ways for attackers to get at the system. Would you notice if someone was brute forcing root's password over the console interface, or attacking the SSH server via localhost? Security vulnerabilities aren't unknown in web browsers, if you access a console via the web. This may point towards separate devices: one for accessing the modern web, and another used for remote access to critical systems. One problem here is juggling the two devices and never going onto the dangerous web with the secure device, versus the problem of people making everything require a web interface that needs a problematic browser to access. One charming device would peg a CPU at 100% while drawing a JavaScript progress bar during updates; a more sane interface might look like

    $ ssh thatcrappydevice 'update && reboot'

but nobody asked me.

Logging

If the "knock", however it is done, is easy to carry out, then one may want to monitor what hosts are triggering it. Monitoring would also help avoid the situation where you are confident that nobody can figure out the complicated knock scheme you've devised, hubris that leads to nemesis when an attacker does figure it out. Apollo 13 was similar where nobody expected the spacecraft to fail like it did, so time was spent on "maybe it's the instruments?" The simulations were subsequently allowed to introduce any such clearly impossible failure cases, among other fixes.

An "easy" knock might be a static URL; if someone figures out that URL (maybe a client system was hacked, or they found the documentation in backups or a stray email, or the request was seen by a proxy, etc) then the URL (or knocking method) can be changed and the access list reset.

gemini://example.org/cgi-bin/knock/154f76a142c24861076bc9d7aad742b4bd64e3323dd34dea57045c89ce58139f

However a random hash might be hard for a user to remember, though something a bit longer (and then hashed) might be easier to remember. Per-user knocking is doubtless necessary with a team whose members will change over time.

    $ printf "say 'friend' and enter" | sha256
    154f76a142c24861076bc9d7aad742b4bd64e3323dd34dea57045c89ce58139f

A more complicated knock might demand a particular client certificate with the request (or one signed by a particular certificate authority) and that both sides share a private key or have other state that must be kept in sync: S/KEY passwords, or however complicated you want to make it. A less complicated setup may more easily allow an attacker to gain a foothold, but less complicated things have less parts that break: a new client may not have a certificate, the time sync is wrong (how do you fix NTP if you can't login to fix NTP?), there is no access to obtain the shared secret that's only in the internal documentation, etc.

RFC 1760 - The S/KEY One-Time Password System

"Towards Human Computable Passwords". Jeremiah Blocki, Manuel Blum, Anupam Datta, Santosh Vempala. 2014.

Regardless, with logging of who has knocked in (possibly summarized to avoid the 10,000 messages problem) any unexpected guests can probably be noticed. Also with defense in depth an attacker who does knock in should be presented with a SSH server (or VPN) that limits and logs connection attempts, requires public key authentication, etc. There can be false positives, but probably a Goldilocks zone can be found that allows for a decent number of connections with low enough limits to make brute force attacks impossible or at least problematic, and logged enough to trigger an alert. Even on "secure" and "private" networks that no attacker could possibly gain access to.

The knock system may itself need rate throttling, limited access, and monitoring given that attackers may learn of it and try to brute force their way in.

Limited SSH

If you're using some other protocol (e.g. wireguard) then some or none of this may apply, though there could be ways to limit naughty traffic to those protocols. Even if you have a VPN, you may want to limit access to SSH servers on that network to make life more difficult for any attackers who gain access to the VPN. Or you could go with "soft and chewy on the inside" which is less work to maintain, and has the advantage of being less likely to lock legitimate users out.

Firewall Level

At the firewall level connections can be limited, though ControlMaster must be used by SSH clients to reuse existing connections. The limits below are very low to make testing easier; the actual limits may need to be higher, especially if addresses are being aggregated through NAT, or there are a lot of users accessing the service.

    pass in log quick proto tcp to any port 22 \
      keep state (max 4, source-track rule, max-src-states 2)

This is not very user friendly, so mostly suits admin systems, and still may trip up an experienced sysadmin who has been working too long with too little sleep (as may too often be the case in some worker's paradise or the other). On the other hand, it does operate at the firewall level so will handle attacks maybe not visible or logged by the SSH daemon. The "not very user friendly" part is because the connection hangs and eventually times out,

    $ ssh user@testhost
    ssh: connect to host testhost port 22: Operation timed out

while the firewall log only shows SYN packets from the client.

    $ doas tcpdump -n -e -ttttt -i pflog0
    ...
    0.0 rule 4/(match) pass in on wg0: 192.0.2.123.10584 > 192.0.2.31.22: S
    5.998374 rule 4/(match) pass in on wg0: 192.0.2.123.10584 > 192.0.2.31.22: S
    17.998463 rule 4/(match) pass in on wg0: 192.0.2.123.10584 > 192.0.2.31.22: S
    41.998840 rule 4/(match) pass in on wg0: 192.0.2.123.10584 > 192.0.2.31.22: S

You could check the states and see how many connections are active; this information might be good to tally somewhere, perhaps with RRDtool, as then you could monitor the active states getting close to any firewall or system limits. Or you could wait for things to fail in production and then start debugging.

    # pfctl -s states | grep :22
    all tcp 192.0.2.31:22 <- 192.0.2.123:7133       ESTABLISHED:ESTABLISHED
    all tcp 192.0.2.31:22 <- 192.0.2.123:41889       ESTABLISHED:ESTABLISHED

Obviously the limits can be set low on my busy server used only by me (or at least I think it's only me). Too low and someone could more easily denial-of-service your access, which GitHub found out when they had a really low DNS TTL and someone knocked the upstream DNS servers offline for long enough. So again there's a Goldilocks zone, and it will vary depending on the site, how many users there are, whether there are alternative means for access, etc. One trick might be to use as small a VPN network as possible so there not enough unique IP addresses on the network to launch attacks from, and then to set the maximum limit to somewhere around that number of hosts, and the connections-per-host to some reasonable value—128 maximum, and a limit of four per system?

    $ ipcalc 192.0.2.64/27
    address   : 192.0.2.64
    netmask   : 255.255.255.224 (0xffffffe0)
    network   : 192.0.2.64      /27
    broadcast : 192.0.2.95
    host min  : 192.0.2.65
    host max  : 192.0.2.94
    hosts/net : 30

If there's IPv6 then getting a small enough subnet might be tricky, especially if SLAAC is involved. But does an admin VPN need lots of hosts on it?

SSH Daemon

/blog/2024/06/24/ssh-autoban.gmi

sshd(8) has various other, older knobs, such as MaxStartups. These should be tuned and tested, though customization may run afoul sshd changes (upgrades, or downgrades) or changes in use of the service (more, or fewer users). Better logging and statistics will help keep track of the current use, and where that is in relation to the limits. Basically look through sshd(8) and think about every option and whether changing it would help prevent attackers do naughty things, and how much the change might impact actual users.

Sites where the administration is less "hands on" may not benefit from too many customizations, as there may be long periods where no consultants are around to monitor things. There is a risk here of changing something, forgetting about it, then running afoul the change. This is fine if you hate future you.