💾 Archived View for envs.net › ~seirdy › 2020 › 11 › 23 › website-best-practices.gmi captured on 2022-04-29 at 12:40:23. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-04-28)
-=-=-=-=-=-=-
Originally posted 2020-11-23. Last updated 2022-04-27.
The following applies to minimal websites that focus primarily on text. It does not apply to websites that have a lot of non-textual content. It also does not apply to websites that focus more on generating revenue or pleasing investors than being inclusive.
This is a "living document" that I add to as I receive feedback.
This is also a somewhat long read; for a summary, skip everything between the table of contents and the conclusion.
I realize not everybody's going to ditch the Web and switch to Gemini or Gopher today (that'll take, like, a month at the longest). Until that happens, here's a non-exhaustive, highly-opinionated list of best practices for websites that focus primarily on text. I don't expect anybody to fully agree with the list; nonetheless, the article should have *some* useful information for any web content author or front-end web developer.
My primary focus is inclusive design:
Accomodation versus inclusive design.
Specifically, I focus on supporting *under-represented ways to read a page*. Not all users load a page in a common web-browser and navigate effortlessly with their eyes and hands. Authors often neglect people who read through accessibility tools, tiny viewports, machine translators, "reading mode" implementations, the Tor network, printouts, hostile networks, and uncommon browsers, to name a few. I list more niches in the conclusion. Compatibility with so many niches sounds far more daunting than it really is: if you only selectively override browser defaults and use plain-old, semantic HTML (POSH), you've done half of the work already.
One of the core ideas behind the flavor of inclusive design I present is being *inclusive by default.* Web pages shouldn't use accessible overlays, reduced-data modes, or other personalizations if these features can be available all the time. Of course, some features conflict; you can't display a light and dark color scheme simultaneously. Personalization is a fallback strategy to resolve conflicting needs. Disproportionately under-represented needs deserve disproportionately greater attention, so they come before personal preferences instead of being relegated to a separate lane.
Another focus is minimalism. Progressive enhancement is a simple, safe idea that tries to incorporate some responsibility into the design process without rocking the boat too much. In addition to progressive enhancement, I prefer limiting any enhancements to ones that are demonstrated to solve specific accessibility, security, performance, or significant usability problems faced by people besides me.
I'd like to re-iterate yet another time that this only applies to websites that primarily focus on text. If graphics, interactivity, etc. are an important part of your website, less of the article applies. My hope is for readers to consider a subset of this page the next time they build a website, and address the trade-offs they make when they deviate. I don't expect--or want--anybody to follow all of my advice, because doing so would make the Web quite a boring place!
I'll cite the Web Accessibility Initiative's (WAI) "Techniques for WCAG 2.2" a number of times:
Unlike the Web Content Accessibility Guidelines (WCAG), the Techniques document does not list requirements; rather, it serves to non-exhaustively educate authors about *how* to use specific technologies to comply with the WCAG. I don't find much utility in the technology-agnostic goals enumerated by the WCAG without the accompanying technology-specific techniques to meet those goals.
One of the defining differences between textual websites and advanced Web 2.0 sites/apps is safety. Most browser vulnerabilities are related to modern Web features like JavaScript and WebGL. If that isn't reason enough, most non-mainstream search indexes have little to no support for JavaScript:
A look at search engines with their own indexes
The simplicity of basic textual websites *should* guarantee some extra safety; however, webmasters need to take additional measures to ensure limited use of "modern" risky features.
All of the simplicity in the world won't protect a page from unsafe content injection by an intermediary. Proper use of TLS protects against page alteration in transit and ensures a limited degree of privacy. Test your TLS setup with these tools:
If your OpenSSL (or equivalent) version is outdated or you don't want to download and run a shell script, SSL Labs' SSL Server Test should be equivalent to testssl:
Mozilla's HTTP Observatory offers a subset of Webbkoll's features and is a bit out of date (and requires JavaScript), but it also gives a beginner-friendly score. Most sites should strive for at least a 50, but a score of 100 or even 120 shouldn't be too hard to reach.
A false sense of security is far worse than transparent insecurity. Don't offer broken TLS ciphers, including TLS 1.0 and 1.1. Vintage computers can run TLS 1.2 implementations such as BearSSL surprisingly efficiently, leverage a TLS terminator, or they can use a plain unencrypted connection. A broken cipher suite is security theater.
Consider taking hardening measures to maximize the security benefits made possible by the simplicity of textual websites, starting with script removal.
JavaScript and WebAssembly are responsible for the bulk of modern web exploits. Ideally, a text-oriented site can enforce a scripting ban at the Content Security Policy (CSP) level.
This is the CSP for my main website, with hashes removed for readability:
default-src 'none'; img-src 'self' data:; style-src 'sha256-HASH'; style-src-attr 'none'; frame-ancestors 'none'; base-uri 'none'; form-action 'none'; manifest-src 'self'; upgrade-insecure-requests; sandbox allow-same-origin
"default-src: 'none'" implies "script-src: 'none'", causing a compliant browser to forbid the loading of scripts. Furthermore, the "sandbox" CSP directive forbids a wide variety) of potentially insecure actions.
"sandbox" CSP directive on MDN
While "script-src" restricts script loading, "sandbox" can also restrict script execution with stronger defenses against script injection (e.g. by a browser add-on).¹ I added the "allow-same-origin" parameter so that these add-ons will still be able to function.²
If you're able to control your HTTP headers, then use headers instead of a <meta http=equiv> tag. In addition to not supporting certain directives, a CSP in a <meta> tag might let some items slip through:
At the time of inserting the meta element to the document, it is possible that some resources have already been fetched. For example, images might be stored in the list of available images prior to dynamically inserting a meta element with an http-equiv attribute in the Content security policy state. Resources that have already been fetched are not guaranteed to be blocked by a Content Security Policy that's enforced late.
HTML Living Standard, Content Security Policy state
Please use progressive enhancement³ throughout your site; every feature possible should be optional, and scripting is no exception.
I'm sure you're a great person, but your readers might not know that; don't expect them to trust your website. Your scripts should look as safe as possible to an untrusting eye. Avoid requesting permissions or using sensitive APIs:
JavaScript Browser Information (BrowserLeaks)
Finally, consider using your CSP to restrict script loading. If you must use inline scripts, selectively allow them with a hash or nonce. Some recent CSP directives restrict and enforce proper use of trusted types:
Third-party content will complicate the CSP, allow more actors to track users, possibly slow page loading, and create more points of failure. Some privacy-conscious users actually block third-party content: while doing so is fingerprintable, it can reduce the amount of data collected about an already-identified user.
Some web developers deliver resources using a third-party content delivery network (CDN), such as jsDelivr or Unpkg. Traditional wisdom held that doing so would allow different websites to re-use cached resources; however, all mainstream browsers engines now partition their caches to prevent this behavior:
https://privacycg.github.io/storage-partitioning/
Avoid third-party content, if at all possible.
If you must use third-party content, use subresource integrity (SRI):
This prevents alteration without your consent. If you wish to be extra careful, you could use SRI for first-party resources too.
Be sure to check the privacy policies for the third party services _and_ subscribe to updates, as their practices could impact the privacy of all your users.
For embedded third-party content (e.g. images), give extra consideration to the "Beyond alt-text" section. Your page should be as useful as possible if the embedded content becomes inaccessible.
Nearly every Internet user has to deal with unreliable connections every now and then, even the most privileged. Developing regions lack modern Internet infrastructure; high-ranking executives travel frequently. Everybody hits the worst end of the bell-curve.
Reducing load time is especially useful to poorly-connected users. For much of the world, connectivity comes in short bursts during which loading time is precious. Chances of a connection failure or packet loss increase with time.
Optimal loading is a complex topic. Broadly, it covers three overlapping categories: reducing payload size, delivering content early, and reducing the number of requests and round trips.
HTML is a blocking resource: images and stylesheets will not load until the user agent loads and parses the HTML that calls them. To start loading above-the-fold images before the HTML parsing finishes, send a "link" HTTP header.
My website includes a "link" header to load an SVG that serves as my IndieWeb photo and favicon (hash removed from filename for readability):
link: </favicon.HASH.svg>; rel=preload; as=image
The browser cache is a valuable tool to save bandwidth and improve page speed, but it's not as reliable as many people seem to believe. Don't focus too much on "repeat view" performance.
Out of privacy concerns, most browsers no longer re-use cached content across sites; refer back to the "third-party content" section for more details. Privacy-conscious users (including all users using "private" or "incognito" sessions) will likely have their caches wiped between sessions. Don't focus too much on "repeat view" performance until after your "initial view" performance is good enough.
Furthermore: a high number of cached resources can decrease performance of the cache, causing browsers to bypass the cache:
When Network is Faster than Cache
The effect is especially pronounced on low-end phones and mechanical hard drives.
One way to help browsers decide which disk-cached resources to prioritize is to use immutable assets. Include the `immutable` directive in your cache-control headers, and cache-bust modified assets by changing their URLs.
The only external assets on my pages are images and a web app manifest (for icons); I mark these assets as "immutable" and cache-bust them by including checksums in their filenames:
cache-control: max-age=31557600, immutable
You can also keep your asset counts low by combining textual assets (e.g. CSS) and inlining small resources.
In addition to HTML, CSS is also a blocking resource. You could pre-load your CSS using a "link" header. Alternatively: if your gzipped CSS is under one kilobyte, consider inlining it in the <head> using a <style> element. Simply inlining stylesheets can pose a security threat, but the "style-src" CSP directive can mitigate this if you include a hash of your inline stylesheet sans trailing whitespace.
Consider inlining images under 250 bytes with a "data:" URI; that's the size at which cache-validation requests might outweigh the size of the image. My 32-pixel PNG site icon is under 150 bytes and inlines quite nicely. On this site's hidden service, it's often the only image on a page (that the hidden service replaces SVGs with PNGs; see the section called "The Tor Browser"). Inlining this image and the stylesheet allows my hidden service's homepage to load in a single request, which is a welcome improvement given the round-trip latency that plagues onion routing implementations.
Download size matters, especially on metered connections. There's no shortage of advice concerning minimizing this easy-to-understand metric. Unfortunately, it alone doesn't give us the full picture: download size is not the exact same thing as time taken to deliver useful content to users.
Google's answer to this problem is "Core Web Vitals" containing metrics such as the Speed Index:
These metrics aren't useless, but they are incredibly naive.
SpeedIndex is based on the idea that what counts is how fast the visible part of the website renders. It doesn't matter what's happening elsewhere on the page. It doesn't matter if the network is saturated and your phone is hot to the touch. It doesn't matter if the battery is visibly draining. Everything is OK as long as the part of the site in the viewport appears to pop into view right away.
Of course, it doesn’t matter how fast the site appears to load if the first thing the completed page does is serve an interstitial ad. Or, if like many mobile users, you start scrolling immediately and catch the 'unoptimized' part of the page with its pants down.
There is only one honest measure of web performance: **the time from when you click a link to when you've finished skipping the last ad.**
Everything else is bullshit.
Maciej Cegłowski, The Website Obesity Crisis
A supplementary metric to use alongside download size is **round trips.** Estimate the number of bytes and round-trips it takes to do the following:
1. Begin downloading the final blocking resource
2. Finish downloading all blocking resources
3. Finish downloading _two screenfuls of content_
4. Finish downloading the full page.
Understanding round-trips requires understanding your server's approach to congestion control.
Historically, TCP congestion control approaches typically set an initial window size to ten TCP packets and grew this value with each round-trip. Under most setups, this meant that the first round-trip could include 1460 bytes. The following round-trip could deliver under three kilobytes.⁵
Nowadays, servers typically employ BBR-based c