- Improve site:-query QOL (/)
- Fix byte folder bug (/)
- refactor EC_URL (/)
ALTER TABLE EC_URL MODIFY COLUMN PROTO ENUM('http', 'https', 'gemini') NOT NULL;
-- put visit-metadata in separate table (/)
- fix bug in language detection (/)
-- re-fetching some pages (/)
- new approach for query rewriting (/)
- make site:-queries return a dummy entry when no site information is available (/)
- hybridized ordering of domains on reindex, F(previous rank, previous quality). (/)
- mark documents with audio, video, object tags (/)
- car service <2021-11-18> (/)
- Add auto redirects for guesswork rss/atom/feed-requests to /log/feed.xml (/)
- investigate extracting more keywords (/)
-- textrank (/)
-- tf-idf (x)
-- sideload additional keywords for most popular sites (/)
- refactor index converter (/)
- clean up code garbage (/)
- trial more vanilla PageRank approach as a tertiary algorithm (/)
- fix a search result priortization bugs for mixed rankings (/)
- fix search interface for firefox on android (x)
It is reportedly broken
-- figure out how to replicate this problem (x)
- fix potential DoS where certain search queries with a large number of common but mutually exclusive terms would take forever to process. (/)
test query: generic stores underground unusual
- prioritize n-gram matches over word matches (/)
- show informative error page when the index server reboots (/)
- Personalized Page Rank (/)
- Duelling Algorithms (/)
- Launch October Update (/)
- fix broken search use-cases (/)
-- c language (/)
-- 67 chevy (/)
-- 68000 (/)
-- c# (/)
-- @twitterhandle (/)
-- #hashtag (/)
- trial tar based archiving to save the poor ext4 fs (/)
- use words to tag document format etc (/)
- dynamic re-bucketing based on something like (/)
SELECT DEST.URL_PART,EXP(DEST.QUALITY)*SUM(EXP(SOURCE.QUALITY)) AS Q from EC_DOMAIN DEST INNER JOIN EC_DOMAIN_LINK ON DEST.ID=DEST_DOMAIN_ID INNER JOIN EC_DOMAIN SOURCE ON SOURCE.ID=SOURCE_DOMAIN_ID WHERE DEST.INDEXED>0 GROUP BY DEST_DOMAIN_ID
- Fix several indexing bugs that hid relevant search results (/)
- Added search profiles (/)
- Rephrased an error message that some people took to mean they weren't speaking a proper language (/)
- Using in-site domain link-names to add search terms (/)
- Fixed buggy default content-type (/)
- Even more aggressive unicode language dectection (/)
- Status flag for domains (/)
Indexed, Active, Blocked
- Improve topic detection (/)
- Tuned search results to demote very short results (/)
- Encyclopedia tries harder to find the right article if the case match isn't exact (/)
- Breaking changes for next Index-rebuild (/)
-- Change writer bucket scaling to 1/4 (/)
-- Move protocol and port from EdgeDomain to EdgeURL (/)
-- Change database schemas to reflect (/)
-- ISO-8859-1/UTF-8 charset sniffer (/)
-- Fixed a bug that would occasionally cause the crawler to re-index the same working set multiple times (/)
- improve edge-director throughput (/)
- give edge-director state for semi-blocking tasks (/)
- optimize URL index size (/)
- clean up gemini navigation (/)
- Atom feed for HTTPS and Gemini (/)
- Feed gemini server with rendered gmi-content (/)
-- Output the content (/)
-- Generate feeds (/)
-- Make the gemini server read it (x)
-- Switch over (/)
- Absorb gemini server into WMSA (/)
- wildcard domain for marginalia.nu (/)
-- move memex to memex-subdomain (/)
- feeds on FEED pragma (/)
- Top nav bar overhaul (/)
- add marker for which files are todo files (/)
Added %%%/pragmas for toggling behavior
-- Added template helpers for consuming pragmas (/)
-- Used to improve topic pages (/)
- Fixes for git (/)
- File manager (/)
-- Delete (/)
-- Delete Empty Dir (x)
-- Move/Rename (/)
--- System for tombstones/redirects (/)
- Edit for / does not work (/)
Needed better support for non-normalized URLs, e.g. //index.gmi
- Backlinks for index (/)
- Git Integration (/)
-- Use commit hooks to trigger pull (/)
https://git-scm.com/book/uz/v2/Appendix-B%3A-Embedding-Git-in-your-Applications-JGit
- Recursive directory watch (/)
- Two column layout (/)
- Overhaul MEMEX navigation (/)
-- Navigation bar (/)
-- Generate site map (x)
-- Editing (/)
--- Add update-root link (/)
- Tombstones aren't generated properly on-delete (/)
The tombstone db wasn't properly
reloaded after being updated.
- Just write static files to disk instead of using an intermediary backend server. (/)
-- Use alias directive to set different root for memex path. (/)
-- Content-type is finnicky (/)
I want to serve html-wrapped .gmi and .html
location ~* \.(gmi|png)$ {
types {
text/html gmi;
text/html png;
}
}
- Move away from statically generated HTML forms in memex (/)
- Fix stability of podcast scraper (/)
- Get crawling up again (/)
-- Monitoring (/)
--- Extraction (/)
--- Status page (/)
-- Scraper config (/)
-- DNS cache (?)
-- IP Block CDNs (/)
--- Parse CIDR (/)
Apache Commons.Net SubnetUtil seems to
do the job, although it can't deal
with IPV6 :-/
--- CloudFlare (/)
173.245.48.0/20
103.21.244.0/22
103.22.200.0/22
103.31.4.0/22
141.101.64.0/18
108.162.192.0/18
190.93.240.0/20
188.114.96.0/20
197.234.240.0/22
198.41.128.0/17
162.158.0.0/15
172.64.0.0/13
131.0.72.0/22
104.16.0.0/13
104.24.0.0/14
2400:cb00::/32
2606:4700::/32
2803:f800::/32
2405:b500::/32
2405:8100::/32
2a06:98c0::/29
2c0f:f248::/32
--- Fastly (/)
23.235.32.0/20
43.249.72.0/22
103.244.50.0/24
103.245.222.0/23
103.245.224.0/24
104.156.80.0/20
146.75.0.0/17
151.101.0.0/16
157.52.64.0/18
167.82.0.0/17
167.82.128.0/20
167.82.160.0/20
167.82.224.0/20
172.111.64.0/18
185.31.16.0/22
199.27.72.0/21
199.232.0.0/16
- Refactor task management (/)
-- Fix prepend (/)
-- Add tests (/)
- Refactor Floyd-Steinberg ditherer (/)
- Todo move-to-done function puts header last in #Done (/)
- Pictures-in-HTML (/)
-- Implement compression via Floyd-Steinberg dithering (/)
https://encyclopedia.marginalia.nu/wiki/Floyd%E2%80%93Steinberg_dithering
http://image4j.sourceforge.net/javadoc/index.html?net/sf/image4j/util/ConvertUtil.html
--- Ensure 4 bit (/)
--- On upload (/)
--- Convert existing stuff on-read (x)
-- Render image views (/)
--- Add to index (/)
-- Upload form (/)
- CSS fixes for mobile (/)
-- text align for tasks (/)
-- indent overflowed tasks (/)
- Fix CME (/)
java.util.ConcurrentModificationException: null
at java.util.HashMap.forEach(HashMap.java:1428) ~[?:?]
at nu.marginalia.wmsa.memex.MemexData.forEach(MemexData.java:51) ~[WMSA-1628951793.jar:?]
at nu.marginalia.wmsa.memex.Memex.reRender(Memex.java:49) ~[WMSA-1628951793.jar:?]
at io.reactivex.rxjava3.core.Scheduler$PeriodicDirectTask.run(Scheduler.java:566) [WMSA-1628951793.jar:?]
at io.reactivex.rxjava3.core.Scheduler$Worker$PeriodicTask.run(Scheduler.java:513) [WMSA-1628951793.jar:?]
at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:65) [WMSA-1628951793.jar:?]
at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:56) [WMSA-1628951793.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
ERROR 2021-08-14 16:36:39,467 RxCachedThreadScheduler-2 MemexMain : Uncaught exception
java.util.ConcurrentModificationException: null
at java.util.HashMap.forEach(HashMap.java:1428) ~[?:?]
at nu.marginalia.wmsa.memex.MemexData.forEach(MemexData.java:51) ~[WMSA-1628951793.jar:?]
at nu.marginalia.wmsa.memex.Memex.reRender(Memex.java:49) ~[WMSA-1628951793.jar:?]
at io.reactivex.rxjava3.core.Scheduler$PeriodicDirectTask.run(Scheduler.java:566) ~[WMSA-1628951793.jar:?]
at io.reactivex.rxjava3.core.Scheduler$Worker$PeriodicTask.run(Scheduler.java:513) ~[WMSA-1628951793.jar:?]
at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:65) [WMSA-1628951793.jar:?]
at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:56) [WMSA-1628951793.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
- Automatic TODO task categorization (/)
- Login API on separate service (/)
-- Set up service (/)
-- Route requests (/)
- Fix header auto-location (/)
- Display top tasks in index (/)
-- + in URLs? (/)
proxy_pass with / forces nginx to parse the url (why?)
Bad:
proxy_pass http://127.0.0.1:5025/public/wiki/
Good:
rewrite ^ $request_uri
rewrite ^/(.*) /public/$1 break;
return 400;
proxy_pass http://127.0.0.1:5025$uri;
- Encyclopedia (/)
-- Search API (/)
-- code tags (/)
- Memex (/)
-- GemtextParser (/)
-- Service skeleton (/)
-- Link extraction (/)
-- Rendering (/)
--- Stylesheet (/)
-- Metadata (-)
-- Uppdateringar (/)
--- API (/)
--- Formulär (/)
- Service Lockdown (/)
-- X-Public header in code (/)
-- Move endpoints (/)
--- Resource Store (/)
--- Search (/)
--- Assistant (/)
-- Update clients (/)
--- Resource Store (/)
--- Search Service (/)
--- Assistant (-)
-- Update nginx (/)
-- Update links on website (/)
- Tune wiki archive fs (/)
sudo tune2fs -O ^dir_index /dev/nvme0n1p2
- marginalia.nu:9999 "BBS" (/)
- encyclopedia.marginalia.nu (/)
- Verify automatic backup of git (/)
- Reddit frontend (/)
-- Scraper: (/)
-- API: Marginalia 2: (/)
- Wiki (/)
-- on Optane (/)
-- fix Hildegard of Bingen (/)
- Block bots on nginx (/)
https://kb.linuxlove.xyz/nginx-badbotblocker.html
- Install Optane (/)
-- Migrate MariaDB (/)
- Wiki (/)
-- redirects (/)
-- top notices (/)
- Bucket4J rate limiting (/)
- Service Monitoring (/)
- Update Cert (/)
- Backups for git (/)
- Load Wikidata from ZIM (/)
- Migrate Server to Debian Buster (/)
- Update description generation algorithm (/)
-- Recalculate descriptions (...) (/)
- Wiki data (/)
-- Load data (/)
-- Wrap wikipedia (/)
-- ZIM? (-)
-- Wikipedia Cleaner (/)
- Spell checker service? (/)
https://github.com/wolfgarbe/SymSpell
- Calculations (/)
-- Detection (/)
-- Parser (/)
-- Unit conversion (/)
--- Temperature (/)
--- Distance (/)
--- Weight (/)
--- Area (/)
--- Volume (/)
- Save websites to disk? (/)
-- GZipped (/)
-- XFS (?)
- Local backlinks in GMI (/)
-- Parse GMI for links and titles (/)
-- Create tags system (/)
- Use prime sizing for HashMap! (/)
-- How to find primes (/)
- Arbitarary size HashMap (/)
- Syntax for orgmode + GMI in kate (/)
Use /usr/share/kde4/apps/katepart/syntax/markdown.xml
- Dictionary analysis in scraping (/)
It seems viable to estimate
the lanaguage of a document
based on the overlap with a
N-most-common-words dictionary.
Threshold 0.05 ok?
-- English (/)
-- Swedish (/)
-- Latin (/)
- Clean up tests (/)
GZip Compression stats:
63% old
21% new
- Hash map (/)
-- Contiguous memory bins (/)
- Key Folding (/)
-- For strings (/)
-- For integers (/)
-- For dates (x)
- Debian Desktop (/)
-- Docker (/)
-- Java 14 (/)
-- IntelliJ (/)
-- Code (/)
-- Gradle (/)
-- OrgMode (/)
- Bugfix: Domain Resolution (/)
- Index Changes (/)
-- Remove Junk Logging (/)
-- Split Query (/)
-- Implement in Frontend (/)
- Dictionary Service (/)
-- Add Index To Table (/)
-- Populate test db (/)
-- Build tests (/)
-- Integrate into frontend (/)
- Site Information (/)
-- Fetch (/)
-- 404 (/)
Reach me at kontakt@marginalia.nu