💾 Archived View for kennedy.gemi.dev › changelog.gmi captured on 2023-06-16 at 16:13:40. Gemini links have been rewritten to link to archived content
View Raw
More Information
⬅️ Previous capture (2023-06-14)
➡️ Next capture (2023-07-10)
-=-=-=-=-=-=-
Kennedy Changelog
2023-05-26
- Improved crawler/indexer so partially downloaded content (like large images) can still be parsed, indexed, and searched.
- Added Capsule Backlinks view to see all external backlinks to a capsule.
2023-05-23
- Improved URL history view by organizing captures with year headings.
- Improved search results page and image search results page with a less-cluttered view, based on feedback (Thanks Buckeye Lady!).
- Improved and better organized "Page Info" view.
- Removed Hashtag and @mentions indexes.
- Fixed showing results even if Wikipedia/Gemipedia unavailable.
2023-05-01
- Rebuild entire system to work off Web Archive (WARC) files. Kennedy crawler nows produces WARC files. Search indexer and Archiver ingest WARC files. Additional information like IP address of remote capsules stored in WARC files.
- Converted previous crawl databases to WARC files, allowing easier ingest into Delorean.
- Imported @mozz's late 2021 Gemini archives, which were in WARC format, into Delorean.
- Delorean now stores metaline and response body, allowing storage of 1x, 3x, and 6x response codes.
- Changed archive database so allow for easier calculation of content sizes and savings due to content deduplication.
- Added a /stats/ endpoint, showing stats on URLs, snapshots, and sizes of the archive.
2023-03-19
- Massive improvement to Delorean, making it store a history of cached versions of content, and not just the copy found in the most recent crawl.
2023-01-27
- Redesign of crawler code which improved speed of the crawler. Robots.txt files are downloaded ondemand instead of requiring a pre-flight step, ensuring that all capsules with Robots.txt are respected
2022-08-06
- Updated "Page Info" view to support image meta data (dimensions, format, text used in index)
- Updated Delorean to work show cached images and other cached, non-text content
2022-07-26
- Added image search! Images are indexed based on the text in their file path, as well as the text in all their inbound links.
2022-06-04
- Updated searched Also include snippet for Gemipedia about the search query and link to Gemipedia entry.
2022-03-01
- Added a "Page Info" view that shows title, language, # lines, size of response, and incoming/outbound links to a page.
- Improved Delorean by adding a "View Cached" link for each page in the "Page Info" view.
- Streamlined the meta data shown on the search results page into a single line and made it a link to "Page Info" view.
- Improved "title" extraction code to use the first header encountered, regardless of level, or alt text from the first pre-formatted section.
2022-02-21
- Added Delorean which lets you view cached content from most recent scan by providing a URL.
2022-02-14
- Added route/view for showing capsules with valid security.txt files.