💾 Archived View for gemini.ctrl-c.club › ~ssb22 › anemone.gmi captured on 2024-06-16 at 12:47:14. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Anemone DAISY maker

Anemone is a Python 3 script to put together a DAISY digital talking book, from HTML text, MP3 audio recordings and time index data.

Anemone produces DAISY 2.02 files by default, or DAISY 3 (i.e. ANSI/NISO Z39.86) if an option is set. It can produce four different types of digital talking book:

1. Full audio with basic Navigation Control Centre only: this requires a list of MP3 or WAV files for the audio, one per section, and the title of each section can be placed either in a separate text file or in the filename of the audio file.

2. Full audio with full text: this requires MP3 or WAV files for the audio, corresponding XHTML files for the text, and corresponding JSON files for the timing synchronisation. Each JSON file is expected to contain a list called "markers" whose items contain "id" (or "paragraphId" or anything else ending id) and "time" (or "startTime" or anything else ending time), which can be in seconds, minutes:seconds or hours:minutes:seconds (fractions of a second are allowed in each case). The IDs in these JSON files should have corresponding attributes in the XHTML, by default data-pid but this can be changed with an option.

3. Text with no audio: this requires just XHTML files, and extracts all text with a specified attribute (data-pid by default)

4. Text with some audio: this is a combination of the above two methods, and you’ll need to specify skip in the JSON file list for the chapters that do not yet have recorded audio

All files are placed on the command line (or in parameters if you’re using Anemone as a module), and Anemone assumes the correspondences are ordered. So for example if MP3, HTML and JSON files are given, Anemone assumes the first-listed MP3 file corresponds with the first-listed HTML file and the first-listed JSON file, and so on for the second, third, etc. With most sensible file naming schemes, you should be able to use shell wildcards like * when passing the files to Anemone.

You may also set the name of an output file ending zip; the suffix _daisy.zip is common.

The title, publisher, language etc of the book should be set via options: run the program with --help or see below.

Download anemone.py or use pip install anemone-daisy-maker or pipx run anemone-daisy-maker

Download anemone.py

History on GitHub

The daisy anemone is a sea creature on the rocky Western shores of Britain and Ireland; the Dorset Wildlife Trust says it’s “usually found in deep pools or hiding in holes or crevices, or buried in the sediment with only tentacles displayed”. Similarly this script has no interactive user interface; it hides away on the command line, or as a library module for your Python program.

Options for Anemone 1.6

--lang the ISO 639 language code of the publication (defaults to en for English)
--title the title of the publication
--url the URL or ISBN of the publication
--creator the creator name, if known
--publisher the publisher name, if known
--reader the name of the reader who voiced the recordings, if known
--date the publication date as YYYY-MM-DD, default is current date
--marker-attribute the attribute used in the HTML to indicate a segment number corresponding to a JSON time marker entry, default is data-pid
--page-attribute the attribute used in the HTML to indicate a page number, default is data-no
--image-attribute the attribute used in the HTML to indicate an absolute image URL to be included in the DAISY file, default is data-zoom
--refresh if images etc have already been fetched from URLs, ask the server if they should be fetched again (use If- Modified-Since)
--cache path name for the URL-fetching cache (default ‘cache’ in the current directory; set to empty string if you don’t want to save anything); when using anemone as a module, you can instead pass in a requests_cache session object if you want that to do it instead, although the delay option is ignored when you do this
--reload if images etc have already been fetched from URLs, fetch them again without If-Modified-Since
--delay minimum number of seconds between URL fetches (default none)
--user-agent User-Agent string to send for URL fetches
--daisy3 Use the Daisy 3 format (ANSI/NISO Z39.86) instead of the Daisy 2.02 format. This may require more modern reader software, and Anemone does not yet support Daisy 3 only features like tables.
--mp3-recode re-code the MP3 files to ensure they are constant bitrate and more likely to work with the more limited DAISY-reading programs like FSReader 3 (this option requires LAME)
--allow-jumps Allow jumps in heading levels e.g. h1 to h3 if the input HTML does it. This seems OK on modern readers but might cause older reading devices to give an error. Without this option, headings are promoted where necessary to ensure only incremental depth increase.
--strict-ncc-divs When generating Daisy 2, avoid using a heading in the navigation control centre when there isn’t a heading in the text. This currently applies when spans with verse numbering are detected. Turning on this option will make the DAISY more conformant to the specification, but some readers (EasyReader 10, Thorium) won’t show these headings in the navigation in Daisy 2 (but will show them anyway in Daisy 3, so this option is applied automatically in Daisy 3). On the other hand, when using verse-numbered spans without this option, EasyReader 10 may not show any text at all in Daisy 2 (Anemone will warn if this is the case). This setting cannot stop EasyReader promoting all verses to headings (losing paragraph formatting) in Daisy 3, which is the least bad option if you want these navigation points to work.
--merge-books Combine multiple books into one, for saving media on CD-based DAISY players that cannot handle more than one book. The format of this option is book1/N1,book2/N2,etc where book1 is the book title and N1 is the number of MP3 files to group into it (or if passing the option into the anemone module, you may use a list of tuples). All headings are pushed down one level and book name headings are added at top level.
--chapter-titles Comma-separated list of titles to use for chapters that don’t have titles, e.g. ‘Chapter N’ in the language of the book (this can help for search-based navigation). If passing this option into the anemone module, you may use a list instead of a comma- separated string, which might be useful if there are commas in some chapter titles.
--chapter-heading-level Heading level to use for chapters that don’t have titles
--warnings-are-errors Treat warnings as errors
--dry-run Don’t actually output DAISY, just check the input and parameters

Behaviour of DAISY readers in 2024

Dolphin EasyReader 10 (iOS, Android and Chromebook): is able to open the ZIP and play the audio while highlighting the paragraphs in a ‘full audio plus full text’ book, both Daisy 2 and Daisy 3. In very large books (over 1 GB), loading and navigation becomes unreliable.
EDRLab Thorium Reader (Windows, Mac and GNU/Linux): is able to open the ZIP and play the audio while highlighting the paragraphs in a ‘full audio plus full text’ book, both Daisy 2 and Daisy 3. Still works in very large books but loading is slow.
Dolphin EasyReader 10 (Windows): is able to play audio while highlighting paragraphs in both Daisy 2 and Daisy 3, but ZIP needs to be unpacked separately and NCC or OPF file opened. Very large (1 GB+) books can cause the program to crash when Search is used.
JAWS FSReader 3 (Windows): is able to play audio while highlighting paragraphs in both Daisy 2 and Daisy 3, but ZIP needs to be unpacked separately and NCC or OPF file opened; may work better without JAWS running; synchronisation with audio seems to require --mp3-recode; images are not scaled to fit; tested working with a Braille display and audio speed changes; not tested with very large books (1GB+)
HumanWare Brailliant: does not show text if there is audio (hopefully it can still be used for navigation) in both Daisy 2 and Daisy 3
Pronto Notetaker: ZIP needs to be unpacked to a “Daisy” folder on SD or USB, and the device just plays the audio; tested only with Daisy 2
US Library of Congress NLS Player: unpack the ZIP onto a blank USB stick of capacity 4 GB or less—audio plays; navigation works if you use --mp3-recode; tested only with Daisy 2 but the documentation says Daisy 3 should work
HumanWare Victor Reader Stream: ZIP needs to be unpacked, either to the top level of a USB device, or into a subfolder of a $VRDTB folder on the SD card (different books will be listed alphabetically). If it’s unpacked at the top level of the SD card, the device can still play the MP3s and allow track or time based navigation but not section navigation, so you should use either the folder structure of the SD card or else a USB device. If correctly set up then audio plays and device can navigate by section. Tested with both Daisy 2 and Daisy 3.
HumanWare Victor Reader Stratus4: When unpacking the ZIP to CD, please ensure that your CD writer does *not* create a *folder* with the same name as the ZIP: this default behaviour of Microsoft Windows does *not* result in a valid Daisy CD. The individual *files* of the ZIP need to be written to the *top level* of the CD, *not* to a folder on it. Otherwise, the Stratus4 will not recognise the CD as a Daisy CD and will just play the MP3s, resulting in only time and track based navigation being available. Tested with both Daisy 2 and Daisy 3.
HIMS QBraille XL: can display the text (after opening with Space and Enter); does not play audio; tested only with Daisy 2
Daisy Consortium Simply Reading 3 (app available for Android 7 and below): is able to open the ZIP and play the audio while highlighting the paragraphs in a ‘full audio plus full text’ book, although fonts for some languages might be missing on earlier Android devices
DAISY Pipeline (2023): Please do not use this to convert an Anemone-produced Daisy 2 book to Daisy 3. The resulting Daisy 3 is not likely to play on anything. If Daisy 3 is required, use Anemone’s --daisy3 option to produce it directly.

Legal

All material © Silas S. Brown unless otherwise stated. Android is a trademark of Google LLC. GitHub is a trademark of GitHub Inc. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Mac is a trademark of Apple Inc. Microsoft is a registered trademark of Microsoft Corp. MP3 is a trademark that was registered in Europe to Hypermedia GmbH Webcasting but I was unable to confirm its current holder. Python is a trademark of the Python Software Foundation. Windows is a registered trademark of Microsoft Corp. Any other trademarks I mentioned without realising are trademarks of their respective holders.