Back to Silas S. Brown's home page

Anemone DAISY maker

Anemone is a Python 3 script to put together a DAISY digital talking book, from HTML text, MP3 audio recordings and time index data.

Anemone produces DAISY 2.02 files by default, or DAISY 3 (i.e. ANSI/NISO Z39.86) if an option is set. It can produce four different types of digital talking book:

  1. Full audio with basic Navigation Control Centre only: this requires a list of MP3 or WAV files for the audio, one per section, and the title of each section can be placed either in a separate text file or in the filename of the audio file.
  2. Full audio with full text: this requires MP3 or WAV files for the audio, corresponding XHTML files for the text, and corresponding JSON files for the timing synchronisation. Each JSON file is expected to contain a list called "markers" whose items contain "id" (or "paragraphId" or anything else ending id) and "time" (or "startTime" or anything else ending time), which can be in seconds, minutes:seconds or hours:minutes:seconds (fractions of a second are allowed in each case). The IDs in these JSON files should have corresponding attributes in the XHTML, by default data-pid but this can be changed with an option.
  3. Text with no audio: this requires just XHTML files, and extracts all text with a specified attribute (data-pid by default)
  4. Text with some audio: this is a combination of the above two methods, and you'll need to specify skip in the JSON file list for the chapters that do not yet have recorded audio

All files are placed on the command line (or in parameters if you're using Anemone as a module), and Anemone assumes the correspondences are ordered. So for example if MP3, HTML and JSON files are given, Anemone assumes the first-listed MP3 file corresponds with the first-listed HTML file and the first-listed JSON file, and so on for the second, third, etc. With most sensible file naming schemes, you should be able to use shell wildcards like * when passing the files to Anemone.

You may also set the name of an output file ending zip; the suffix _daisy.zip is common.

The title, publisher, language etc of the book should be set via options: run the program with --help or see below.

Download anemone.py or use pip install anemone-daisy-maker or pipx run anemone-daisy-maker

History on GitHub

The daisy anemone is a sea creature on the rocky Western shores of Britain and Ireland; the Dorset Wildlife Trust says it's "usually found in deep pools or hiding in holes or crevices, or buried in the sediment with only tentacles displayed". Similarly this script has no interactive user interface; it hides away on the command line, or as a library module for your Python program.

Options for Anemone 1.85

--lang
the ISO 639 language code of the publication (defaults to en for English)
--title
the title of the publication
--url
the URL or ISBN of the publication
--creator
the creator name, if known
--publisher
the publisher name, if known
--reader
the name of the reader who voiced the recordings, if known
--date
the publication date as YYYY-MM-DD, default is current date
--marker-attribute
the attribute used in the HTML to indicate a segment number corresponding to a JSON time marker entry, default is data-pid
--marker-attribute-prefix
When extracting all text for chapters that don't have timings, ignore any marker attributes whose values don't start with the given prefix
--page-attribute
the attribute used in the HTML to indicate a page number, default is data-no
--image-attribute
the attribute used in the HTML to indicate an absolute image URL to be included in the DAISY file, default is data-zoom
--refresh
if images etc have already been fetched from URLs, ask the server if they should be fetched again (use If- Modified-Since)
--cache
path name for the URL-fetching cache (default 'cache' in the current directory; set to empty string if you don't want to save anything); when using anemone as a module, you can instead pass in a requests_cache session object if you want that to do it instead
--reload
if images etc have already been fetched from URLs, fetch them again without If-Modified-Since
--delay
minimum number of seconds between URL fetches (default none)
--retries
number of times to retry URL fetches on timeouts and unhandled exceptions (default no retries)
--user-agent
User-Agent string to send for URL fetches
--daisy3
Use the Daisy 3 format (ANSI/NISO Z39.86) instead of the Daisy 2.02 format. This may require more modern reader software, and Anemone does not yet support Daisy 3 only features like tables.
--mp3-recode
re-code the MP3 files to ensure they are constant bitrate and more likely to work with the more limited DAISY-reading programs like FSReader 3 (this requires LAME or miniaudio/lameenc)
--max-threads
Maximum number of threads to use for MP3 re-coding. If set to 0 (default), the number of CPU cores is detected and used, and, if called as a module, multiple threads calling anemone() share the same pool of MP3 re-coding threads. This is usually most efficient. If set to anything other than 0, a local pool of threads is used for MP3 re-coding (instead of sharing the pool with any other anemone() instances) and it is limited to the number of threads you specify. If calling anemone as a module and you want to limit the pool size but still have a shared pool, then don't set this but instead call set_max_shared_workers().
--allow-jumps
Allow jumps in heading levels e.g. h1 to h3 if the input HTML does it. This seems OK on modern readers but might cause older reading devices to give an error. Without this option, headings are promoted where necessary to ensure only incremental depth increase.
--strict-ncc-divs
When generating Daisy 2, avoid using a heading in the navigation control centre when there isn't a heading in the text. This currently applies when spans with verse numbering are detected. Turning on this option will make the DAISY more conformant to the specification, but some readers (EasyReader 10, Thorium) won't show these headings in the navigation in Daisy 2 (but will show them anyway in Daisy 3, so this option is applied automatically in Daisy 3). On the other hand, when using verse-numbered spans without this option, EasyReader 10 may not show any text at all in Daisy 2 (Anemone will warn if this is the case). This setting cannot stop EasyReader promoting all verses to headings (losing paragraph formatting) in Daisy 3, which is the least bad option if you want these navigation points to work.
--merge-books
Combine multiple books into one, for saving media on CD-based DAISY players that cannot handle more than one book. The format of this option is book1/N1,book2/N2,etc where book1 is the book title and N1 is the number of MP3 files to group into it (or if passing the option into the anemone module, you may use a list of tuples). All headings are pushed down one level and book name headings are added at top level.
--chapter-titles
Comma-separated list of titles to use for chapters that don't have titles, e.g. 'Chapter N' in the language of the book (this can help for search-based navigation). If passing this option into the anemone module, you may use a list instead of a comma- separated string, which might be useful if there are commas in some chapter titles. Use blank titles for chapters that already have them in the markup.
--toc-titles
Comma-separated list of titles to use for the table of contents. This can be set if you need more abbreviated versions of the chapter titles in the table of contents, while leaving the full versions in the chapters themselves. Again you may use a list instead of a comma-separated string if using the module. Any titles missing or blank in this list will be taken from the full chapter titles instead.
--chapter-heading-level
Heading level to use for chapters that don't have titles
--warnings-are-errors
Treat warnings as errors
--ignore-chapter-skips
Don't emit warnings or errors about chapter numbers being skipped
--dry-run
Don't actually output DAISY, just check the input and parameters
--version
Just print version number and exit (takes effect only if called from the command line)

Behaviour of DAISY readers in 2024


Copyright and Trademarks: All material © Silas S. Brown unless otherwise stated.
Android is a trademark of Google LLC.
GitHub is a trademark of GitHub Inc.
Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
Mac is a trademark of Apple Inc.
Microsoft is a registered trademark of Microsoft Corp.
MP3 is a trademark that was registered in Europe to Hypermedia GmbH Webcasting but I was unable to confirm its current holder.
Python is a trademark of the Python Software Foundation.
Windows is a registered trademark of Microsoft Corp.
Any other trademarks I mentioned without realising are trademarks of their respective holders.