Playing with text to speech in emacs (part 1)

Amazon has just announced Polly and I wanted to give it a try. Of course the first thing that came to my mind was: elfeed! Polly is not free but 4$ per 24 hours seems reasonable in order to read a post from time to time. Anyway let’s start there, get the ball rolling, and if it works well maybe we can think in adding more backends.

So, let’s go to work!

I was trying the command line and it looked good enough for me. If you have aws-cli tools installed it is enough a mere:

aws polly synthesize-speech \
    --output-format mp3 \
    --voice-id Joanna \
    --region eu-west-1
    --text "We are about to have a lot of fun!" \
    hello.mp3

to get an mp3 downloaded into your machine. With that, probably we can have just enough.

Let’s go to emacs. I just added to my config the following function:

(defun my/elfeed-send-to-tts ()
  "Send current article to a text to speech system"
  (interactive)
  (let*
    ((html-to-read (elfeed-deref (elfeed-entry-content elfeed-show-entry)))
     (text-to-read (replace-regexp-in-string "<.*?>" "" html-to-read))
     (temp-input-file (make-temp-file "elfeed-input-tts"))
     (temp-output-file (make-temp-file "elfeed-output-tts" nil ".mp3"))
     (polly-command (concat "aws polly synthesize-speech --region eu-west-1 --output-format mp3 --voice-id Joanna --text \"$(< " temp-input-file ")\" " temp-output-file )))
    (progn
      (write-region text-to-read nil temp-input-file)
      (shell-command polly-command)
      (shell-command (concat "open -g " temp-output-file)))))

Elfeed retrieves the content in HTML so the first thing I have to do is get the raw text

((html-to-read (elfeed-deref (elfeed-entry-content elfeed-show-entry)))
 (text-to-read (replace-regexp-in-string "<.*?>" "" html-to-read))

Then, in order to be easier to me to work with the command line I just created a temporary file to store the text and a temporary file that will hold the mp3 of the text

(temp-input-file (make-temp-file "elfeed-input-tts"))
(temp-output-file (make-temp-file "elfeed-output-tts" nil ".mp3"))

I still need to investigate if the output file is correctly disposed since I will be overwriting it with the shell.

The rest is pretty obvious. We just prepare the command:

(polly-command (concat "aws polly synthesize-speech --region eu-west-1 --output-format mp3 --voice-id Joanna --text \"$(< " temp-input-file ")\" " temp-output-file )))

then prepare the input and fire the command

(write-region text-to-read nil temp-input-file)
(shell-command polly-command)

I’m using a little bash trick

$(< input_file)

That will put the file contents in the command. Probably I will have to escape the output somehow but this is my first approach. Let’s add this to the TODO list and continue.

Finally to play it I will use another dirty trick. This will only work in macOS at the moment.

(shell-command (concat "open -g " temp-output-file))

This will open the default mp3 player in the background and read the text.

With this we have:

So at least we have something working. We have still some problems with escaping the input text. Also I found that the command line has a limitation of 1000 characters. We will deal with this problems in further posts.

# Playing with text to speech in emacs (part 1)

+++ Dec 16, 2016 +++