Converting Speech to Text in 10 Minutes with Python and Watson

Ok, it’s not free-as-in-freedom, but someone nevertheless could find it useful…

Converting Speech to Text in 10 Minutes with Python and Watson

Social network for developers to discuss topics about bugs and issues, write and share knowledge and connect with millions of developers worldwide.

 

Don’t waste your time doing this by hand, you can speed up this process using Watson Speech to Text and Python. In just ten minutes you can have your own speech to text model converting audio files to text. That way you can get the boring stuff out of the way a whole lot faster.

In this video you’ll learn how to:

  1. Set up the Watson Speech to Text Service
  2. Convert Speech to Text using Python and the Watson API
  3. Changing language models to suit your accent or language

Github Repo for the Project: https://github.com/nicknochnack/WatsonSTT

Subscribe: https://www.youtube.com/channel/UCHXa4OpASJEwrHrLeIzw7Yg

Here’s the entire content of the “Watson Speech to Text.ipynb” file:

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Install and Import Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: ibm_watson in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (4.5.0)\n",
      "Requirement already satisfied: websocket-client==0.48.0 in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from ibm_watson) (0.48.0)\n",
      "Requirement already satisfied: requests<3.0,>=2.0 in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from ibm_watson) (2.22.0)\n",
      "Requirement already satisfied: ibm-cloud-sdk-core==1.5.1 in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from ibm_watson) (1.5.1)\n",
      "Requirement already satisfied: python-dateutil>=2.5.3 in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from ibm_watson) (2.8.0)\n",
      "Requirement already satisfied: six in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from websocket-client==0.48.0->ibm_watson) (1.12.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from requests<3.0,>=2.0->ibm_watson) (2019.9.11)\n",
      "Requirement already satisfied: idna<2.9,>=2.5 in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from requests<3.0,>=2.0->ibm_watson) (2.8)\n",
      "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from requests<3.0,>=2.0->ibm_watson) (1.24.2)\n",
      "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from requests<3.0,>=2.0->ibm_watson) (3.0.4)\n",
      "Requirement already satisfied: PyJWT>=1.7.1 in /Users/nicholasrenotte/opt/anaconda3/lib/python3.7/site-packages (from ibm-cloud-sdk-core==1.5.1->ibm_watson) (1.7.1)\n"
     ]
    }
   ],
   "source": [
    "!pip install ibm_watson"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ibm_watson import SpeechToTextV1\n",
    "from ibm_watson.websocket import RecognizeCallback, AudioSource \n",
    "from ibm_cloud_sdk_core.authenticators import IAMAuthenticator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Setup STT Service"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "apikey = 'API KEY'\n",
    "url = 'URL'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Setup Service\n",
    "authenticator = IAMAuthenticator(apikey)\n",
    "stt = SpeechToTextV1(authenticator=authenticator)\n",
    "stt.set_service_url(url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Open Audio Source and Convert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Perform conversion\n",
    "with open('Untitled.mp3', 'rb') as f:\n",
    "    res = stt.recognize(audio=f, content_type='audio/mp3', model='en-US_NarrowbandModel', continuous=True).get_result()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'result_index': 0,\n",
       " 'results': [{'final': True,\n",
       "   'alternatives': [{'transcript': 'hello well ', 'confidence': 0.8}]}]}"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'hello well '"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text = res['results'][0]['alternatives'][0]['transcript']\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "confidence = res['results'][0]['alternatives'][0]['confidence']\n",
    "confidence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('output.txt', 'w') as out:\n",
    "    out.writelines(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Change Language Models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Perform conversion\n",
    "with open('Untitled.mp3', 'rb') as f:\n",
    "    res = stt.recognize(audio=f, content_type='audio/mp3', model='en-AU_NarrowbandModel', continuous=True).get_result()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'result_index': 0,\n",
       " 'results': [{'final': True,\n",
       "   'alternatives': [{'transcript': 'hello world ', 'confidence': 0.99}]}]}"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.