Thursday 28 February 2019

OpenNMT installation using PyTorch

Reference from OpenNMT Official Site

This manual will guide you through openNMT installation using PyTorch.

We assume that you have python3 already installed and 'pip3' the python3 package installer and your OS is assumed to be Linux(ideally Ubuntu)

Step1 Install PyTorch

pip3 install torch torchvision
If you are having python2 version then you may use the below command.
pip install torch torchvision
While this package downloads and installs you may have a cup of tea as this will take a while. Remember you need to have good internet connection as this package is about 582.5MB.

Step2 Clone the OpenNMT-py repository

git clone https://github.com/OpenNMT/OpenNMT-py
cd OpenNMT-py

Step3 Install required libraries

pip3 install -r requirements.txt

For python2 use

pip install -r requirements.txt
Thats it now you are ready to take off. To get familiarize about how to use openNMT follow the link

Monday 25 February 2019

Translation platform or Tools for Indian Languages

Translation platform often confused with Translation software is a tool/platform that aids a translator to use computer aided resources that are required to translate source language text into target language text.

While a translation software provides the possible Machine Translation of a source language text based on how its build where there is no human intervention. This machine translated output is most likely to be present in nonpunishable quality, meaning a human translator is needed to verify the machine generated translation and edit/review it so that the translation is perfect and publishable.



In today's world there are many translation platforms that support human translators to translate text at greater speeds and deliver high quality translations with publishable quality. European languages are very well supported with these kind of translation platforms. An Ideal translation platform should consist of the following features that can enable a translator to deliver high quality translation.4


  • Integrated with Multiple Machine Translations software that helps a translator to choose the most closely generated text from machine and then edit/review for further enhancements.
  • Availability of bilingual dictionaries/synonym dictionaries/Terminologies/Glossaries etc..
  • Translation Memory if available.
  • Transliteration tool.
  • Concordance search to search for some text in wide variety of available corpora.
  • Name Entity identification and Terminology identification.
  • Powerful target language spell checker.
  • Ability to add user's dictionary or existing Translation Memory.
One such tool that can help translators to deliver high quality publishable content is Transzaar.

Translation Memory Exchange(TMX)

Translation Memory Exchange or TMX is an xml file format for storing translation units for the exchange of translation memory data between computer-aided translation and localization tools with little or no loss of critical data.

<tmx version="1.4">
  <header
    creationtool="PyTool" creationtoolversion="1.01-023"
    datatype="PlainText" segtype="sentence"
    adminlang="en-us" srclang="en"
    o-tmf="ABCTransMem"/>
  <body>
    <tu>
      <tuv xml:lang="en">
        <seg>Hello world!</seg>
      </tuv>
      <tuv xml:lang="te">
        <seg>ప్రపంచానికి నమస్కారం</seg>
      </tuv>
    </tu>
  </body>
</tmx>

This is how a sample TMX file looks. Here I have given an example of English->Telugu translation Memory.

Translation Memory is useful in the following ways:

  • To recollect a past translation that has already been done and added to Translation Memory database.
  • Fuzzy search in Translation Memory helps to find out similar translations that can aid a translator.

Python Mysql connection and extracting sample data

In this tutorial we are going to learn how to connect to MYSQL using python.

First of all we need a config.py that will act as a configuration file that our python script will read. This configuration file consists information of MYSQL database username and password, database name and host where the MYSQL is at.

This is how config.py will look like

server = dict(
    #serverip = 'localhost',
    dbhost = 'localhost',
    dbname = 'userdb',
    dbuser = 'root',
    dbpassword = 'root123'
)

Then the actual python script that will connect using these parameters to our db and fetch results as needed.