WikiTrans

Project leader: Nick Ruiz (with support by Andreas Eisele)

We wish to continue the WikiTrans project that was worked on in earlier MT Marathons. The purpose of WikiTrans is to design a machine translation website that allows users to request translations of Wikipedia articles into various target languages. The wiki community would then have the opportunity to post-edit the machine translation to improve fluency. These post-edits will eventually be used to improve machine translation output.

We have recently discovered another translation framework called Pootle, which provides a user interface for translators to create localizations of sentences and phrases. We wish to integrate the current WikiTrans code base with the Pootle framework. Additionally, we wish to integrate WikiTrans with MT Server Land to use Server Land as a platform for distributed Machine Translation.

Code in this project will be done in Python, based on the django platform.

Team: Eleftherios Avramidis, Hai-Son LE, Nick Ruiz, Fred Blain, Andreas Eisele


Existing Resources:

1. Existing WikiTrans Project being used by JHU

  • We will skip Pynax as it is optional
  • Code is in Django (see http://djangoproject.com/en/dev/intro/tutorial01).
    • Basic models described in models.py, Able to override default functions of objects
    • Views control the appearance, get the data and
  • There are scripts that can be executed from the command line in order to asychronously fire actions to external tools (Wikipedia site, Google Translator, Mechanical Turk etc) according to what has been previously specified within th interface.

2. Adapted Pootle by Nick Ruiz

  • Organises translations based on .po files
  • Ability to mark a translation as Fuzzy
  • Google translation works with AJAX and JasON, which is not optimal for other machines and for plugging in MT-Serverland

Goal: migrate the structure of the existing WikiTrans, so that it is based on .po files, which would provide the rich interface of Pootle. Question on whether the database covering should be kept.


Changes suggested

1. Import from a Wiki. (Andreas)

2 options given by Wikipedia (a) HTML rendered (what the current system is able to work with) (b) Raw WikiMedia format (mwlib) -

  • at start strip all format and keep heading
  • [Adv. feature: Retain wikilinks in order to check if wikilings exist].

2. Fix article view (Le)

3. Export to .po (Nick)

4. Convert to Pootle Project (Elefthereios, Fred)

  • Add languages

5. MT interface / MT serverland, XML RPC

6. Multiple suggestions for translation/post-editing

7. Sync .po with Django objects

8. User-interface


Installation

Installing wikitrans (original)

Installing wikitrans-pootle

 sudo apt-get install libpq-dev python-polib sqlite3 python-setuptools python-dev
 sudo pip install django
 sudo pip install mwlib
 sudo pip install django-uni-form --upgrade
  • Edit Pootle/localsettings.py and make sure that this line requires sqlite3 and no other database engine
 DATABASE_ENGINE = 'sqlite3'                 # 'postgresql_psycopg2',
  • go to the installation directory
 ./setup.py build
 sudo python setup.py install
  • finally start the server by typing
 cd Pootle
 ./PootleServer

Dependencies

  • setuptools-0.6c11

http://pypi.python.org/pypi/setuptools#downloads Install with: setuptools-0.6c11-py2.6.egg

  • django

sudo easy_install django

Working with github

If you need to get the latest version (and merge with your own), you can use one of the following sets of code:

 git fetch upstream
 git merge upstream/master

or

 git pull upstream master

When you done the changes, you should do

 git push origin master 

If you get an RPC error, increase the buffer by typing

 git config http.postBuffer 524288000

Installation shell script (In Progress)

#! /bin/bash

GIT_PARENT_DIRECTORY="/home/nick/workspace/"
GIT_PROJECT_NAME="wikitrans-pootle"
GIT_LOCAL_DIRECTORY=$GIT_PARENT_DIRECTORY$GIT_PROJECT_NAME
GITHUB_BASE_URL="github.com/NickRuiz/"
GITHUB_PROJECT_URL="http://"$GITHUB_BASE_URL$GIT_PROJECT_NAME
GITHUB_READONLY_PROJECT="git://"$GITHUB_BASE_URL$GIT_PROJECT_NAME".git"

# Install git
sudo apt-get install git-core git-gui git-doc

# Install easy_install and pip
sudo apt-get install python-setuptools python-dev build-essential
sudo easy_install pip
sudo pip install --upgrade pip

# Install django
sudo pip install django

# User-interaction: fork the project from github
echo "1. Please create an account on github, if you have not already."
echo "http://www.github.com"
echo "2. You will need to follow the directions here to add a ssh key to github."
echo "http://help.github.com/linux-key-setup/"
echo "3. Create your own fork of the project listed below."
echo $GITHUB_PROJECT_URL
echo "4. Once you have your own fork, paste the Private URL here: "
read GIT_PRIVATE_URL
echo ""

# Automatically retrieve the project from github
echo "Navigating to "$GIT_PARENT_DIRECTORY
cd $GIT_PARENT_DIRECTORY
echo "Cloning project at "$GIT_PRIVATE_URL
git clone $GIT_PRIVATE_URL

echo "Navigating to "$GIT_PROJECT_NAME
cd $GIT_PROJECT_NAME
echo "Add a git upstream to "$GITHUB_READONLY_PROJECT
git remote add upstream $GITHUB_READONLY_PROJECT
echo "Fetch the data"
git fetch upstream

echo "Setting directory to "$GIT_LOCAL_DIRECTORY"/Pootle"
cd $GIT_LOCAL_DIRECTORY/Pootle

# Install dependencies from requirements.txt
sudo apt-get install libyaml-0-1
sudo pip install -I pyyaml
sudo pip install http://dist.repoze.org/PIL-1.1.6.tar.gz

sudo pip install -I -r requirements.txt

sudo apt-get install libxml2-dev
sudo apt-get install libxslt-dev
sudo pip install lxml

# Configure nltk
python nltk_config.py

GIT_PARENT_DIRECTORY="/home/nick/workspace/"
GIT_PROJECT_NAME="wikitrans-pootle"
GIT_LOCAL_DIRECTORY=$GIT_PARENT_DIRECTORY$GIT_PROJECT_NAME

echo "Setting directory to "$GIT_LOCAL_DIRECTORY
cd $GIT_LOCAL_DIRECTORY

# Build Pootle
echo "Building and installing Pootle..."
sudo python setup.py build install
echo "Done."
Page last modified on September 17, 2010, at 06:02 PM