Derek Sivers

Entrepreneur, programmer, avid student of life. I make useful things, and share what I learn.

Projects » lang.pro

Language translation project management

Need some text translated?
Post any text (from a single word to an entire article) using a REST service or the lang.pro website, and our real (human) translators will translate it into the languages of your choice, returning the result by REST webservice or lang.pro website, keeping your formatting intact. Note: this is a paid service, since it's paying professional translators to do the work.
Are you a translator?
lang.pro makes it easy to do all translations directly on the website, without needing any outside software. Articles are broken up into sentences, so you can do one at a time, but read the whole article to get context. You get paid by the word, and a running count lets you know your total. lang.pro takes no percentage.
Project manager?
lang.pro alerts translators when a project is waiting, and the first one to claim it gets it, but only if they do the project soon. If they wait too long, it reverts back to the pool and alerts the other translators. All projects are easy to oversee, and contact the client and translators. The system also allows for reviews, suggestions, and corrections.

Everything accessible by REST API so it can be integrated into any other software, CLI, or cron.

Status: was live and working, but these days I think Gengo is doing a great job, so I'm sending all translations to them for now.

Schema

A client creates a project that says, in advance, which languages everything should be translated into. The client then posts an original under that project, for it to be translated into each of that project's languages. An original is usually English, anything from a single word to an entire article in plain text or HTML. Each original needs an ID (remote_id) given by the client for reference. It has an optional comment, where the client should give helpful info to the translator.

If the posted original is longer than a sentence, it's split up, and each individual sentence is inserted into the database, so that the translators can work with one sentence at a time. The original is updated with a sentencemap that says how to put them back together, so that the client only sees the finished translation. (Especially good for HTML so that translator doesn't need to see codes.)

When a new original is posted by the client, an empty translation for each language is created for it in the database, waiting for a translator to claim it and start translating. If the original was broken into sentences, the translation refers to the sentence. Otherwise it refers only to the original.

Translator attributes are: language, rate-per-word, and priority. The language is from-and-to English. Wordrate is their price per-word to translate original text. Optionally they have a reviewing wordrate, which is their rate to verify (or fix) someone else's translation. The priority is a 1-10 number set by the manager to set the order in which the translators are notified when translations are waiting. Lower numbers are notified first, then higher numbers only if lower numbered translators have not claimed the translations after a time.

If a translator claims a translation, but doesn't start or finish it after a reasonable time, it will be unclaimed and available to the pool of translators again.

When a translator marks a translation as finished, a charge for it is created, by multiplying the number of words in the original by that translator's price-per-word.

When the translator is paid (by the manager, using PayPal or any other method) a payment is created, saving that payment's details, and linking it to specific charges.

Afterwards, if another translator has a correction for a translation, they enter their suggestion as a new translation, with an optional explanation comment. A “usethis” boolean flag on every translation says which one to use. This also creates a charge.

Sentence Splitting and Mapping

When a single word, phrase, or sentence is posted to lang.pro, it makes it easy. The translator just translates the original directly.

But often, the client will post an HTML page, which needs to be broken down into individual sentences, removing any block-level HTML tags - (since they are between sentences, the translator never needs to see them) - but keeping any inline tags like <strong> or <a href="">, since the translator needs to control which words those wrap.

Then, to put the sentences back together, the original needs to keep a sentencemap.

EXAMPLE: Client posts this:

<!-- Split me up, baby! -->
<p>
Mr. Thompson is from the U.S.A. and is 75 years old. Dr. Bob is a <strong>great</strong> dentist. Can this deal with a <a href="http://wikipedia.org">question</a>? It sure can! "But why," you ask? “But why,” I correct you with curly-quotes.
</p><ul>
<li>some things</li>
<li>are better in lists</li>
</ul>

It's split into this:

  1. Split me up, baby!
  2. Mr. Thompson is from the U.S.A. and is 75 years old.
  3. Dr. Bob is a <strong>great</strong> dentist.
  4. Can this deal with a <a href="http://wikipedia.org">question</a>?
  5. It sure can!
  6. "But why," you ask?
  7. “But why,” I correct you with curly-quotes.
  8. some things
  9. are better in lists

The saved map would be a text string like this:

<!-- {1} -->
<p>
{2}{3}{4}{5}{6}{7}
</p><ul>
<li>{8}</li>
<li>{9}</li>
</ul>

The translator would translate each sentence separately:

  1. Diviser-moi, bébé!
  2. M. Thompson est de l'Amérique et a 75 ans.
  3. Dr Bob est un dentiste <strong>merveilleux</strong>.
  4. Cela peut-il traiter d'une <a href="http://wikipedia.org">question</a>?
  5. C'est sûr!
  6. «Mais pourquoi?» direz-vous.
  7. «Mais pourquoi?» je vous corriger avec des guillemets.
  8. certaines choses
  9. sont mieux dans les listes

The client gets it mapped like this

<h3>Diviser-moi, bébé!</h3>
<p>
M. Thompson est de l'Amérique et a 75 ans. Dr Bob est un dentiste <strong>merveilleux</strong>. Cela peut-il traiter d'une <a href="http://wikipedia.org">question</a>? C'est sûr! «Mais pourquoi?» direz-vous. «Mais pourquoi?» je vous corriger avec des guillemets.
</p><ul>
<li>certaines choses</li>
<li>sont mieux dans les listes</li>
</ul>

NOTES:

Site

The lang.pro website has all of the functionality described above, plus the REST URLs. Everything is world-readable, but only authorized people can update/create/delete.

Each of the four roles of browser-users have a subdirectory: /c/ for client, /m/ for manager, /t/ for translator. When a person successfully logs in, they are sent to the appropriate subdirectory. This allows for uniformity of URLs, but with the appropriate views and permissions. (Example: /m/original/123 would be a manager's view of original ID# 123, with the ability to update it, and see all of its translations. /t/original/123 would be a translator's view of original ID# 123, only if they are the translator for one of its translations. They can not edit the original, only the translation.)

The top-level URLs are for the REST API. Optionally: all the forms in the subdirectories can do all their POST/PUT/DELETE to the REST URLs in the top-level, so all of that functionality is not duplicated. Tight permissions on all, to ensure that translators can only change their translation, clients can only see/create their own originals/translations, and managers can change anything, but not (for example) delete a translator who has translations or charges.

The /m/ manager section lets the manager see, edit, or add anything. Overviews show managerial things like unassigned translations, unpaid translators, open reviews, and such.

One world-public URL is /fix with the format /fix/2/MTCATEGORY6/es where 1=project.id MTCATEGORY6=original.remote_id es=language. This can by used by the client's website to say, “Suggest an improvement to this translation?” - so anyone can create a review.

Some URLs

/
Home page to describe service, log in, sign up
/* = REST
Authenticated REST for all models described above. Plus...
/assignment
POST-only: assigns unassigned translations to posted translator.id
/rejection
POST-only: rejects a translation, resetting its assignment and deleting any charges for it
/m/ = MANAGER
Forms to let the manager see, edit, or add anything. Plus...
/m/search
Search originals and translations for phrase, returning linked list
/m/undone
List of all unfinished translations
/m/undone/2
List of unfinished translations for translator ID# 2
/m/unpaid
List of translators owed money, and how much
/m/unpaid/2
List of unpaid translations for translator ID# 2, with sum total, and a form to create a new payment, including corresponding charge.ids.
/m/payment
When POSTING a new payment, update corresponding charges with its payment.id, verifying the amounts match.
/t/ = TRANSLATOR
Pages for a translator to see their translations, unfinished and finished. Forms to edit them. Plus...
/t/account
Edit their own info, including info in people database. Note: can't change wordrate. Only manager can.

REST API

URI GET POST PUT DELETE
/person/2 get person info . update person .
/client list of all clients person_id to add new client . .
/client/2 client info (+ list of projects) project.name to add new project person_id to update person_id delete client
/project list of all projects client_id + name to add new project . .
/project/2 project info . update project delete project
/project/2/status simple reponse: pending or ready, depending on whether translations are still unfinished . . .
/project/2/originals originals for this project . . .
/project/2/translations finished translations for this project . . .
/project/2/translations/all all translations for this project, finished or not . . .
/manager list of all managers person_id to add new manager . .
/manager/2 person info about manager . update person_id delete
/translator list of all translators person_id, lang to add new translator . .
/translator/pt list of all Portuguese ('pt') translators, order by priority . . .
/translator/2 translator info . update translator delete translator
/translator/2/translations finished translations by this translator . . .
/translator/2/originals originals with unfinished translations waiting for this translator . . .
/translator/2/payments payments to this translator . . .
/translator/2/charges charges for this translator's translations . . .
/translator/2/all all info for this translator (see above) . . .
/original list of all originals available to authorized client (client.projects + shared) project_id, remote_id, lang, original, context_url, comment to create new original AND create empty unassigned translations for each of that project's language codes . .
/original/2 get original . update original delete original
/remote/x1 alias for /original/id, (same methods), so clients can use their remote_id
/original/2/translations translations of this original . . .
/remote/x1/translations alias for /original/id/translations,so clients can use their remote_id
/translation/2 translation info . update translation delete translation
/translation/2/charges charges for translation . . .
/translation/2/reviews reviews for translation . . .
/charge . translation_id to create new charge for that translation (server calculates amount based on translator.wordrate x words) . .
/charge/2 charge info . update charge delete charge
/payment . translator_id, currency, millicents, details to create new payment and mark all their unpaid charges with that payment_id, until millicents sum match. error if not exact match. . .
/payment/2 payment info . update payment delete payment

Database Schema

BEGIN;

CREATE SCHEMA lang;
SET search_path = lang;

CREATE TABLE managers (
	id serial primary key,
	person_id integer not null unique
);

CREATE TABLE clients (
	id serial primary key,
	person_id integer not null unique
);

CREATE TABLE translators (
	id serial primary key,
	person_id integer not null unique,
	lang char(2) not null,
	priority integer not null default 10,
	currency char(3) not null default 'USD',
	millicents_per_word_translate integer,
	millicents_per_word_review integer
);

CREATE TABLE projects (
	id serial primary key,
	client_id integer not null REFERENCES clients(id),
	name varchar(64) not null unique,
	langcodes text
);
CREATE INDEX prjcl ON projects(client_id);

CREATE TABLE originals (
	id serial primary key,
	project_id integer REFERENCES projects(id),
	remote_id varchar(127),
	created_at timestamp with time zone not null default CURRENT_TIMESTAMP,
	lang char(2) not null,
	original text,
	comment text,
	sentencemap text,
	UNIQUE (project_id, remote_id)
);
CREATE INDEX orgpi ON originals(project_id);
CREATE INDEX orgri ON originals(remote_id);

CREATE TABLE sentences (
	id serial primary key,
	original_id integer not null REFERENCES originals(id),
	sentence text
);
CREATE INDEX soi ON sentences(original_id);

CREATE TABLE translations (
	id serial primary key,
	original_id integer not null REFERENCES originals(id),
	sentence_id integer REFERENCES sentences(id),
	lang char(2) not null,
	translator_id integer REFERENCES translators(id),
	claimed_at timestamp with time zone,
	started_at timestamp with time zone,
	finished_at timestamp with time zone,
	translation text,
	comment text,
	usethis boolean not null default true
);
CREATE INDEX t9npi ON translations(original_id);
CREATE INDEX t9nsi ON translations(sentence_id);
CREATE INDEX t9nti ON translations(translator_id);

CREATE TABLE payments (
	id serial primary key,
	translator_id integer not null REFERENCES translators(id),
	currency char(3) not null,
	millicents integer,
	created_at date not null default CURRENT_DATE,
	details text
);
CREATE INDEX pyti ON payments(translator_id);

CREATE TABLE charges (
	id serial primary key,
	translation_id integer UNIQUE not null REFERENCES translations(id),
	currency char(3) not null,
	millicents integer,
	payment_id integer REFERENCES payments(id)  -- NULL until paid
);
CREATE INDEX chpi ON charges(payment_id);
CREATE INDEX chti ON charges(translation_id);

COMMIT;