Saturday, August 29, 2009

How's that Babelfish coming along?

I'm not going to do a full baker's dozen on this, at least not at the moment, but just to test the waters let's try round tripping-the last paragraph of my last post through Yahoo! Babelfish. In the past, this exercise (with various on-line translation engines, not just Yahoo!) has been good for much amusement, but these things have a way of incrementally improving.

The original paragraph was
Personally, I still don't see I, Robot coming to life any time soon, but I do see things that got written off as impossible during the dead-and-debunked phase starting to stir to life again. I'm thinking, say, competent machine translation or robots that can pick things up and carry them around a house, happening gradually in the next decade or so.
For whatever reason, Babelfish has trouble with "don't" and "I'm", passing them through to the target language undigested, so for the test I've changed them to "do not" and "I am". I've also quoted "I Robot" to reduce confusion. This seems to help.

First, let's try Dutch. Dutch, along with its cousin Frisian, is just about as closely related as extant languages get to English.
Persoonlijk, zie ik nog geen "I Robot" de komst aan het leven wanneer spoedig, maar ik dingen zien die onmogelijk tijdens afgeschreven werden dood-en-debunked fase die aan het leven begint opnieuw te bewegen. Ik denk, namelijk, bekwame automatische vertaling of robots die kunnen dingen oprapen en hen dragen die rond een huis, geleidelijk aan in het volgende decennium gebeuren of zo.
From my limited knowledge of Dutch, this is actually not too far off. I'm pretty sure it's ungrammatical in places, and I'd be interested to hear from a native Dutch speaker, but I'm also pretty sure such a person could get the intent of what I was saying. Of course, most Dutch speakers would just read the Engels versie and be done with it.

Now for the back-translation of the translated text. For several reasons, one shouldn't expect the original to come back word-for-word, but ideally it should come back in good English with the sense substantially intact:
Personally, to see I "still no; I Robot" the arrival to living when shortly, but I things see that impossible during was written down dood-en-debunked phase move which to living starts again. I think, as it happens, competent automatic translation or robots which can pick up things and them carry that around a house, gradually in the next decade happen or this way.
Um ... I'd say this bears a strong statistical resemblance to what I said, but I wouldn't say it hews closely to the original sense.

Several things have always puzzled me about Babelfish. Why does it not at least attempt to come up with grammatical statements in the target language? I suspect this is because it takes a local, phrase-at-a-time approach, but then why can it translate "dead-and-debunked" to "dood-en-debunked", that is to say "Dutch for "dead"-Dutch for "and"-Don't know what this is, just pass it through", but not take that back to "dead-and-debunked" by the same logic? Why can it rewrite English word order into more-or-less Dutch word order, but not vice versa?

I'm less surprised that it would take "say" into "namelijk" and then back to "as it happens". The failure of senses of words to map one-to-one even between closely related languages is the bane of all translation. In this case "bij voorbeeld" [or maybe "met name"?] would have worked better, coming back as "for example", but a human translator would likely have seen that "for example" was more appropriate in any case, even with the Dutch text as it stands.

This sort of determination would seem at least to require a more global view of the text, which global view Babelfish clearly lacks. It's just the sort of thing I've been waiting to see signs of in online translation. It might be out there somewhere, but not in Babelfish, and I expect it would take significant structural changes in the engine for Babelfish to acquire it.

Ah well. What about Spanish? Spanish is more widely spoken than Dutch, and particularly more widely spoken as an only language, so perhaps the English-Spanish-English translators get more attention. It's also still in the same general family as English and one could make a plausible argument that English owes at least as much of its structure to the Romance languages as to the Germanic. In other words, the winds appear at least as favorable here as for the last attempt. Without further ado:
Personalmente, todavía no veo el "I, Robot" viniendo la vida en caulquier momento pronto, pero a mí vea las cosas que consiguieron amortizadas como imposible durante la fase muerto-y-debunked que comenzaba a revolver a la vida otra vez. Estoy pensando la traducción automática o las robustezas, por ejemplo, competente que pueden coger cosas y llevarlas alrededor de una casa, sucediendo gradualmente en la década próxima o tan.
Again, I'm not a native speaker, but this seems like reasonable Spanish, although still with what look to my untrained eye like a couple of grammatical errors. But at least "say" came across as "por ejemplo", which should come back as "for example". How does the whole thing come back?
Personally, still I do not see "I, Robot" coming the life in caulquier moment soon, but it sees the things that obtained amortized like impossible during the die-and-debunked phase that revolver to the life began again. I am thinking the automatic translation or the robustezas, for example, competent that can take things and to take them around a house, happening gradually about the next decade or so.
Urk. Again, why can it produce "cualquier" but not at least come back with "whatever", which would sound a bit weird but at least be English? Likewise "revolver" to "return" or "robustezas" to "robots"? Even with those filled in, it's still a bit of a word salad, leading me to think I was probably too generous about the quality of the Dutch and Spanish. Not being particularly expert in those languages, I'm probably better able to gloss over gross errors.

OK, one more try: Simplified Chinese. My ignorance of written Chinese is profound, so the translation here could read "the square on the hypotenuse is equal to the sum of the other two squares" for all I know, but here it is:
亲自,我仍然不看" 我, Robot" 很快来到生活,但是我看得到注销的事物,不可能在开始死和被揭穿的阶段期间再搅动到生活。 我认为可能拾起事和在房子附近运载他们,在下十年或如此逐渐发生的能干机器翻译或机器人。
And back again:
Personally, I still did not look at "I, Robot" Arrives at the life very quickly, but I looked that obtains the logging out thing, is impossible to start to die with stage period which reveals mixes the life again. I thought that possibly ascends to stage a rebellion with delivers them nearby the house, either has the competent machine translation or the robot so gradually in the next ten years.
I'll just let that speak for itself.

earl said...

As well you might. Personalmente, I thought the English from Spanish was much more legible than the English from Dutch. An interesting comparison would be to try the experiment with less than fully competent human translators. Let someone, say, you, translate a paragraph of English into Spanish, and then ask Rafael who works for the roofing company and can get by in English but doesn't really speak it take your Spanish and put it into English. You will undoubtedly not get back the original, but the errors will be of an entirely different type than Bfish's.