Thinking Aloud
Cover Article
Book Review
Showcase
C-DAC in Focus
SnapShots
In Profile
Celebrate life with Laughter
The Fascinating world of celestial bodies
Stars for you
Sportsmeet
Guest Column
Pen to Paper
Past Issue
Dream Team
Guest Book

Connect Home

 

Search Connect
Browse Archive
July-September 2001
Jan-March 2001
Oct-Dec 2000
July-Sep 2000
April-June 2000

 

Machine Translation: A Hard Nut to Crack...

Machine Translation (MT) was historically one of the first tasks that was imagined as application of Natural Language Processing (NLP) as early as in the 40’s, and remains one that still resists the progress of Computer Science. Indeed, current MT systems remain often very poor compared with human translation. From the earliest days, MT has been bedeviled by grandiose claims and exaggerated expectations. MT researchers and developers should stop over-selling. The general public should stop over-expecting. One of the main aims of this article is that the reader comes to appreciate where we are today in terms of actual achievement, reasonable expectation, and unreasonable hype. This is not the kind of thing that one can sum up in a catchy headline ("No Prospect for MT" or "MT Removes the Language Barrier"), but it is something one can absorb, and which one can thereafter use to distill the essence of truth that will lie behind reports of products and researches. We all must realize that the potential economic rewards linked with the development of MT systems are huge, especially in non English-speaking countries.

The social or political importance of MT arises from the socio-political importance of translation in communities where more than one language is generally spoken. The commercial importance of MT is a result of related factors. First, translation itself is commercially important: for example, an instruction manual, repair manual for a piece of manufacturing machinery or the manual for a safety critical system, is not just a matter of taste. Secondly, translation is expensive. Translation is a highly skilled job, requiring much more than mere knowledge of a number of languages, and in some countries, translators’ salaries are comparable to other highly trained professionals. Moreover, in some cases, delays in translation can be very costly.

Scientifically, MT is interesting, because it is an obvious application and testing ground for many ideas in Computer Science, Artificial Intelligence, and Linguistics, and some of the most important developments in these fields have begun in MT. To illustrate this: the origins of Prolog , the first widely available logic programming language, which formed a key part of the Japanese ‘Fifth Generation’ programme of research in the late 1980s, can be found in the ‘Q-Systems’ language, originally developed for MT.

Despite this, very few people, even those who are involved in producing or commissioning translations, have much idea of what is involved in MT today, either at the practical level of what it means to have and use an MT system, or at the level of what is technically feasible, and what is science fiction.

We are reminded of this scale of ignorance every time we admit to working in the field of MT. After initial explanations of what MT is, the typical reaction is one of two contradictory responses (sometimes one gets both together). One is "But that’s impossible --- no machine could ever translate Shakespeare." The other is "Yes, I saw one of those in the Duty Free Shop when I went on holiday last summer." These reactions are based on a number of misconceptions that are worth exposing. We will look at these, as well as some correct conceptions, in the next section.[1]

Popular Conceptions and Misconceptions

"MT is a waste of time because you will never make a machine that can translate Shakespeare".

The criticism that MT systems cannot, and will never, produce translations of great literature of any great merit is probably correct, but quite besides the point. It certainly does not show that MT is impossible. First, translating literature requires special literary skill - it is not the kind of thing that the average professional translator normally attempts. So accepting the criticism does not show that automatic translation of non-literary texts is impossible. Second, literary translation is a small proportion of the translation that has to be done, so accepting the criticism does not mean that MT is useless. Finally, one may wonder who would ever want to translate Shakespeare by machine --- it is a job that human translators find challenging and rewarding. The criticism that MT systems cannot translate Shakespeare is a bit like criticism of Industrial Robots for not being able to dance Swan Like.

"Generally, the quality of translation from a MT system is very low. This makes them useless in practice."

Far from being useless, there are several MT systems in day-to-day use around the world. Examples include METEO (does translation of 45,000 words weather bulletins, daily since 1977, used at the Canadian Meteorological Center in Dorval, Montreal), SYSTRAN (in use at the Altavista Search engine, and elsewhere), LOGOS , ALPS , ENGSPAN (and SPANAM ), METAL , GLOBALINK.

C-DAC’s MANTRA-Rajbhasha (Beta version) was also evaluated at various Ministries and Departments and the results were very encouraging. It is true that the number of organizations that use MT on a daily basis is relatively small, but those that do use it benefit considerably.

"MT threatens the jobs of translators."

The quality of translation that is currently possible with MT is one reason why it is wrong to think of MT systems as dehumanizing monsters which will eliminate human translators, or enslave them. It will not eliminate them, simply because the volume of translation to be performed is so huge, and constantly growing. What is more likely is that the process of producing draft translations, along with the often tedious business of looking up unknown words in dictionaries, and ensuring terminological consistency, will become automated, leaving human translators free to spend time on increasing clarity and improving style, and to translate more important and interesting documents - editorials rather than weather reports, for example.

"MT systems are machines, and buying an MT system should be very much like buying a car."

The misconception is the idea that one would take a MT system and ‘drive it away’, as one would drive a car. In fact, this is unlikely to be possible, and a better analogy is with buying a house. What one buys may be immediately habitable, but there is a considerable amount of work involved in adapting it to one’s own special needs. In the case of a house this might involve changes to the decor and plumbing. In the case of a MT system this will involve additions to the dictionaries to deal with the vocabulary of the subject area and possibly the type of text structures to be translated.

Against these misconceptions, we should place the genuine facts about MT.

C-DAC’s Experience: C-DAC’s decade long research and experience shows that ‘good’ MT cannot be produced by replacing source language words with their target language translational equivalents. As all translators know, word for word translation doesn’t produce a satisfying target language text, not even when some local reordering rules (e.g. for the position of the adjective with regard to the noun which it modifies) have been included in the system.

Translating a text requires not only good knowledge of the vocabulary of both source and target language, but also of their grammar - the system of rules which specifies which sentences are well-formed in English language and which are not. Additionally, it requires some element of real world knowledge - knowledge of the nature of things out in the world and how they work together --- and technical knowledge of the text’s subject area. We analyzed the corpus and created the grammar of the sub-language. Most effort has gone into increasing the subtlety, breadth and depth of the linguistic or grammatical knowledge available to the system.

Thus, in MANTRA-Machine assisted Translation tool (Rajbhasha), the strategy adopted was, NOT word to word NOR rule to rule, BUT lexical tree to lexical tree, where a chunk to chunk level of transfer can be done [2]. This uses the Tree Adjoining Grammar (TAG) formalism for both Parsing of English language and Generation of Hindi sentences [3]. The domain selected is Personnel Administration and the text related to appointment, transfer, office orders, and office memorandum are translated successfully with 90-95% accuracy. The current version is under Beta evaluation and soon it will be installed at various places. As a consequence, we see that the main impact of MANTRA-Rajbhasha in the immediate future will be in large corporate environments, ministries and departments where substantial amount of official translation work is performed. We have extended this to other domains and other language pairs.

Web being the destination of all the products these days, we are developing a MANTRA- Translation Server (M-TS) which can be accessed by anyone on the World Wide Web. The English text, from the specified domain, can be sent to M-TS and the target language text is generated and sent back by email. We are also working on some domain specific translated chat application. Here, one can select a language and all the communication will be done in the selected language, e.g. if I select Hindi and you select English then I will get all the messages in Hindi although you will be typing in English and vice-versa.

Conclusion: The correct conclusion is that MT is not only a possibility but also an actuality. But it is important to see the product in a proper perspective, to be aware of its strong points and shortcomings. Thus we can say that: "Machine Translation: A hard nut to crack" BUT we cracked it.

Some Popular Misconceptions about MT

  • False: MT is a waste of time because you will never make a machine that can translate Shakespeare.
  • False: There was/is an MT system which translated ‘the spirit is willing, but the flesh is weak’ into the Russian equivalent of ‘The vodka is good, but the steak is busy,’ and hydraulic ram into the French equivalent of water goat, MT useless.
  • False: Generally, the quality of translation you can get from an MT system is very low. This makes them useless in practice.
  • False: MT threatens the jobs of translators.
  • False: The Japanese have developed a system that you can talk to on the phone. It translates what you say into Japanese, and translates the other speaker’s replies into English.
  • False: There is an amazing South American Indian Language with a structure of such logical perfection that it solves the problem of designing MT systems.
  • False: MT systems are machines, and buying an MT system should be very much like buying a car.

Some Facts about MT

  • True: MT is useful. The METEO system has been in daily use since 1977. As of 1990, it was regularly translating around 45000 words daily. In the 1980s. The diesel engine manufacturers Perkins Engines was saving around £4000 and up to 15 weeks on each manual translated.
  • True: While MT systems sometimes produce howlers, there are many situations where the ability of MT systems to produce reliable, if less than perfect, translations at high speed is valuable.
  • True: In some circumstances, MT systems can produce good quality output: less than 4% of METEO output requires any correction by human translators at all and most of these are due to transmission errors in the original texts). Even where the quality is lower, it is often easier and cheaper to revise draft quality MT output than to translate entirely by hand.
  • True: MT does not threaten translators jobs. The need for translation is vast and unlikely to diminish, and the limitations of current MT systems are too great. However, MT system can take over same of the baring, repetitive translation job s and allow human translation to concentrate on mare interesting talks, where their specialist skills are really needed.
  • True: Speech-to-Speech MT is still a research topic. In general, there are many open research problems to be solved before MT systems will be come close to the abilities of human translators.
  • True: Not only are there many open research problems in MT, but building an MT system is an arduous and time consuming job, involving the construction of grammars and very large monolingual and bilingual dictionaries. There is no ‘Magic solution’ to this.
  • True: In practice, before an MT system becomes really useful, a user will typically have to invest a considerable amount of effort in customizing it.

References:
1. Machine Translation : An Introductory Guide.

2. MANTRA- An English-Hindi Translation methodology: An Insight, CDAC/AAI/TR-98-0108, June 1998

3. Tree Adjoining Grammar,
URL http://www.cis.upenn.edu/~xtag/


Hemant Darbari is Group Co-ordinator and leads the Applied Artificial Intelligence Group. of C-DAC. Click here to send an eMail