|
Machine
Translation: A Hard Nut to Crack...
Machine
Translation (MT) was historically one of the
first tasks that was imagined as application
of Natural Language Processing (NLP) as early
as in the 40’s, and remains one that still resists
the progress of Computer Science. Indeed, current
MT systems remain often very poor compared with
human translation. From the earliest days, MT
has been bedeviled by grandiose claims and exaggerated
expectations. MT researchers and developers
should stop over-selling. The general public
should stop over-expecting. One of the main
aims of this article is that the reader comes
to appreciate where we are today in terms of
actual achievement, reasonable expectation,
and unreasonable hype. This is not the kind
of thing that one can sum up in a catchy headline
("No Prospect for MT" or "MT
Removes the Language Barrier"), but it
is something one can absorb, and which one can
thereafter use to distill the essence of truth
that will lie behind reports of products and
researches. We all must realize that the potential
economic rewards linked with the development
of MT systems are huge, especially in non English-speaking
countries.
The
social or political importance of MT arises
from the socio-political importance of translation
in communities where more than one language
is generally spoken. The commercial importance
of MT is a result of related factors. First,
translation itself is commercially important:
for example, an instruction manual, repair manual
for a piece of manufacturing machinery or the
manual for a safety critical system, is not
just a matter of taste. Secondly, translation
is expensive. Translation is a highly skilled
job, requiring much more than mere knowledge
of a number of languages, and in some countries,
translators’ salaries are comparable to other
highly trained professionals. Moreover, in some
cases, delays in translation can be very costly.
Scientifically,
MT is interesting, because it is an obvious
application and testing ground for many ideas
in Computer Science, Artificial Intelligence,
and Linguistics, and some of the most important
developments in these fields have begun in MT.
To illustrate this: the origins of Prolog ,
the first widely available logic programming
language, which formed a key part of the Japanese
‘Fifth Generation’ programme of research in
the late 1980s, can be found in the ‘Q-Systems’
language, originally developed for MT.
Despite
this, very few people, even those who are involved
in producing or commissioning translations,
have much idea of what is involved in MT today,
either at the practical level of what it means
to have and use an MT system, or at the level
of what is technically feasible, and what is
science fiction.
We
are reminded of this scale of ignorance every
time we admit to working in the field of MT.
After initial explanations of what MT is, the
typical reaction is one of two contradictory
responses (sometimes one gets both together).
One is "But that’s impossible --- no machine
could ever translate Shakespeare." The
other is "Yes, I saw one of those in the
Duty Free Shop when I went on holiday last summer."
These reactions are based on a number of misconceptions
that are worth exposing. We will look at these,
as well as some correct conceptions, in the
next section.[1]
Popular
Conceptions and Misconceptions
"MT
is a waste of time because you will never make
a machine that can translate Shakespeare".
The
criticism that MT systems cannot, and will never,
produce translations of great literature of
any great merit is probably correct, but quite
besides the point. It certainly does not show
that MT is impossible. First, translating literature
requires special literary skill - it is not
the kind of thing that the average professional
translator normally attempts. So accepting the
criticism does not show that automatic translation
of non-literary texts is impossible. Second,
literary translation is a small proportion of
the translation that has to be done, so accepting
the criticism does not mean that MT is useless.
Finally, one may wonder who would ever want
to translate Shakespeare by machine --- it is
a job that human translators find challenging
and rewarding. The criticism that MT systems
cannot translate Shakespeare is a bit like criticism
of Industrial Robots for not being able to dance
Swan Like.
"Generally,
the quality of translation from a MT system
is very low. This makes them useless in practice."
Far
from being useless, there are several MT systems
in day-to-day use around the world. Examples
include METEO (does translation of 45,000 words
weather bulletins, daily since 1977, used at
the Canadian Meteorological Center in Dorval,
Montreal), SYSTRAN (in use at the Altavista
Search engine, and elsewhere), LOGOS , ALPS
, ENGSPAN (and SPANAM ), METAL , GLOBALINK.
C-DAC’s
MANTRA-Rajbhasha (Beta version) was also evaluated
at various Ministries and Departments and the
results were very encouraging. It is true that
the number of organizations that use MT on a
daily basis is relatively small, but those that
do use it benefit considerably.
"MT
threatens the jobs of translators."
The
quality of translation that is currently possible
with MT is one reason why it is wrong to think
of MT systems as dehumanizing monsters which
will eliminate human translators, or enslave
them. It will not eliminate them, simply because
the volume of translation to be performed is
so huge, and constantly growing. What is more
likely is that the process of producing draft
translations, along with the often tedious business
of looking up unknown words in dictionaries,
and ensuring terminological consistency, will
become automated, leaving human translators
free to spend time on increasing clarity and
improving style, and to translate more important
and interesting documents - editorials rather
than weather reports, for example.
"MT
systems are machines, and buying an MT system
should be very much like buying a car."
The
misconception is the idea that one would take
a MT system and ‘drive it away’, as one would
drive a car. In fact, this is unlikely to be
possible, and a better analogy is with buying
a house. What one buys may be immediately habitable,
but there is a considerable amount of work involved
in adapting it to one’s own special needs. In
the case of a house this might involve changes
to the decor and plumbing. In the case of a
MT system this will involve additions to the
dictionaries to deal with the vocabulary of
the subject area and possibly the type of text
structures to be translated.
Against
these misconceptions, we should place the genuine
facts about MT.
C-DAC’s
Experience: C-DAC’s decade long research and
experience shows that ‘good’ MT cannot be produced
by replacing source language words with their
target language translational equivalents. As
all translators know, word for word translation
doesn’t produce a satisfying target language
text, not even when some local reordering rules
(e.g. for the position of the adjective with
regard to the noun which it modifies) have been
included in the system.
Translating
a text requires not only good knowledge of the
vocabulary of both source and target language,
but also of their grammar - the system of rules
which specifies which sentences are well-formed
in English language and which are not. Additionally,
it requires some element of real world knowledge
- knowledge of the nature of things out in the
world and how they work together --- and technical
knowledge of the text’s subject area. We analyzed
the corpus and created the grammar of the sub-language.
Most effort has gone into increasing the subtlety,
breadth and depth of the linguistic or grammatical
knowledge available to the system.
Thus,
in MANTRA-Machine assisted Translation tool
(Rajbhasha), the strategy adopted was, NOT word
to word NOR rule to rule, BUT lexical tree to
lexical tree, where a chunk to chunk level of
transfer can be done [2]. This uses the Tree
Adjoining Grammar (TAG) formalism for both Parsing
of English language and Generation of Hindi
sentences [3]. The domain selected is Personnel
Administration and the text related to appointment,
transfer, office orders, and office memorandum
are translated successfully with 90-95% accuracy.
The current version is under Beta evaluation
and soon it will be installed at various places.
As a consequence, we see that the main impact
of MANTRA-Rajbhasha in the immediate future
will be in large corporate environments, ministries
and departments where substantial amount of
official translation work is performed. We have
extended this to other domains and other language
pairs.
Web
being the destination of all the products these
days, we are developing a MANTRA- Translation
Server (M-TS) which can be accessed by anyone
on the World Wide Web. The English text, from
the specified domain, can be sent to M-TS and
the target language text is generated and sent
back by email. We are also working on some domain
specific translated chat application. Here,
one can select a language and all the communication
will be done in the selected language, e.g.
if I select Hindi and you select English then
I will get all the messages in Hindi although
you will be typing in English and vice-versa.
Conclusion:
The correct conclusion is that MT is not only
a possibility but also an actuality. But it
is important to see the product in a proper
perspective, to be aware of its strong points
and shortcomings. Thus we can say that: "Machine
Translation: A hard nut to crack" BUT we
cracked it.
Some
Popular Misconceptions about MT
-
False: MT is a waste of time because you will
never make a machine that can translate Shakespeare.
-
False: There was/is an MT system which translated
‘the spirit is willing, but the flesh is weak’
into the Russian equivalent of ‘The vodka
is good, but the steak is busy,’ and hydraulic
ram into the French equivalent of water goat,
MT useless.
- False:
Generally, the quality of translation you
can get from an MT system is very low. This
makes them useless in practice.
- False:
MT threatens the jobs of translators.
- False:
The Japanese have developed a system that
you can talk to on the phone. It translates
what you say into Japanese, and translates
the other speaker’s replies into English.
- False:
There is an amazing South American Indian
Language with a structure of such logical
perfection that it solves the problem of designing
MT systems.
- False:
MT systems are machines, and buying an MT
system should be very much like buying a car.
Some
Facts about MT
-
True: MT is useful. The METEO system has been
in daily use since 1977. As of 1990, it was
regularly translating around 45000 words daily.
In the 1980s. The diesel engine manufacturers
Perkins Engines was saving around £4000 and
up to 15 weeks on each manual translated.
-
True: While MT systems sometimes produce howlers,
there are many situations where the ability
of MT systems to produce reliable, if less
than perfect, translations at high speed is
valuable.
-
True: In some circumstances, MT systems can
produce good quality output: less than 4%
of METEO output requires any correction by
human translators at all and most of these
are due to transmission errors in the original
texts). Even where the quality is lower, it
is often easier and cheaper to revise draft
quality MT output than to translate entirely
by hand.
-
True: MT does not threaten translators jobs.
The need for translation is vast and unlikely
to diminish, and the limitations of current
MT systems are too great. However, MT system
can take over same of the baring, repetitive
translation job s and allow human translation
to concentrate on mare interesting talks,
where their specialist skills are really needed.
-
True: Speech-to-Speech MT is still a research
topic. In general, there are many open research
problems to be solved before MT systems will
be come close to the abilities of human translators.
-
True: Not only are there many open research
problems in MT, but building an MT system
is an arduous and time consuming job, involving
the construction of grammars and very large
monolingual and bilingual dictionaries. There
is no ‘Magic solution’ to this.
-
True: In practice, before an MT system becomes
really useful, a user will typically have
to invest a considerable amount of effort
in customizing it.
References:
1. Machine Translation : An Introductory Guide.
2. MANTRA- An English-Hindi Translation methodology:
An Insight, CDAC/AAI/TR-98-0108, June 1998
3. Tree Adjoining Grammar,
URL http://www.cis.upenn.edu/~xtag/

Hemant Darbari is Group Co-ordinator and leads
the Applied Artificial Intelligence Group. of
C-DAC. Click
here to send an eMail
|