«Abstract In Cross-Language Information Retrieval (CLIR), queries in one language retrieve relevant documents in other languages. Machine-Readable ...»
4. MT-based approach We explore the retrieval effectiveness of Machine Translation (MT) systems for ArabicEnglish and English-Arabic Cross-Language Information Retrieval (CLIR), as well as what factors affect performance, and to what extent. As mentioned in Section 2.2, one of the approaches being tested for CLIR makes use of existing machine translation systems to provide automatic translation of the queries or documents, from one language to another. The basic task of any machine translation system is to analyze the source text, including morphological, syntactic, and semantic analysis using bilingual dictionaries or special purpose lexicons, and target language generation. Therefore, a machine translation strategy for CLIR might allow the researchers to take advantage of the extensive research on machine translation and the availability of commercial products.
There are two basic approaches to MT, translating the documents or the queries. The drawbacks of the document translation approach, as compared to translating the queries, are the extensive processing required to translate very large amount of data, and in the case of multiple query languages, the need to duplicate the documents in all of the query languages. In the case of translating the queries, Oard (Oard, 1998) discussed the technique and concluded that it is less costly than translating the documents. This provides an obvious approach to query translation.
Many researchers criticize the MT-based CLIR approach. The reasons behind their criticisms mostly stem from the fact that the current translation quality of MT is poor. In particular, typical search terms lack the context necessary for the MT system to correctly perform proper syntactic and semantic analysis of the source text. Another reason is that MT systems are expensive to develop, and their application degrades the retrieval efficiency (run time performance) due to the cost of the linguistic analysis. A study by (Radwan and Fluhr, 1995) compared the retrieval effectiveness of the French-English CLIR using SYSTRAN machine translation system with the effectiveness of their EMIR dictionary-based query translation. They determined that the EMIR was more effective than their MT-based query translation technique using SYSTRAN.
Other researchers, in contrast, showed that machine translation approaches could achieve reasonable effectiveness. Jones, et al. (1999), showed that full disambiguation by a MT system outperforms dictionary lookup methods that include several terms as candidates in the query. Also, many participants in the TREC-8 CLIR track (Braschler et al., 1999) concluded that MT-based CLIR is an effective strategy. Another advantage of using MT systems for CLIR is that if L1-L2 MT and L2-L3 MT systems are available, it is possible to construct a L1-L3 CLIR system without developing a L1-L3 MT system, where L1, L2, and L3 are three different languages (Kwok, 1999).
Our experiments provide insight into the performance of the MT-based query translation approach on a large document collection described in Section 3.4.2. The machine translation systems that we adapted for our experiments are commercial products that are designed to assist humans by automatically translating full sentences, or even a paragraph. For higher accuracy, if the query terms are formulated as phrases, we can apply MT systems as well. However, experience shows that users typically prefer to give isolated words, or at best, short phrases to an information retrieval system. Therefore, we are considering short queries directed at the titles of TREC-7, TREC-9 and Arabic TREC-10 topics to experiment with this situation.
4.1 Experimental Approach In Arabic-English CLIR, presently, no benchmark data are available for Arabic-English CLIR. To provide a means to compare our efforts with future Arabic-English CLIR efforts, we used readily available English benchmark document collections and provide our Arabic queries, a translation of the National Institute of Science and Technology, Text Retrieval Conference (TREC) queries on our web site at www.ir.iit.edu. We used these 100 translated versions as our original Arabic queries issued against the TREC English collection.The Arabic queries were translated back to English using the ALKAFI MT system. Indexing is done using the Porter and K-stem algorithms after eliminating the stop-words. Similarly, querying is done after stemming and eliminating the stopwords of the translated target English queries. The ALKAFI Arabic-English MT system is a commercial system developed by CIMOS Corporation and it is the first Arabic to English machine translation system.
Usually, the Arabic text is not vocalized; so ALKAFI can add vowels internally. But sometimes, the user must vocalize some consonants to help ALKAFI at lexical and syntactic analysis. Vocalization is crucial step since word sense depends on vocalization and on word position in context. The system attempts to analyze words in context and then builds semantic relations. Then, the English text is generated by a transfer method
according to English language grammar rules. ALKAFI uses five dictionaries:
The TREC queries (or topics in the TREC vernacular) consist of three fields: title, description, and narrative. The title is considered short; it consists of one, two or three concept terms. In Table 13, we illustrate an example of the original Arabic title and its translation. The description field is of medium length; it consists of one or two sentences.
In Table 14, we provide an example of the description field and its translation. The longest part is the narrative field; in Table 15, we show an example of the narrative field and its translation using the ALKAFI MT system. To measure the effectiveness of an MT system for CLIR, we experimented using all three-query types to determine the effects of query length (short, medium, and long) on the performance of the MT-based method for CLIR.
Table 15. The narrative of the original Arabic and the translated English query using the ALKAFI MT system For English-Arabic CLIR, we conducted the experiments using Al-Mutarjim Al-Arabey English to Arabic commercial system (ATA Software Technology).
The titles of the source Arabic queries are translated to English by the Al-Mutarjim Al-Arabey MT system. The average length of the titles of Arabic TREC topics is 6.2 words. The minimum speed of translation is 1000 words per minute on a system with just the basic hardware requirements. The translation result of query’s title AR23 using Al-Mutarjim Al-Arabey system is shown in Table 16.
Table 16. English query terms and their translation using Al-Mutarjim Al-Arabey MT system
4.2 Results We use three performance measures. The first uses the recall-precision scores at 11 standard points. In CLIR systems, given the expenses of the translation, a user is most likely to be interested in only the top few retrieved Web pages. Thus, we provide measures for the top n retrieved documents. We also provide the overall average of precision of each run. We evaluate the effects of the MT system in Arabic-English CLIR.
As described earlier, we used both the TREC-7 and TREC-9 topics and TREC-9 collections. For TREC-7, as shown in Table 17, the machine translation achieved 61.8%, 64.7%, and 60.2% for title, description, and narrative fields, respectively. The 11-point average recall-precision for TREC-7 topics is shown in Figures 4, 5, and 6 for the title, description, and narrative fields, respectively. As shown, the MT-based approach on description is more effective than title and narrative. In each figure, we also illustrate the “ideal” system score, which is represented by the monolingual query. At the higher precision-lower recall levels, the difference is even more noticeable. The degraded effectiveness of the machine translation on title is that the ALKAFI machine translation system is designed to perform best on well-formed sentences or at least on a sequence of words that form a context. However, the titles of topics 351-400 are all three words or less; thus, no substantive context is formed.
For the narrative run results shown in Figure 6, the MT system is unable to preserve its accuracy when extra, potentially noise, terms are presented in the source query. The greater the number of source query terms, except for, of course, keywords or words of high query disambiguation content, the greater is the performance degradation of a CLIR system. These additional, potentially noise, terms do not provide a strong basis of the source query. The ALKAFI MT system, however, is still capable of maintaining 60.2% of the monolingual retrieval. At the higher precision-lower recall levels, the narrative run is more effective than the title. At the higher recall level (up to 0.8), the title run is more effective than the narrative run. As measured by average precision, there is a slight difference between the narrative and the title runs. It is not surprising that the narrative run is strictly worse in accuracy then the descriptive run since the MT system achieves its best performance on the fewest sequence of words that still provides a full context.
0.3 0.2 0.1
0.4 0.3 0.3 0.2 0.2 0.1 0.1
In Table 19, we illustrate the average precision of TREC-9 topics. Our CLIR approach using the ALKAFI MT system achieves 58.4%, 57.1%, and 53.4% for title, description, and narrative fields, respectively. The 11-point average recall-precision for TREC-9 topics is shown in Figures 7, 8, and 9 for the title, description, and narrative fields, respectively. Again, the “ideal” monolingual run is likewise illustrated in each figure.
0.35 0.5 0.3
In Tables 20, we illustrate the results up to 1000 documents retrieved for TREC-9. As shown, again, the description run consistently outperforms both the title and narrative runs. However, as shown in Table 20, the percentage of degradation of the title run from the “ideal” monolingual title run is less that that of the descriptive run. This result is seemingly inconsistent with the results obtained for the machine translation on titles run for queries 351-500 as presented in Table 17. The reason behind this seeming contradiction in accuracy performance is that the titles of query 451-500 are actually quite long. The average title query length for queries 351-400 is 2.72 word per query while the average length for queries 451-500 is 3.46 words. This 27% difference in query length was sufficient to provide our MT system with the possibility to form a proper context for many more queries in the TREC-9 query set as compared to the TREC-7 query set. This is especially so considering that for the TREC-9 query set had 16 queries with 4 or more words as compared to the only 6 queries of similar length in
the TREC-7 query set. For example, the title of the query number 482 is:
The translated query using ALKAFI MT system is:
“Where is he possible that I find the rates of the growth for the tree of the pine? ” This query provides a full context to make the ALKAFI machine translation produces the most accurate translation. Adding more contexts to that query does not help the MT system to provide better translation accuracy.
Finally, for completeness, we provide a brief overview of efficiency results. In Table 21, we summarize the efficiency (run time performance) of the ALKAFI MT system to translate the titles, descriptions and narratives fields of topics TREC-7 and TREC-9.
The narrative fields as described in Tables 18 and 20, which represent the long queries, are not effective compared to the description fields, which represents the medium length queries. According to theses findings, the fewer terms provided in the original query that form a context to obtain unambiguous representation, the better running time as well as the better retrieval effectiveness. As presented in Tables 21 and 22, the total running time for the description and narrative runs of TREC-7 is 6194.79 and 1990.25 seconds, respectively. The running time of the narrative is 211% of the running time of the description. In fact, the difference of the running time degrades the performance of our CLIR system without any improvement on the effectiveness. These findings are consistent with TREC-9 topics and collection as presented in Tables 21 and 22.
The description runs perform 340% much more time compared to title runs of TREC-7 dataset. Accordingly, the achieved performance of the description run is more effective than the title run. Thus, choosing few terms that form a full context achieves better accuracy at the expense of efficiency, a trade-off whose merits are application dependant.
Similar findings exist for the TREC-9 queries.
As shown in Table 23, the MT system achieved 70.2% of the monolingual retrieval. The MT system is capable to preserve its accuracy since most of the titles of the Arabic topics are quite long to form a context.
The post-translation expansion technique improved the performance by 15.6%, the difference between the MT and MT+post is statistically significant at 98% confidence level. Table 25, describes the runs at lower level of recalls, up to 1000 retrieved documents. As sown, the MT with query expansion after translation consistently outperforms the MT approach without query expansion.
5. Conclusions Our results demonstrate the potential Arabic-English and English-Arabic CLIR.
Automatic dictionary translation is cost effective as compared to the other methods such as parallel corpus, and Latent Semantic Indexing (LSI). The resources needed are readily available. The ambiguity introduced by the Every-Match (EM) method yields poor effectiveness; it achieved roughly half of the performance of the monolingual retrieval.
The factor affecting this is the transfer of too many senses that are inappropriate to the source query.
It is common for a single word to have several translations, some with different senses.