«Abstract In Cross-Language Information Retrieval (CLIR), queries in one language retrieve relevant documents in other languages. Machine-Readable ...»
To reduce the number of extraneous terms, the First-Match (FM) technique was evaluated for Arabic-English and English-Arabic. For Arabic-English CLIR, this approach achieved 68.9% and 64.7% of the titles of English only TREC topics 351-400 and TREC topics 451-500, respectively. The drawback of this method is that many terms that are related to the original queries may be ignored. Therefore, we proposed a new method for Arabic-English CLIR; it is called the Two-Phase method.
In the Two-Phase method, we ignore all the terms that do not retranslate to the original Arabic query term. This method achieved 71.5% and 69.0% of monolingual retrieval by using titles of TREC topics 351-400 and TREC topics 451-500, respectively. The TwoPhase method yields a 38% and 52% improvement over the Every-Match (EM) method of TREC topics 351-400 and TREC topics 451-500, respectively. It also yields a 4% and 7% improvement over the First-Match (FM) method of TREC topics 351-400 and TREC topics 451-500, respectively. We found that our TP results were statistically significant at greater than a 99% confidence interval over the EM for both TREC-7 and TREC-9. It achieved 86% and 89% over FM method for TREC-7 and TREC-9, respectively. In this study, we showed that eliminating unrelated terms by the Two-Phase method can significantly reduce the ambiguity associated with dictionary translation. We also conducted initial experiments with a commercial MT-based Arabic-English CLIR; we found its performance inferior to that of the FM and TP methods.
We also evaluated the MT-based Arabic-English CLIR; we found that the query length affects the performance of the MT system. The evaluation was conducted by using the ALKAFI system and two standard TREC collections and topics. To explore the effects of the context to the quality of translation, we experimented with various query lengths.
We studied the effects of using Al-Mutarjim Al-Arabey MT system and MRD for English-Arabic CLIR. The post-translation approach was used. We found that the query expansion after translation via PRF is consistently more effective for both MT and MRD approaches.
The experimental results indicate that the less source terms that are needed to form a context, the better is the retrieval accuracy and efficiency. However, the problem of semantics is perennial due to the complexities of the Arabic grammar. Without some level of semantic representation, MT systems are unable to achieve high quality translation, because they cannot differentiate between cases that are lexically and syntactically ambiguous. Accordingly, a well-formed source query makes the MT system able to provide its best accuracy.
A possible extension to our work is to expand the original source query using PRF for Arabic-English CLIR to emphasize the context of the source query and finding term threshold for the TP method. Another extension is to apply the Two-Phase method by using parallel corpus or a combination of MRD and parallel corpus.
6. References Abu-Salem, H., Al-Omari, M., Evens, M. (1999). Stemming Methodologies over Individual Query Words for an Arabic Information Retrieval System. JASIS 50(6): 524aDawliah Universal Electronics (1999), http://www.adawliah.com.sa/.
Adriani, M., and Croft, W. (1997). The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval. CLIR Technical Report IR-170, University of Massachusetts, Amherst.
Aljlayl, M., and Frieder, O. (2001).Effective Arabic-English Cross-Language Information Retrieval via Machine Readable Dictionaries and Machine Translation. ACM CIKM, pp.
Al-Kharashi, I., Evens, M. (1994). Comparing Words, Stems, and Roots as Index Terms in an Arabic Information Retrieval System. JASIS 45(8): 548-560.
ATA Software Technology Ltd., http://www.atasoft.com Ballesteros, L., and Croft, B. (1996). Dictionary Methods for Cross-Lingual Information Retrieval. In the Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications. pp. 791-801.
Ballesteros, L., and Croft, B. (1997). Phrasal Translation and Query Expansion Techniques for Cross-language Information Retrieval. SIGIR 1997, pp. 84-91.
Ballesteros, L., and Croft, B.(1998). Resolving Ambiguity for Cross-Language Retrieval.
SIGIR. pp. 64-71 Braschler, M., Peters, C. and Schuable, P (1999). Cross-Language Information Retrieval (CLIR) Track Overview, TREC-8 Proceedings.
Chowdhury, A., Beitzel, S.,Jensen, E., Sai-lee, M., Grossman, D., Frieder, O., McCabe C., Holmes, D. (2000)."IIT TREC-9 - Entity Based Feedback with Fusion", Proceedings of TREC-9, NIST, pp. 241-248.
Davis, M., and Dunning, T. (1995). Query Translation using Evolutionary Programming for Multilingual Information Retrieval. In Proceeding of the Fourth Annual Conference on Evolutionary Programming.
Dunning, T. and Davis, M. (1993). Multi-lingual information retrieval. Technical Report MCCS-93-252. Computing Research Laboratory, New Mexico State University.
Egyptian Demographic Center, (2000).
Hasnah, A. (1996). Full Text Processing and Retrieval: Weight Ranking, Text Structuring, and Passage Retrieval for Arabic Documents. Ph. D. Dissertation, Computer Science Department, Illinois Institute of Technology, Chicago, IL.
Hull, D. and Grefenstette, G. (1996). Querying Across Languages. A Dictionary-based Approach to Multilingual Information Retrieval. In proceedings of SIGIR, pp. 49-57.
Jones, G., Sakai, T., Collier, N., Kumano, K., Sumita, K.(1999). A Comparison of Query Translation Methods for English-Japanese Cross-Language Information Retrieval. SIGIR, pp. 269-270.
Kwok, K.L. (1999). English-Chinese Cross-Language Retrieval based on a Translation Package, Post-Conference Workshop on Machine Translation for Cross Language Information Retrieval at AAMT Machine Translation Summit VIII.
Landauer, T. K., and Littman, M. L. (1990). Full Automatic Cross-Language Document Retrieval using Latent Semantic Indexing. In Proceedings of the 6th Conference of UW center for New OED and Text Research, pp. 31-38.
Oard D. (1998). A Comparative Study of Query and Document Translation for CrossLanguage Information Retrieval. In Machine Translation and the Information Soup.
Third Conference of the Association for Machine Translation in the Americas, pp. 472Pirkola, A. (1998). The Effects of Query Structure and Dictionary Setups in a DictionaryBased Cross-Language Information Retrieval. ACM SIGIR. pp. 55-63.
Radwan, K., Fluhr, C. (1995). Textual Database Lexicon used as a Filter to Resolve Semantic Ambiguity Application on Multilingual Information Retrieval. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pp. 121-136.
Sheridan, P. and Ballerini, J.P.(1996). Experiments in Multilingual Information Retrieval using the SPIDER System. In Proceedings of ACM SIGIR. pp. 58-65.
Tayli, M., and Al-Salamah, A. (1990). Building Bilingual Microcomputer Systems. In Communications of the ACM, Vol. 33, No.5, Pages 495-505.
TREC, (2001). http://trec.nist.gov/act_part/tracks.html Wonnacott, R., Wonnacott, T. (1990). Introductory Statistics, John Wiley & Sons, Fourth Edition.
Xu, J. and Croft, W. B. (1996). Query Expansion using Local and Global Document Analysis. In Proceedings of ACM SIGIR, pp. 4-11.