FREE ELECTRONIC LIBRARY - Theses, dissertations, documentation

Pages:     | 1 |   ...   | 2 | 3 ||

«Abstract In Cross-Language Information Retrieval (CLIR), queries in one language retrieve relevant documents in other languages. Machine-Readable ...»

-- [ Page 4 ] --

To reduce the number of extraneous terms, the First-Match (FM) technique was evaluated for Arabic-English and English-Arabic. For Arabic-English CLIR, this approach achieved 68.9% and 64.7% of the titles of English only TREC topics 351-400 and TREC topics 451-500, respectively. The drawback of this method is that many terms that are related to the original queries may be ignored. Therefore, we proposed a new method for Arabic-English CLIR; it is called the Two-Phase method.

In the Two-Phase method, we ignore all the terms that do not retranslate to the original Arabic query term. This method achieved 71.5% and 69.0% of monolingual retrieval by using titles of TREC topics 351-400 and TREC topics 451-500, respectively. The TwoPhase method yields a 38% and 52% improvement over the Every-Match (EM) method of TREC topics 351-400 and TREC topics 451-500, respectively. It also yields a 4% and 7% improvement over the First-Match (FM) method of TREC topics 351-400 and TREC topics 451-500, respectively. We found that our TP results were statistically significant at greater than a 99% confidence interval over the EM for both TREC-7 and TREC-9. It achieved 86% and 89% over FM method for TREC-7 and TREC-9, respectively. In this study, we showed that eliminating unrelated terms by the Two-Phase method can significantly reduce the ambiguity associated with dictionary translation. We also conducted initial experiments with a commercial MT-based Arabic-English CLIR; we found its performance inferior to that of the FM and TP methods.

We also evaluated the MT-based Arabic-English CLIR; we found that the query length affects the performance of the MT system. The evaluation was conducted by using the ALKAFI system and two standard TREC collections and topics. To explore the effects of the context to the quality of translation, we experimented with various query lengths.

We studied the effects of using Al-Mutarjim Al-Arabey MT system and MRD for English-Arabic CLIR. The post-translation approach was used. We found that the query expansion after translation via PRF is consistently more effective for both MT and MRD approaches.

The experimental results indicate that the less source terms that are needed to form a context, the better is the retrieval accuracy and efficiency. However, the problem of semantics is perennial due to the complexities of the Arabic grammar. Without some level of semantic representation, MT systems are unable to achieve high quality translation, because they cannot differentiate between cases that are lexically and syntactically ambiguous. Accordingly, a well-formed source query makes the MT system able to provide its best accuracy.

A possible extension to our work is to expand the original source query using PRF for Arabic-English CLIR to emphasize the context of the source query and finding term threshold for the TP method. Another extension is to apply the Two-Phase method by using parallel corpus or a combination of MRD and parallel corpus.

6. References Abu-Salem, H., Al-Omari, M., Evens, M. (1999). Stemming Methodologies over Individual Query Words for an Arabic Information Retrieval System. JASIS 50(6): 524aDawliah Universal Electronics (1999), http://www.adawliah.com.sa/.

Adriani, M., and Croft, W. (1997). The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval. CLIR Technical Report IR-170, University of Massachusetts, Amherst.

Aljlayl, M., and Frieder, O. (2001).Effective Arabic-English Cross-Language Information Retrieval via Machine Readable Dictionaries and Machine Translation. ACM CIKM, pp.


Al-Kharashi, I., Evens, M. (1994). Comparing Words, Stems, and Roots as Index Terms in an Arabic Information Retrieval System. JASIS 45(8): 548-560.

ATA Software Technology Ltd., http://www.atasoft.com Ballesteros, L., and Croft, B. (1996). Dictionary Methods for Cross-Lingual Information Retrieval. In the Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications. pp. 791-801.

Ballesteros, L., and Croft, B. (1997). Phrasal Translation and Query Expansion Techniques for Cross-language Information Retrieval. SIGIR 1997, pp. 84-91.

Ballesteros, L., and Croft, B.(1998). Resolving Ambiguity for Cross-Language Retrieval.

SIGIR. pp. 64-71 Braschler, M., Peters, C. and Schuable, P (1999). Cross-Language Information Retrieval (CLIR) Track Overview, TREC-8 Proceedings.

Chowdhury, A., Beitzel, S.,Jensen, E., Sai-lee, M., Grossman, D., Frieder, O., McCabe C., Holmes, D. (2000)."IIT TREC-9 - Entity Based Feedback with Fusion", Proceedings of TREC-9, NIST, pp. 241-248.

Davis, M., and Dunning, T. (1995). Query Translation using Evolutionary Programming for Multilingual Information Retrieval. In Proceeding of the Fourth Annual Conference on Evolutionary Programming.

Dunning, T. and Davis, M. (1993). Multi-lingual information retrieval. Technical Report MCCS-93-252. Computing Research Laboratory, New Mexico State University.

Egyptian Demographic Center, (2000).


Hasnah, A. (1996). Full Text Processing and Retrieval: Weight Ranking, Text Structuring, and Passage Retrieval for Arabic Documents. Ph. D. Dissertation, Computer Science Department, Illinois Institute of Technology, Chicago, IL.

Hull, D. and Grefenstette, G. (1996). Querying Across Languages. A Dictionary-based Approach to Multilingual Information Retrieval. In proceedings of SIGIR, pp. 49-57.

Jones, G., Sakai, T., Collier, N., Kumano, K., Sumita, K.(1999). A Comparison of Query Translation Methods for English-Japanese Cross-Language Information Retrieval. SIGIR, pp. 269-270.

Kwok, K.L. (1999). English-Chinese Cross-Language Retrieval based on a Translation Package, Post-Conference Workshop on Machine Translation for Cross Language Information Retrieval at AAMT Machine Translation Summit VIII.

Landauer, T. K., and Littman, M. L. (1990). Full Automatic Cross-Language Document Retrieval using Latent Semantic Indexing. In Proceedings of the 6th Conference of UW center for New OED and Text Research, pp. 31-38.

Oard D. (1998). A Comparative Study of Query and Document Translation for CrossLanguage Information Retrieval. In Machine Translation and the Information Soup.

Third Conference of the Association for Machine Translation in the Americas, pp. 472Pirkola, A. (1998). The Effects of Query Structure and Dictionary Setups in a DictionaryBased Cross-Language Information Retrieval. ACM SIGIR. pp. 55-63.

Radwan, K., Fluhr, C. (1995). Textual Database Lexicon used as a Filter to Resolve Semantic Ambiguity Application on Multilingual Information Retrieval. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pp. 121-136.

Sheridan, P. and Ballerini, J.P.(1996). Experiments in Multilingual Information Retrieval using the SPIDER System. In Proceedings of ACM SIGIR. pp. 58-65.

Tayli, M., and Al-Salamah, A. (1990). Building Bilingual Microcomputer Systems. In Communications of the ACM, Vol. 33, No.5, Pages 495-505.

TREC, (2001). http://trec.nist.gov/act_part/tracks.html Wonnacott, R., Wonnacott, T. (1990). Introductory Statistics, John Wiley & Sons, Fourth Edition.

Xu, J. and Croft, W. B. (1996). Query Expansion using Local and Global Document Analysis. In Proceedings of ACM SIGIR, pp. 4-11.


Pages:     | 1 |   ...   | 2 | 3 ||

Similar works:

«A NACIONES UNIDAS Distr. Asamblea General GENERAL A/HRC/WG.6/2/KOR/1 9 de abril de 2008 ESPAÑOL Original: INGLÉS CONSEJO DE DERECHOS HUMANOS Grupo de Trabajo sobre el Examen Periódico Universal Segundo período de sesiones Ginebra, 5 a 16 de mayo de 2008 INFORME NACIONAL PRESENTADO DE CONFORMIDAD CON EL PÁRRAFO 15 a) DEL ANEXO DE LA RESOLUCIÓN 5/1 DEL CONSEJO DE DERECHOS HUMANOS* República de Corea * El presente documento no fue objeto de revisión editorial antes de ser enviado a los...»

«boxer precio boxer precio JAMIESONBOXERS.COM El sitio web de BOXERS más visitado de todo México. Adidas Adiclub Boxer | Natación Adidas Adiclub Boxer Hombre AJ8331 29,9 Euro Gran variedad Buenos precios 21run.com 21streetwear.com 21cycles.com Español Precio: 37,95 Marcas Peugeot Boxer, precio del catálogo y Catálogo vigente de Peugeot Boxer nuevos, informaciones tecnicas, equipamientos, precios de venta y cotizaciones en Argentina. Peugeot Boxer Coches de Alemania Peugeot Boxer Coches...»

«Aggregating Conditionally Lexicographic Preferences on Multi-Issue Domains J´ rˆ me Lang1, J´ rˆ me Mengin2, and Lirong Xia3 eo eo LAMSADE, Universit´ Paris-Dauphine, France, lang@lamsade.dauphine.fr e IRIT, Universit´ de Toulouse, France, mengin@irit.fr e SEAS, Harvard University, USA, lxia@seas.harvard.edu Abstract. One approach to voting on several interrelated issues consists in using a language for compact preference representation, from which the voters’ preferences are elicited...»

«1 BT GROUP PLC TRANSCRIPT FOR PREMIER LEAGUE RIGHTS ANNOUNCEMENT, ANALYST AND INVESTORS CONFERENCE CALL 13 June 2012 at 18:00 Company participants: Ian Livingston (IL) Marc Watson (MW) Other participants: Carl Murdock-Smith (JP Morgan Cazenove) (CMS) Wilton Fry (Bank of America Merrill Lynch) (WF) Maurice Patrick (Barclays Capital) (MP) Laurie Davison (Deutsche Bank) (LD) Nick Lyall (UBS) (NL) Nick Delfas (Morgan Stanley) (ND) Paul Sidney (Credit Suisse) (PS) Steve Malcolm (Arete Research) (SM)...»

«Policy Development and Reform Principles of Basic and Secondary Education in Finland since 1968 Erkki Aho, Kari Pitkänen and Pasi Sahlberg The Education Working Paper Series is produced by the Education Unit at the World Bank (HDNED). It provides an avenue for World Bank staff to publish and disseminate preliminary education findings to encourage discussion and exchange ideas within the World Bank and among the broader development community. Papers in this series are not formal World Bank...»

«Empirical Results of Two Sample Design Options for Sampling for the Count by C.T. Isaki, J.H. Tsay, and Y. Thibaudeau I. Introduction Sampling for the count or sample census is a scenario in which a sample of units is selected from a frame and a data collection procedure is used to obtain responses from those units in the sample. For units not in sample, estimates of their characteristics are provided based on sample information. Given the time constraints imposed, two sample designs were...»

«ONTARIO AUTO INSURANCE ANTI-FRAUD TASK FORCE INTERIM REPORT DECEMBER 2011 ONTARIO AUTO INSURANCE ANTI-FRAUD TASK FORCE INTERIM REPORT DECEMBER 2011 TABLE OF CONTENTS Introduction The Ontario Auto Insurance System and Its Evolution The Current System Insurance Costs and Rate Setting The Rate Approval Process The Evolution of the System: 1990 to 2003 The Evolution of the System: 2003 to 2010 The Evolution of the System: 2010 Forward Additional Measures in 2011 Assessment and Implications For Our...»

«Amicus Attorney Link Guide: PCLaw Applies to:  Amicus Attorney Premium Edition 2013 / 2012 / 2011 SP1 Contents About the Link What you need What information is exchanged Link setup checklist Preparing your data Special preparation prior to database conversion from an earlier version of Amicus Attorney General preparation Configuring the Link Welcome dialog of the wizard Step 1 Set the data source for the Link Step 2 Set the Link preferences Step 3 Map the common lists Step 4 Read about the...»


«Mutations and Genetic Variability 1. What is occurring in the diagram below? A. Sister chromatids are separating. B. Alleles are independently assorting. C. Genes are replicating. D. Segments of DNA are crossing over.2. The chart below shows the codons that make up the genetic code and the sequence of nucleotides that corresponds to them. A mistake during DNA replication leads to a mutation in the nucleotide sequence shown below. This mutation results from the insertion of two nucleotides into...»

«motorola power pack slim 2400/5100 Micro USB charging cable Cable de carga micro USB USB output to charge device Salida USB para cargar el dispositivo Micro USB input to charge power pack Entrada micro USB para cargar la unidad de alimentación Power/Charge button Status light Botón de Encendido/ Luz de estado Cargar For product safety information, visit: Para obtener información de seguridad del producto, visite: www.motorola.com/powerpacklegalguide Charge Your Power Pack Slim Cargue su...»

«PRINCIPAL’S PIECE 14 NOVEMBER 2012 Dear Parents and Caregivers This week’s newsletter is a large one with a great deal of information. Please take the time to read the details. We have published the Senior Prizegiving results so you can see who the successful students were. Nathan Faavae, an endurance race athlete was the guest speaker. He was a very engaging speaker, with a key message about setting challenging long term goals but also having short term steps to help achieve these goals....»

<<  HOME   |    CONTACTS
2016 www.theses.xlibx.info - Theses, dissertations, documentation

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.