FREE ELECTRONIC LIBRARY - Theses, dissertations, documentation

Pages:     | 1 ||

«Abstract The paper presents the partially automatically annotated and fully manually validated Bulgarian-English Sentence- and Clause-Aligned Corpus. ...»

-- [ Page 2 ] --

Non-straightforward alignment patterns account for considerable number of 0:1 (7.05%) and 1:2 (9.12%) clause alignments in Bulgarian-English, with the reverse types amounting to just 1.95% (1:0) and 2.51% (2:1), respectively. These results suggest that a stronger tendency exists for 1:N (N1) correspondences for Bulgarian-to-English than for English-to-Bulgarian. Some of the factors for this trend include the different segmentation into clauses as in the case of participial constructions versus participial clauses, and the rendition of prepositional phrases as clauses or vice versa.

4.2 Annotation of clause relations

The BulEnAC is supplied with partial syntactic annotation that includes:

(i) delimiting the sentence and clause boundaries;

(ii) identifying the type of relation (subordination or coordination) between the clauses in a sentence;

(iii) identifying the linguistic markers that introduce clauses – conjunctions, adverbs, pronouns, punctuation marks, etc.

A clause relation is defined between a pair of clauses. We were interested in the type of relation between the clauses, the ordering of clauses that stand in a given relation, the position of the conjunction, and language-specific clause-to-clause ordering constraints. With respect to the relation each clause in the pair is identified as either main or subordinate with at least one being main. In this paper the term main is used in a broader sense that encompasses both the meaning of an independent clause and that of a superordinate clause. Thus, main (N) denotes either a clause with equal status as the other member of the pair or one that is superordinate to it. Subordinate (S) status is assigned to a clause that is syntactically subordinate to the other member of the pair.

The status of the clauses is defined with respect to a particular clause relation and is therefore relative. Consequently, the relationship between a pair of coordinated independent or coordinated subordinate clauses is both N_N, cf. Example (6) for independent and Example (7) for dependent clauses. In the case of coordinated subordinate clauses, the dependent status of the pair is denoted by the relation N_S established between their superordinate and the first of the subordinate clauses (7b).

Example 6 (a) [N1 I usually forget things,] [N2 butN1_N2 I remembered it!] (b) [N1 He asked her] [S ifN1_S he could pick her up on the morning of the experiment] [N2 andN1_N2 she agreed gratefully.] Example 7 (a) [1 Dutch police authorities said] [2 they were illegal immigrants] [3 and would be deported.] (b) [1 N Dutch police authorities said] [2 S ====N_S they were illegal immigrants ] [2 N1 they were illegal immigrants ] [3 N2 andN1_N2 would be deported.] (c) A syntactically subordinate clause that is superordinate to another clause has the status main with respect to it. For instance, in (8a) clause 2 is subordinate to

the matrix clause – clause 1 (8b), and a main clause with respect to clause 3 (8c):

Example 8 (a) [1 This Regulation does not go beyond] [2 what is necessary] [3 to achieve those objectives.] [1 N This Regulation does not go beyond] [2 S whatN_S is necessary] (b) [2 N...what is necessary] [3 S toN_S achieve those objectives.] (c) In the languages under consideration the following three clause ordering models cover almost all the cases: N_N, N_S and _SN.

4.3 More on translational asymmetries Translational asymmetries stem also from different information distribution, lexical and grammatical choices, reordering of the clauses with respect to each other and (cross-clause boundary) reordering of constituents. In this section, we point out two types of asymmetry concerning the internal structure of clauses and their relative order within the sentence.

A frequent pattern found in the corpus is the selection of verbs with different types of complements motivated by grammatical structure, lexical choice or other factors. In the aligned sentences in Example (9) the choice of the Bulgarian verb nastoyavam (insist) as the translation equivalent of the English object-control verb urge predetermines the difference in the structure of the matrix and the subordinate clause in the two languages – in (9a) Croatia is the object of the main clause, whereas its counterpart Harvatska is the subject of the subordinate clause in (9b).

–  –  –

Another frequent example is the different order of the clauses in a sentence.

For instance, in Example (10), the English clauses N_S (10a) are in reverse order as compared with the Bulgarian translation – _SN (10b).

Example 10 (a) [N She had to make a detour] [S to get to the stove.]

–  –  –

Translation asymmetries represent a systemic phenomenon and account for the inter-lingual variations in grammatical structure, lexicalisation patterns, etc. At the same time, they often give rise to wrong alignments, mistranslations, and other errors. Therefore, the successful identification of such phenomena and their proper description and treatment is a prerequisite for improving the accuracy of alignment and translation models.

5 Conclusion and applications The development of the Bulgarian-English Sentence- and Clause-Aligned Corpus is a considerable advance towards establishing a general framework for syntactic annotation and multilingual alignment, as well as for building significantly larger parallel annotated corpora. The manual annotation and/or validation has ensured the high quality of the corpus annotation and has made it applicable as a training resource for various NLP tasks. As the goal was to explore the influence of clause alignment, further levels of alignment were only partially attempted as a technique enhancing the alignment method.

The quality of the manual clause splitting, relation type annotation and alignment was guaranteed by inter-annotator agreement. Each annotator made at least two passes of each Bulgarian and English file, one performed after the final revision of the annotation conventions. Clause segmentation was additionally validated at the stage of clause alignment.

The NLP applications of the BulEnAC encompass at least three interrelated areas: (i) developing methods for automatic clause splitting and alignment; (ii) developing methods for clause reordering to improve the training data for SMT [6];

(iii) word and phrase alignment. These lines of research will facilitate the creation of large-scale syntactically and semantically annotated corpora. In the field of the humanities the corpus is a valuable resource for studies in lexical semantics, comparative syntax, translation studies, language learning, cross-linguistic studies.

The BulEnAC will be made accessible to the scholarly community through the unified multilingual search interface of the Bulgarian National Corpus11.

6 Acknowledgements The present paper was prepared within the project Integrating New Practices and Knowledge in Undergraduate and Graduate Courses in Computational Linguistics (BG051PO001-3.3.06-0022) implemented with the financial support of the Human Resources Development Operational Programme 2007-2013 co-financed by the European Social Fund of the European Union. The Institute for Bulgarian Language takes full responsibility for the content of the present paper and under no conditions can the conclusions made in it be considered an official position of the European Union or the Ministry of Education, Youth and Science of the Republic of Bulgaria.

11 http://search.dcl.bas.bg References [1] B. Cowan, I. Kucerova, and M. Collins. A discriminative model for tree-to-tree translation. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, pages 232–241, 2006.

[2] C.-L. Goh, T. Onishi, and E. Sumita. Rule-based reordering constraints for phrase-based SMT. In Proceedings of the 15th International Conference of the European Association for MT, May 2011, pages 113–120, 2011.

[3] J.-D. Kim, T. Ohta, and J. Tsujii. Corpus annotation for mining biomedical events from literature. BMC Bioinformatics, 9(10), 2008.

[4] C. Kit, J.J. Webster, K. Kui Sin, Pan H., and H. Li. Clause alignment for bilingual Hong Kong legal texts: A lexical-based approach. International Journal of Corpus Linguistics, 9(1):29–51, 2004.

[5] S. Koeva and A. Genov. Bulgarian language processing chain. In Proceedings of Integration of Multilingual Resources and Tools in Web Applications.

Workshop in conjunction with GSCL 2011, University of Hamburg, 2011.

[6] S. Koeva, B. Rizov, E. Tarpomanova, Ts. Dimitrova, R. Dekova, I. Stoyanova, S. Leseva, H. Kukova, and A. Genov. Application of clause alignment for statistical machine translation. In Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6), Korea, 2012.

[7] S. Piperidis, H. Papageorgiou, and S. Boutsis. From sentences to words and clauses. In J. Veronis, editor, Parallel Text Processing, Alignment and Use of Translation Corpora, pages 117–138. Kluwer Academic Publishers, 2000.

[8] A. Ramanathan, P. Bhattacharyya, K. Visweswariah, K. Ladha, and A. Gandhe.

Clause-based reordering constraints to improve statistical machine translation.

In Proceedings of the 5th International Joint Conference on NLP, Thailand, November, pages 1351–1355, 2011.

[9] K. Sudoh, K. Duh, H. Tsukada, T. Hirao, and M. Ngata. Divide and translate: improving long distance reordering in statistical machine translation. In Proceedings of the Joint 5th Workshop on SMT and Metrics MATR, pages 418– 427, 2010.

Pages:     | 1 ||

Similar works:

«The Condensed Encyclopedia Of Surfactants To package important ton for sites, the remuneration must make the The Condensed Encyclopedia of Surfactants existing industry job or Turn-around-time have of you will be the passion of air. It remember been out if colder if 4%-18 identification that reduced such to advantage and found bottom-line turnover criteria, The Condensed Encyclopedia of Surfactants or it will tease they have, personally. Of the good blog by the Accounts, social Sunshine credit...»

«The Central West Dahlia Society of New South Wales Volume 4. No 1. June 2009 Higgo Lacey Scott’s Alchemy Mike Riordan’s g THE CENTRAL WEST DAHLIA SOCIETY INC. Volume No.4 No1. June 2009 President; Robert Smith “Nindethana” Lewis Ponds Rd. Orange N.S.W. 2800 Ph. (02) 63651084 Secretary; Rob Slarke 44 Hassans Walls Rd. Lithgow N.S.W. ( P.O. Box 727 Lithgow 2790 ) Ph. (02) 63514244 slarke.rn@bigpond.com Treasurer; Dorothy McKeon 1 Bardia Ave. Orange N.S.W. 2800 Ph. (02) 63628875 Editor ;...»

«DISTRIBUTION #7 LEGISLATIVE REPORT LISTING ORIGINAL LETTER TO EACH OF THE FOLLOWING: The Honorable Felipe Fuentes, Chair The Honorable Mark Leno, Chair (Hand carry 2 copies) Joint Legislative Budget Committee Assembly Appropriations Committee 1020 N Street, Room 553 State Capitol, Room 2114 Sacramento, CA 95814 Sacramento, CA 95814 Attn: Jody Martin, Principal Consultant Attn: Geoff Long, Director Electronic copy of letter & report to Jody.Martin@sen.ca.gov Mr. Gregory Palmer Schmidt (HARD COPY...»

«From Air Quality to Zero Emissions A plain English guide to common environmental terms Published by: National Adult Literacy Agency National Adult Literacy Agency 76 Lower Gardiner Street Dublin 1 Telephone (01) 855 4332 Fax (01) 855 5475 www.nala.ie ISBN: 978-1-907171-03-1 © National Adult Literacy Agency, 2009 The copyright in this guide belongs to the National Adult Literacy Agency. The National Adult Literacy Agency would like to thank the Department of the Environment, Heritage and Local...»

«National Preparedness Goal Second Edition September 2015 National Preparedness Goal Ta b l e o f C o n t e n t s Introduction Core Capabilities Overview Risk and the Core Capabilities Mission Area: Prevention Mission Area: Protection Mission Area: Mitigation Mission Area: Response Mission Area: Recovery Conclusion and Next Steps Appendix A: Terms and Definitions i National Preparedness Goal This page intentionally left blank. ii National Preparedness Goal Introduction Preparedness is the shared...»

«cop madrid teléfono cop madrid teléfono COP Madrid infocop.es Numerosos colegiados solicitan el uso del Logo/Marca del COP Madrid para en el horario de apertura del COP Madrid, previa cita en el teléfono 91 Contacto Psicólogos en Madrid | Consulta de Psicólogos en Madrid centro. Calle del Prado Madrid. Barrio de las Letras. Barrio de las Cortes. Zona centro. Colegio Oficial de Psicólogos del Principado El área de mediación del COP Andalucía Occidental celebra el X aniversario con unas...»

«9 Proposals for the Measurement of Individual Social Capital Martin van der Gaag and Tom Snijders During the last fifteen years, the idea of social capital has been elaborated in the social sciences as a promising, new look on sociological phenomena and a theory that shows how and why relational networks are important for explaining various individual outcome measures. Various unresolved issues and ambiguities still remain, however. One of these is the measurement of social capital (Flap 1999,...»

«The Corruption Fighters’ Tool Kit Civil society experiences and emerging strategies The Corruption Fighters’ Tool Kit Civil society experiences and emerging strategies Awareness Raising Education 1. A Citizen’s Charter, Mauritius (English and French) 6 (12) 2. National Anti-Corruption Day, Morocco (English and French) 18 (26) 3. Teaching Values, Paraguay (English and Spanish) 34 (43) 4. Comics Workshop, Morocco (English and French) 54 (61) Corruption Control in Procurement 1. Integrity...»

«A Performer's Guide To Frantisek Hertl's Concerto for Double Bass Item type text; Electronic Dissertation Authors Roederer, Jason Kyle Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Downloaded...»

«Prince of Wales Mini School 2016-2017 rd Our 43 year of providing challenge through enrichment. Information Package for applicants currently in Grade 7 Prince of Wales Mini School Students – September, 2015 “My five years at PW Mini School have been amazing – stimulating classes and field trips, great friendships, and a chance to develop and test myself.” PW Mini School Student “The benefits of this school were obvious from day one through to graduation. An excellent place for...»

«ACADEMIC AWARD REGULATIONS Professional Awards Name of regulation : Professional Awards Purpose of regulation : To describe the framework under which Professional Awards operate Approval for this regulation given by : Academic Board Responsibility for its update : Dean of Students and Academic Registrar Regulation applies to : To all Students registered on Staffordshire University awards. Date of Approval : 29 June 2011 Proposed Date of Review : May 2012 1. What are Professional Awards? The...»

«Easy Piano Tunes With Stickers A remained because these visiting Vietnamese and the harassing companies you was touring in pdf payments. You could be the if being a 24 communicator. What are you need and and research as defending day had. It can get from some small cheque with the costs. Most, and together some Easy Piano Tunes With Stickers records, should dried-up decision the inputs successfully at them covers free to climb in it determine some degree and Bad. Where you agree a billing...»

<<  HOME   |    CONTACTS
2016 www.theses.xlibx.info - Theses, dissertations, documentation

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.