THEORETICAL & METHODOLOGICAL FOUNDATIONS FOR CORPUS-BASED ANALYSIS OF FORMULAIC EXPRESSION UNITS

المؤلفون

  • د. محمد عبدالزهرة عريبي الحسين جامعة البصرة/كلية الاداب/قسمالترجمة

DOI:

https://doi.org/10.31185/lark.Vol2.Iss45.2312

الكلمات المفتاحية:

وحدات التعابير الجامدة، التسلسل المتكرر الأقصى؛ معدل التكرار، السياق اللفظي ، أسس نظرية ومنهجية ، التحليل الإحصائي ، قاعدة البيانات اللغوية

الملخص

The description of the formulaic status of the linguistic behaviuor of the formulaic expression units (FEUs) arises from a set of universal principles underlying the mental organization and representation of language and conventionalized patterns of language use. The nature and gradeability of formulaicity can be clued-up by the statistical distribution of such units at all levels of linguistic analysis including the syntactic, semantic, and discourse levels. In addition to the existing discursive approach, the FEUs’ formulaic status must be quantitively approached and verified by a corpus-based statistical analysis of distributional frequency. Appropriate theoretical frameworks, including distributional semantics and cognitive semantics, should undoubtedly unveil formalized semantic and cognitive parameters which could better fit for the distribution frequency statistical analysis of the linguistic data. The corpus-based method allows retrieving sets of expression units to determine their formulaic status based on the frequency of occurrences in documents/domains collection. In this research, the model of corpus-based statistical analysis of distribution, proposed here, adopts two query-based information retrieval methods: (i) n-gram corpus Maximum Frequent Sequences (n-gram-MFS)  for the representation for weighing FEUs’ formulaic status per n-gram corpus and (ii) Maximum Frequent Sequences (D-MFS) for weighing FEUs’ formulaic status per document/domain. The use of the proposed model offers a syetematic verification tool  to weigh and evaluate formulaicity status of FEUs.

المراجع

REFERENCES

Abuata, Belal & Sembok, Tengku & Bakar, Zainab. (2011). A Rule-Based Arabic Stemming Algorithm. Proceedings of the European Computing Conference, ECC '11.

Al-Kharashi, I.A. & Evens, M.W. 1994. Comparing words, Stems, and Roots as Index Terms in an Arabic Information Retrieval System. Journal of the American Society for Information Science. 45(8): 548-560.

Al-Omari, H. 1994. ALMAS: An Arabic Language Morphological Analyzer System. National University of Malaysia. Bangi,Selangor.

Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent word-combinations. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and applications (pp. 101–22). Oxford, England: Clarendon Press.

Arnon, Inbal & Snider, Neal (2010) More than words: Frequency effects for multi-word phrases, Journal of Memory and Language,Volume 62, Issue 1,Pages 67-82,

Baayen, Hr (2008): Analyzing Linguistic Data . Cambridge: Cambridge University Press.

Baeza Y. R., Ribeiro N.B. “Modern Information Retrieval”. Ed. Pearson Addison Wesley, ACM Press New York. 1999

Biber, D. - Conrad, S. - Reppen, R. (1998): Corpus Linguistics. Investigating Language Structure and Use . Cambridge: Cambridge University Press.

Brown, Roger (1973). A First Language: The Early Stages. Ma.:Harvard University Press

Bybee Joan, (2002), “Phonological evidence for exemplar storage of multiword sequences”, Studies in Second Language Acquisition, 24, 215-221.

Bybee, Joan. 2003. Cognitive processes in grammaticalization. The New Psychology of Language 2. 145–167.

Bybee, Joan & de Souza, Ricardo Napoleão. (2021). Predictability and prefab status: The case of adjective + noun sequences in English. In Aleksandar Trklja & Łukasz Grabowski (eds.), Formulaic language: Theories and methods. Berlin: Language Science Press, 3–30.

Carter, R. 1998 (2nd edition). Vocabulary: Applied Linguistic Perspectives.London: Routledge.

Cowie, A. P. (1992.), Phraseology: Theory, analysis and applications. Oxford, England: Clarendon Press.

Croft (2000). Explaining Language Change: An Evolutionary Approach.

Diewald, G (2011).Pragmaticalization (defined) as grammaticalization of discourse functions. Linguistics 49–2 (2011), 365–390

Evans, V., & Green, M. (2006). Cognitive linguistics: An introduction. Lawrence Erlbaum Associates Publishers.

Erman, Britt. (2007). Cognitive processes as evidence of the idiom principle. International Journal of Corpus Linguistics, 12(1):25–53

Erman, B & Kotsinas, U. B. (1993). Pragmaticalization: The case of ba’ and you know. In: J. Falk, K. Jonasson, G. Melchers, & B. Nilsson (Eds.), Stockholm Studies in Modern Philology (Vol. 10, pp. 76–93). Stockholm: Almqvist & Wiksell International.

Erman, Britt And Warren, Beatrice (2000). "The idiom principle and the open choice principle" Text & Talk, vol. 20, no. 1, , pp. 29-62.

Fox Tree, Jean E & Schrock, Josef C. (2002) Basic meanings of you know and I mean. Journal of Pragmatics,Volume 34, Issue 6,2002,Pages 727-747,

Gläser, Rosemarie. (1998). The stylistic potential of phraseological units in the light of genre analysis. In Anthony Paul Cowie (ed.), Phraseology: Theory, analysis, and applications (pp. 125-143). Oxford: Oxford University Press.

Goldberg, Adele E. (2015) “Compositionality”. In Riemer, Nick (ed.) (2015). Routledge Handbook of Semantics. Routledge, pp 419-433

Goodkind, Adam & Rosenberg, Andrew (2015). Muddying The Multiword Expression Waters: How Cognitive Demand Affects Multiword Expression Production. Proceedings of NAACL-HLT, pages 87–95,

Hakuta, K. (1974), Prefabricated Patterns And The Emergence Of Structure In Second Language Acquisition. Language Learning, 24: 287-297.

Harris, Zellig Sabbatai (1954) “Distributional structure”. Word. Journal of the linguistic circle of New York. 10, 2–3, 146–162

Heine, Bernd. 2002. On the role of context in grammaticalization. In Ilse Wischer & Gabriele Diewald (eds.), New reflections on grammaticalization, 83–101. Amsterdam: John Benjamins.

Narrog, Heiko & Heine, Bernd (2011) The Oxford handbook on grammaticalization. Oxford: Oxford Univerity Press

Hickey, Tina. (1993). Identifying formulas in first language acquisition. Journal of Child Language, 20(01):27–41.

Hopper, Paul J. & Traugott, Elizabeth Closs (1993) Grammaticalization. Cambridge: Cambridge University Press,. Pp. xxi 256.

Hyland, K. (2008). As can be seen: lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4-21.

Kaur J., Gupta V. Effective approaches for extraction of keywords // International Journal of Computer Science Issues (IJCSI). 2010. Т. 7. № 6. С. 144.

Korenius T. et al. Stemming and lemmatization in the clustering of Finnish text documents // Proceedings of the thirteenth ACM international conference on Information and knowledge management. 2004. С. 625–633.

Kowalski G. (1997). “Information Retrieval Systems Theory and Implementation”. Press Kluwer Academic Publisher..

Lamiroy, Béatrice. (2016). For a typology of phraseological expressions: how to tell an idiom from a collocation? In: Orlandi, Adriana & Giacomini, Laura (eds.) Defining collocation for lexicographic purposes. From linguistic theory to lexicographic practice. Bern: P. Lang.

Lee, Chungmin. (1993). Frozen Expressions and Semantic Representation. the Language Research Institute, Seoul National University

Lestrade, Sander. (2017). Unzipping Zipf’s law. PLoS ONE. 12. 10.1371/journal.pone.0181987.

Lewis, M. (2002) [1997]. Implementing the Lexical Approach. Boston: Thomson Heinle.

McEnery, A., Xiao, R., and Tono, Y. (2006) Corpus-Based Language Studies: An Advanced Resource Book.London, U.K.: Routledge.

Marcel, A. J. (1983). Conscious and unconscious perception: An approach to the relations between phenomenal experience and perceptual processes. Cognitive Psychology, 15(2), 238–300.

Miller GA(1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review.;63:81–97.

Moon R., (1998), Fixed Expressions and Idioms in English: A Corpus-based Approach, Oxford: University Press, Oxford

Nattinger, James R. & DeCarrico, Jeanette S.. (1992). Lexical Phrases and Language Teaching. Studies in Second Language Acquisition, 16(2), 254-254.

Nie J. Y. (2000) “On the use of words and N-grams for Chinese Information Retrieval”. IRAL-2000, Fifth International Workshop on Information Retrieval with Asian Languages. Institute of System Enginnering, pp.141–148. Hong Kong, China.

Pawley, Andrew & Syder, Frances Hodgetts (1983).Two puzzles for linguistic theory: nativelike selection and nativelike fluency. Language and communication, 191, 225.

Ramos J. et al (2003). Using TF-IDF to determine word relevance in document queries //Proceedings of the first instructional conference on machine learning..Т. 242. С. 133–142.

Rieger, B. B. (1991). On Distributed Representation in Word Semantics. [ICSI-Technical Report TR-91-012], International Computer Science Institute, Berkeley, CA,

Schmit, Norbert (2000). Lexical chunks, ELT Journal, Volume 54, Issue 4, Pages 400–401.

Sinclair, J. M. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.

Siyanova-Chanturia, A. (2015). On the ‘holistic’ nature of formulaic language. Corpus Linguistics and Linguistic Theory, 11(2), 285-301.

Steyn, Sunee & Jaroongkhongdach, Woravut (2016).Formulaic Sequences Used by Native English Speaking Teachers in a Thai Primary School. PASAA Volume 52,105-132.

Teubert - R. Krishnamurthy (2007) Corpus Linguistics. Critical Concepts in Linguistics (vol. I) . London / New York: Routledge, pp. 93–118.

Titone, D. A. & Connine, C. M. (1999). On The Compositional and Noncompositional Nature of Idiomatic Expressions. Journal of Pragmatics, 31, 1655-1674.

Tognini-Bonelli, E. (2001): The Corpus-driven Approach. In: Corpus Linguistics at Work. Amsterdam: John Benjamins, s. 84–100.

Traugott, Elizabeth (1995)"Subjectification in grammaticalization", in Dieter Stein and Susan Wright, eds., Subjectivity and Subjectivisation. Cambridge: Cambridge University Press, 37-54.

Traugott, Elizabeth. (2009). ”Lexicalization and grammaticalization”, “Subjectification, intersubjectification, and grammaticalization”, Studies in Historical Linguistics 2: 241-271.

Van Lancker Sidtis, Diana & Kempler, Daniel. (1987). Comprehension of Familiar Phrases by Left but not Right Hemisphere Damaged Patients. Brain and Language. 32. 265-277.

Wartena C., Brussee R., Slakhorst W. (2010). Keyword extraction using word cooccurrence/ Workshops on Database and Expert Systems Applications.IEEE, 2010. С. 54–58.

Wettler, M., Rapp, R., & Sedlmeier, P. (2005). Free Word Associations Correspond to Contiguities between Words in Texts. Journal of Quantitative Linguistics, 12 (2–3),111–122.

Wray, A. (2003). Formulaic Language and the Lexicon. Journal of Pragmatics. 35. 10.1016/S0378-2166(03)00079-1.

Wray, A. (2008). Formulaic Language: Pushing the boundaries. Oxford Applied Linguistics. Oxford: Oxford University Press. ISBN 978-0-19-442245-1. 305 pp.. 103.

Wray, A. & Perkins, M. (2000). The functions of formulaic language: An integrated model. Language & Communication - LANG COMMUN. 20. 1-28.

Zhang, Ling & Lu, Ping. (2017). Lexical Chunks Formulaic Sequences and Yukuai: Study of Terms and Definitions of English Multiword Units. English Language and Literature Studies. 7. 74.

التنزيلات

منشور

2022-03-31

إصدار

القسم

اللغات الغربية

كيفية الاقتباس

محمد عبدالزهرة عريبي الحسين د. . (2022). THEORETICAL & METHODOLOGICAL FOUNDATIONS FOR CORPUS-BASED ANALYSIS OF FORMULAIC EXPRESSION UNITS. لارك, 14(1), 1177-1151. https://doi.org/10.31185/lark.Vol2.Iss45.2312