الأسس النظرية والمنهجية للتحليل لوحدات التعابير المركبة الجامدة حسب الأسس الإحصائية لقواعد البيانات اللغوية


  • Dr. Mohamed A. Al-Husain Al-Basra University College of Arts




Formulaic Expression Units; Maximal Frequent Sequences; Frequency of Occurrence; Co-text, Theoretical and Methodological foundations; Corpus-based Statistical Analysis; INTRODUCTION


يتناول هذا البحث خاصية الجمود اللغوية للتعابير المركبة التي تترشح من مجموعة أسس ومبادئ عامة لها علاقة بالتركيب العقلي والتجسيد اللغوي وصيغ الاستخدام. يكمن فهم طبيعة وتدرج الجمود اللغوي في التوزيع التكراري لمثل هذه التعابير عند المستويات النحوية والدلالية والنصية. إضافة الى المقاربات الاستطرادية النوعية عند دراسة للتعابير المركبة الجاهزة لابد من التوجه لمقاربات كمية إحصائية لدراسة التوزيع التكراري قابلة لبرهنة ثبوت الجمود اللغوي للتعابير و مقياس حالة الجمود. لتحصيل هذا الهدف يمكن الاعتماد على الأطر النظرية للدلالة التوزيعية و الدلالة الادراكية لشرح النتائج التحليلية. التحليل الاحصائي الكمي يحتاج الى قاعدة بيانات لغوية للتكون مجال للبحث و الحصول على نتائج توزيع تكراري تسجل عدد تكرار استخدام المستخدمين للغة لمثل هكذا تعابير و بنفس الصيغة في نصوص معينة او مجالات معرفية معينة. يكمن اسهام البحث في تكوين الأسس النظرية لتكوين نموذج احصائي يمكن استخدامه لبرهنة ثبوت الجمود اللغوي للتعابير و مقياس حالة الجمود واقتراح استخدام نموذج الحدود القصوى لتكرار الاستخدام كنموذج قابل للتطبيق لبلوغ اهداف البحث.

الكلمات المفتاحية: وحدات التعابير الجامدة، التسلسل المتكرر الأقصى؛ معدل التكرار، السياق اللفظي ، أسس نظرية ومنهجية ، التحليل الإحصائي ، قاعدة البيانات اللغوية



Abuata, Belal & Sembok, Tengku & Bakar, Zainab. (2011). A Rule-Based Arabic Stemming Algorithm. Proceedings of the European Computing Conference, ECC '11.

Al-Kharashi, I.A. & Evens, M.W. 1994. Comparing words, Stems, and Roots as Index Terms in an Arabic Information Retrieval System. Journal of the American Society for Information Science. 45(8): 548-560.

Al-Omari, H. 1994. ALMAS: An Arabic Language Morphological Analyzer System. National University of Malaysia. Bangi,Selangor.

Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent word-combinations. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and applications (pp. 101–22). Oxford, England: Clarendon Press.

Arnon, Inbal & Snider, Neal (2010) More than words: Frequency effects for multi-word phrases, Journal of Memory and Language,Volume 62, Issue 1,Pages 67-82,

Baayen, Hr (2008): Analyzing Linguistic Data . Cambridge: Cambridge University Press.

Baeza Y. R., Ribeiro N.B. “Modern Information Retrieval”. Ed. Pearson Addison Wesley, ACM Press New York. 1999

Biber, D. - Conrad, S. - Reppen, R. (1998): Corpus Linguistics. Investigating Language Structure and Use . Cambridge: Cambridge University Press.

Brown, Roger (1973). A First Language: The Early Stages. Ma.:Harvard University Press

Bybee Joan, (2002), “Phonological evidence for exemplar storage of multiword sequences”, Studies in Second Language Acquisition, 24, 215-221.

Bybee, Joan. 2003. Cognitive processes in grammaticalization. The New Psychology of Language 2. 145–167.

Bybee, Joan & de Souza, Ricardo Napoleão. (2021). Predictability and prefab status: The case of adjective + noun sequences in English. In Aleksandar Trklja & Łukasz Grabowski (eds.), Formulaic language: Theories and methods. Berlin: Language Science Press, 3–30.

Carter, R. 1998 (2nd edition). Vocabulary: Applied Linguistic Perspectives.London: Routledge.

Cowie, A. P. (1992.), Phraseology: Theory, analysis and applications. Oxford, England: Clarendon Press.

Croft (2000). Explaining Language Change: An Evolutionary Approach.

Diewald, G (2011).Pragmaticalization (defined) as grammaticalization of discourse functions. Linguistics 49–2 (2011), 365–390

Evans, V., & Green, M. (2006). Cognitive linguistics: An introduction. Lawrence Erlbaum Associates Publishers.

Erman, Britt. (2007). Cognitive processes as evidence of the idiom principle. International Journal of Corpus Linguistics, 12(1):25–53

Erman, B & Kotsinas, U. B. (1993). Pragmaticalization: The case of ba’ and you know. In: J. Falk, K. Jonasson, G. Melchers, & B. Nilsson (Eds.), Stockholm Studies in Modern Philology (Vol. 10, pp. 76–93). Stockholm: Almqvist & Wiksell International.

Erman, Britt And Warren, Beatrice (2000). "The idiom principle and the open choice principle" Text & Talk, vol. 20, no. 1, , pp. 29-62.

Fox Tree, Jean E & Schrock, Josef C. (2002) Basic meanings of you know and I mean. Journal of Pragmatics,Volume 34, Issue 6,2002,Pages 727-747,

Gläser, Rosemarie. (1998). The stylistic potential of phraseological units in the light of genre analysis. In Anthony Paul Cowie (ed.), Phraseology: Theory, analysis, and applications (pp. 125-143). Oxford: Oxford University Press.

Goldberg, Adele E. (2015) “Compositionality”. In Riemer, Nick (ed.) (2015). Routledge Handbook of Semantics. Routledge, pp 419-433

Goodkind, Adam & Rosenberg, Andrew (2015). Muddying The Multiword Expression Waters: How Cognitive Demand Affects Multiword Expression Production. Proceedings of NAACL-HLT, pages 87–95,

Hakuta, K. (1974), Prefabricated Patterns And The Emergence Of Structure In Second Language Acquisition. Language Learning, 24: 287-297.

Harris, Zellig Sabbatai (1954) “Distributional structure”. Word. Journal of the linguistic circle of New York. 10, 2–3, 146–162

Heine, Bernd. 2002. On the role of context in grammaticalization. In Ilse Wischer & Gabriele Diewald (eds.), New reflections on grammaticalization, 83–101. Amsterdam: John Benjamins.

Narrog, Heiko & Heine, Bernd (2011) The Oxford handbook on grammaticalization. Oxford: Oxford Univerity Press

Hickey, Tina. (1993). Identifying formulas in first language acquisition. Journal of Child Language, 20(01):27–41.

Hopper, Paul J. & Traugott, Elizabeth Closs (1993) Grammaticalization. Cambridge: Cambridge University Press,. Pp. xxi 256.

Hyland, K. (2008). As can be seen: lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4-21.

Kaur J., Gupta V. Effective approaches for extraction of keywords // International Journal of Computer Science Issues (IJCSI). 2010. Т. 7. № 6. С. 144.

Korenius T. et al. Stemming and lemmatization in the clustering of Finnish text documents // Proceedings of the thirteenth ACM international conference on Information and knowledge management. 2004. С. 625–633.

Kowalski G. (1997). “Information Retrieval Systems Theory and Implementation”. Press Kluwer Academic Publisher..

Lamiroy, Béatrice. (2016). For a typology of phraseological expressions: how to tell an idiom from a collocation? In: Orlandi, Adriana & Giacomini, Laura (eds.) Defining collocation for lexicographic purposes. From linguistic theory to lexicographic practice. Bern: P. Lang.

Lee, Chungmin. (1993). Frozen Expressions and Semantic Representation. the Language Research Institute, Seoul National University

Lestrade, Sander. (2017). Unzipping Zipf’s law. PLoS ONE. 12. 10.1371/journal.pone.0181987.

Lewis, M. (2002) [1997]. Implementing the Lexical Approach. Boston: Thomson Heinle.

McEnery, A., Xiao, R., and Tono, Y. (2006) Corpus-Based Language Studies: An Advanced Resource Book.London, U.K.: Routledge.

Marcel, A. J. (1983). Conscious and unconscious perception: An approach to the relations between phenomenal experience and perceptual processes. Cognitive Psychology, 15(2), 238–300.

Miller GA(1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review.;63:81–97.

Moon R., (1998), Fixed Expressions and Idioms in English: A Corpus-based Approach, Oxford: University Press, Oxford

Nattinger, James R. & DeCarrico, Jeanette S.. (1992). Lexical Phrases and Language Teaching. Studies in Second Language Acquisition, 16(2), 254-254.

Nie J. Y. (2000) “On the use of words and N-grams for Chinese Information Retrieval”. IRAL-2000, Fifth International Workshop on Information Retrieval with Asian Languages. Institute of System Enginnering, pp.141–148. Hong Kong, China.

Pawley, Andrew & Syder, Frances Hodgetts (1983).Two puzzles for linguistic theory: nativelike selection and nativelike fluency. Language and communication, 191, 225.

Ramos J. et al (2003). Using TF-IDF to determine word relevance in document queries //Proceedings of the first instructional conference on machine learning..Т. 242. С. 133–142.

Rieger, B. B. (1991). On Distributed Representation in Word Semantics. [ICSI-Technical Report TR-91-012], International Computer Science Institute, Berkeley, CA,

Schmit, Norbert (2000). Lexical chunks, ELT Journal, Volume 54, Issue 4, Pages 400–401.

Sinclair, J. M. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.

Siyanova-Chanturia, A. (2015). On the ‘holistic’ nature of formulaic language. Corpus Linguistics and Linguistic Theory, 11(2), 285-301.

Steyn, Sunee & Jaroongkhongdach, Woravut (2016).Formulaic Sequences Used by Native English Speaking Teachers in a Thai Primary School. PASAA Volume 52,105-132.

Teubert - R. Krishnamurthy (2007) Corpus Linguistics. Critical Concepts in Linguistics (vol. I) . London / New York: Routledge, pp. 93–118.

Titone, D. A. & Connine, C. M. (1999). On The Compositional and Noncompositional Nature of Idiomatic Expressions. Journal of Pragmatics, 31, 1655-1674.

Tognini-Bonelli, E. (2001): The Corpus-driven Approach. In: Corpus Linguistics at Work. Amsterdam: John Benjamins, s. 84–100.

Traugott, Elizabeth (1995)"Subjectification in grammaticalization", in Dieter Stein and Susan Wright, eds., Subjectivity and Subjectivisation. Cambridge: Cambridge University Press, 37-54.

Traugott, Elizabeth. (2009). ”Lexicalization and grammaticalization”, “Subjectification, intersubjectification, and grammaticalization”, Studies in Historical Linguistics 2: 241-271.

Van Lancker Sidtis, Diana & Kempler, Daniel. (1987). Comprehension of Familiar Phrases by Left but not Right Hemisphere Damaged Patients. Brain and Language. 32. 265-277.

Wartena C., Brussee R., Slakhorst W. (2010). Keyword extraction using word cooccurrence/ Workshops on Database and Expert Systems Applications.IEEE, 2010. С. 54–58.

Wettler, M., Rapp, R., & Sedlmeier, P. (2005). Free Word Associations Correspond to Contiguities between Words in Texts. Journal of Quantitative Linguistics, 12 (2–3),111–122.

Wray, A. (2003). Formulaic Language and the Lexicon. Journal of Pragmatics. 35. 10.1016/S0378-2166(03)00079-1.

Wray, A. (2008). Formulaic Language: Pushing the boundaries. Oxford Applied Linguistics. Oxford: Oxford University Press. ISBN 978-0-19-442245-1. 305 pp.. 103.

Wray, A. & Perkins, M. (2000). The functions of formulaic language: An integrated model. Language & Communication - LANG COMMUN. 20. 1-28.

Zhang, Ling & Lu, Ping. (2017). Lexical Chunks Formulaic Sequences and Yukuai: Study of Terms and Definitions of English Multiword Units. English Language and Literature Studies. 7. 74.






west languages

How to Cite

Mohamed A. Al-Husain, D. (2022). الأسس النظرية والمنهجية للتحليل لوحدات التعابير المركبة الجامدة حسب الأسس الإحصائية لقواعد البيانات اللغوية. Lark, 14(1), 1177-1151. https://doi.org/10.31185/lark.Vol2.Iss45.2312