Wednesday, December 10, 2025

Morphology in Computational Linguistics: Ruslan Mitkov's Oxford Handbook of Computational Linguistics

 Morphology in Computational Linguistics


1) Introduction

Ruslan Mitkov a professor of Computing and Communications at Lancaster University has written a book ‘Oxford Handbook of Computational Linguistics’ wherein he has discussed how morphology, syntax, semantics, pragmatics can be applied in NLP in Computational Linguistics.

Morphology is the study of the internal structure of words and the meaningful units that compose them. These units, called morphemes, may be roots, prefixes, suffixes, or grammatical markers that modify meaning or function.

1. Root (Base word):

·        teach → the core meaning is “to instruct.”

2. Prefix:

·        un + happy → unhappy (prefix un- adds the meaning “not”).

3. Suffix:

·        quick + -ly → quickly (suffix -ly changes an adjective into an adverb).

4. Grammatical marker (inflection):

·        walk + -ed → walked (suffix -ed marks past tense).

In computational linguistics, the knowledge of morphology becomes crucial because computers must not only process whole words but also understand how words are formed.

Computational morphology applies techniques of computer science, algorithms, linguistics, and artificial intelligence to automatically analyse (break words into components) and generate (construct surface words from grammatical features) in natural languages. Let’s take an example:

  A computational system takes the word “unhappiness” and automatically analyses it as:

·        un- (prefix meaning “not”)

·        happy (root)

·        -ness (suffix forming a noun)

   

   The same system can also generate a correct surface word. For example, given the features:

·        ROOT: happy

·        PREFIX: un

·        SUFFIX: ness

It will automatically construct the word “unhappiness.

 

This is essential for tasks like machine translation, spell checkers, search engines, speech recognition, document indexing, corpus annotation, and text-to-speech software.

Languages differ greatly in morphology. Isolating languages like Chinese use little affixation, whereas complex languages like Turkish or Finnish contain long words formed from many morphemes. Thus, computational systems must handle diverse patterns of word formation, making morphology a core study area in language technology.

2) Overview of Morphology

Morphology studies how different morphemes combine to form complex words. These morphemes can be:

Free morphemes (can stand alone): book, run, chair

Bound morphemes (cannot stand alone): -ing, -ed, un-, -s

Morphological processes include:

A. Inflection

Changes grammatical properties (tense, number, case) without changing category:

play → played, book → books

B. Derivation

Creates new words or categories:

happy → happiness, teach → teacher

C. Compounding

Joining two free morphemes:

blackboard, sunflower

Some languages add prefix, suffix, infix, or circumfix (Arabic), zero morphology (sheep → sheep), subtractive morphology (Spanish hermano → hermanita). The complexity of these processes requires computers to learn or model many rules for correct analysis and generation.

3) Structure & Ambiguity in Morphology

Ambiguity is one of the greatest challenges in both analysis and generation in natural language processing. Words can be morphologically unclear, meaning one surface form can have multiple analyses. For example:

·      Second (English) can function as noun, ordinal number

·      Okuma (Turkish) can mean reading, don’t read, or to my arrow, depending on morpheme boundaries.

Computational morphology needs to decide the correct meaning based on context. This requires identifying:

·        Correct root or stem

·        Proper affix boundaries

·        Grammatical features (tense, mood, case, etc.)

·        Computational Morphology (Very Simple Explanation)

4. Morphological Analysis (Breaking a word)

·      Computational morphology is about teaching computers how to understand and create different word forms. The computer takes a full word (surface form) and breaks it into:

Root word, Grammatical information (features)

Example:

walked → walk + PAST

(“walked” is the surface form, “walk” is the root, “PAST” is the tense)

·      Morphological Generation (Making a word)

The computer starts with:

Root word, Features (tense, number, person, etc.)

…and creates the correct surface word.

Example:

walk + 3rd person + present → walks

To do this correctly, two things are needed:

A. Morphotactics (Order of morphemes)

These are rules about which pieces of words can join together and in what order.

Example:

In English, you can add -ed after a verb, but you cannot say edwalk.

B. Morphophonemics (also called Morphographemics)

These are the spelling or sound changes that happen when affixes attach.

Examples:

carry + ed → carried (y changes to i)

make + ing → making (drop the e)

In short:

A computational morphology system must understand:

·      Which parts can combine (morphotactics), and

·      How spelling/sound changes happen (morphophonemics).

Only then can a computer correctly break words apart or form new ones.

5) Finite-State Morphology

A Finite-State Transducer (FST) is a simple computer tool used to convert:

  • Lexical level (root + grammar features)
    walk + PAST

into

  • Surface level (the actual word)
    walked

Why FSTs are useful

  • Very fast
  • Can both analyse and generate words
  • Store rules in a small, compact way
  • Handle morpheme order (morphotactics) and
  • Handle spelling changes (morphograph­emics)

6) Handling Morphotactics (Allowed Word Building Rules)

Morphotactics = rules about which morphemes can join together.

Example:

·        Correct: dog → dogs

·        Incorrect: sheeps, boyses → these must be blocked

In FSTs:

  • Each word type (noun, verb, adjective) has its own small dictionary called a sub-lexicon.
  • These sub-lexicons say what is allowed:
    • nouns → can take plural
    • adjectives → cannot take plural

So morphotactics keeps word formation legal and grammatical.

Conclusion

Computational morphology helps computers:

  • understand how words are built
  • know the meaning of different word parts
  • choose the correct word form
  • handle tasks like translation, speech, and text search

By using finite-state methods, rule systems, and modern machine-learning models, computational morphology keeps improving.
This makes language technology more accurate, faster, and better connected to real linguistic knowledge.

 

No comments:

Post a Comment

'The Selfish Giant' by Oscar Wilde as a Moral Allegory

  The Selfish Giant as a Moral Allegory Introduction Oscar Wilde (1854-1900) was a brilliant Irish poet, playwright, novelist who was al...