Wednesday, December 10, 2025

Syntax in Computational Linguistics: Oxford Handbook of Computational Linguistics by Ruslan Mitkov

 Syntax in Computational Linguistics

1) Introduction to Syntax

Syntax is the study of how words join together to make sentences.
It tells us:

·        Who is doing the action (subject)

·        What the action is (verb)

·        Who or what receives the action (object)

Example: John visited Mary.

·        John = subject (doer)

·        visited = verb (action)

·        Mary = object (receiver)

In computational linguistics, this can be written like a small formula:

Visit (John, Mary)

1. Why “Mary visited John” means something different

In coding terms:

Visit(John, Mary)   → John visited Mary

Visit(Mary, John)   → Mary visited John

Even though the words are the same, changing the order changes who is doing the action.

Humans understand this naturally.
Computers need rules (syntax rules) to figure this out.

So, syntax gives the structure of a sentence, helping computers understand language.

 

2) Basic Syntactic Concepts

a) Subject – Predicate Relation

Every sentence has:

·        a predicate = usually the verb

·        Subject = the people or things involved (subject, object)

Different languages show these roles differently:

·        English uses word order

o   John (subject) → visited (verb) → Mary (object)

·        Japanese uses case markers

o   John-ga (subject marker)

o   Mary-o (object marker)

Computers must understand these patterns.

b) Phrase Structure

Words group together into phrases that act like one unit.

Example: the tall boy from the park
This whole group = one noun phrase (NP).

Syntax studies:

·        how phrases are built

·        how they can be expanded

·        how one phrase can contain another

c) Ambiguity

A sentence can have more than one structure → two meanings.

Example:

The man from the school with the flag.

Who has the flag?

·        the school?

·        or the man?

Syntax helps computers detect and solve such ambiguities.

 

3) Agreement (Dependency)

Agreement means words must match each other in number, person, gender, etc.

Examples:

This boy is tall
These boy is tall

Problems like this help computers check grammatical correctness.

Example of ambiguity:

Flying planes seem/seems dangerous.

·        If we use seem, “planes” is subject.

·        If we use seems, “flying” becomes the subject.

Agreement helps decide the meaning.

 

4) Valency (Subcategorization)

Different verbs need different numbers of arguments.

·        Intransitive = 1 argument

o   He slept.

·        Transitive = 2 arguments

o   She ate an apple.

Computers use valency to check whether a sentence is complete.

 

5) Embedding and Long-Distance Dependency

One sentence can be inside another sentence.

Example:

The girl that John visited left.

The embedded part that John visited depends on the main sentence.

Deep embedding is difficult for both humans and machines:

The man who said that the woman who knew the teacher who criticized the scholar left early…

Syntax tells computers how to track long-distance relations correctly.

 

6) Conclusion

Syntax is the architecture behind meaningful sentences.

It explains:

  • how verbs control arguments
  • how phrases combine
  • how agreement keeps grammar correct
  • how ambiguity arises
  • how computers can resolve and understand sentences

Modern NLP prefers feature-based and dependency-based models. Without syntax, computers cannot understand or produce meaningful sentences—they can only list words.

No comments:

Post a Comment

'The Selfish Giant' by Oscar Wilde as a Moral Allegory

  The Selfish Giant as a Moral Allegory Introduction Oscar Wilde (1854-1900) was a brilliant Irish poet, playwright, novelist who was al...