World Library  
Flag as Inappropriate
Email this Article

Lemma (morphology)

Article Id: WHEBN0002639048
Reproduction Date:

Title: Lemma (morphology)  
Author: World Heritage Encyclopedia
Language: English
Subject: Root (linguistics), Uninflected word, Marker (linguistics), Books/Selected article/Upcoming, Lemma
Publisher: World Heritage Encyclopedia

Lemma (morphology)

In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words (headword). In English, for example, run, runs, ran and running are forms of the same lexeme, with run as the lemma. Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme. In lexicography, this unit is usually also the citation form or headword by which it is indexed. Lemmas have special significance in highly inflected languages such as Arabic, Turkish and Russian. The process of determining the lemma for a given word is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary.


  • Morphology 1
  • Lexicography 2
  • Pronunciation 3
  • Difference between stem and lemma 4
  • See also 5
  • References 6
  • External links 7


In English, the citation form of a noun is the singular: e.g., mouse rather than mice. For multi-word lexemes that contain possessive adjectives or reflexive pronouns, the citation form uses a form of the indefinite pronoun one: e.g., do one's best, perjure oneself. In languages with grammatical gender, the citation form of regular adjectives and nouns is usually the masculine singular. If the language additionally has cases, the citation form is often the masculine singular nominative.

In many languages, the citation form of a verb is the infinitive: French aller, German gehen, Spanish ir. In English it usually is the bare infinitive (that is, lacking the to which customarily precedes English infinitives); the present tense is used for some defective verbs (shall, can, and must have only the one form). In Latin, Ancient Greek, and Modern Greek (which has no infinitive), however, the first person singular present tense is normally used, though occasionally the infinitive may also be seen. (For contracted verbs in Greek, an uncontracted first person singular present tense is used to reveal the contract vowel, e.g. φιλέω philéō for φιλῶ philō "I love" [implying affection]; ἀγαπάω agapáō for ἀγαπῶ agapō "I love" [implying regard]). In Japanese, the non-past (present and future) tense is used.

The form that is chosen to be the lemma is usually the least marked form, though there are occasional exceptions; e.g., Finnish dictionaries list verbs not under the verb root, but under the first infinitive marked with -(t)a, -(t)ä.

In verbal noun. For Korean, -da is attached to the stem.

In the Irish language words are highly inflected depending on their case (genitive, nominative, dative, and vocative); they are also inflected on their place within a sentence due to the presence of initial mutations. The noun cainteoir, the lemma for the noun meaning "speaker", has a variety of forms: chainteoir, gcainteoir, cainteora, chainteora, cainteoirí, chainteoirí and gcainteoirí.

Some phrases are cited in a sort of lemma, e.g., Carthago delenda est (literally, "Carthage must be destroyed") is a common way of citing Cato, although what he said was nearer to Ceterum censeo Carthaginem esse delendam ("As to the rest, I hold that Carthage must be destroyed").


In a dictionary, the lemma "go" represents the inflected forms "go", "goes", "going", "went", and "gone". The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g., "went" < "go". The disadvantage of such simplifications is, of course, the inability to look up a declined or conjugated form of the word, although some dictionaries, like Webster's, will list "went". Multilingual dictionaries vary in how they deal with this issue: the Langenscheidt dictionary of German does not list ging (< gehen); the Cassell does.

Lemmas or word stems are used often in corpus linguistics for determining word frequency. In such usage the specific definition of "lemma" is flexible depending on the task it is being used for.


A word may have different pronunciations depending on its phonetic environment (neighbouring sounds) or its degree of stress within a sentence. An example of the latter is the weak and strong forms of certain English function words such as some and but (pronounced /sʌm/, /bʌt/ when stressed, but /s(ə)m/, /bət/ when unstressed). Dictionaries usually give the pronunciation used when the word is pronounced alone (in its isolation form) and with stress, although they may also note commonly occurring weak forms of pronunciation.

Difference between stem and lemma

Stem is the part of the word that never changes even when morphologically inflected; a lemma is the base form of the word. For example, from "produced", the lemma is "produce", but the stem is "produc-". This is because there are words such as production.[1] In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed. When phonology is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" vs. "production" .

Some lexemes have several stems but one lemma. For instance "to go" (the lemma) has the stems "go" and "went". (The past tense is based on a different verb, "to wend". The "-t" suffix may be considered as equivalent to "-ed".)

See also


  1. ^ "Natural Language Toolkit — NLTK 3.0 documentation". 2015-09-05. Retrieved 2015-09-27. 

External links

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.