AI / ChatGPT
AI Consultancy

On languages

Posted on September 9, 2023 in languages by Stefaan Meeuws

What defines a human language?

A human language is a communication system that allows people to share ideas, feelings, and information. Think of it like a complex code where you use sounds, symbols, or gestures to represent things. For example, when you say “apple,” you’re using a specific sound to illustrate a fruit that can be red, green, or yellow.

Now, languages aren’t just a random set of words. They have rules, known as grammar, that dictate how words should be arranged to make sense. Imagine you’re building a LEGO castle; you can’t just throw pieces together. You need to follow specific steps to make sure the castle stands strong. Similarly, grammar helps us construct sentences that are clear and understandable.

Lastly, languages are living things; they evolve. New words get added, old ones fade away, and the rules can change. It’s like a software update; you add new features or remove bugs to improve the system. Language changes to adapt to new ideas and technologies, ensuring it stays relevant and useful for communication.

How many human languages exist today?

According to the Ethnologue, a comprehensive database of world languages, there are approximately 7,139 living languages as of 2021. However, this number is not static; languages evolve, become extinct, or new dialects may gain recognition as separate languages. It’s a dynamic field, much like how software ecosystems are continually updated and expanded.

What are some practical, empirical ways to organize the human languages of the world?

When it comes to organizing human languages, there are several empirical approaches that can be quite effective, much like how you’d categorize libraries of code or software frameworks.

  1. Language Families: One of the most common ways is to group languages into families based on historical and linguistic relationships. For example, English, German, and Dutch are all part of the Germanic language family. Think of this as similar to object-oriented programming, where languages inherit features from a common ancestor.
  2. Geographical Distribution: Languages can also be categorized based on the regions where they are predominantly spoken. This is akin to how you might organize servers geographically to optimize latency.
  3. Morphological Complexity: Some languages are isolating with very little morphology (like Mandarin), while others are highly inflected or agglutinative (like Finnish or Turkish). This is somewhat analogous to comparing procedural and functional programming paradigms based on how they handle state and data manipulation.
  4. Functional Use: Languages can be classified based on their use-cases, such as trade languages, liturgical languages, or sign languages. This is similar to how you’d differentiate between a database query language like SQL and a general-purpose language like Python.
  5. Vitality and Endangerment: Languages can be sorted based on how many people speak them and whether they are at risk of becoming extinct. This could be likened to software tools; some are widely used and supported, while others are deprecated or have a small user base.
  6. Phonological Features: Languages can also be grouped based on the sounds they use. For instance, tonal languages like Mandarin use pitch to distinguish meaning, while non-tonal languages like English do not. This is a bit like how different programming languages have unique syntax or compilers that interpret code differently.

Each of these methods has its pros and cons, and often a combination is used for a more nuanced understanding, much like how you’d use multiple metrics to evaluate the performance or utility of a software application.

These are some of the language families

  • Germanic: English, German, Dutch, Swedish, Norwegian
  • Romance: French, Spanish, Italian, Portuguese, Romanian
  • Celtic: Irish, Welsh, Scottish Gaelic
  • Slavic: Russian, Polish, Czech, Serbian, Ukrainian
  • Indo-Iranian: Hindi, Urdu, Persian, Sanskrit
  • Semitic: Arabic, Hebrew, Amharic, Maltese
  • Berber: Kabyle, Tuareg, Tamasheq
  • Cushitic: Somali, Oromo, Afar
  • Sinitic: Mandarin, Cantonese, Shanghainese
  • Tibeto-Burman: Tibetan, Burmese, Dzongkha
  • Finno-Ugric: Finnish, Estonian, Hungarian
  • Sami: Northern Sami, Lule Sami, Southern Sami
  • Turkic: Turkish, Uzbek, Kazakh, Turkmen
  • Mongolic: Mongolian, Kalmyk, Buryat
  • Tungusic: Evenki, Manchu, Xibe
  • Malayo-Polynesian: Indonesian, Malay, Tagalog, Maori, Hawaiian
  • South Dravidian: Tamil, Malayalam
  • Central Dravidian: Telugu, Kannada
  • Bantu: Swahili, Zulu, Shona
  • Kwa: Akan, Ewe
  • Mande: Bambara, Malinke

Each of these families can have sub-families and individual languages can have dialects, much like how a software library can have sub-modules and versions. Some languages are isolates, meaning they don’t belong to any known family.

For each of these languages, what are some well-known dialects?

Dialects are like different versions or forks of a software project; they share a joint base but may have unique features or optimizations. Here’s a rundown of some well-known dialects for languages from the families mentioned:


  • English: American, British, Australian, Canadian
  • German: High German, Low German, Swabian, Bavarian
  • Dutch: Flemish, Brabantian, Limburgish


  • French: Parisian, Québécois, Acadian, Provençal
  • Spanish: Castilian, Latin American, Andalusian
  • Italian: Tuscan, Sicilian, Neapolitan


  • Irish: Munster, Connacht, Ulster
  • Welsh: Northern, Southern
  • Scottish Gaelic: Hebridean, Mainland


  • Russian: Northern, Southern, Central
  • Polish: Greater Polish, Lesser Polish, Silesian
  • Czech: Bohemian, Moravian


  • Hindi: Khari Boli, Braj, Awadhi
  • Urdu: Dakhini, Rekhta
  • Persian: Farsi, Dari, Tajiki


  • Arabic: Modern Standard, Egyptian, Levantine, Gulf
  • Hebrew: Modern, Ashkenazi, Sephardi
  • Amharic: Gondar, Gojjami, Showa


  • Mandarin: Standard (Putonghua), Taiwanese, Sichuanese
  • Cantonese: Standard (Guangzhou), Hong Kong, Macau


  • Finnish: Standard, Savo, Tavastian
  • Estonian: Northern, Southern
  • Hungarian: Western, Eastern, Palóc


  • Turkish: Istanbul, Aegean, Eastern Anatolian
  • Uzbek: Northern (Tashkent), Southern (Samarkand)


  • Indonesian: Standard, Javanese, Sundanese
  • Tagalog: Standard (Filipino), Batangas, Marinduque

South Dravidian

  • Tamil: Standard, Kongu, Madurai
  • Malayalam: Standard, Travancore, Malabar


  • Swahili: Standard, Kiunguja (Zanzibar), Kimvita (Mombasa)
  • Zulu: Zululand, Natal

This is a simplified list, and each dialect can have its own sub-dialects or regional variations, much like a software library can have different branches or commits.

Comments on 'On languages' (0)

Leave a Reply

%d bloggers like this: