Linguistic Curiosities

@dave-the-linguist / dave-the-linguist.tumblr.com

Tumblr shoots itself in the foot with an ego-trip of panopticist surveillance powered by dumb AI. This blog is no longer updated.
Avatar

What is NLU in humans?

Today I was thinking about understanding in terms of information processing, circuits, and activation patterns. 

If we say that an average Google search corresponds to 1kJ of energy consumption, can we also think of understanding a sentence for humans in terms of energy consumption? Lexicon, idioms, grammar structures, contextual references, analogies--some instances are harder to parse than others. A single sentence of 12 words may contain many dozens of these patterns that need to be understood. Can we assign a “level of difficulty” to each pattern? Pattern X would be 1.2kJ/mil, that is, the “average” brain consumes 1.2 kilojoules of energy over 1 million encounters of understanding this pattern. But we know that neural networks reshape themselves. It becomes more efficient. Understanding is a lot “easier” for an expert. Then we talk about “level-appropriate materials”, these are rough categories, what if it becomes very granular? Through which methodologies can we make level selection adaptive to almost an individual?

What does it mean to understand the word ‘marmot’? For all individuals who are deemed to be capable of understanding this word, the activation patterns may be different, or they may be very similar when analysed down to principal components. I would guess the subjective, episodic experience of exposure to everything that contributes to the formulation of understanding of this word is different for each individual, so is perhaps how each instance of exposure is processed. But when it converts to semantic understanding, are the patterns materially different?

Avatar

The problem with flashcards is that there isn’t much on the back of the card except a *shorthand*, thus it’s not useful for ingesting the concept for the first time. Even for something low-information density as a concrete noun for some common object--an apple is just an apple, right? Some demonstrations in real context is still useful: can the apple be eaten? Does the apple decompose? You need this running around of ontological exercises to solidly place the idea in a web of plausible combinatorics. Maybe in some natural languages an aeroplane does not fly, it can only be flown, I don’t know. An apple is rarely just an apple. The shorthand is only effective once this ingestion of ontology is largely complete, then upon each instance of repair (seeing the flashcard for active recall) the connection is available through “symbolic transfer”. The shorthand is the access symbol, what it brings to mind is already in your mind.

Avatar

Meaning as an emergent phenomenon

At my level of ~A2 German, often I encounter a sentence where all the parts of the sentence are “known” words, but I just do not get the meaning of the sentence.

I can think of a corollary example in English, “He would have been let go...”

It’s easy to imagine someone new to English having trouble decoding the meaning. The phrase embeds so many layers of grammatical features:

  • Inflected “to have been + past participle”
  • Passive structure
  • Subjunctive “would have"
  • Strong forms: “let” being the past participle of “to let”
  • Plus idiom “to let go”
  • Strong verb “to let” followed by another verb

These features are not explained at the level of lexicon or morphology.

Avatar

Non-trivial efforts

What does it mean to speak a language?

Let’s consider this question in the context that many language learning products claim to help users achieve such a goal. Actually, forget that context for a moment—very often people are just curious about how much effort is involved to reach the “near-native” level.

Basically, some lasting changes in memory and cognition must occur—my inkling is almost everyone on this topic grossly misunderestimates the profoundness of such changes and the amount of “work” entailed.

Obviously, there are established frameworks for assessing the level (e.g. A1-C2), though I often think about other anecdotal or qualitative criteria…

For example, I consider my English to be “near-native”, here’s a list of observations of what I’m able to do:

  • Get amused by standup comedy
  • Read text in intentionally obscure style and register, e.g. academic work, terms and conditions, fine print
  • Write structured paragraphs of logical arguments or articles that people consider well-written
  • Find reading entertaining (similar to watching TV)

I’m not able to any of these in German, which I estimate to be at A2/B1 level… Note that I have already spent thousands of hours on German, and more in terms of casual exposure—the point is, attainment of functional abilities is not trivial.

What I’m able to do in German:

  • Understand 50%+ of what is said or written in general contexts (non-specialised)
  • Get some sentences in conversational contexts completely
  • Know approximately 2,000+ words (lemmas)
  • Understand the phonological structure, know what is plausible and what is not—the same with grammatical features, e.g. endings
  • Remember lyrics from songs etc.

Now, I’m not able to do most of this in Finnish… What can I do in Finnish?

  • Order food or beer (no customisation)
  • Recognise some words
  • Have a very incomplete understanding of phonology and word structure
  • Use very basic expressions
  • Imitation and memorise longer sentences (e.g. “Are your parents happily married?”)

Mind you, I was able to order KFC in Russia, with practically zero hours spent on Russian, except reading Cyrillics… so a lot of this is “cognitive problem-solving” instead of language acquisition—I did take a course on Finnish and listened to some audio courses (estimated hundreds of hours)

So that’s why I’m very sketical towards claims that you can “speak” anything with trivial efforts—this doesn’t mean the process has to take years—you just need a very systematic regimen. What I really want to do, is to “mechanise” this process, an analogous illustration would be: https://www.quora.com/How-many-Chinese-characters-can-a-foreign-language-student-expect-to-learn-in-one-year-How-about-two-years/answer/David-Rosson

Areas that I want to explore:

  • Vocabulary: Not just 4,000 words, but 20,000 words, 50,000 words
  • Usage: Phrases; idioms and expressions; nuanced semantic gradients
  • Morphology: Word formation; conjugations and declensions
  • Sentence structures and patterns; “combinatorics”
  • Lots of graded reading: news, discussions, wikis
  • A tutor for intensive dialogue training
Avatar

Formula-based Lexicography

Vocabulary lists often deal with target words and their meanings as one-to-one translations in gloss form. For example:

vertreibenexpel

This sometimes results in out-of-context erros, e.g. "I'm wearing a *clock" because the item-to-item mapping really doesn't offer more information -- the learner then conflates this tenuous link between the two item with equivalence in semantic values (which are often context-dependent).

The same semantic role may be filled by a single verb, or a phrasal verb, or some long winded expression. For example: "to call someone out on it" may very well translate to a single transitive verb -- just look at how many parts are stuffed into it!

What we want to find, are real, natural, authentic expressions that fulfill such a role, in either language -- rather than two items that happen to follow the same form or lexical category. Sometimes a verb may translate to an idiom.

Some people advocate using example sentences. They are good, especially pithy, gem-like ones. But they are not the minimal unit of demostration. This minimal unit is a "formula".

{aus} [Land, Gebiet] verteiben ⇒ to expel {from} [..., ...]
{sich} (DAT) [die Zeit] {mit} [etw] vertreiben ⇒ to pass [the time] {with} [sth]

Here on each side (for each language) there is a patterned template, that shows several pieces of information in addition to just the 'pivotal' item:

  1. Collocations: 'to dress oneself' or 'to dress a wound' -- the same pivotal item may map onto different words in the other language.
  2. Phrasal components: such as prepositions, which ones to use and where to place them.
  3. Thematic roles and cases: it also forms a template for who's doing what to whom -- and in what inflectional form should each component appear.

I imagine the best way to make such a dictionary (or a vocabulary list) is to start with high-quality, curated corpora, and from there, real n-gram collocation data.

Avatar

Verb reduction

"Ah, wrong again! It was a knife. But, 'stab' implies the blade was thrust into the victim, whereas this wound was produced by it being hurled into her chest."

Here the word 'to hurl' is probably a low frequency word, to make the sentence easier to understand, the contrasted keywords may as well be replaced by words of higher frequency, for example, 'pushed' and 'thrown'.

Now, if we analysed the elements within the semantic value:

to hurl = to throw + with force

If we drop the adverbial specification, it becomes a hypernym, a "more general" verb, a higher level or "easier" word.

In this particular case, it also happens that 'hurled' is regular and 'thrown' is irregular -- we say that the higher verb is "stronger".

If we make the verbs stronger still, both 'thrust->push' and 'hurl->throw' can be replaced by 'put'. Now we arrive at a level of what I would call "very strong verbs", which I imagine are the "essential slots" in almost every language, each to be filled by a common verb, phrasal verb, or some expression.

Just off random musing, I imagine they would fill these categories:

ATTRIBUTION: have (possess), exist, be at (location), be called (name)

CAUSATION: make (create, prepare), make (cause, transform, enforce), look like (appearance), undertake/undergo, try to

VOLITION: want, like (pleased), hope/wish, agree/accept/allow, be allowed, advocate, attack/reject, care/mind/be concerned about, command, promise

PROGRESS: go, come; come back (return), give back; move (self), move (object); start, stop/end, wait for, wait (inaction); continue, keep (object)

COGNITION: know, understand, consider, think/feel (opinion); keep in mind, recall, choose/decide/judge, intend/plan to; feel (emotion), believe

SENSES: look/behold, see (detect), listen to, sample (taste), touch/feel up

CUSTOMS: eat, drink, sleep, sit, walk, meet, read, write, buy, reside

TRANSACTION: fill/fit, obtain/collect, look for (search), take away, take with, give/pass to, profer, lose/let go, put/leave, hide, tell (notify), demonstrate

[Some of these are reflective pairs]

Avatar

Phylogenic Components of Language

  • Ritualised reference (lexicon)
  • Combinatorial encoding (phonology)
  • Central coherence (semantics: Gestalt meaning)
  • Procedural coordination (syntax)
Avatar

More thoughts on central coherence

Central coherence = priming activation?

Consider these examples:

rise up rise down* fall down fall up* come down come up come here come there*

The semantic implications inherent of a word impose constraints that spread onto the neighbouring environment. When we consider a verb in a second language, we can think about whether it implies motion, direction, change of state, agency and so on.

Chairs come in different shapes; birds fly but balls and time also fly. This is the native or folk ontology that differentiates one nuanced concept from another. There is a lot of detail that may not be immediately obvious.

If meaning were to be derived from mere association rather than the aggregate result of activation, "going to bed" would have to components "walking up to" and the destination object "the bed". Only when the semantic network of "what a bed is" (which can range from a mattress to a pile of hay, as Plato would say) becomes activated (and proceduralised through idiomatic use) -- we can arrive at the actual meaning of "going to sleep".

In weak central coherence, the symbolic, encoded representation; the subject can still memorise a sequence such as 'cat' or 'orange', very much the same way for memorising a phone number. There is the item-to-item association, but very limited spreading. The associative mapping of meaning (on an isolating basis) is present, but there isn't so much of the generalised "induction of meaning" that allows for fast and robust comprehension.

Avatar

De/Re-generative Grammar

This is the notion that instances of actual language use eminate from Platonian representations, of abstract models of the language, and go through a process of sytematic decay, distortion, or realisation when they are produced.

Running speech is a de/re-generated product of idealised speech. Colloquial grammars are de/re-generated from formal models of expression. Hence it's difficult for learners to faithfully reproduce authentic expressions through mere imitation, because the output itself is the product of a process of decay -- they haven't got the original, and they cannot let it decay the right way.

An analogy is apple juice, you get it from apples, that's fairly simple. But to concoct an artificial flavour that tries to imitate apple juice, is hard.

Avatar

I always thought the Germans say 'den' like 'din', now I know I'm not alone.

Harding, S., & Meyer, G. (2003). Changes in the perception of synthetic nasal consonants as a result of vowel formant manipulations. Speech Communication, 39(3), 173-189.
"The nasal prototypes /m/ and /n/ were used in all experiments, together with a range of preceding vowels differing only in the frequency and transitions of their second formant (F2). When no explicit transitions were present between the vowel and nasal, the perception of each nasal changed from /m/ to /n/ as the vowel F2 increased. Introducing explicit formant transitions removed this effect, and listeners heard the appropriate percepts for each nasal prototype. However, if the transition and the nasal prototype were inconsistent, the percept was determined by the transition alone. In each experiment, therefore, the target frequency of the vowel F2 transition into the nasal consonant determined the percept, taking precedence over the formant structure of the nasal prototype."
Avatar

The genesis of language: a summary based on Arbib's talk

1.1 Extracting meaning: example of visual processing, from edge detection to thematic analysis -- feature extraction and contextual probabilities -- snapped onto a schema of recognition.

1.2 Central coherence: from features to themes, with flexibility and tolerance for variations and noise => robust reduction.

1.3 Abstract representations: ability to generalise => robust induction.

2. The repertoire of manual operations: "reach -> grip -> retrieve" => a mental store of available options: sequential actions towards proximal and ultimate goals. See: Alstermark et al. (1981).

3. Mirror neurons: registering operations without performing them, i.e. a mental representation of actions/movement/gestures in others.

4. Implications for fitness: imitation, transmission of skills; competitive advantage in anticipating others' moves; empathy or theory of mind.

5. Ritualisation: the evolution and emergence of bodily signals -- the ability to achieve a function (e.g. determine hierarchy) without performing the full sequence of available actions (e.g. fighting to death).

6. Now the picture is almost complete:

  • Linking actions to meaning -> performable actions serving a goal.
  • Registering actions (gestures) -> mirrored recognition.
  • From meaning to gesture -> ritualisation.
  • Robustness in recognition -> allows abstraction.

7. Now the gesture or symbol referring to a meaning or idea can be far removed from the original sequence.

For example, when you pull out your smart phone, and “dial” a number by touching the screen. The gestures with which you communicate with the computer are really many steps away from the etymology, there's no dial and you are not really dialing anything -- except you are performing an action signified by such a word.

And that essentially what a lexis allows you to do: representing ideas using abstract symbols that are far removed from the original action sequence or quality or thing or even its associated pentomimes.

Actually the above only goes to the level of bonobos on lexigrams, that's only about one third of the story. The second step is to explain how speech is basically "audible gestures", and how a combinatorial encoding system takes over -- along the expansion of lexicon (Acredolo & Goodwyn, 1985; Capirci et al., 1996; Butcher, 2000; Iverson & Goldin-Meadow, 2005) where it goes from one-word to one-word-plus (gesture) to two-word. See also: Anisfeld, M., Rosenberg, E. S., Hoberman, M. J., & Gasparini, D. (1998). Lexical acceleration coincides with the onset of combinatorial speech. First Language, 18(53), 165-184.

Then the third part is explaining the emergence of generative grammar... a rule-based system for planning and executing sequences. Perhaps see: Fitch, W. T. (2011). The evolution of syntax: an exaptationist perspective. Frontiers in evolutionary neuroscience, 3.

[Video]: in slow-motion, you can see the cat modifying the "tactical positioning" of its footholds as well as various "action components" with high precision in executing a well-coordinated leap sequence.

[Continued]

Saying "Ahhh" can be just another gesture, it's no more "removed" or abstract than clapping hands (which happens to be an audible gesture) -- only that you are "clapping" your vocal folds to make the sound.

Consider these "units of meaning" with no sonorant components and seemingly non-conformative to how English phonology would define a word:

  • "Psst!"
  • "Pff..."
  • "Tsk tsk..."
  • "Shhh!"

They are closer to "audible gestures" than to lexical items with a re-combinatory encoding scheme (that is, made up by combining and re-arranging phonemes).

This difference in between or threshold is what I alluded to as the "switch" from referential gestures to linguistic phonology. I have two speculations about this:

1. This "phonology module" -- though this module may be psycholinguistically but not neurologically real i.e. it's actually an interplay of various exapted (rather than de novo) sub-systems, as Arbib would say -- emerged at some point of the evolutionary course. And it gave its bearers (our common ancestors) an advantage because the vastly expanded lexical capacity of a combinatorial system.

2. This "module" matures along some point of the developmental course, roughly corresponding to the sharp "kink" or inflection point you see in the vocabulary curve. The child would move from controlled gestures and gesture-like utterances to multiple gestuers and expanded one-word vocabulary and coordinated word-plus-gesture uses, and eventually to a switch onto a phonologically based model.

Avatar

Thoughts of the Day:

  1. Why do diphthongs move along with monophthongs when sound shifts happen? It must be that, the "vowel targets" underlying both categories are actually doing the moving, that is, the sign posts defining the vowel space are shifting, rather than exemplar positions or definitions of individual sounds.
  2. Transcription is a model. It should be useful but needs not be true. Approaching what is true is the work of theories vetted by empirical investigation. The (potential) danger of phonetic realism is that it conflates interpretation with documentation, and applicability with validity.
Avatar

Questions of the Day:

  1. Assimilation has often been described in terms of how neighbouring segments affect each other -- and the output are thought of as segments with changed features -- what if the products are something else altogether?
  2. To what extend are segments real? We often assume they have abstract mental representations, and each of them holds a bundle of features together -- they are seen as units on which phonological rules operate -- are these units an illusion?
Avatar

Topics of potential interest:

  • Multi-word Units on the Frequency List
  • Effect of Mass Exposure to Citations
  • Typology: Permutation of Constituents
  • Measuring Comprehension with Eyetracking
  • Efficient Sample Test of Vocabulary Size
  • Memory, Delusions, and Hypnosis
  • Dance Dance Revolution for Prosody
  • Platonic Models of Generative Register
  • Visual Feedback for Vowel Targeting
  • Neat, Informative Graphs with ggplot2
  • ...

So far it's only Tuesday...

Avatar

The hiddenness of abstraction

What make the analysis so confounding and "hidden" is the many layers of conversion from abstraction to realisation. What we can observe and collect are only at the surface, many steps removed from their "top-down" origins. And as Anderson said quite cryptically: "Physical events are notoriously neutral." Phonologists tend to think (and phoneticians and researchers working on speech synthesis have come to realise this too) of speech articulation as a continuous stream of "gestures". It's kind of like interpretive dance, or choreography, where you are trying to convey contrastive meaning using perceivable motion. It's not what was conventionally thought of as a string of idealised "targets" stitched together.

Now imagine what you can capture are images, then you have to solve vision, starting from edge detection and all that, and then the anatomy and stick figures, and then step-by-step you get to the system of movement, and then perhaps to meaning. I would speculate it's notoriously difficult for computers to "understand" interpretive dance. Phonological features and classes (the elements out of which we make speech, and the rules for doing so) are abstract. There used to be an impression that they must have some articulatory and then acoustic targets (as in, "this is the sound to produce"); but apparently no.

  • People who lost front teeth or just had dental aneasthetics can still speak, and we can still understand them to an extent.
  • The phoneme /r/ varies so greatly in manner and place, it's very hard to explain their relations through acoustics.
  • Sign languages have concepts and processes analogous to phonemes, co-articulation, rhyming and so on; only they use hand shapes, postion, movements, and facial gestures instead.

Phonology is realised through anatomy, but not bound by it. You could imagine an alien species with a completely different set of organs as articulators, or imagine sci-fi implants giving us the ability to use very novel gestures -- it just so happened that our ancestors went down the path of utilising the vocal tract. And if you look at cross-linguistic data, even just the vocal tract can have a very diverse range of expressive possibities, some less obvious than others, from clicks to tones to labial protrusion to breathiness to ingressives and more. The spectrogram gives us spectral and temporal resolution, depending on the maths applied to it, we can see the individual beat of the vocal folds, we can see harmonics, and we can see resonance characteristics, and the patterns of acoustic energy. But after all physical events are just pointers to the real thing, like moving shadows cast by hand puppets.

And naturally when we look at these, we interpret them as both physical phenomena and "linguistically interesting cues" that we are searching for. When coding in Praat, for example, you would hear people talk about: voicing, formants, turbulance, closures, "energy droping off", movements, glottalisation, periodicity, and so on -- multiple levels of representation and interpretation (acoustics, anatomic, phonetic) mixed altogether. What we want to extract are not intrinsic in the sound data, therefore they cannot be analysed in isolation. They must be interpreted with reference to a whole range of constraints and predictions from phonetics to phonology, from anatomy to sociolinguistic to pragmatics and beyond. At this point, I have pretty much started to ramble, so I'll just leave it there with some blog posts on the topic:

Avatar

When a noun becomes a verb, the logical structure or construction of its semantic value is not always predictable:

to drug to poison to paint (vt.) to water (vt.)
to paint (vi.) to water(vi.)
to fire
to fish
to worm to weed

Sponsored

You are using an unsupported browser and things might not work as intended. Please make sure you're using the latest version of Chrome, Firefox, Safari, or Edge.