14 January 2014

Text Length in Malay and English

In my previous post, I compared two New Year messages sent out by the Dean of FASS, one in Malay and the other in English; and I noted that although the Malay message seems to be longer, in fact the English message has more words.

In fact, we can analyse this a bit further. First, the Malay has 61 words, but many of them are morphologically complex, so it has 81 morphemes. For example, diucapkan ('said') can be analysed as three morphemes: di+ucap+kan. In contrast, only four of the English words are obviously morphologically complex: 'going', 'taking', 'friends' and 'celebrating'. This means that 66 words in the English version have a total of 70 morphemes. The greater number of morphemes in Malay (81 vs 70) partly explains why the Malay text seems longer.

Next, we can consider word length. The average word length in Malay is 5.66 letters, while that in English is 3.89 letters. So the Malay text really does have longer words.

One other interesting contrast between these two texts is the extent of lexical repetition. Malay tends to tolerate repetition of words, while English does not. And we can see that kepada ('towards') occurs four times in the Malay, while there is no word that occurs so often in the English; and selemat occurs three times in the Malay, but the closest equivalent 'happy' only occurs twice in the English.