Corpus of Khmer Inscriptions

About the Corpus of Khmer Inscriptions

No SEA country's transition to history is better documented than Cambodia. More than 1,200 Khmer and Sanksrit inscriptions have been found across a broad swath of the Mekong delta region. Spanning the 6th to the 19th centuries, they provide first-hand evidence of ancient and medieval Khmer history, culture, language, and art.

The tools provided here serve two quite different functions, allowing:

- study of the texts that make up the corpus, and

- analysis of the corpus as a set of words.

This implementation has limited functionality: texts are incomplete, have not been fully proofread, and the division into eras and regions is very rough. Nevertheless, with some 669 texts comprising nearly a million words, and integrated tools for collocate search and analysis, even this demonstration version provides a powerful research tool.

Using the Corpus Tools

Indic entry Insert characters as necessary. Note that (digit), (vowel), and (consonant) will replace any single letter in the actual search.

Corpus search Return specific items in a context of +/- n words (default 5). Because spelling may vary, the searches exactly, while (vowels), (consonants), and (vowels or consonants) allow plausible substitutions; e.g. a long vowel for a short one. Returned contexts can be ordered by date or inscription number.

Collocates / neighbors Collect, count, and display only the immediate collocate, or the nearest neighbors, of any item. Summing the neighbors will is helpful for ignoring intervening modifiers or numbers.

Summary distribution Show how a term's frequency varies over the entire corpus. The approximate spelling buttons track variants as well.

Corpus restriction / display These controls let one or more texts form a sub-corpus, whose members may be chosen by number, era, origin, etc. Once the sub-corpus has been selected, you may:

- search it using the Corpus search tools at the top.

- display full texts with the button.

- extract a lexicon of all words in all the selected texts. These may be ordered alphabetically, by frequency, or by date of actual appearance.