Table Of Contents

Previous topic

GST Goldstücker Sanskrit-English Dictionary (Developer notes)

Next topic

INM Index to the Names in the Mahabharata (Developer notes)

This Page

IEG Indian Epigraphical Glossary (Developer notes)

Date of digitization: 2014

Metadata

The original digitization is file ieg_orig.txt, which is coded in the cp1252 (windows 1252) encoding, and is best viewed in a text editor which supports this encoding. For example, in Emacs, one may use the command revert-buffer-with-coding-system and then select cp1252 as the coding. The internet reference http://www.cp1252.com/ describes this coding system.

The file ieg_orig_utf8.txt is a conversion of ieg_orig.txt to the more common utf-8 encoding. The file ieg.txt is also in the utf-8 encoding, and incorporates various editing changes, such as corrections of typographical errors.

There are several extended ascii codes occurring in ieg.txt:

¤  (\u00a4)    46 := CURRENCY SIGN
º  (\u00ba)    27 := MASCULINE ORDINAL INDICATOR
×  (\u00d7)     1 := MULTIPLICATION SIGN
‘  (\u2018)  2310 := LEFT SINGLE QUOTATION MARK
’  (\u2019)  2322 := RIGHT SINGLE QUOTATION MARK
“  (\u201c)     2 := LEFT DOUBLE QUOTATION MARK
”  (\u201d)     2 := RIGHT DOUBLE QUOTATION MARK
…  (\u2026)    49 := HORIZONTAL ELLIPSIS

The {X...X} style of coding serves several purposes:

{# #}  57882  : {#X#} devanagari text, coded with KH. Only in Preface
{% %}  11793  : italic text
{??}  2  : unreadable text

The <> style of coding is used as follows:

<P>  8829 : paragraph indentation
<Poem> </Poem>  3  = a poem.
<>  9752  := Begin ordinary (unindented) line
<H>  62  :=  a heading of some sort
<HI>  9939  := A heading, only in preface and index
<HS>  1  :=  only in preface
Page breaks are coded as [Page...].
Page breaks are of form [PageX nn], where nn is the number of following
lines on page X.
X has one of the regular expression forms:
-([ivx]+) for preface pages. X from ‘iii’ to ‘xvi’
-([0-9]+) 001-442 bulk of dictionary, 556-564 for post-index matters
([0-9]+[ab]) for index pages (which have two columns, ‘a’ or ‘b’)
442-555

The lines of the digitization represent lines of the text.

Headword coding is exemplified by: <P>{%a1ba1dha%}
BUT there are some of these form which are not headwords:
<P>{%Same%} as {%Stha1n-a1ca1rya%} (EI 17), a temple priest.
The headword is coded in Anglicized-Sanskrit AS transliteration.
The general form is <P>{%X%},
where X (key1) is coded in Anglicized-Sanskrit (AS) transliteration, and
is normalized to remove such things as accents.

The headwords are ordered in according to alphabetical order of the AS transliteration, but with no distinction to diacritical markings.

Sanskrit in the text appears in the European Indological form, which is coded in ieg.txt with the the AS (Anglicized Sanskrit) coding.

The general AS scheme, as described in CDSL.pdf, uses Latin alphabetical letters ‘x (a-z,A-Z), possibly with suffixed numbers; the letter-number combinations are, in the general scheme:

x1 = macron
x2 = dot below
x3 = dot above
x4 = accent aigu
x5 = tilde
x6 = dash below
x7 = umlaut
x10 = circonflex (hat)
x11 = accent grave

Here are the characters that occur in ieg.txt in this coding, with their approximate frequency:

Not all of these have a direct
unicode code point representation, for instance d8, s8, etc.

A1   223 := Ā  (\u0100)  LATIN CAPITAL LETTER A WITH MACRON
a1 23174 := ā  (\u0101)  LATIN SMALL LETTER A WITH MACRON
a2     1 := ạ  (\u1ea1)  LATIN SMALL LETTER A WITH DOT BELOW
a4     2 := á(\u00e1) LATIN SMALL LETTER A ACUTE
a7     6 := ä  (\u00e4)  LATIN SMALL LETTER A WITH DIAERESIS
d2  3111 := ḍ  (\u1e0d)  LATIN SMALL LETTER D WITH DOT BELOW
D2     1 := Ḍ  (\u1e0c)  LATIN CAPITAL LETTER D WITH DOT BELOW
d8   151 := ? ḏ (\u1e0f)  'd with double dot below'
e12   346 := ĕ  (\u0115) LATIN SMALL LETTER E WITH BREVE
e3     1 := ė  (\u0117)  LATIN SMALL LETTER E WITH DOT ABOVE
e4     2 := é  (\u00e9)  LATIN SMALL LETTER E WITH ACUTE
h2   282 := ḥ  (\u1e25)  LATIN SMALL LETTER H WITH DOT BELOW
h8     1 := ? ẖ  h with double dot below = upaDmAnIya (in preface material)
I1     6 := Ī  (\u012a)  LATIN CAPITAL LETTER I WITH MACRON
i1  3695 := ī  (\u012b)  LATIN SMALL LETTER I WITH MACRON
I2     1 := Ị  (\u1eca)  LATIN CAPITAL LETTER I WITH DOT BELOW
I3     1 := İ  (\u0130)  LATIN CAPITAL LETTER I WITH DOT ABOVE
i7     3 := ï  (\u00ef)  LATIN SMALL LETTER I WITH DIAERESIS
l13   113 := NO DESCRIPTION l with 3 dots below = Dravidian cerebral fricative
l2    44 := ḷ  (\u1e37)  LATIN SMALL LETTER L WITH DOT BELOW
l8   514 = ? ḻ (\u1e3b) NO DESCRIPTION  l with double dot below = (deva 'infinity') = slp1 |
l12     1 :=  (\u1e39) LATIN SMALL LETTER L WITH DOT BELOW AND MACRON
m3   842 := ṁ  (\u1e41)  LATIN SMALL LETTER M WITH DOT ABOVE
n2  5023 := ṇ  (\u1e47)  LATIN SMALL LETTER N WITH DOT BELOW
n3  1120 := ṅ  (\u1e45)  LATIN SMALL LETTER N WITH DOT ABOVE
n5   858 := ñ  (\u00f1)  LATIN SMALL LETTER N WITH TILDE
n8    74 = ? ṉ  (\u1e49)  LATIN SMALL LETTER N WITH LINE BELOW  n with double-dot below
o1     1 := ō  (\u014d)  LATIN SMALL LETTER O WITH MACRON
o12    15 := ŏ (\u014f)  LATIN SMALL LETTER O WITH BREVE
R2     7 := Ṛ  (\u1e5a)  LATIN CAPITAL LETTER R WITH DOT BELOW
r2  1210 := ṛ  (\u1e5b)  LATIN SMALL LETTER R WITH DOT BELOW
r21     2 := ṝ (\u1e5d)  LATIN SMALL LETTER R WITH DOT BELOW AND MACRON
r3     1 := ṙ  (\u1e59)  LATIN SMALL LETTER R WITH DOT ABOVE
r8   379 = ? ṟ (\u1e5f)  r with double dot below Dravidian palatal alveolar
s2  3062 := ṣ  (\u1e63)  LATIN SMALL LETTER S WITH DOT BELOW
S2     5 := Ṣ  (\u1e62)  LATIN CAPITAL LETTER S WITH DOT BELOW
s4  3359 := ś  (\u015b)  LATIN SMALL LETTER S WITH ACUTE
S4   523 := Ś  (\u015a)  LATIN CAPITAL LETTER S WITH ACUTE
S8     3 := ? S8  S with double dot below
s8    15 := ? s8   s with double dot below
T2    32 := Ṭ  (\u1e6c)  LATIN CAPITAL LETTER T WITH DOT BELOW
t2  4166 := ṭ  (\u1e6d)  LATIN SMALL LETTER T WITH DOT BELOW
t6     2 := ? ṯ  (\u1e6f)  LATIN SMALL LETTER T WITH LINE BELOW  (double dot below)
t8    10 := ? ṯ  (\u1e6f)  t with  double dot below
u1  1413 := ū  (\u016b)  LATIN SMALL LETTER U WITH MACRON
u7     9 := ü  (\u00fc)  LATIN SMALL LETTER U WITH DIAERESIS

Several letter-number codes are not AS representations:
A17     1 :=   reference
A9     1 :=   reference
E32     1 :=   reference
E8     2 :=  reference
I30     1 :=  reference
I31     1 :=  reference
I33     1 :=  reference
I4     1 :=  reference
I9     1 :=  reference
R5     1 :=   on line 1 of ieg.txt
L57375     1 :=   on 1st line only.
S488     1 :=   on 'check-out' card text at end
Z23     1 :=   reference

DTD

ieg.dtd:

<?xml version="1.0" encoding="UTF-8"?>
<!-- ieg.dtd
 May 1, 2014

-->
<!ELEMENT  ieg (H1)*>
<!ELEMENT H1 (h,body,tail) >
<!ENTITY % body_elts "i  |lb |Poem |P |H" >
<!-- h element -->
<!ELEMENT h  (key1,key2,hom?)>
<!ELEMENT key1 (#PCDATA) > <!-- in slp1 -->
<!ELEMENT key2 (#PCDATA )><!-- in AS -->
<!ELEMENT hom (#PCDATA)> <!-- homonym -->

<!ELEMENT body (#PCDATA  | %body_elts;)*>
<!ELEMENT i (#PCDATA | lb)*> <!-- italic, Sanskrit, in AS transliteration -->
<!ELEMENT Poem (#PCDATA | lb | i)*>
<!ELEMENT lb EMPTY> <!-- line break  -->
<!ELEMENT P EMPTY> <!-- paragraph  -->
<!ELEMENT H EMPTY> <!-- headline of some sort  -->

<!-- tail -->
<!ELEMENT tail (#PCDATA | L | pc )*>
<!ELEMENT L (#PCDATA) >
<!ELEMENT pc (#PCDATA) >

<!-- attributes  -->