Table Of Contents

Previous topic

GRA Grassman Wörterbuch zum Rig Veda (Developer notes)

Next topic

IEG Indian Epigraphical Glossary (Developer notes)

This Page

GST Goldstücker Sanskrit-English Dictionary (Developer notes)

Date of digitization: 2014

Metadata

The original digitization is file gst_orig.txt, which is coded in the cp1252 (windows 1252) encoding, and is best viewed in a text editor which supports this encoding. For example, in Emacs, one may use the command revert-buffer-with-coding-system and then select cp1252 as the coding. The internet reference http://www.cp1252.com/ describes this coding system.

The file gst_orig_utf8.txt is a conversion of gst_orig.txt to the more common utf-8 encoding. The file gst.txt is also in the utf-8 encoding, and incorporates various editing changes, such as corrections of typographical errors.

There are several extended ascii codes occurring in gst.txt:

£  (\u00a3)     3 := POUND SIGN
¤  (\u00a4)     1 := CURRENCY SIGN
¦  (\u00a6)  6771 := BROKEN BAR
º  (\u00ba)   206 := MASCULINE ORDINAL INDICATOR
Æ  (\u00c6)     6 := LATIN CAPITAL LETTER AE
æ  (\u00e6)    54 := LATIN SMALL LETTER AE
ç  (\u00e7)     1 := LATIN SMALL LETTER C WITH CEDILLA
œ  (\u0153)    19 := LATIN SMALL LIGATURE OE
‘  (\u2018)  1613 := LEFT SINGLE QUOTATION MARK
’  (\u2019)  1569 := RIGHT SINGLE QUOTATION MARK
“  (\u201c)    13 := LEFT DOUBLE QUOTATION MARK
”  (\u201d)    13 := RIGHT DOUBLE QUOTATION MARK
…  (\u2026)   387 := HORIZONTAL ELLIPSIS

The {X...X} style of coding serves several purposes:

{#X#}  44859  : {#X#} devanagari text, coded with HK
{@X@}      5  : bold text: only in non-body pages
{%X%}   6506  : italic text
{|X|}     63  : widely spaced text
{??}      21  : unreadable text

The following <x> type tags are found:

<g></g>  16  : Greek text, not transcoded
<>    28229  : Line beginning
<H>       9  : Heading. In title sections
<HI>   6811  : Part of headword notation
<HS>      2  : In  title section(s)
<P>    1156  : 'Paragraph' beginning new Sense
Page breaks are coded as [Page...].
Page breaks are more specifically coded as
[Page001-a+ 41] occurring on line 13 of gst.txt
In general, a page number has form [Pagexxx-y+ nn] where x is page number and
y is column (a or b) and nn is the number of lines in the following column.
Other forms appear at beginning of non-body text (title, preface, corrections)

The lines of the digitization correspond to lines of the text.

Headword coding is exemplified by: <HI>{#a#}¦
The general form is <HI>{#X#}¦
where X (key1) is coded in Harvard-Kyoto transliteration.
A variant form is used when there are homonymns, as exemplified by:
<HI>I. {#apa#}¦ homonym (also II,III)

The headwords are ordered according to Sanskrit alphabet ordering.

Only words beginning with the first Sanskrit letter, ‘a’, appear in this dictionary.

Sanskrit in the text sometimes appears in the European Indological form, which is coded in gst.txt with the the AS (Anglicized Sanskrit) coding. The general AS scheme, as described in CDSL.pdf, uses Latin alphabetical letters ‘x (a-z,A-Z), possibly with suffixed numbers; the letter-number combinations are, in the general scheme:

x1 = macron
x2 = dot below
x3 = dot above
x4 = accent aigu
x5 = tilde
x6 = dash below
x7 = umlaut
x10 = circonflex (hat)
x11 = accent grave

Here are the characters that occur in gst.txt in this coding, with their approximate frequency:

Some consonants, like d, appear in the text with an acute accent.
In these cases there is no preformed unicode equivalent, and such a character
C is represented in displays with as C\u0301  COMBINING ACUTE ACCENT
The visual display of these combining characters varies in quality, and
is often either invisible or awkwardly placed.

a1     1 := ā  (\u0101)  LATIN SMALL LETTER A WITH MACRON
a11    1 := à  (\u00e0)  LATIN SMALL LETTER A WITH GRAVE
A3    10 := NO DESCRIPTION  In Devanagari
A4   116 := Á (\u00c1) LATIN CAPITAL LETTER A with udatta accent *
a4  6199 := á(\u00e1) LATIN SMALL LETTER A with udatta accent
d4   145 := d' (\u0064\u0301) LATIN SMALL LETTER D WITH COMBINING ACUTE *
D4     1 := D' (\u0044\u0301) LATIN CAPITAL LETTER D WITH COMBINING ACUTE *
e3     1 := ė  (\u0117)  LATIN SMALL LETTER E WITH DOT ABOVE
E4     1 := É  (\u00c9)  LATIN CAPITAL LETTER E WITH ACUTE
e4    14 := é  (\u00e9)  LATIN SMALL LETTER E WITH ACUTE
H3     1 := Ḣ  (\u1e22)  LATIN CAPITAL LETTER H WITH DOT ABOVE
h4    10 := h' (\u0068\u0301) LATIN SMALL LETTER H WITH COMBINING ACUTE *
i1     3 := ī  (\u012b)  LATIN SMALL LETTER I WITH MACRON
I4     3 := Í (\00cd) LATIN CAPTIAL LETTER I and udatta accent *
i4  1025 := í (\u00ed) LATIN SMALL LETTER I and udatta accent
j4     1 := j' (\u006a\u0301) LATIN SMALL LETTER J WITH COMBINING ACUTE *
l4     1 := ĺ  (\u013a)  LATIN SMALL LETTER L WITH ACUTE
m4     3 := m' (\u006d\u0301) LATIN SMALL LETTER J WITH COMBINING ACUTE *
n3     1 := ṅ  (\u1e45)  LATIN SMALL LETTER N WITH DOT ABOVE
n4  1563 := ń  (\u0144)  LATIN SMALL LETTER N WITH ACUTE
o3     2 := NO DESCRIPTION: In devanagari
o4     1 := ó  (\u00f3) LATIN SMALL LETTER O and udatta accent
o7    17 := ö  (\u00f6)  LATIN SMALL LETTER O WITH DIAERESIS
R4   328 := Ŕ  (\u0154)  LATIN CAPITAL LETTER R WITH ACUTE
r4  1951 := ŕ  (\u0155)  LATIN SMALL LETTER R WITH ACUTE
S4   717 := Ś  (\u015a)  LATIN CAPITAL LETTER S WITH ACUTE
s4   867 := ś  (\u015b)  LATIN SMALL LETTER S WITH ACUTE
T4     1 := T' (\u0054\u0301) LATIN CAPITAL LETTER T WITH COMBINING ACUTE *
t4   514 := t' (\u0074\u0301) LATIN SMALL LETTER T WITH COMBINING ACUTE *
U4     4 := Ú (\u00da) LATIN CAPITAL LETTER U and udatta accent *
u4   543 := ú (\u00fa) LATIN SMALL LETTER U and udatta accent
U7     4 := Ü   (\u00dc)  LATIN CAPITAL LETTER U WITH DIAERESIS

DTD

gst.dtd:

<?xml version="1.0" encoding="UTF-8"?>
<!-- gst.dtd
 April 7, 2014

-->
<!ELEMENT  gst (H1)*>
<!ELEMENT H1 (h,body,tail) >
<!ENTITY % body_elts "i |s| lb |P|g " >
<!-- h element -->
<!ELEMENT h  (key1,key2,hom?)>
<!ELEMENT key1 (#PCDATA) > <!-- in slp1 -->
<!ELEMENT key2 (#PCDATA )><!-- in AS -->
<!ELEMENT hom (#PCDATA)> <!-- homonym -->

<!ELEMENT body (#PCDATA  | %body_elts;)*>
<!ELEMENT i (#PCDATA | lb)*> <!-- italic, maybeSanskrit, in AS transliteration -->
<!ELEMENT s (#PCDATA | lb)*> <!-- Sanskrit, in slp1 transliteration  -->
<!ELEMENT g (#PCDATA)> <!-- Greek (placeholder), always empty  -->
<!ELEMENT lb EMPTY>
<!ELEMENT P EMPTY>

<!-- tail -->
<!ELEMENT tail (#PCDATA | L | pc )*>
<!ELEMENT L (#PCDATA) >
<!ELEMENT pc (#PCDATA) >

<!-- attributes  -->