Table Of Contents

Previous topic

CCS Cappeller Sanskrit Wörterbuch (Developer notes)

Next topic

GST Goldstücker Sanskrit-English Dictionary (Developer notes)

This Page

GRA Grassman Wörterbuch zum Rig Veda (Developer notes)

Date of digitization: 2007

Metadata

The original digitization is file gra_orig.txt, which is coded in the cp1252 (windows 1252) encoding, and is best viewed in a text editor which supports this encoding. For example, in Emacs, one may use the command revert-buffer-with-coding-system and then select cp1252 as the coding. The internet reference http://www.cp1252.com/ describes this coding system.

The file gra_orig_utf8.txt is a conversion of gra_orig.txt to the more common utf-8 encoding. The file gra.txt is also in the utf-8 encoding, and incorporates various editing changes, such as corrections of typographical errors.

There are several extended ascii codes occurring in gra.txt:

£  (\u00a3)  1086 := POUND SIGN
¤  (\u00a4)    47 := CURRENCY SIGN
§  (\u00a7)  1850 := SECTION SIGN
«  (\u00ab)     4 := LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
°  (\u00b0)     1 := DEGREE SIGN
µ  (\u00b5)    54 := MICRO SIGN
Ç  (\u00c7)    20 := LATIN CAPITAL LETTER C WITH CEDILLA
×  (\u00d7)     1 := MULTIPLICATION SIGN
ä  (\u00e4)  4660 := LATIN SMALL LETTER A WITH DIAERESIS
ç  (\u00e7) 11822 := LATIN SMALL LETTER C WITH CEDILLA
ö  (\u00f6)  3440 := LATIN SMALL LETTER O WITH DIAERESIS
ü  (\u00fc)  5641 := LATIN SMALL LETTER U WITH DIAERESIS
þ  (\u00fe)     1 := LATIN SMALL LETTER THORN
‘  (\u2018)     1 := LEFT SINGLE QUOTATION MARK
“  (\u201c)    59 := LEFT DOUBLE QUOTATION MARK
”  (\u201d)   982 := RIGHT DOUBLE QUOTATION MARK
„  (\u201e)   932 := DOUBLE LOW-9 QUOTATION MARK
…  (\u2026)  5897 := HORIZONTAL ELLIPSIS

The {X...X} style of coding serves several purposes:

{@X@}  24330  : bold text
{%X%}  23889  : italic text
{µXµ}     26  : widely-spaced text

The following <x> type tags are found in gra.txt:

<P>    13454 := Part of headword coding
<g>X</g> 229 := Greek text (uncoded)
<H>     4452 := Centered text
<P1>   32753 := Sub-paragraph
<F>        1 := Footnote
Page breaks are coded as [PageX] in gra.txt, where
X = 1 to X = 1686.
Some pages have two parts, and the 2nd parts are indicated with an ‘a’:
30622 505a
30624 506a
45768 757a
45834 758a
81218 1365a
81276 1366a

In the scanned image, each image shows two pages. Also, ‘Verzeichnisse nach dem Endlaute’ sections extend from pages 1686 to 1776, but are not included in the gra.txt digitization.

The lines of the digitization generally represent ‘sections’ of the text; the actual line-breaks of the text are not coded. Line breaks are sometimes represented by a vertical bar ‘|’ character.

There are several headword forms:
<P>5. ({@vas@})
<P>{@an3çu4,@}
<P>1. {@aks2a4,@}

In the original digitization, some of these forms represented non-headwords;
for example,
under verb ‘aj’ is shown <P>{@abhi4,@} ...;
this represents a preverb form used with ‘aj’.
In about 1000 such cases, the preverb form was changed to <P1>{@abhi4,@};
this was done by a program, so there are likely some errors and omissions
in the gra.txt headword list for this reason.

The headwords are ordered according to Sanskrit alphabet ordering.

Sanskrit in the text appears in the European Indological form, which is coded in gra.txt with the the AS (Anglicized Sanskrit) coding.

The general AS scheme, as described in CDSL.pdf, uses Latin alphabetical letters ‘x (a-z,A-Z), possibly with suffixed numbers; the letter-number combinations are, in the general scheme:

x1 = macron
x2 = dot below
x3 = dot above
x4 = accent aigu
x5 = tilde
x6 = dash below
x7 = umlaut
x10 = circonflex (hat)
x11 = accent grave

Here are the characters that occur in gra.txt in this coding, with their approximate frequency:

Note on combining diacriticals.  Some AS codes in gra.txt have no
one-character unicode representation, but may be represented using
a Unicode combining character.
The visual display of these combining characters varies in quality, and
may be either invisible or awkwardly placed.

a1 34711 := ā  (\u0101)  LATIN SMALL LETTER A WITH MACRON
a10 23907 := â  (\u00e2)  LATIN SMALL LETTER A WITH CIRCUMFLEX
a11    18 := à  (\u00e0)  LATIN SMALL LETTER A WITH GRAVE
a14     1 := ā' (\u0101\u0301) LATIN SMALL LETTER A WITH MACRON AND COMBINING ACUTE
a2     3 := ạ  (\u1ea1)  LATIN SMALL LETTER A WITH DOT BELOW
a4 62711 := á (\u00e1) LATIN SMALL LETTER A WITH ACUTE
a5     5 := ã  (\u00e3) LATIN SMALL LETTER A WITH TILDE
d2   747 := ḍ  (\u1e0d)  LATIN SMALL LETTER D WITH DOT BELOW
e1   970 := ē  (\u0113)  LATIN SMALL LETTER E WITH MACRON
e10   665 := ê  (\u00ea)  LATIN SMALL LETTER E WITH CIRCUMFLEX
e4  6122 := é  (\u00e9)  LATIN SMALL LETTER E WITH ACUTE
h2    98 := ḥ  (\u1e25)  LATIN SMALL LETTER H WITH DOT BELOW
i1  7530 := ī  (\u012b)  LATIN SMALL LETTER I WITH MACRON
i10  2949 := î  (\u00ee)  LATIN SMALL LETTER I WITH CIRCUMFLEX
i13     1 := ī3  (\u012b\u0033)  LATIN SMALL LETTER I WITH MACRON and digit 3
i4 19034 := í (\u00ed) LATIN SMALL LETTER I WITH ACUTE
i7     4 := ï  (\u00ef)  LATIN SMALL LETTER I WITH DIAERESIS
l2     4 := ḷ  (\u1e37)  LATIN SMALL LETTER L WITH DOT BELOW
m2   390 := ṃ  (\u1e43)  LATIN SMALL LETTER M WITH DOT BELOW
m3    25 := ṁ  (\u1e41)  LATIN SMALL LETTER M WITH DOT ABOVE
N06     1 := N06  Not AS. In gra.txt metadata at line 2.
n1   475 := n̄   (\u006e\u0304)  LATIN SMALL LETTER N WITH COMBINING MACRON
n2  6811 := ṇ  (\u1e47)  LATIN SMALL LETTER N WITH DOT BELOW
n3  1669 := ṅ  (\u1e45)  LATIN SMALL LETTER N WITH DOT ABOVE
n5  1221 := ñ  (\u00f1)  LATIN SMALL LETTER N WITH TILDE
o1   777 := ō  (\u014d)  LATIN SMALL LETTER O WITH MACRON
o10   610 := ô  (\u00f4)  LATIN SMALL LETTER O WITH CIRCUMFLEX
o4  4940 := ó  (\u00f3) LATIN SMALL LETTER O WITH ACUTE
P1 34294 := P1  Not AS. Sub-paragraph
r1    56 := r̄   (\u0072\u0304)  LATIN SMALL LETTER R WITH COMBINING MACRON
r10   105 := r̄   (\u0072\u0302)  LATIN SMALL LETTER R WITH COMBINING CIRCUMFLEX
r2     9 := ṛ  (\u1e5b)  LATIN SMALL LETTER R WITH DOT BELOW
r3  7709 := ṙ  (\u1e59)  LATIN SMALL LETTER R WITH DOT ABOVE
r4  2827 := ŕ  (\u0155)  LATIN SMALL LETTER R WITH ACUTE
s2 14421 := ṣ  (\u1e63)  LATIN SMALL LETTER S WITH DOT BELOW
t2  2330 := ṭ  (\u1e6d)  LATIN SMALL LETTER T WITH DOT BELOW
u1  4179 := ū  (\u016b)  LATIN SMALL LETTER U WITH MACRON
u10  2195 := ū  (\u016b)  LATIN SMALL LETTER U WITH MACRON
u11     2 := ù  (\u00f9)  LATIN SMALL LETTER U WITH GRAVE
u4  9206 := ú (\u00fa) LATIN SMALL LETTER U WITH ACUTE
v4    35 := v́' (\u0076\u0301) LATIN SMALL LETTER V WITH ACUTE
y10     2 := ŷ  (\u0177)  LATIN SMALL LETTER Y WITH CIRCUMFLEX
y2     1 := ỵ  (\u1ef5)  LATIN SMALL LETTER Y WITH DOT BELOW
y4   193 := ý  (\u00fd)  LATIN SMALL LETTER Y WITH ACUTE
z2     1 := ẓ  (\u1e93)  LATIN SMALL LETTER Z WITH DOT BELOW

DTD

gra.dtd:

<?xml version="1.0" encoding="UTF-8"?>
<!-- gra.dtd
 July 07-02, 2014

-->
<!ELEMENT  gra (H1)*>
<!ELEMENT H1 (h,body,tail) >
<!ENTITY % body_elts "i |b | P | P1 | H | g | wide | F" >
<!-- h element -->
<!ELEMENT h  (key1,key2,hom?)>
<!ELEMENT key1 (#PCDATA) > <!-- in slp1 -->
<!ELEMENT key2 (#PCDATA )><!-- in AS -->
<!ELEMENT hom (#PCDATA)> <!-- homonym -->

<!ELEMENT body (#PCDATA  | %body_elts;)*>
<!ELEMENT i (#PCDATA | wide | sic)*> <!-- italic -->
<!ELEMENT b (#PCDATA)> <!-- bold  -->
<!ELEMENT P EMPTY> <!-- Paragraph, often with headword -->
<!ELEMENT P1 EMPTY>  <!-- Sub-paragraph -->
<!ELEMENT H EMPTY>  <!-- Centered text -->
<!ELEMENT wide (#PCDATA)> <!-- widely-spaced text  -->
<!ELEMENT g (#PCDATA)> <!-- Greek text, usu. not coded  -->
<!ELEMENT F (#PCDATA)> <!-- Footnote (one instance)  -->

<!-- tail -->
<!ELEMENT tail (#PCDATA | L | pc )*>
<!ELEMENT L (#PCDATA) >
<!ELEMENT pc (#PCDATA) >

<!-- attributes  -->