Table Of Contents

Previous topic

INM Index to the Names in the Mahabharata (Developer notes)

Next topic

MCI Mahabharata Cultural Index (Developer notes)

This Page

KRM Kṛdantarūpamālā (Developer notes)

Date of digitization: 2014

Metadata

The original digitization is file krm_orig.txt, which is coded in the cp1252 (windows 1252) encoding, and is best viewed in a text editor which supports this encoding. For example, in Emacs, one may use the command revert-buffer-with-coding-system and then select cp1252 as the coding. The internet reference http://www.cp1252.com/ describes this coding system.

The file krm_orig_utf8.txt is a conversion of krm_orig.txt to the more common utf-8 encoding. The file krm.txt is also in the utf-8 encoding, and incorporates various editing changes, such as corrections of typographical errors.

There are several extended ascii codes occurring in krm.txt:

¤  (\u00a4)   729 := CURRENCY SIGN
º  (\u00ba)     1 := MASCULINE ORDINAL INDICATOR
×  (\u00d7)     1 := MULTIPLICATION SIGN
‘  (\u2018) 14931 := LEFT SINGLE QUOTATION MARK
’  (\u2019) 14927 := RIGHT SINGLE QUOTATION MARK
“  (\u201c)  2427 := LEFT DOUBLE QUOTATION MARK
”  (\u201d)  2424 := RIGHT DOUBLE QUOTATION MARK
…  (\u2026)   410 := HORIZONTAL ELLIPSIS

In title/preface material: £ (u00a3) 1 := POUND SIGN § (u00a7) 1 := SECTION SIGN © (u00a9) 4 := COPYRIGHT SIGN

The {X...X} style of coding serves several purposes:

Almost all text in body is Devanagari, coded in krm.txt
with Kyoto-Harvard transliteration.
{#X#}   2114 : {#X#} X is english, or X is a Roman-numeral
{@X@}   2067 : bold text.  X is Devanagari, coded with HK. Mostly within headwords.
{%X%}    137 : italic text
{??}      23 : unreadable text

The following <x> type tags are found in krm.txt:

<F>X</F>  9293 :  Footnote.  Footnotes are intepolated. Thus, lines of file
                  do not directly correspond to lines of digitization.
<Poem>X</Poem? 2 :  Poem
<> 25676 : ordinary line break
<H> 2084 : Headline.  Used in headwords
<NI> 1897 : Usage unclear
<P>  424 : Paragraph. Usage unclear

<HI>   8 : in preliminary passage
<HS>  13 : In volume title pages
<Picture> 1 : in a volume title page
Page breaks are coded as [Page...].
In general, a page number has form [PagePPPP+ n],
where PPPP is a page number (in pdf) from 0001 to 1489
and n is the number of lines in the following page.
The body of the work appears on pages 0003 to 1425.
Pages labeled 1429-1489 are title and preface pages.

The lines of the digitization generally represent lines of the text, with the exception that footnotes in the digitization are inserted at the point of reference.

Headwords are generally identified by the form
<H>(n) {@“X”@}
where ‘n’ is a sequence number and X is the dhatupatha root.
There are a few variations on this form.

The headwords are ordered according to Sanskrit alphabet ordering. In this text, a ‘headword’ is a root in dhatupatha form.

The only letter-number combinations in krm.txt that represent Anglicized Sanskrit occur in the title/preface sections of the five volumes. Such text in krm.txt appears in the scanned images as European Indological.

The general AS scheme, as described in CDSL.pdf, uses Latin alphabetical letters ‘x (a-z,A-Z), possibly with suffixed numbers; the letter-number combinations are, in the general scheme:

x1 = macron
x2 = dot below
x3 = dot above
x4 = accent aigu
x5 = tilde
x6 = dash below
x7 = umlaut
x10 = circonflex (hat)
x11 = accent grave

The only letter-number combinations that represent AS occur in the title/preface sections. Here they are with their approximate frequency:

A1    11 := Ā  (\u0100)  LATIN CAPITAL LETTER A WITH MACRON
a1   103 := ā  (\u0101)  LATIN SMALL LETTER A WITH MACRON
A4     2 := NO DESCRIPTION
d2     3 := ḍ  (\u1e0d)  LATIN SMALL LETTER D WITH DOT BELOW
i1    13 := ī  (\u012b)  LATIN SMALL LETTER I WITH MACRON
m3     1 := ṁ  (\u1e41)  LATIN SMALL LETTER M WITH DOT ABOVE
n2    21 := ṇ  (\u1e47)  LATIN SMALL LETTER N WITH DOT BELOW
n3     3 := ṅ  (\u1e45)  LATIN SMALL LETTER N WITH DOT ABOVE
n5     1 := ñ  (\u00f1)  LATIN SMALL LETTER N WITH TILDE
R2     5 := Ṛ  (\u1e5a)  LATIN CAPITAL LETTER R WITH DOT BELOW
r2    29 := ṛ  (\u1e5b)  LATIN SMALL LETTER R WITH DOT BELOW
s2     7 := ṣ  (\u1e63)  LATIN SMALL LETTER S WITH DOT BELOW
S4    11 := Ś  (\u015a)  LATIN CAPITAL LETTER S WITH ACUTE
s4     7 := ś  (\u015b)  LATIN SMALL LETTER S WITH ACUTE
t2    11 := ṭ  (\u1e6d)  LATIN SMALL LETTER T WITH DOT BELOW
U1     5 := Ū  (\u016a)  LATIN CAPITAL LETTER U WITH MACRON
u1    12 := ū  (\u016b)  LATIN SMALL LETTER U WITH MACRON

Some other letter number combinations, occurring in body of text are:
final '0' (zero) used in Devanagari abbreviations:
A0 9
a0 1
o0 1

Ending digit '2' occurs several times in body of text; it may be that these
should be preceded by a space:
A2 5
a2 3
m2 4

DTD

krm.dtd:

<?xml version="1.0" encoding="UTF-8"?>
<!-- krm.dtd
 May 28, 2014

-->
<!ELEMENT  krm (H1)*>
<!ELEMENT H1 (h,body,tail) >
<!ENTITY % body_elts "i |s |lb|P|H|NI|F|Poem|b|nd" >
<!-- h element -->
<!ELEMENT h  (key1,key2,hom?)>
<!ELEMENT key1 (#PCDATA) > <!-- in slp1 -->
<!ELEMENT key2 (#PCDATA )><!-- in AS -->
<!ELEMENT hom (#PCDATA)> <!-- homonym -->

<!ELEMENT body (#PCDATA  | %body_elts;)*>
<!ELEMENT i (s )> <!-- in body, Tamil coded with HK (twice)-->
<!ELEMENT b (#PCDATA |s | lb)*> <!-- bold-->
<!ELEMENT s (#PCDATA)> <!-- Sanskrit, in slp1 transliteration  -->
<!ELEMENT nd (#PCDATA)> <!-- non-Devanagari. {#X#}  -->
<!ELEMENT F (#PCDATA  | lb | s | i | Poem|nd)*>  <!-- footnote -->
<!ELEMENT Poem (#PCDATA  | lb | s)*>
<!ELEMENT lb EMPTY>  <!-- line break -->
<!ELEMENT P EMPTY>
<!ELEMENT H EMPTY>
<!ELEMENT NI EMPTY>

<!-- tail -->
<!ELEMENT tail (#PCDATA | L | pc )*>
<!ELEMENT L (#PCDATA) >
<!ELEMENT pc (#PCDATA) >

<!-- attributes  -->