This section of the documentation describes the flow of processing for the ‘general’ digitization. For convenience in the explanations, we use the capital letter ‘X’ to represent the prefix identifying the dictionary. You can think of ‘X’ as ‘skd’ for ‘Sabdakalpadruma’, or as ‘cae’ for ‘Cappeller Sanskrit-English Dictionary’, etc. The names of various files will incorporate that X; for instance, X_orig.txt corresponds, when ‘X’ is ‘skd’, to the particular file ‘skd_orig.txt’.
The various files are available for download by a developer in the downloads section of the Cologne Sanskrit-Lexicon web-site for each dictionary. The data and computer program codes are released under the Creative Commons Attribution Non-Commercial Share Alike license.
If you happen to find some required file unavailable for a particular dictionary, let us know and we will aim to make the file available for download. It is our intent to make available for download everything we have done that a developer might need.
The general flow of processing is shown below. Follow the links to find further details.
Digitization adjustment : Various alterations to the original digitization are made.
Headword identification : Headwords in the digitization X.txt are identified and normalized and transcoded.
Construction of xml file : An xml representation is constructed from the digitization and the headwords.
Construction of sqlite : The xml representation is converted into a very simple SQL dataqbase form, which is useful in web displays.
Construction of query_dump file : The xml representation is used to generate a file specialized for the advanced search display.
For mw, the Monier-Williams Dictionary (of 1899), only the last two steps of the framework are applicable. See MW Monier-Williams Sanskrit-English Dictionary (Developer notes).