Table Of Contents

Previous topic

Construction of sqlite

Next topic

Transcoding (Developers)

This Page

Construction of query_dump file


X.xml query_dump.txt

Method of construction

The query_dump.txt file is used by the Advanced Search Display for the dictionary, as discussed below. Technically, the reconstruction of this file is quite simple:

php init_query.php X.xml query_dump.txt

Location of init_query.php

The program is available for download from the download pages for each dictionary. Specifically, it is in the Xweb.zip and/or Xweb1.zip downloads. Within these downloads, it is in the ‘web/webtc2’ directory.

Relation to Search Engines

Developers will probably recognize that there is a relation between the Advanced Search, based upon query_dump.txt, and search engines. The Advanced Search for a dictionary may be thought of as a search engine for the dictionary. This Advanced Search search engine is much more primitive in some ways than general search engines.

  • Modern search engines, such as those based on Lucene, are document based. It is probably appropriate to think of a dictionary headword entry as such a document.
  • Search engines create inverted indices, represented in rather complex file-based data structures. An inverted index would contain an entry with all the documents containing a particular word; for instance, all the dictionary entries containing the word ‘dog’. The Advanced Search uses no inverted index.
  • A search engine can respond to complex queries; for instance, find all dictionary entries containing the word ‘dog’ AND the word ‘cat’.
  • A properly configured search engine can find words in certain textual categories; for instance, find all entries containing a literary source reference to the Rg Veda.
  • A properly configured search engine does ‘stemming’; for instance, it might stem ‘carries’ to ‘carry’.
  • A properly configured search engine can provide the data needed to highlight the occurrences of a search term within a long document.

By contrast, there is at least one feature if the Advanced Search that may be difficult for a general search engine.

  • allow substring searches; for instance, find all words ending in ‘deva’, or containing ‘deva’ as a substring.

Also, there are capabilities absent in both the Advanced Search approach and in the hypothetical application of a modern search engine. For instance,

  • take into account variations in Sanskrit spelling. For instance,

    • kAryya v. kArya,
    • gaMgA v. gaNgA (slp1 transliteration)
    • many others - a full list is not known.
  • Sanskrit stemming. There is no generally accepted stemming algorithm for Sanskrit.

  • In bilinguial dictionaries, the distinction between Sanskrit words in IAST and words in English, French, German and Latin.

In conclusion, it is our view that there are major research opportunities in the application and customization of search engine software principles to the task of searching the digitized Sanskrit dictionaries.