The query_dump.txt file is used by the Advanced Search Display for the dictionary, as discussed below. Technically, the reconstruction of this file is quite simple:
php init_query.php X.xml query_dump.txt
The program is available for download from the download pages for each dictionary. Specifically, it is in the Xweb.zip and/or Xweb1.zip downloads. Within these downloads, it is in the ‘web/webtc2’ directory.
The Advanced Search display for the dictionary allows additional kinds of searches within a dictionary:
find headwords containing the substring ‘deva’.
search for headwords whose definitions contain words matching certain patterns. For instance, in a Sanskrit-English dictionary, find headwords whose definitions contain the word ‘dog’.
The second type of search is similar in many ways to a full text search of the dictionary. The Advanced Search display searches the query_dump.txt file for X, instead of X.sqlite or X.xml or X.txt. This file is similar to X.xml in that it contains a record for each headword. It differs from X.xml in that it removes markup. Also, depending on the dictionary, it makes certain spelling simplifications; for instance, it might remove diacritical marks from IAST spellings.
Developers will probably recognize that there is a relation between the Advanced Search, based upon query_dump.txt, and search engines. The Advanced Search for a dictionary may be thought of as a search engine for the dictionary. This Advanced Search search engine is much more primitive in some ways than general search engines.
- Modern search engines, such as those based on Lucene, are document based. It is probably appropriate to think of a dictionary headword entry as such a document.
- Search engines create inverted indices, represented in rather complex file-based data structures. An inverted index would contain an entry with all the documents containing a particular word; for instance, all the dictionary entries containing the word ‘dog’. The Advanced Search uses no inverted index.
- A search engine can respond to complex queries; for instance, find all dictionary entries containing the word ‘dog’ AND the word ‘cat’.
- A properly configured search engine can find words in certain textual categories; for instance, find all entries containing a literary source reference to the Rg Veda.
- A properly configured search engine does ‘stemming’; for instance, it might stem ‘carries’ to ‘carry’.
- A properly configured search engine can provide the data needed to highlight the occurrences of a search term within a long document.
By contrast, there is at least one feature if the Advanced Search that may be difficult for a general search engine.
- allow substring searches; for instance, find all words ending in ‘deva’, or containing ‘deva’ as a substring.
Also, there are capabilities absent in both the Advanced Search approach and in the hypothetical application of a modern search engine. For instance,
take into account variations in Sanskrit spelling. For instance,
- kAryya v. kArya,
- gaMgA v. gaNgA (slp1 transliteration)
- many others - a full list is not known.
Sanskrit stemming. There is no generally accepted stemming algorithm for Sanskrit.
In bilinguial dictionaries, the distinction between Sanskrit words in IAST and words in English, French, German and Latin.
In conclusion, it is our view that there are major research opportunities in the application and customization of search engine software principles to the task of searching the digitized Sanskrit dictionaries.