In order to create a very good rich and extensible VN-EN dictionary (and other languages in the future for sure) I made a research of the current open source environment.
Dictionary formats(rich text, XML metalanguage):
We pretend our dictionaries to be extensible by the community, kind of wiki style, in order to do that, the dictionaries must be well formatted so we can recognize units like NOUN,VERB, DEFINITIONS,EXAMPLES, etc
XDXF (XML Dictionary Exchange Format) is one of the best and well structured languages. This format is good to make the computer understand what is each element and that way transform them to specific colors, sizes, etc. I hate to see the dictionaries in boring plain-text all the time.
The second advantage is the extensibility of the dictionaries by the communities,let’s see this example:
| play |
| A |
noun |
| |
1 |
play, swordplay |
| |
|
the act using a sword (or other weapon) vigorously and skillfully |
| |
2 |
play, child’s play |
| |
|
play by children that is guided more by imagination than by fixed rules; “Freud believed in the utility of play to a small child” |
|
Let’s say that somebody wants to add the meaning for the “play” when used as a verb, so instead of messing the content adding new lines, change the colour, etc , the dictionary software should have the necessary tools(buttons) to allow the user to add new XML structures to the content.
StarDict format The structure is not XML, all byte-coded, it can contain Images,Audio,XDXF,HTML,Wiki links, I don’t really like this because it’s being created from StarDict and almost all the dictionaries are just plain-text not profiting this features.
Here there is a list of other formats.
Engine dictionary (compression,memory usage):
DICT It’s widely used because the StarDict files are compressed using this engine. Info,Index and Content are separated in different files. StarDict project has optimized the engine creating some extra files (cache, collation,etc)
Sdictionary. Info, Index and Content is compressed in the same file. There are tools to create files using HTML content.
Both engines allow a minimum usage of memory because only the index is loaded in memory and only the requested content is being taken by the compressed file.
Open Source Software dictionaries containers:
Name
|
Language
|
Active
|
Features
|
| STARDICT |
C / GTK |
Reactive from 2007 (2 developers) |
Many dictionaries, plain text format, supports DICT, Sdictionary format and Babylon format. |
| JALINGO |
Java |
Inactive from 2006(we’ll reactivate it) |
Very nice interface. Supports Multisearch. formats: Sdictionary, MOVA, etc (No DICT yet). Supports Richtext. |
| SDIQT |
python / QT |
Inactive from 2006 |
Not nice interface.Only supports Sdictionary format. |
| KTRANSLATOR |
C++ / QT |
Inactive from 2006 |
Stardict, Freedict, DICT formats. KDE component. |
| QSTARDICT |
C / QT4 |
Active (1 developer) |
it’s a clone of stardict using QT4 |
Our decision: If nobody changes their mind, I think we’ll go for the XDXF/Sdictionary/Jalingo combination to create the VN-EN project.