Archive for August, 2005

switching databases

Now that I’ve added preliminary Adso support, I needed to add an easy way to allow the user to switch databases. So I’ve added a database preference page that allows you to do just that. You do have to restart the app when you change, but hopefully it’s not something you do so often that it will become annoying. Currently, I’m thinking of packaging the adso dictionary as an optional download that a user can if they desire. And now switching to using it should be a piece of cake.

I’m also playing around with the hsqldb web server which could allow me to serve my chinese.script file over the web. This is especially useful in my case where you have multiple computers and don’t want to synchronize your category lists on each one. Performance is a little bit slower (of course), but it doesn’t seem too bad so far. I’ll play around with it some more.

Add comment August 28th, 2005

Adso

The big thing I’ve been working on is getting Adso support in to the zdt. They’re the engine behind the cool newsinchinese website. I’ve been working off one of their flat file mysql databases. It has over 135,000 entries, compared to maybe 27000 that CEDICT had. Although that’s not a totally fair comparison since their dictionary is laid out a little different. For example, CEDICT has an entry with multiple definitions while Adso only has one definition per entry. Also Adso captures data like parts of speech which results in some duplicate entries since some words can act differently in different contexts. The only limitation with Adso so far that I see is that it does not have traditional characters.

To get Adso to work I have to convert their mysql flat file into my schema. Luckily my schema is basically just a simple subset of theirs, traditional char, simplified char, pinyin, definition. I mentioned above about the duplicates, and that’s what caused me the most trouble. First to get it to actually get rid of them correctly, and second to do it in a reasonable time. My first stab at it, my conversion algorithm took about 45 minutes to go through about 65,000 entries before I just stopped it. My second try got me almost there, about 10-15 minutes and maybe 90,000 entries before I actually ran out of memory. Finally, on my third try I got it down to 30 secs and finished the whole file succesfully. Definately need to choose the right data structures and algorithms if you’re parsing such a huge file. :) The resulting chinese.script file is a little above 14mb, compared to 2.7mb for the CEDICT version. I think the results are pretty good so far. Still got to do some more testing though.

Add comment August 24th, 2005

zdt and Firefox Part II

I got a couple of responses to my Firefox question on the Eclipse newsgroups. Basically, Firefox needs to be explicitly compiled with embedding support to work correctly with the SWT Browser widget. By default the releases that you get from Mozilla.org don’t have this embedding support built in. Some linux distributions do compile Firefox with this support, but not all. Billy Biggs suggests looking to see if your /usr/lib/mozilla-firefox folder contains the file ‘libgtkembedmoz.so’. However it’s not a foolproof way to be sure. Anyway, this highlights the need for the zdt to handle systems that don’t have the right browser installed in an appropriate manner. (like not crash) :)

2 comments August 23rd, 2005

Previous Posts


Calendar

August 2005
M T W T F S S
    Sep »
1234567
891011121314
15161718192021
22232425262728
293031  

Posts by Month

Posts by Category