Tuesday, November 24, 2009
Vocabulary Lists!
I mentioned that I've been working on vocabulary lists for those people who don't want/need/like flashcards. So now you can print out your flaschard deck as a study list! Give it a try!
- Go to flashcards at the top of this page
- Click "Print a flashcard deck".
- Choose a deck.
- Click the new button that says "Print Deck as Vocabulary List".
- Then simply go to File->Print (or File->Print Preview)!
Labels: features, flashcards, word lists
Regular Progress
I like to keep the news relatively fresh, even if nothing major is happening. But even though this is a somewhat slow time for Numen development, there are still a constant stream of small updates. I'd like to keep everybody apprised of the situation.
One of the benefits of having students who actively use this dictionary is their feedback. One of the things they noticed was that sometimes an error message pops up saying, "AJAX has timed out", and it would happen relatively often -- especially on the flashcard practice tool. So I dug into the code and found the issue: I designed the site so that -- if any request took more than a few hundred milliseconds -- it would give an error. Such timeouts always involve a balance between briefness and lengthiness of waiting, and I quickly realized that I had not struck that balance. So I bumped up the timeout duration to something reasonably middle-of-the-road and. Voila! Problem solved! Gratias vobis ago, discipuli.
More: My big project with Vergil and Livy is still paying off. I continue to correct dozens of tiny mistakes and errors in the data every week, and I was able to run some statistics. Excluding proper names and place names, the parsing engine can analyze and pin down around 98.5% of all the words in these two Augustan authors. So accuracy is definitely improving daily! I can't yet account for all false positives, but they seem to be less than a fraction of a percent (anecdotally).
Speaking of accuracy, there is still room for improvement in three key areas:
In other news -- I guess I have more than I had first assumed -- I've almost got a word list feature finished. This is for people who prefer to work with formatted word lists as opposed to flashcard decks (which I understand are sometimes referred to as index cards). The only major problem I have with this process is that the Lewis Elementary dictionary does not provide a "core" definition for most words, so the word list would have extremely lengthy definitions. I think one of my best options is to import the data from Whittaker's Words, which have more simple, more core-like definitions. But this could be problematic as a 1:1 mapping between Lewis' forms and Whittaker's forms would be difficult to achieve. Alas, I shall continue to think on this one.
Okay, so that's enough for now! Keep using it, and please keep reporting problems and errors! It may take a few days or weeks, but I eventually do fix all the errors!
One of the benefits of having students who actively use this dictionary is their feedback. One of the things they noticed was that sometimes an error message pops up saying, "AJAX has timed out", and it would happen relatively often -- especially on the flashcard practice tool. So I dug into the code and found the issue: I designed the site so that -- if any request took more than a few hundred milliseconds -- it would give an error. Such timeouts always involve a balance between briefness and lengthiness of waiting, and I quickly realized that I had not struck that balance. So I bumped up the timeout duration to something reasonably middle-of-the-road and. Voila! Problem solved! Gratias vobis ago, discipuli.
More: My big project with Vergil and Livy is still paying off. I continue to correct dozens of tiny mistakes and errors in the data every week, and I was able to run some statistics. Excluding proper names and place names, the parsing engine can analyze and pin down around 98.5% of all the words in these two Augustan authors. So accuracy is definitely improving daily! I can't yet account for all false positives, but they seem to be less than a fraction of a percent (anecdotally).
Speaking of accuracy, there is still room for improvement in three key areas:
- syncopated forms (laudaverunt => laudarunt), which Livy loves by the way!
- irregular forms (bobus, filiabus)
- proper and place names (which do not, for the most part, exist as a regular part of the Lewis Elementary dictionary).
In other news -- I guess I have more than I had first assumed -- I've almost got a word list feature finished. This is for people who prefer to work with formatted word lists as opposed to flashcard decks (which I understand are sometimes referred to as index cards). The only major problem I have with this process is that the Lewis Elementary dictionary does not provide a "core" definition for most words, so the word list would have extremely lengthy definitions. I think one of my best options is to import the data from Whittaker's Words, which have more simple, more core-like definitions. But this could be problematic as a 1:1 mapping between Lewis' forms and Whittaker's forms would be difficult to achieve. Alas, I shall continue to think on this one.
Okay, so that's enough for now! Keep using it, and please keep reporting problems and errors! It may take a few days or weeks, but I eventually do fix all the errors!
Labels: bugs, core definitions, features, feedback, flashcards, Lewis, Whittaker, word lists
Wednesday, October 14, 2009
J's and U's Updated / Speed Increases
I mentioned a few weeks ago that I planned on making I's/J's and U's/V's look the same on the back-end, while preserving their traditional orthographies on the front-end. I've just completed this task!
My main motivation for making this update is because certain passages stored in The Latin Library reflect the older conventions of using J's for consonantal I's or U's for both consonantal and vocalic V's. Numen's parsing engine was having trouble recognizing forms like jecit (iecit) and uuius (vivus). So now as a result -- after a bit of work -- the engine is updated and now recognizes more possibilities than ever. Incidentally, internally J's are stored as I's and U's are stored as V's.
Another project I completed at the same time is an order-of-magnitude speed improvement for parsing. I was trying to figure out ways to make the engine faster and I discovered a shortcut that boosts speed tremendously. When parsing a word, the engine used to spend between 250ms and 500ms parsing each word! That was always disappointing to me, but I had gotten around the problem by caching the results. Now, however, word parsing takes about 25ms!
Why bother improving the speed? Because soon I will be implementing word lists and frequency lists! A word list, of course, is just a "mini-lexicon" that defines only the words in your chosen passage, and a frequency list is a list of words in order of how often they appear in a passage. The word list will be helpful to quickly work on vocabulary for a passage, and a frequency list will help Latin students study more effectively by giving them the most frequent words first. I'm very excited about this feature, but I don't anticipate it will be done before January 10th (giving me the winter holiday to work on it).
That's all for now!
My main motivation for making this update is because certain passages stored in The Latin Library reflect the older conventions of using J's for consonantal I's or U's for both consonantal and vocalic V's. Numen's parsing engine was having trouble recognizing forms like jecit (iecit) and uuius (vivus). So now as a result -- after a bit of work -- the engine is updated and now recognizes more possibilities than ever. Incidentally, internally J's are stored as I's and U's are stored as V's.
Another project I completed at the same time is an order-of-magnitude speed improvement for parsing. I was trying to figure out ways to make the engine faster and I discovered a shortcut that boosts speed tremendously. When parsing a word, the engine used to spend between 250ms and 500ms parsing each word! That was always disappointing to me, but I had gotten around the problem by caching the results. Now, however, word parsing takes about 25ms!
Why bother improving the speed? Because soon I will be implementing word lists and frequency lists! A word list, of course, is just a "mini-lexicon" that defines only the words in your chosen passage, and a frequency list is a list of words in order of how often they appear in a passage. The word list will be helpful to quickly work on vocabulary for a passage, and a frequency list will help Latin students study more effectively by giving them the most frequent words first. I'm very excited about this feature, but I don't anticipate it will be done before January 10th (giving me the winter holiday to work on it).
That's all for now!
Labels: accuracy, database, development, features, frequency lists, google cache, orthography, parsing engine, performance, slowness, vergil, word lists
