Saturday, December 26, 2009
Perfect Syncopation
18204 total word(s)But what does it mean???
17369 word(s) found
20 word(s) not found
815 word(s) ignored
0.11% of words not found
4.48% of words ignored
3264 unique word(s)
Well, I've just run the word analysis tool on Livy Ab Urbe Condita Book 2. The important thing to note is that out of eighteen thousand words, only 20 weren't parsed and found in the dictionary. That's pretty much amazing.
How did this happen? Well, two things had to happen. First, I ignore capitalized words that weren't located in the dictionary. Essentially, I'm ignoring proper names and place names. Second, I programmed Numen's ability to parse syncopated perfect verbs: laudasse (laudavisse), norat (noverat), et cetera.
I still have a bit of testing to do to make sure I didn't break anything, but this was one of the few major hurdles that I needed to overcome to get a nearly perfect parsing engine!
Labels: accuracy, features, paradigms, verbs
Monday, December 21, 2009
Visual Refresh
News is slow -- for most schools around the U.S., winter break is upon us. With students taking a break, the site is also slowing down. Hopefully when you all return from break there will be some interesting new features and bug fixes!
Happy break.
Labels: interface
Saturday, November 28, 2009
OpenID Repaired
Tuesday, November 24, 2009
Vocabulary Lists!
- Go to flashcards at the top of this page
- Click "Print a flashcard deck".
- Choose a deck.
- Click the new button that says "Print Deck as Vocabulary List".
- Then simply go to File->Print (or File->Print Preview)!
Labels: features, flashcards, word lists
Regular Progress
One of the benefits of having students who actively use this dictionary is their feedback. One of the things they noticed was that sometimes an error message pops up saying, "AJAX has timed out", and it would happen relatively often -- especially on the flashcard practice tool. So I dug into the code and found the issue: I designed the site so that -- if any request took more than a few hundred milliseconds -- it would give an error. Such timeouts always involve a balance between briefness and lengthiness of waiting, and I quickly realized that I had not struck that balance. So I bumped up the timeout duration to something reasonably middle-of-the-road and. Voila! Problem solved! Gratias vobis ago, discipuli.
More: My big project with Vergil and Livy is still paying off. I continue to correct dozens of tiny mistakes and errors in the data every week, and I was able to run some statistics. Excluding proper names and place names, the parsing engine can analyze and pin down around 98.5% of all the words in these two Augustan authors. So accuracy is definitely improving daily! I can't yet account for all false positives, but they seem to be less than a fraction of a percent (anecdotally).
Speaking of accuracy, there is still room for improvement in three key areas:
- syncopated forms (laudaverunt => laudarunt), which Livy loves by the way!
- irregular forms (bobus, filiabus)
- proper and place names (which do not, for the most part, exist as a regular part of the Lewis Elementary dictionary).
In other news -- I guess I have more than I had first assumed -- I've almost got a word list feature finished. This is for people who prefer to work with formatted word lists as opposed to flashcard decks (which I understand are sometimes referred to as index cards). The only major problem I have with this process is that the Lewis Elementary dictionary does not provide a "core" definition for most words, so the word list would have extremely lengthy definitions. I think one of my best options is to import the data from Whittaker's Words, which have more simple, more core-like definitions. But this could be problematic as a 1:1 mapping between Lewis' forms and Whittaker's forms would be difficult to achieve. Alas, I shall continue to think on this one.
Okay, so that's enough for now! Keep using it, and please keep reporting problems and errors! It may take a few days or weeks, but I eventually do fix all the errors!
Labels: bugs, core definitions, features, feedback, flashcards, Lewis, Whittaker, word lists
Thursday, November 5, 2009
IE8 Flashcard Bug Fixed
I fixed a small bug that affected flashcard decks in Internet Explorer 8 (and presumably earlier versions). If you couldn't create a flashcard deck in that browser, it should be fixed now!
As a side note, I've been working on a big project with Livy and Vergil. I've essentially been editing all the mistakes and unfound words in those authors. This is especially useful in Livy because we have a corpus of about 1 million words! So the accuracy of this dictionary is creeping up to the highest possible levels! With the exception of proper names and place names, I'll ballbark its accuracy with common classical authors at about 95%.
Also, thanks to the people who have been reporting errors and bugs! It's really helpful to have your feedback!
Labels: accuracy, bugs, internet explorer, livy, vergil
Wednesday, October 14, 2009
J's and U's Updated / Speed Increases
My main motivation for making this update is because certain passages stored in The Latin Library reflect the older conventions of using J's for consonantal I's or U's for both consonantal and vocalic V's. Numen's parsing engine was having trouble recognizing forms like jecit (iecit) and uuius (vivus). So now as a result -- after a bit of work -- the engine is updated and now recognizes more possibilities than ever. Incidentally, internally J's are stored as I's and U's are stored as V's.
Another project I completed at the same time is an order-of-magnitude speed improvement for parsing. I was trying to figure out ways to make the engine faster and I discovered a shortcut that boosts speed tremendously. When parsing a word, the engine used to spend between 250ms and 500ms parsing each word! That was always disappointing to me, but I had gotten around the problem by caching the results. Now, however, word parsing takes about 25ms!
Why bother improving the speed? Because soon I will be implementing word lists and frequency lists! A word list, of course, is just a "mini-lexicon" that defines only the words in your chosen passage, and a frequency list is a list of words in order of how often they appear in a passage. The word list will be helpful to quickly work on vocabulary for a passage, and a frequency list will help Latin students study more effectively by giving them the most frequent words first. I'm very excited about this feature, but I don't anticipate it will be done before January 10th (giving me the winter holiday to work on it).
That's all for now!
Labels: accuracy, database, development, features, frequency lists, google cache, orthography, parsing engine, performance, slowness, vergil, word lists
