Saturday, December 26, 2009
Perfect Syncopation
18204 total word(s)But what does it mean???
17369 word(s) found
20 word(s) not found
815 word(s) ignored
0.11% of words not found
4.48% of words ignored
3264 unique word(s)
Well, I've just run the word analysis tool on Livy Ab Urbe Condita Book 2. The important thing to note is that out of eighteen thousand words, only 20 weren't parsed and found in the dictionary. That's pretty much amazing.
How did this happen? Well, two things had to happen. First, I ignore capitalized words that weren't located in the dictionary. Essentially, I'm ignoring proper names and place names. Second, I programmed Numen's ability to parse syncopated perfect verbs: laudasse (laudavisse), norat (noverat), et cetera.
I still have a bit of testing to do to make sure I didn't break anything, but this was one of the few major hurdles that I needed to overcome to get a nearly perfect parsing engine!
Labels: accuracy, features, paradigms, verbs
Tuesday, November 24, 2009
Vocabulary Lists!
- Go to flashcards at the top of this page
- Click "Print a flashcard deck".
- Choose a deck.
- Click the new button that says "Print Deck as Vocabulary List".
- Then simply go to File->Print (or File->Print Preview)!
Labels: features, flashcards, word lists
Regular Progress
One of the benefits of having students who actively use this dictionary is their feedback. One of the things they noticed was that sometimes an error message pops up saying, "AJAX has timed out", and it would happen relatively often -- especially on the flashcard practice tool. So I dug into the code and found the issue: I designed the site so that -- if any request took more than a few hundred milliseconds -- it would give an error. Such timeouts always involve a balance between briefness and lengthiness of waiting, and I quickly realized that I had not struck that balance. So I bumped up the timeout duration to something reasonably middle-of-the-road and. Voila! Problem solved! Gratias vobis ago, discipuli.
More: My big project with Vergil and Livy is still paying off. I continue to correct dozens of tiny mistakes and errors in the data every week, and I was able to run some statistics. Excluding proper names and place names, the parsing engine can analyze and pin down around 98.5% of all the words in these two Augustan authors. So accuracy is definitely improving daily! I can't yet account for all false positives, but they seem to be less than a fraction of a percent (anecdotally).
Speaking of accuracy, there is still room for improvement in three key areas:
- syncopated forms (laudaverunt => laudarunt), which Livy loves by the way!
- irregular forms (bobus, filiabus)
- proper and place names (which do not, for the most part, exist as a regular part of the Lewis Elementary dictionary).
In other news -- I guess I have more than I had first assumed -- I've almost got a word list feature finished. This is for people who prefer to work with formatted word lists as opposed to flashcard decks (which I understand are sometimes referred to as index cards). The only major problem I have with this process is that the Lewis Elementary dictionary does not provide a "core" definition for most words, so the word list would have extremely lengthy definitions. I think one of my best options is to import the data from Whittaker's Words, which have more simple, more core-like definitions. But this could be problematic as a 1:1 mapping between Lewis' forms and Whittaker's forms would be difficult to achieve. Alas, I shall continue to think on this one.
Okay, so that's enough for now! Keep using it, and please keep reporting problems and errors! It may take a few days or weeks, but I eventually do fix all the errors!
Labels: bugs, core definitions, features, feedback, flashcards, Lewis, Whittaker, word lists
Wednesday, October 14, 2009
J's and U's Updated / Speed Increases
My main motivation for making this update is because certain passages stored in The Latin Library reflect the older conventions of using J's for consonantal I's or U's for both consonantal and vocalic V's. Numen's parsing engine was having trouble recognizing forms like jecit (iecit) and uuius (vivus). So now as a result -- after a bit of work -- the engine is updated and now recognizes more possibilities than ever. Incidentally, internally J's are stored as I's and U's are stored as V's.
Another project I completed at the same time is an order-of-magnitude speed improvement for parsing. I was trying to figure out ways to make the engine faster and I discovered a shortcut that boosts speed tremendously. When parsing a word, the engine used to spend between 250ms and 500ms parsing each word! That was always disappointing to me, but I had gotten around the problem by caching the results. Now, however, word parsing takes about 25ms!
Why bother improving the speed? Because soon I will be implementing word lists and frequency lists! A word list, of course, is just a "mini-lexicon" that defines only the words in your chosen passage, and a frequency list is a list of words in order of how often they appear in a passage. The word list will be helpful to quickly work on vocabulary for a passage, and a frequency list will help Latin students study more effectively by giving them the most frequent words first. I'm very excited about this feature, but I don't anticipate it will be done before January 10th (giving me the winter holiday to work on it).
That's all for now!
Labels: accuracy, database, development, features, frequency lists, google cache, orthography, parsing engine, performance, slowness, vergil, word lists
Monday, May 25, 2009
Flashcards, UTF8 and XSS
I've been a busy beaver since the semester ended. I've got two main things going on in my life right now: my reading list and this web site. I read Lombardo's translation of the Aeneid and now I'm reading Ferry's Georgics. I'm also working through Discourse, Consciousness and Time by Wallace Chafe.
But I've also been working on this site! If you've tried to visit in the last week, you might have noticed that the site was a bit flakey from time to time. It's true, and I apologize, but it was all temporary and for a good cause.
First, I rewrote the flashcards feature entirely using AJAX technology. Check them out! They're completely awesome. They should work at the very least in IE8, Firefox 3, Chrome and Safari 3. That should cover 98% of the people out there. Maybe I'll test them in Opera later. There is one major feature missing: printing. But I added two super-awesome features: custom flashcards decks and practicing those decks online! Two other minor features are missing: timed slideshows when practicing and searching by tags. Those are minor additions that I'll get to later.
I also made the site more uniformly UTF8 compatible. This is a technical, backend feature that won't affect you at all, most likely. I used to send all the Latin characters to your browser in HTML entities, but now I'm sending them directly in UTF8 encodings. Surprisingly, that was a really easy feature to enable.
Another big improvement is the site security. I've been looking for holes and security breach-points. I discovered a big one: XSS (Cross Site Scripting). It's kind of an ugly loophole on websites, one which has been around for ages. Essentially I fixed my back-end library code to disallow these so-called XSS attacks. With a bit of luck and some salt thrown over the shoulder, I've hopefully closed all the loopholes.
As usual, I'll add a promise to try and update the news regularly. But if I don't, just remember that this site is continually improving behind the scenes.
Labels: ajax, features, flashcards, security, utf8, vergil, xss
Wednesday, December 24, 2008
New Server and Speed Increases
I bought a new server. Did you know you can get slightly older computers, but still really powerful, for super cheap? People and businesses upgrade and then basically give their computers to discounters for nothing! I got this server for $144, with tax, shipping and an extra year's warranty. I'm very impressed.
Also, U.N.M (University of New Mexico) gave me a static IP address on their network, so we have a super-fast internet connection.
So, if you're used to this site being slow, get ready for serious changes! In general, moving to this new server on this new internet connection has increased the speed by an order of magnitude (from 300ms per request to 15ms per request). Wow!
But that's not all folks! I've also done some back-end coding to cache the results of morphology lookups. So now morphology lookups should increase by another order of magnitude (as long as a word is cached). If the word is not cached, the lookup will still be 2-3 times faster.
I apologize for geeking out a bit here, but I hope you notice the speed improvements.
As usual, I'm always developing The Latin Lexicon, but since I'm on winter break, expect to see some serious improvements for January!
Oh, one more thing. I also set up some bug-tracking software (BugZilla) to keep track of issues and improvements. So if you find all this technical stuff interesting, feel free to check it out!
Ok, one more thing! OpenID logins will be down for a day or two. Also, if you created an account or any flashcards between the 16th of December and today, I'm afraid that information is lost because I upgraded the database on the 16th and didn't get it moved until today. Sorry about that, if you're affected.
Labels: bugs, development, features, flashcards, openid, slowness, UNM, web server
Monday, July 28, 2008
Into the Great Wide Open
Another cool feature, one which is in development, is the flashcards feature. Anytime you see a word you want to study, just check the "I want a flashcard" option. Then, you can print out a list of your flashcards on Avery Business Cards! A planned feature is to be able to study your flashcards online, and keep "sets" of flashcards.
One feature which is not ready yet -- but coming soon -- is the paradigm creator. When completed, The Latin Lexicon will create a full paradigm for any word in the dictionary!
Currently under development is The Latin Lexicon for iPhone/iPod touch. If you have one of these fantastic little devices, give it a try!
There's a lot to come, so this application will remain in beta for a while. Even so, I hope you find it useful!
Labels: browse, development, features, flashcards, iphone, ipod touch, paradigms, search
