Saturday, January 21, 2006

On the Archive: An iteresting aspect of moving to a dynamic site was how to support old links. There are links out there to pages like

That page is now defined as

I wanted to be able to achieve backwards support for the old links.

To achieve this, I defined a Redirecting servlet and deployed it as all of the top level folders. So there's a Redirecting servlet deployed as JenMark. When a url like the example above comes in, the Redirecting servlet forwards the request to

The path parameter is the old location.

The archive servlet reads the path and maps that back to the ID in that place.

Eventually this remapping should be removed since links to the static folder path are not immutable anyway. Someone could move the element in their file system, sync up, and then the path is no longer valid.

That is way it is always a good practice to link by ID.

Old links use to also be made to the showelementpage servlet as in the following example.

The new servlet takes those links and remaps them to the new format.

It was sort of a pain to do all this backwards link mapping, but it's necessary since so many family blogs link into the archive for photos.
I thought I might blog a little bit about my archive project.

Some background. The archive is an automatically generated web site from pictures in the family. I think most geeks have built the same thing. I build and maintain this to work on my Java/Sevlet/Tomcat skills and because it's fun. The unique points about mine are:
  • Reads the photo information and uses the photo date to map to a calendar. (This has now been picked up by other projects, but I was doing this a couple years ago.) You can see all photos taken on a particular date.
  • Each family member maintains their section on their computer. There is a sync operation they run to upload (or remove) any changes to sync up with the archive. They are in control of their section. This sync operation also acts as a backup for the archive. We even use it in the same house. The sync operation also publishes a follow up email with links to the new photos.
  • Has useful functionality like Blog This which sets up a blogger posting with a link and a thumbnail on the image.
  • Maintains a recent changes section to see newly added elements.
In version 6 and before, the archive was made by running a 20 minutes process on the photo collection. The process generated a 998 MB static site on the 30,000+ elements consisting of about 100,000 html pages. This was due to the fact that there were index pages for every folder, date, and recent changes set and at least 3 version of every page due to the fact that pages had previous / next / up links.

When new photos were addeed, a process went through and scanned the archive folder structure and generated a static web site to match. This was fine at the beginnning, and I made some major performance gains in the generation, but with 30,000 photos we were still looking at about a 20 minute publish time at best.

Each element has an associated XML file with additional attributes - attributes set from reading the photo information, added from the process or modified by user edits. I like this design instead of a database because the storage is all file based and organization is all file based. Everything matches what it synced. All XML processing is done using castor.

Tod came along and suggested we add tagging support. I'm still working on that at this moment and have some new ideas not seen on other projects. But regardless, I realized that if we were going to add tags, I could not keep generating the site in a static manner. It would be a lot more pages to generate to index and display elements under all of the tags. It would also be nearly impossible to regenerate specific pages appropriatly when the user added a tag.

So I decided to switch to a totally dynamic generation design.

Around version 5, I added the concept of an ID to the elements. I did this so links from other blogs or sites would work, even if the photos was moved. Any photo could be accessed by it's unique ID. Whenever the system published the static site, it publised an additionl master list of IDs and locations in an xml file.

This meant that when the server started up, it read the ID map and was able to resolve requests for a photo by ID by remapping to the location of the element.

In version 7, two major changes were made. All links were replaced by links based on IDs. In addition, all pages were generated dynamically based on their ID. The ID mapping information now contains some additional information about the element including its children (for folders in the file system hierarchy view), its date (for the calendar) and its title amoung other things. The primary source of the mapping informaiton is still the individual xml files associated with the element. There's a 1-2 minute process that can rescan the file system and rebuild the cache.

With the cache can handle, the servlet can dynamically render all pages in the archive without having to read in the element's xml attributes except when the element is viewed on it's own page (in any section). All of the related previous, next, up characteristics can be determined from the cache. The entire calendar map is pulled from the cache.

At this point, I'm ready to work on the next phase which is being able to tell the cache that either a new element has been added, or an element has been modified. This will support the ability to add a tag and see the changes immediately. This will also have a nice benefit in that when the sync is happening, as soon as the element is uploaded, it can be added to the cache and immediately visible on the site (pending thumbnail generation).

Finally, in version 7.1, I have moved the syncing process which used to be a independent application into the website. The entire archive is run as a single war file under tomcat. This will avoid having to do any inter-process communication when updating the cache.

I'll chat more as things progress. I think the Archive system is very interesting. There are dozens of ways to do a lot of these things, but it's fun to play around with the design and still be pragmatic about how things are done.

An additional note, now that the application is setup in Tomcat as a single war, I have a ant task in Eclipse that deploys new versions with a single click. Nothing like making a code change and putting it in production with a single click! Luckily with the more recent changes, this deployment can be made and the web site be back up in about 20 seconds.

Tuesday, January 10, 2006

I was listening to the Bernstein Mass tonight. I also found this:

Timeline: School of Music: Indiana University

Under 1988 it mentions "Leonard Bernstein’s Mass at Tanglewood, as part of Boston Symphon's celebration of his seventieth birthday". I was in that performance. We performed the pieces several times in Bloomington, then went on the road to Tanglewood. My parents and other family members all attended the concert. I met Bernstein after the concert while he was sitting there with a drink and a cigarette. Mom got him to sign a program. It's in my scrapbook.

Under 1989 is also mentions "Philharmonic Orchestra participates in opening festivities of the Bastille Opera House" I was also in that performance and I made my orchestral debut as a celesta player. I convinced them they could save air-fare for one person to just let me play the part. The instrument in Paris waamazingng. traveleded to Paris for this concert. I have a photo album with lots of pictures.

Finally is mentions "Thomas Mathiesen, the nation'’s foremost authority on ancient Greek and Medieval Latin music theory texts, appointed" who taught me music history one semester.

Wow. Lots of memories.