Fluid Engage Server-side Technology Notes

Overview

This page documents a series of ongoing, informal conversations amongst members of the Fluid community about the Engage services layer technology. It provides notes and background thinking about the criteria and architectural considerations when choosing a particular suite of technologies.

Server-side versus client-side markup

Rendering markup on the server has the following advantages:

Easily Googleable: the page appears similarly to a web agent as to a user
Clearly bookmarkable: content is much more likely to enjoy a good correspondence of URL structure to document structure
Easier to repurpose or read offline: browser's "Save web page" can save a good rendition of the content
More performant: initial page loads will be faster, and not stress low-powered devices

On the other hand, client-side markup rendering has the following desirable points:

Broadly approachable: client-side technologies are ubiquitous, whereas any one choice for server-side language/framework is likely to alienate potential contributors
"Quick Start" development: In the short term, it is easier to dive into HTML mockups and add JavaScript interactivity to get quick results.
Easier dynamism: client-side development—especially with Infusion—simplifies the work of data and event binding, which can be complex between client and server.

It's fairly safe to predict that, in the long term, Engage will want a server-side markup idiom, especially if we want to take advantage of progressive enhancement to support a wide range of mobile devices. However, in the interests of rapid initial prototyping, a good deal of work can be done client-side. This is particular truefor those parts of the application (data entry, configuration, etc.) that are highly coupled to the need for dynamic schema (see below) and represent more clearly "application" rather than "content."

Coupled with this, we imagine starting with a server-side largely exposing CRUD-like functionality over a set of JSON feeds. Over the life of the project, we imagine a gradual migration to make a stronger reliance on server-side markup to improve load time, battery performance, and compatibility with a wider range of mobile devices.

In this direction, we plan to rely strongly on the services of the Fluid Renderer, which is a powerful solution for associating pure HTML markup with a potentially highly dynamic interface. The Renderer is unobtrusive, avoiding the need to mix up code and markup, such as in familiar templating languages like TAL or Smarty.

Anatomy of the Server-Side

The technology that will power the Engage services layer can be broadly divided into three areas:

Server-side programming language
Web framework, including deployment structure and URL routing
Persistence technologies for storing data, sharing data as JSON feeds, and providing free text search

Framework and Language Criteria

Some important criteria we will consider for a suitable choices are as follows:

Easy of deployment and hosting
Familiarity
Community
Quick start
Performance

Candidates We Considered

We explored a variety of potential technologies for the Engage services layer. In each case, we considered both a programming language and an accompanying Web framework, recognizing that much of the advantage of a particular language comes from good tools we can reuse. Here's a short list of the major technologies we considered:

PHP
Ruby + Merb
Python + CherryPy
Java + RSF
Server-side JavaScript: a) JVM + Servlets + JSGI, and b) V8CGI or other next-generation runtime + Apache module

We explored each of these options through a series of conversations with the Engage development team and the wider Fluid community.

Notes on Languages and Frameworks for Engage

Our Process

Making technology decisions early in a cutting-edge project can be fuzzy and difficult. The goal here was to look at a range of potential server-side languages and Web frameworks, narrowing in on an approach that provides our community with enough flexibility to adapt and change over the course of the project.

Arguably, our evaluation wasn't particularly scientific, though we did perform some fairly comprehensive performance benchmarks as we started to narrow down our choices. It's hard to quantify many of the criteria of a successful development environment; our aim was to find a language and accompanying framework that fit both the technological needs of our users and the culture of our development community. Throughout the process, we carefully examined both the features and the context of a particular technology: what it has to offer both in terms of technical solutions as well as the associated community, documentation, and support.

Persistence

Engage is going to have to embrace an incredible amount of diversity in terms of how data is structured. Each museum's collection is different, and it is a daunting task to attempt to unify all types of collections with a single schema. Rather than creating a "one size fits all" approach at the database level, the approach we'll take in Engage is to enable flexible schemas.

Each institution is liable to have custom workflow, along with corresponding institution-specific fields, that need to be fitted smoothly into the Fluid Engage markup and application system. These will arise through the inheritance of legacy data, or any number of other local priorities. As a result of not being able to determine the desired application schema in closed form before the design phase, it would be inappropriate to make use of a standard relational idiom for persistence (that is, one based on SQL).

Schema-less Databases

We'll also need to support free text searches. The most viable open source technology for this is Apache's Lucene and its associated Solr project. Solr is a standalone packaging of the underlying Lucene search engine. Whilst this is also capable of searching semi-structured data as for CouchDB, it excels particular at supporting free-text searches amongst larger document sets. Solr defines a standard wire protocol (with both XML and JSON APIs) for search and update, and so whilst dependent on Java hosting, need not commit to a Java-based core engine.

Sketching the Architecture

Here's a sketch of what the overall shape of the Engage architecture might look like:

Database: CouchDB
Search engine: Solr or Lucene
Binary file storage: filesystem or Fedora
Web services: JavaScript + Infusion or Python + framework

CouchDB would serve as the core database, providing a central store for saving and sharing data out. With Couch's RESTful interfaces, the database would essentially act as the main data feed for all information about an exhibit. For implementations where a database is already in place (for example, McCord's excellent CMS), a Couch-compatible RESTful API need only be created.

Full-text searches and structured queries would be provided by Solr placed in a Java Servlet container (e.g. Tomcat). A repository for binary files would be provided by either something simplistic, directly punching through to the filesystem, or a more built-up solution such as Fedora Repository.

Finally, a basic engine for organising and rendering views and data feeds for the Web services. This might be expressed directly in JavaScript using Rhino on the same JVM as the Solr app, or might instead be implemented in Python using a conventional framework.

Some Draft Scenarios for Engage Services Layer

Hosting the Services Layer

A goal of the Engage services layer is to support a range of hosting options. Some museums will inevitably have existing in-house technical infrastructure, and they'll want Engage to fit as seamlessly as possible into it. On the other hand, many institutions have little in the way of in-house technical resource, and will rely on contractors or publicly-available services to help with hosting and maintenance.

Engage Hosting Ideas