Friday, December 25, 2009

RunJS: GitHub, build options, features, file sizes

A few updates on RunJS, a JavaScript file/module loader (see the README for more documentation):
  • RunJS is now on GitHub
  • Plugins for RunJS are supported. i18n bundles have been pulled out as a plugin, and a new text plugin allows you to set text files (think HTML/XML/SVG files) as dependencies for a module. The plugin will use async XMLHttpRequest (XHR) to fetch those files and will pass the text of those files as an argument to a module's module definition function. The RunJS build system will then *inline* those text files with the module, so that the XHR calls be removed in deployed code, and allow cross-domain use of those text files.
  • Any function return type is allowed from the module definition function. Before only objects and functions were allowed and functions had to be called out in a special way. Now, that special call out is removed and any return type is allowed. The cost was an extra call, run.get() that needs to be used in circular dependency cases. See the Circular Dependencies section in the README.
  • The build system that comes with RunJS now supports build pragmas.
The build pragma support was used to build RunJS in a couple of different configurations. I am trying to get a handle on where the bulk of implementation lies, and what features add to its file size. Here is the breakdown (warning, Google Doc iframe inclusion, but interesting numbers inlined in this post after the iframe):



Let's look at the non-license sizes, since they give a better indication of code density. Google's Closure Compiler did the minification for this evaluation.

The normal config, with no plugins included (but with plugin support) is 7,970 bytes minified, 3,167 gzipped. Including both the i18n and text plugins with run.js bumps it up to 11,759 minified, 4,655 gzipped.

The interesting number for me is the version of run.js without plugin support, no run.modify, no multiversion support and no page load (run.ready/DOMContentLoaded callbacks). This version of run has just the following features:
  • support for the run() module format
  • nested dependency resolution
  • configure paths to modules
  • load just plain .js files that do not define run.js modules (scripts that do not call run(), for example jQuery, or plugins for jQuery).
That bare bones loader comes in at 5,086 minified and 2,204 gzipped. The one you should use, the one with the license, is 5,245 minified and 2,317 bytes gzipped. I need to work on the size of that license block!

That size could probably be brought down a tiny bit (probably reaching the 2,000 gzip size) if I were to really be aggressive and remove all context references, but that would be a mess to maintain and there would be no easy upgrade path to multiversion support.

I believe that is the lower limit a functional loader that does nested dependencies via run() module calls. I view run.ready/DOMContentLoaded support more of a necessity for a loader, so unless you already had an implementation for that, I suggest the version that has run.ready() support, which comes in (with license) at 5,867 minifed, 2,522 gzipped.

The nice thing about the build pragma setup for RunJS, you can upgrade run without having to change your code if you find you want more features, like plugin support, or i18n/text dependency support via plugins.

I am interested in trying to sell more front-end JavaScript toolkits on this loader. For some, I can see the bare-bones 2.3K gzipped loader a nice way to step into it, and then their users have the option to swap out a more powerful version via a different RunJS build output.

I have put up the different build outputs for 0.0.6 if you want to grab one of the minified versions and play with it. Here is the minimum set of compliance tests which use the smallest loader (no modify/plugins/page load/context support) mentioned above. See the README for documentation.

Right now I believe around 2KB gzipped is close to the lower bound for a stand-alone code loader in the browser. At least for a loader I would consider using: anything that uses XHR and eval are dead to me. Using plain script src="" tags helps the xdomain case, and just fits better with debugging. While Dojo has used an XHR-based loader for quite a while (and it will continue to be supported), it just does not work as well with the browser as a script-tag based loader. Any loader should also do nested dependency loading too -- if a module in a script has dependencies in other modules, be sure to evaluate the dependencies in the right order.

As a point of comparison, consider LABjs. I feel a kinship with the author of LABjs, Kyle Simpson, even though we have never talked. We are both focusing on efficient code loading in the browser. I recommend LABjs if it fits your style.

While LABjs does not quite do nested dependency resolution, it does something related where you can tell it to wait to load a script before continuing to load other scripts. LABjs is not trying to push a module format like run is, but targeted more at existing code that does not have the concept of a module format.

By the way, RunJS can also handle loading these types of files. Where LABjs has a wait() call for holding off loading scripts that depend on another script being loaded (like a framework), RunJS uses nested run calls.

Example from the LABjs page:

$LAB
.script("framework.js").wait()
.script("plugin.framework.js")
.script("myplugin.framework.js")
.wait(function(){
myplugin.init();
framework.init();
framework.doSomething();
});

Equivalent example with RunJS:

run(["run", "framework.js"],
function(run) {
run("plugin.framework.js", "myplugin.framework.js"],
function() {
myplugin.init();
framework.init();
framework.doSomething();
}
);
}
);

Taking the 1.0.2rc1 version of LABjs and using Closure Compiler on it (without the license) gives LABjs a size of 4,360 bytes minified and 2,170 gzipped. As a reminder, the equivalent RunJS file is 5,086 minified and 2,204 gzipped. I may be able to do better with making the structure of the RunJS code more amenable to minification, but the gzip sizes come up fairly close. I do not believe the code tricks I would do to help minification will help the gzip size any.

Both LABjs and RunJS end up around 2KB gzipped. So, about 2KB gzipped seems close to the lower limit on a standalone loader, one that uses script tags/plays nice with the browser and can do nested dependencies. I would like to be proven wrong though, and ideally by modifying RunJS to fit that lower limit. :) I am sure the code can be improved.

But remember the guidelines, no goofy XHR stuff/something that works well with the browser and can handle nested dependencies. No script tags with inlined source/eval tricks. Even though Firefox and WebKit make eval debugging easier, it is still not as nice as regular script src tags.

Irakli Gozalishvili believes web workers might help, but I do not see it. The workers are restricted to message passing, and anything interesting in a web browser will likely need to touch the DOM, so a web worker solution will just be another async-XHR-like approach, where you will need to eval the scripts or inline-script inject to get all the scripts for them to see each other and the DOM.

Irakli does have an async-XHR based loader for CommonJS modules. As of today, it comes in at 1,527 minified, 838 gzipped (license not included). But it uses XHR, so limited to the same domain as the page, and debugging support is just not as nice across browsers. It also uses CommonJS module syntax, but I have decided CommonJS modules do not play well out of the box in the browser, and I believe the format's "module", "exports", and "require.main" parts are unnecessary.

Thursday, December 10, 2009

Dojo 1.4 Favorite Features

Dojo 1.4 is out! There is a metric ton of changes. Here are some of my favorite things about the release. I focus mostly on Dojo Core, and mostly in the non-animation parts of it, so my list is skewed for that focus. However, there are lots of other changes, some in the animation functionality, and in Dijit and Dojox. Check out the 1.4 release notes to get a more complete picture.

One of the things I want to do for Dojo Core is to bring the DOM APIs, particularly the methods on dojo.NodeList (the return object for dojo.query() calls, Dojo's CSS selector method) more in-line with what is available in jQuery. jQuery has demonstrated that its APIs resonate strongly with developers. Where it makes sense and fits Dojo's philosophy, we should also provide those APIs, to make it easier for developers. These Dojo 1.4 changes reflect that goal:
  • dojo.ready(), just an alias for dojo.addOnLoad().
  • dojo.NodeList-traverse: A helper module that adds methods to dojo.NodeList. Its goal is to bring in some methods to NodeList that exist in jQuery for DOM traversal, specifically: children, closest, parent, parents, siblings, next, nextAll, prev, prevAll, andSelf, first, last, even, odd.
  • dojo.NodeList-manipulate: A helper module that adds methods to dojo.NodeList. Its goal is to bring in some methods to NodeList that exist in jQuery for DOM manipulation, specifically: innerHTML, html, text, val, append, appendTo, prepend, prependTo, after, insertAfter, before, insertBefore, remove, wrap, wrapAll, wrapInner, replaceWith, replaceAll, clone.
  • IO pipeline topics: get notifications of IO events via dojo.subscribe/dojo.publish. Handy for putting up a generic "loading" indicator when any sort of IO call happens. These topics are not strictly how jQuery exposes this functionality, but we can leverage the power of dojo.publish/subscribe to implement this feature.
Some other new Dojo Core 1.4 features that are really sweet:
  • dojo.cache(): allows you to reference external HTML files and use them as if they are strings. It is integrated into the build system, so you can avoid the XHR calls to get the external text files by just doing a build. No extra build option is needed. This is a great way to construct HTML -- by writing plain HTML instead of building awkward strings in code or using JS DOM-building calls, which can obscure what the HTML actually looks like.
  • dojo.position(): A faster, more understandable replacement for dojo.coords(). If you were using dojo.coords() before, odds are good that you probably want to switch to dojo.position(). Douglas Hays stepped up and put in this great new method.
  • dojo.declare(): It is faster and more robust. Many thanks to Eugene Lazutkin for doing this work. It took a lot of patience and perseverance to get this new version up to snuff and keep it backward compatible.
  • dojo.hash(): An easy way to set the URL hash (fragment ID) and to watch changes to the hash. This allows you to create pages that reflect the proper state as shown by the URL in the browser. This was a contribution from community member Rob Retchless and other IBM Jazz team members.
For the build system, support was added for Google's Closure Compiler, so you can experiment with using it for minifying your code. Right now we just support the "simple" minification done by Closure Compiler, not the advanced features.

It was a little while coming, but it is great to have Dojo 1.4 out. Thanks to the community for making the toolkit better!

Wednesday, November 25, 2009

JavaScript module loading, the browser and CommonJS

JavaScript module syntax and loading seems like a hot topic at the moment, and here are some thoughts about how to construct module syntax and a loader, with the goal of trying to get to a more universal approach for it. This will be discussed in the context of CommonJS, but browser-based module loaders have existed for a while. All are constrained by the browser in some fashion as listed below. I have a preferred solution, also described in this post.

First, a look at module syntax.

CommonJS is an umbrella for a few different things, including a spec for a module syntax and a standard library of modules. This post is just interested in its module syntax. A simple example of CommonJS syntax, defining an "increment" module defined in increment.js:

var add = require('math').add;
exports.increment = function(val) {
return add(val, 1);
};
How could we build a module loader with this syntax? Here are a couple of options:

1) You parse the module before executing it, looking for require calls. You make sure to fetch those modules and work out the right dependency order in which to execute the modules.

2) You just run the module and when require() is hit, do an synchronous IO operation to load the required module.

For both approaches, use a sandbox or specific context to make sure things like "exports" are defined separately for each module.

Both of those options are easy to implement on the server side since you have more control over the IO layer, and can create separate contexts for each module. However, in the browser, things are different. Creating a separate context for each module is tricky. For IO, there are two realistic approaches, and each has its difficulties:
  • XMLHttpRequest (XHR)
  • script tags
XHR allows us to do either approach, #1 or #2. It can get the contents of the module and parse it into a structure that pulls out the dependencies and we can sort out the right order to execute things. We could use sync XHR calls to accomplish #2, block when each require call is seen. However, sync XHR calls in the browser really hurt performance.

This is actually what the default Dojo loader has done for a very long time, and I believe some pathways in the Google's Closure library do the same thing. It is always recommended you do a custom build to combine all the modules you need into one file to cut out those XHR calls when you want to go to production.

So path #1 would make more sense with an XHR-based loader. However, for an XHR loader to work, it has to use eval() to bring the module into being. Some environments, like Adobe AIR do not allow eval(), and it makes debugging hard to do across browsers. Firefox and WebKit have a convention to allow easier eval-based debugging, but it is still not what I consider to be in keeping with traditional script loading in a browser.

Instead of eval, after the XHR call finishes its parsing and module wrapping for context, you could try to create a script tag that has a body set to the modified module source, but this really hurts debugging: if there is an error, the error line number will be some weird line in a gigantic HTML file instead of the line number of the actual module.

Dojo has a djConfig.debugAtAllCosts option that will use sync XHR to pull down all the modules, parse the for dependencies, work out the right load order, then load each module via a dynamically added script src="" tag. However, since IE and WebKit will evaluate dynamically added script tags out of DOM order -- they evaluate them in network receive order (which is nice for long-polling comet apps, but does not help module loading). So, each script tag has to be added one at a time, then wait for it to finish then add the next one. Not so speedy.

XHR is also normally limited to just accessing the same host as the web page. This makes it hard to use CDNs to load content, and get performance benefits with that approach. There is now support for xdomain XHR in most recent browsers, but IE prefers to use a non-standard XDomainRequest object, making our module loader more complicated. And xdomain XHR just plain does not work in older browsers like IE6.

So, an XHR-based loader is not so great.

Script tags are nice because they keep with the known script pathway in browsers -- easy to debug, and we can get parallel loading. However, we cannot do approach #2 in the browser: our JavaScript in the browser cannot access the module contents before they are evaluated. And since dynamically added script src="" tags via head.appendChild() are not a synchronous operation, approach #2 will not work.

So, really we need to do a variant of #1, pull out the dependencies needed by the module, then after those dependencies are loaded, execute the module. The way to do this in script: put a function wrapper around the module contents, and call a module loader function with a list of dependencies and the module function wrapper. Something like this, for a module with the name of "c" that has dependencies of "a" and "b", Here is a syntax (call it Variant A) for defining a module "c" with this approach:

loader(
"c",
["a", "b"],
function(a, b) {
//The module definition of "c" in here.
//return an object to define what "c" is.
return {};
}
);

or, another variant, call it Variant B:

loader({
name: "c",
dependencies: ["a", "b"],
module: function(a, b) {
//The module definition of "c" in here.
//return an object to define what "c" is.
return {};
}
});

Ideally, we would not have to tell the loader that this structure defines module "c" (the first arg in Variant A and the name: property in Variant B) -- the loader could work this out. Unfortunately, since script tags can load asynchronously and at least IE can trigger script.onload events out of order when compared to when the script is actually evaluated, we need to keep the module name as part of the module definition. This also helps with custom builds, where you can combine a few of these module definition calls into one script.

This approach is actually what Dojo's xdomain loader has done for a very long time, but with more verbose syntax. However, it requires a custom build to convert modules into this structure. The other option is to use a server-side process to convert the modules on the fly, but I do not feel that is keeping with the simplicity of normal browser development: just open a text editor, write some script, save, reload, no extra server config/process needed, besides maybe a vanilla web server.

So, I believe that modules should be coded by the developer in this module wrapper format. YUI 3 has taken this approach, and it is the approach I have taken for RunJS too. However, YUI 3 is limited to needing some module dependency metadata files to help it out. It also uses module names that do not map to the actual module's defined name/functions.

OK, back to CommonJS.

As it stands now, I believe the CommonJS format is not suitable for modules in the browser. There have been attempts to get it to work, but the attempts either use a sync XHR loader, or a "transform-on-the-fly" server process to convert the code to a module wrapper similar to Variant B.

I would rather see a module wrapper format that works with browser natively, that can be hand-authored by developers and that will work with CommonJS modules. CommonJS started out as ServerJS. As ServerJS, the case could be made that supporting browsers may not be an aim of a ServerJS module format. However, with the name change to CommonJS, I believe supporting browsers as a first class citizen is important for CommonJS to get more traction.

So the trick is to come up with a module syntax that has a function wrapper, but is not too wordy with boilerplate. We need some boilerplate, since we need a function wrapper. I believe RunJS has the right right approach. The boilerplate is very terse, basically Variant A mentioned above:

run(
"c",
["a", "b"],
function(a, b) {
//The module definition of "c" in here.
//return an object to define what "c" is.
return {};
}
);

I can see where there is some bikeshedding on the name "run". I think script() instead of run() is a viable alternative, and I may switch to that in the near future (and rename RunJS to ScriptJS).

I have attempted to engage the CommonJS community by putting up a proposal for an Alternate Module Format.

Progress has been slow, but to be expected: the CommonJS group is trying to do lots of other things like define a standard library and build out implementations. However, I am hopeful we can get something that works for the browser front end developers.

The ideal scenario is that some variant of the above syntax is just adopted as the only CommonJS module format. That would save a lot of conversion work, and I believe it makes things much simpler for CommonJS compliant loader. Right now, for CommonJS loaders there is a concept of a require() and require.async() and having to expose Promises for the async stuff. The above format neatly avoids the issue of whether the modules are loaded async or sync and avoids any need for Promises in the module loader. I think it is fine though for modules themselves to use Promises as part of individual module APIs, but at least the loader and module syntax stays simple.

I also do not believe a "module" variable needs to be defined for each module and an exports variable is avoided by returning an object from the module function wrapper.

I can appreciate that the CommonJS folks with modules already written may not like moving to the above syntax. I think it helps in the long run if we can just have one syntax, but in the meantime, I plan on doing the following:
  • Continue to engage the CommonJS community.
  • build out RunJS, probably rename to ScriptJS in the near future, and use script() instead of run()
  • Write a converter that converts Dojo modules to the RunJS/ScriptJS module syntax. I have something basic working, here is an example of Dojo's themeTester.html using RunJS-formatted dojo/dijit/dojox modules. That example is not bulletproof yet (I used a built version of Dojo which removes some dependency info) and i18n modules have not been converted either. RunJS also has built-in support for i18n modules.
  • Convert Raindrop to use RunJS-formatted dojo and convert the Raindrop modules to that format.
  • Override run.load()/script.load() in server environments so it could be used in CommonJS server implementations.
  • Work on a converter for existing CommonJS modules.
  • Use RunJS/ScriptJS as the module syntax for Blade and/or Dojo 2.0 efforts.
If module syntax/loading is important to you, then please join the discussion list for CommonJS, so we can sort this out. It would be great to get consensus on JavaScript module syntax and loading, and I think CommonJS is the area to do that.

I am happy to adjust some of the syntax in RunJS/ScriptJS to match some consensus, but I strongly prefer a terse format. The existing ones I have seen for server-converted CommonJS modules is too verbose for me, particularly for the common cases of defining a module with some dependencies.

Friday, November 20, 2009

Raindrop, CouchDB and data models

Raindrop uses CouchDB for data storage. We are starting to hit some tough issues with how data is stored and queried. This is my attempt to explain them. I am probably not the best to talk about these things. Mark Hammond, Raindrop's back-end lead is a better candidate for it. I am hoping by trying to write it out myself, I can get a better understanding of the issues and trade-offs. Also note that this is my opinion/view, may not be the view of my employer and work colleagues, etc...

First, what are our requirements for the data?
  • Extensible Data: we want people to write extensions that extend the data.
  • Rollback: we want it easy for people to try extensions, but this means some may not work out. We need to roll back data created by an extension by easily removing the data they create.
  • Efficient Querying: We need to be able to efficiently query this data for UI purposes. This includes possibly filtering the data that comes back.
  • Copies: Having copies of the data helps with two things:
    • Replication: beneficial when we think about a user having a Raindrop CouchDB on the client as well as the server.
    • Backup: for recovering data if something bad happens.
How Raindrop tries to meet these goals today

Extensible Data: each back-end data extension writes a new "schema" for the type of data it wants to emit. A schema for our purposes is just a type of JSON object. It has a "rd_schema_id" on it that tells us the "type" of the schema. For instance a schema object with rd_schema_id == "rd.msg.body" means that we expect it to have properties like "from", "to" and "body" on it. Details on how schemas relate to extensions:
  • An extension specifies what input schema it wants to consume, and the extension is free to emit no schemas (if the input schema does not match some criteria), or one or more schemas.
  • Each schema written by an extension is stamped with a property rd_schema_provider = "extension name".
  • All the messages schemas are tied together via an rd_key value, a unique, per-message value. Schemas that have the same rd_key value all relate to the same message.
More info is on the Document Model page.

Rollback: Right now each schema is stored as a couch document. To roll back an extension, we just select all documents with rd_schema_provider = "extension name" that we want to remove, and remove them. As part of that action, we can re-run extensions that depended on that data to have them recalculate their values, or to just remove the schemas generated by those extensions.

Having each schema as a separate document also helps with the way CouchDB stores data -- if you make a change to a document and save it back, then it appends the new document to the end of the storage. The previous version is still in storage, but can be removed via a compaction call.

If we store all the schemas for a message in one CouchDB document, then it results in more frequent writes of larger documents to storage, making compaction much more necessary.

Efficient Querying: Querying in CouchDB means writing Views. However, a view is like a query that is run as data is written, not when the UI may actually want to retrieve the information. The views can then be very efficient and fast when actually called.

However, the down side is that you must know the query (or a pretty good idea of it) ahead of time. This is hard since we want extensible data. There may be some interesting things that need to be queried later, but adding a view after there are thousands of documents is painful: you need to wait for couch to run all the documents through the view when you create the view.

Our solution to this, started by Andrew Sutherland and refined by Mark, was to create what we call "the megaview". It essentially tries to emit every piece of interesting data in a document as a row in the view. Then, using the filtering capabilities of CouchDB when calling the view (which are cheap), we can select the documents we want to get.

Copies: While we have not actively tested it, we planned on using CouchDB's built-in replication support. This was seen as particularly valuable for master-master use cases: when I have a Raindrop CouchDB on my laptop and one in the cloud.

Problems Today

It feels like the old saying, "Features, Quality or Time, pick two", except for us it is "Extensible, Rollback, Querying or Copies, pick three". What we have now is an extensible system with rollback and copies, but the querying is really cumbersome.

One of the problems with the megaview: no way to do joins. For instance, "give me all twitter messages that have not been seen by the user". Right now, knowledge of a message being from twitter is in a different schema document than the schema document that knows if it has been seen by the user. And the structure of the megaview means we can really only select one property at a time on a schema.

So it means doing multiple megaview calls and then doing the join in application code. We recently created a server-side API layer in python to do this. So the browser only makes one call to the server API and that API layer does multiple network calls to CouchDB to get the data, then does the join merging in memory.

Possible solutions

Save all schemas for a message in one document and more CouchDB views
Saving all schemas for a message in one document makes it possible to then at least consult one document for both the "type=twtter, seen=false" sort of data, but we still cannot query that with the megaview. It most likely means using more CouchDB views to get at the data. But views are expensive to generate after data has been written. So this approach does not seem to scale for our extensible platform.

This approach means taking a bit more care on rollbacks, but it is possible. It also increases the size of data stored on disk via Couch's append-only model, and will require compaction. With our existing system, we could consider just never compacting.

This is actually the approach we are starting to take. Mark is looking at creating "summary documents" of the data, but the summary documents are based on the API entry points, and the kind of data the API wants to consume. These API entry points are very application-specific, so the summary document generation will likely operated like just another back end extension. Mark has mentioned possibly just going to one document to store all schemas for a message too.

However, what we have not sorted out how to do is an easier join model: "type=twitter and seen=false". What we really want is "type=twitter and seen=false, ordered by time with most recent first". Perhaps we can get away with a small set of CouchDB views that are very specific and that we can identify up-front. Searching on message type and being seen or unseen, ordered by time seems like a fairly generic need for a messaging system.

However, it means that the system as a whole is less extensible. Other applications on the Raindrop platform need to either use our server API model of using the megaview then doing joins in their app API code (may not be so easy to learn/perform), or tell the user to take the hit waiting for their custom views to get up to date with all the old messages.

Something that could help: Make CouchDB views less painful to create after the fact. Right now, creating a new view, then changing any document means waiting for that view to index all the documents in the couch, and it seems to take a lot of resources for this to happen. I think we would be fine with something that started with most recent documents first and worked backwards in time, using a bit more resources at first, but then tailing off and doing it in the background more, and allow the view to return data for things it has already seen.

Do not use CouchDB
It would be very hard for us to move away from CouchDB, and we would likely try to work with the CouchDB folks to make our system work best with couch and vice versa. It is helpful though to look at alternatives, and make sure we are not using a hammer for a screwdriver.

Schema-less storage is a requirement for our extensible platform. Something that handles ad-hoc queries better might be nice, since we basically are running ad-hoc queries with our API layer now, in that they have to do all the join work each time, for each request.

Dan Goldstein in the Raindrop chat mentioned MongoDB. Here is a comparison of MongoDB and CouchDB. Some things that might be useful:
  • Uses update-in-place, so the file system impact/need for compaction is less if we store our schemas in one document are likely to work better.
  • Queries are done at runtime. Some indexes are still helpful to set up ahead of time though.
  • Has a binary format for passing data around. One of the issues we have seen is the JSON encode/decode times as data passes around through couch and to our API layer. This may be improving though.
  • Uses language-specific drivers. While the simplicity of REST with CouchDB sounds nice, due to our data model, the megaview and now needing a server API layer means that querying the raw couch with REST calls is actually not that useful. The harder issue is trying to figure out the right queries to do and how to do the "joins" effectively in our API app code.
What we give up:
1) easy master-master replication. However, for me personally, this is not so important. In my mind, the primary use case for Raindrop is in the cloud, given that we want to support things like mobile devices and simplified systems like Chrome OS. In those cases it is not realistic to run a local couch server. So while we need backups, we probably are fine with master-slave. To support the sometimes-offline case, I think it is more likely that using HTML5 local storage is the path there. But again, that is just my opinion.

2) ad-hoc query cost may still be too high. It is nice to be able to pass back a JavaScript function to do the query work. However, it is not clear how expensive that really is. On the other hand, at least it is a formalized query language -- right now we are on the path to inventing our own with the server API with a "query language" made up of other API calls.

Persevere might be a possibility. Here is an older comparison with CouchDB. However, I have not looked in depth at it. I may ask Kris Zyp more about it and how it relates to the issues above. I have admired it from afar for a while. While it would be nice to get other features like built-in comet support, I am not sure it will address our fundamental issues any differently than say, MongoDB. It seems like an update-in-place model is used with queries run at runtime. But definitely worth more of a look.

Something else?

What did I miss? Bad formulation of the problem? Missing design solution with the tools we have now?

Wednesday, October 28, 2009

Blade, a JavaScript toolkit experiment

I am playing around with a different way (at least for me) to construct a JavaScript toolkit. It is called Blade, and you can follow it via the Blade GitHub repo.

I have had this on my local drive for a few weeks now, and the germ of it started with this post. I wanted to get it more polished, but best to get it up somewhere to get some feedback at least on the principles.

There is not much there now, basically a tiny amount of spaghetti code that is not really usable. However, I list out the guiding principles in the README.md, visible on the GitHub source tab.

Thursday, October 22, 2009

Raindrop, Open Messaging for the Open Web

I work for Mozilla Messaging, and we just opened the doors on Raindrop. Raindrop is the reason I had the opportunity to move to Vancouver, BC. It has been fun building it, seeing if we could get something to work.

Raindrop is still very much an experiment and not useful for any day-to-day work. However, it has potential and we need community help to take if further.

Why I like it:
  • It is driven by product design. We want an extensible platform, but a strong, simple product design will be driving much of the development.
  • It is not trying to be a message service in itself, but collect messages from existing services.
  • It is web-based: the default UI is plain HTML/JavaScript/CSS goodness.
  • It is frickin awesome to be able to play with your messages: data mine them, and do interesting display things using simple script languages like JavaScript and Python.
  • It is open: open source and motivated by the Mozilla Manifesto.
  • The Mozilla Messaging team is talented and smart. They are motivated and care about what is best for users.
I am driving the front end development for Raindrop. I used Dojo to create the infrastructure for the pages, using Dijit's dijit._Widget and dijit._Templated as a base for many of the UI widgets.

Dojo's dynamic code loader and Dijit's well-defined methods on widgets have enabled the slick things we are doing with in-place extension editing and updates. JQuery is also included in the page, mostly for extension developers, so you have a choice on what to use, Dojo or JQuery.

While I expect many of the decisions I made about how the front end works might change over time, it has been a joy to make what is there so far.

There is lots to do though. If you want to get involved with the code, check the Hacking page is a good place to get started. There is a screencast of the architecture on the Raindrop home page. There is a Community page too.

Sunday, October 18, 2009

RunJS updated: module modifiers and function modules

I pushed some changes to allow lazy-loaded/lazy-evaluated module modifiers and also better support for modules that just define a function. The changes are documented in the documentation page.

I am experimenting with using JSLint as a code formatter. We'll see how it goes.

All these changes bring the size up to 3.1 KB minified and gzipped. I would like to be under 3 KB, but I want to be sure the right functionality is in place first before squeezing it down.

The module modifiers are a bit of an experiment. I wanted some way to separate a bunch of bad, wordy code out for the normal cases of a module but only in bad cases load the bad code. The example I give in the documentation is a module that gets DOM node dimensions and position. In standards mode, it is fairly compact, but in quirks mode it gets uglier. So I only want to load the quirks mode code if the page is in quirks mode. I never want to develop in a quirks mode page, but for a general JavaScript library it might be important.

So I am still not sure if the module modifier approach is the right way to go, but I have used it a little bit so far on another project, and I will see how that works out.

Monday, October 12, 2009

RunJS updated: simple modules and i18n bundles

I pushed some changes to RunJS, the stand-alone JavaScript file/module loader.

Newest changes are support for simple module definitions and internationalization (i18n) bundles.

Simple module definitions are possible when there are no dependencies for the module. In that case the function wrapper for the module is not needed:

run(
"my.simplething",
{
color: "red",
size: "large"
}
);
i18n bundle support is handy for separating out strings that might need to be translated into other languages. Quick example for a my/nls/colors.js that will define a bundle:

run(
"my.nls.colors",
[{
"root": {
"red": "red",
"blue": "blue",
"green": "green"
},
"fr-fr": "my.nls.fr-fr.colors"
}]
);
Then define a file at my/nls/fr-fr/colors.js that has the following contents:

run(
"my.nls.fr-fr.colors",
{
"red": "rouge",
"blue": "bleu",
"green": "vert"
}
);
See the documentation for more information.

If you want to try out the latest code, you can use one of the following URLs to fetch the code:
In addition to the documentation, some of the test files might be of interest to see how RunJS can be used.

RunJS is still nice and small with these new features, around 2.7KB, when minified via YUICompressor and gzipped.

Tuesday, September 29, 2009

RunJS

I just created what I hope to be the next generation JavaScript module loader here: RunJS. The documentation has some history and reasoning behind it.

I would like to see it used for any project that needs JS code loading, particularly since it handles dependencies, can load regular JavaScript files, handles multiple versions of modules, and is a compact 2.2KB (minified and gzipped). It is JavaScript toolkit-agnostic and has no dependencies.

Enjoy!

Wednesday, September 23, 2009

Chrome Frame, or Browser vs. Renderer

Chrome Frame came out yesterday. I like the conversation it is trying to start. It makes a big difference having working code to get the conversation to seriously happen. There are some mechanics of the specific approach that probably need tweaking, but the general idea is worth consideration.

The big point is about separating the innovation over organizing the user's experience of the web in general (the Browser) vs. innovation in the display of web sites (the Renderer).

The Chrome Frame approach allows browser makers to preserve their revenue models and their UI interaction models, at least in the ideal -- there are some specific issues to work out with the Chrome Frame model, like password/form data storage. But I think the direction is a good one.

It also solves the issue where a web application was developed a few years ago and is not going to be maintained any more. If the browser allows multiple rendering engines it makes it possible for those old web apps to continue to work.

Old enterprise/business applications in particular can continue to work, but we still get to move the new web experience forward.

From a browser market share perspective, this clean split of responsibility could allow a newer browser like Google Chrome to get more market share. It could use the reverse idea: embed the IE render engine in Google Chrome. Then, take that to all the IT administrators and say, here is a newer, safer, more secure browser for your company that will work on older machines, is free, and can still allow your old business web apps to work.

That may not work, since most people are attached to their browser chrome vs. the actual renderer. In those cases, the plain Chrome Frame approach of installing a plugin for another renderer works to allow better, faster web apps experiences.

One concern: user choice or control. I think it actually leads to more user control: the user gets to keep the browser chrome, their organizing model for the whole web, intact, but gets to use more of the better parts of the web. And for areas where the user does not have control now (in business environments that need old apps to work), it gives a way out to use more of the web.

Of course, there should user controls so they can set preferences and overrides for the renderers used. At a minimum, business IT groups will need it to configure the browsers to use old renderers for older in-house business web apps.

So, some tweaks to the basic mechanics I would like to see (realizing that I have no concept how hard this work would be):

1) Make sure the Renderer works well with the Browser: for instance make sure that saved passwords/form data works well no matter what Renderer is used. Make sure the split is clean.

2) Change the UA-Compatible thing to be more Renderer-based feature sets vs. browser version numbers. So, something where the developer mentions the capabilities that are desired:

<meta equiv="X-UA-Compatible" content="addEventListener=1,svg=1">

If the developer has a suggested browser renderer, it could place that at the end, sort of how CSS font names can start with generic names, then get more specific:

<meta equiv="X-UA-Compatible" content="addEventListener=1,svg=1;gecko=1.9.2">

It would be good to *not* use browser names in the tag, but rather the renderer engine names/versions. Ideally though, just list capabilities.

The above syntax is not exactly right, but just to demonstrate the idea: focus on telling the browser the capabilities the page wants, and use render engine names, not browser names.

3) Make sure the user can override the choices made by the browser. The pref control does not have to be obvious, but should be there, so the user has the final say.

In summary, as a conversation starter, I like that Chrome Frame has really tried to highlight the difference between upgrades in renderers vs. browser interface.


Sunday, August 02, 2009

Developer Tools and JavaScript Syntax Checking

I was using Coda to do web development, primarily because of the simple interface, nice Panic Sans font and integrated FTP and Subversion support. I normally just use TextWrangler for Dojo Core development.

However, while doing development at the new job, FTP/Subversion support was not needed, and I have grown tired having to run my JavaScript in the browser to find basic syntax errors.

I saw David Ascher using Komodo Edit, the free editor from ActiveState, which has JavaScript syntax checking built in. Searching in the Tools, Add-Ons menu for Dojo also brought up a Dojo API Catalogs extension that allowed for autocomplete of Dojo APIs.

Komodo Edit has been mentioned in the Dojo community before as a nice option particularly for doing Dojo development, but I can get severe tunnel-vision, so it was not until recently that really tried out Komodo Edit.

After I changed the key bindings to use the normal OS X COMMAND+] and COMMAND+[ for multi-line indenting, I was ready to go!

It makes a difference having syntax checking and Dojo API autocompletion. I believe other tools, like Emacs with js2-mode, can get at least syntax checking, but I am not an Emacs user. It would be neat to see Coda (and TextWrangler) add JavaScript syntax checking support. I can see JavaScript syntax checking being a minimum requirement for JS coders going forward.

As Bespin comes along, I can see it as my next developer tool upgrade, particularly since FTP support has been implicitly needed for some of my projects. With Bespin, I might be able to edit directly on the server.

For now I am enjoying the bump in productivity with Komodo Edit. Thanks ActiveState for making a nice tool available for free!

Wednesday, April 01, 2009

Custom local variables using the Dojo Loader

Recently, Dion Almaer was wishing for a way to avoid typing "dojo" so many times in the Bespin code. He appreciated the namespace protection that Dojo gave, but did not like the typing tax that came with it. He talked to Alex and Pete about it. Alex suggested a way, but it is very experimental.

Here is another way that leverages the strength of the Dojo Loader.

The Dojo Loader handles the work of loading your JavaScript modules. When you do a dojo.require("foo.bar"), the loader figures out that you want to load the foo/bar.js file and loads it for you. Normally the loader uses a synchronous XMLHttpRequest (XHR) call to load the file and uses eval() to bring the code into existence.

There is an xdomain loader that does not use eval, and allows your code to be loaded from any domain (with dependencies properly loaded), but it requires a build step using the Dojo build tools.

Both versions of the loader (normal and xdomain) allow you to namespace your code -- so you can map dojo to mydojo, and even your own code to some other name. This allows you to load multiple versions of dojo and/or your code in a page. Nifty.

The Loader is able to support this scope mapping by wrapping your module in a function call like so:
(function(dojo, dijit, dojox){
//Module code injected in here
})(dojo, dijit, dojox);
We can use this new scope created by this function to create local variables for your module.

Dojo's Loader can be modified to allow you to specify a set of local variable names that get injected with this function wrapping. And we can do this on a per-module prefix basis, so your code can have your own local variables, but some other module can specify a different set.

I did a prototype that works as follows: assume "coolio" is the namespace I use for my modules. I create a coolio.locals that has the following content:
dojo.provide("coolio.locals");

dojo.setLocalVars("coolio", {
trim: "dojo.hitch(dojo, 'trim')",
$: "dojo.hitch(dojo, 'query')",
id: "dojo.hitch(dojo, 'byId')"
});
This code will create local variables called trim, $ and id that map to dojo.query, dojo.byId and dojo.trim respectively, but only for coolio* modules.

Then I have another module, called coolio.actions that uses these variables:
dojo.provide("coolio.actions");

coolio.actions = {
init: function(){
$("#trimButton").onclick(coolio.actions, function(evt){
id("trimOutput").value = trim(id("trimOutput").value);
});
}
}
Notice that trim, id and $ were not declared in here. The final bit of magic is to use djConfig.require in the HTML page to auto-load the coolio.locals module before any other coolio code, so that the loader knows to create the local variables for any coolio.* module:
<script type="text/javascript" src="dojo/dojo.js" djConfig="require: ['coolio.locals']"><script>

See it in action with this sample page.

I created ticket #9032 to track the possibility of allowing this in the future. It also has the patch that can be applied to the Dojo source to get it to work. Or, if you are using Dojo 1.3.0, you can grab these built files with the changes: dojo.js or dojo.js.uncompressed.js

There are some caveats to this prototype:
1) We really need a build step that would inline the local variables for your build layers, so the code as-is will not work with custom build layers.
2) Does not work with xdomain builds yet, another tweak to the build system is needed for that.

If you think this might be useful for you, feel free to add your comments to the issue tracker entry or leave a comment.

Sunday, March 29, 2009

Job Transition

Friday was my last day at AOL, just shy of 13 years. There were good times, bad times. Some really cool things. Neat people and teachers. A great learning experience.

I am being a bit melodramatic, but Sinead O'Connor's Thank You for Hearing Me expresses the sentiment best, including the end of the song. For work and for life.

Monday I start at Mozilla Messaging. I am really excited about the work. It is still being defined, so nothing to share yet, but I look forward to helping people take charge of their messaging.

Saturday, March 28, 2009

Namespaces, Subjects and Verbs in JavaScript

It has been interesting to follow Dion Almaer's comments about using Dojo in Bespin. This post mentioning call conventions higlighted something I have wanted to talk more about.

The basic issue is namespacing and how you like to call functions. Here are some choices:
  1. subject.verb()
  2. namespace.verb(subject)
  3. namspace(subject).verb(), which can lead to namspace(subject).verb().verb().verb()
subject.verb()
#1 is the Prototype/Ruby way. You define a verb on the object's class, or in JavaScript, on its prototype.

In this model the namespace is really the subject's namespace. There is a possibility of collision with other code that wants to use the same verb. This can be mitigated by keeping your project small and focused, and managing your dependencies. This also works well when your code is a leaf node, or one step from the leaf node -- your code is not consumed by other code, except maybe a top level web application.

The benefit is a nice call structure, and I think fits better with normal English. The subject is identified and then a verb/action is performed with that subject.

Unfortunately, in JavaScript, there can be problems adding things to basic prototypes, like Array, String (and shudder, Object). In Dojo we have had to put in some protections in our code in case the page also uses code that modifies the built-in prototypes.

I believe the situation will improve in future JavaScript releases if added properties/verbs can be marked as not enumerable. That will help a bit, but namespace collision is still an issue.

Some browsers now have native String.prototype.trim implementations and Dojo now delegates to them for Dojo's trim method. Recently, there was a bug filed for Dojo where the core issue was some other code adding a String.prototype.trim() method that did not strip out beginning whitespace.

namespace.verb(subject)
#2 takes the position that verb collision is bad and can cause hard to trace bugs, so always use your own namespace. It also feels more "functional" or procedural: use small functions that do not maintain interior state but operate on their arguments.

This has the benefit of being safer, but it can be more verbose than #1, and therefore more of a constant tax on the developer.

This is mitigated somewhat in Dojo, where you can assign Dojo to another namespace. So you could map "dojo" to "$" to cut down on the typing.

namspace(subject).verb()
#3 is a nice compromise: Define a function that wraps the real subject in an object and provides verbs to act on that subject, without directly modifying the object/class structure of the subject. jQuery took this to new heights by allowing chaining of the verbs. Similarly, dojo.query() returns a dojo.NodeList object that has chainable verbs on it.

An explicit namespace is involved, but the idea is to make it as small as possible, "$". So the overhead for having the namespace is "$()". Coupled with the chainable verbs, it can lead to short code, less of a tax on the developer.

For Dojo, I think #3 is the right approach in general: it gives nice namespace protection, but still gives short call structures.

I want to explore a dojo() function that does this verb mapping in general for all of Dojo. Dijit might be able to use a dijit() to get something similar for the verbs it exposes in its namespace.

So for instance: dojo({foo: "bar"}).clone() would act the same as dojo.clone({foo: "bar"}). Scopemap dojo to $ and you get $({foo: "bar"}).clone().

The difficult part is how to deal with verbs/methods that deal with Strings. I want dojo("div") to actually call dojo.query("div"). But how to allow dojo(" some string ").trim()?

Maybe not map dojo("string") to dojo.query("string"), but instead, allow scope mapping of "d" (or even "_"?) to dojo and another mapping of "$" to dojo.query. That would probably match best with the expectations of what $ does today in other toolkits.

I will have to do more exploration. Eugene Lazutkin's work on providing adaptAs* functions for mixing in dojo methods into an object prototype as chainable methods might point the way to do this.