Javascript compressor
Let’s see if you are a real hacker.
Your problem: a web page somewhat slow, with lots of javascript code.
You can:
- ignore the problem
- activate mod_deflate in the server for javascript code (be careful with old browsers!)
- use a javascript compressor to remove any extra spaces, new lines, comments, etc.
- take an existing javascript parser, and make it rewrite your javascript code, as above, without comments, spaces, etc. safely renaming internal variable names.
- download the ECMA standard, build a full javascript parser from scratch, and make it rewrite your javascript code as above. Extra points if you implement some extra size optimizations.
After a full week-end working on this thing, I had a parser “almost” working. Some more evenings and I had a compressor, but without renaming variables (yet).
Now I’m trying to finish the parser. As always, the latest 5% takes 90% of time. My parser is compliant except for:
- Virtual semicolons
- Regular expressions
We all know that in javascript you have to separate statements with semicolons, but you can ignore them in some cases. Among others, you can ignore them if put separate the current statement from the next one with at least a new line, and these two statements combined as one would raise an error. And actually somebody thought *that* thing would make javascript easier to understand (?!)
I have only modified my grammar so it is able to add virtual semicolons before ‘}’ and before the end of file. These are the two most useful points where you can unambiguosly do not put the semicolon.
Regular expressions are also a bit hard to parse with a LALR(1) grammar. I’m thinking of matching a ‘/’ or ‘/=’ token for a primary expression, and then switching the lexer so that it can parse the regular expression and parse it, all in the action of these two tokens. (At least that is what Rhino does.)
If I fail, I will rewrite the parser to a LL(1) one. I will have the same problems, but this time the parser will be hand made, and thus I should be able to put inside these hacks as I need them.
The good news are that my code is fully parsed and written back correctly, except for the two regular expressions I use. I will then start working on most advanced compression features, not yet available anywhere else.
I will keep you posted!
Archives
- July 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
- August 2005
Categories
- css (2)
- html (8)
- Interaction Design (6)
- javascript (11)
- miscellaneous (25)
- new features / improvements (73)
- panoramio (60)
- personal (2)
- places (25)
- Uncategorized (8)
Am I the only one around who thinks option 2 is the only good choice of the above? Options 3 onward don’t primarily address the speed problem, they put an end to the openness and reusability feature that makes the web such a good place to be in and part of in the first place.
Have you measured gains (okay, drops) in page weight, using both techniques? Just starting out with some imperfect sluggish hack such as the BrainJar crunchinator (which does not treat regexps, or even comments, very well) and comparing the results with what gzip -9 does to the same page? Last time I tried, gzip won by a margin.
Hi Johan,
Nope, I guess you are not the only one that think it
But I do not think removing extra stuff in javascript changes substantially the openness of the web. You are already entitled to follow the copyright of javascript code, and there are a lot of pretty printers for javascript out there. In fact, I will put a pretty printer along the compressor.
Sure it will not bring back comments and renamed names, but it is more than enough to make javascript code understandable. For instance, I regularly read / modify google maps code without even running it through a prettifier.
As you said, compressing javascript code with mod_deflate wins over removing extra stuff hands down. But these are cumulative wins. Wether you should do the “removing extra stuff” pass depends on your particular situation. And to some people the slight obfuscation provided by this pass is a feature instead of a bug.
The thing is that I am mostly interested in the “extra size optimizations” I hinted above, in point 5. I am using prototype.js and script.aculo.us, and these libraries are huge. I want for my tool to remove unused parts of these libraries, and collapse all the used files on a single one. That is why I went all the way to write my own parser.
Currently downloading 4 or 5 javascript files at the same time stress a bit navigators, and I want the front page to show as fast as possible.
I know there is a balance to keep here. Having javascript splitted in several files helps if the base ones are cached among the whole website. However I hope the end result to be so tiny for total time to be dominated by the number of files, but I will certainly have to measure it.
Last, and I should confess that was the main point to start the parser, it looks like a cool project with some potential. I may grow it to a full javascript engine, and coupled with a html parser it may be a nice tool to debug some javascript problems that are not easy to debug with today tools, as memory leaks in some browsers, or warnings when you use features that will not run on old browsers, and so on.
Btw, I have already fixed the parsing of the regular expressions with the hack I outlined above, and this morning I had an idea to insert virtual semicolons as needed, let’s see if it works…
Thank you for your comments Johan!
Cheers,
Oh, lookie; there is a comments feed; hooray!
*subscribes*
(I sloppily missed it earlier as there was no autodetect code in place for it.)
I’ve been poring through and tweaking the Google Maps code numerous times too, but it has always been a high time cost endeavour, at least on the levels I have been digging; the widget creation code, event, image+transparency and maptile handling subsystems to name a few places where very little publically named method, and most of all, parameter names, are strewn about. It’s just as painful every time. I probably would not be as passionate about this if I had not been through any of that.
But you are right about your tools in development being interesting from a higher level of development perspective; there are lots of things they could do that presently available javascript creation supporting toolkits do not offer.
JavaScript Tools find at http://www.yaodownload.com/web-authoring/javascript/, it contains the hot tools.
Did you try the Dojo compressor? http://dojotoolkit.org/docs/compressor_system.html - it’s based on the Rhino engine. From a pragmatic point of view, I have to agree with Johan: gzip is the ultimate solution to the JavaScript size problem.
JavaScript library in a rather big project (using prototype and scriptaculous):
original script file: 369 K
compressed with gzip -9: 83 K
Hi Philipp,
yes, it’s pretty good. Let’s say I had more fun building mine
As per the gzip suggestion, problem is if you gzip your javascript IE (at least up to version 6, don’t know about version 7) will not be able to use it. Sometimes it works, sometimes it’s not able to execute the javascript, and sometimes it even crashes.
So if even if you gzip your javascript, that will be of no use to the most part of your users.