Monday 2 March 2015

Don't worry about finishing, focus on starting.

Big projects are so hard to finish.

Thats why I never try to finish them, I focus on starting them, and if I stop I focus on starting again.

The truth is that you'll never be able to work on something and guarantee that you wont be stopped or interrupted by something. Thats just life.

So concentrating on finishing something is counter productive from the get go because you're doomed to fail.

You're a hundred times better off if you focus on starting and allowing the progress you make each time you start to accumulate.

It's been said that truly great projects are never finished anyway, only abandoned. So starting often and building upon your previous work is the best way to get a big project over the line, because as the project gets more and more complete it becomes harder to find new things to start. And when you get to the point that there is no task to start, you can happily say the project is as done as it needs to be.

The other benefit of multiple little starts is that you get to set yourself small achievable goals... like setting up source control, wire framing the project, setting coding standards, etc (yes I'm talking about programming jobs here, but the theory applies to anything).

The other benefit of starting often is that each time you get to look at the project with fresh eyes. The project and the requirements may have changed since you first started it, and at each new start you get to re-set your game plan in line with reality.

The software development methodology called "Scrum" is based almost entirely on the principle of re-evaluating and re-starting your project repeatedly overs it's life cycle.

Starting lots of times also gets you into the habit of thinking in terms of tasks your actually have a chance to finish before life gets in the way. This is good because it helps you think about you're work in achievable chunks, or bite sized pieces if you prefer.

If you find yourself starting a project at the same point all the time then your not carrying over your progress and you need to start thinking about ways to work on smaller tasks with persistent benefits.

Finally remember that it's OK to restart a project after a long hiatus, i know so many personal projects that get abandoned because they person stops and then thinks of the project as having failed or expired.

But starting a project is like getting onto a horse, if you get bucked off, you get on the horse again just like you start a project again. The reason you stopped doesn't matter.

I've worked on many projects that finished while I wasn't looking, you look round and realise that you've met all the fitness criteria and just hadn't noticed till now. Starting a project, however, is not something that happens without personal engagement and choice.

Start now, start often, let finishing take care of itself.



 

Client-Side Includes without JQuery

So the other day I was working on a flat web project, no server side functionality and no option to publish/compile, but I really wanted to use includes to help manage my code and allow me edit common code in a single location instead of almost twenty.

What to do?

After a little bit of complaining about how unfair the universe is I sat down and looked for a solution.

I wasn't the first person to see the need for HTML client side includes and a quick google will show up several projects out there, but they all require a framework (AngularJS, JQuery, etc) and I really wanted to avoid using a framework if I could. Partly for size constraints but mostly because i couldn't see that a simple client side include should need a framework.

So I ended up writing my own called Syringe because it injects client-side code.

The first shock was the synchronous ajax calls had been deprecated in the main page rendering thread. No more render blocking ajax for all of us who liked to live dangerously and thumb our noses at best practice javascript!

So realistically I had to post process the DOM after the page load, make an asynchronous call for the content and paste it in.

I started with a standard scope wrapper:


(function(global) {
    // global is window when running in the usual browser environment.

    "use strict";

    if (global.synject) { return; } // Syringe already loaded

    global.synject = function () {
        return true;
    }


})(this);


This is where I start a lot of my libs, it wraps up scope so I don't have to worry about clashing vars, and gives me a way to stop the lib from being called more than once. Don't worry I had other plans for the syringe function but that s for later...

The next step for me way to create an XMLHttp object (syntax depending on the browser). As always when you need to run code in different environments, don't attempt to identify the environment (browser in this case) focus on testing the functionality as it's a much more stable approach.


    ...

    // Generate xmlHttp
    function getXmlhttp() {
        if (typeof XMLHttpRequest !== 'undefined') {
            return new XMLHttpRequest();
        }
        var xobjects = [
            "MSXML2.XmlHttp.5.0",
            "MSXML2.XmlHttp.4.0",
            "MSXML2.XmlHttp.3.0",
            "Microsoft.XmlHttp"
        ];
        var xmlhttp;
        for(var i = 0; i < xobjects.length; i++) {
            try {
                xmlhttp = new ActiveXObject(xobjects[i]);
                break;
            } catch (e) {
            }
        }
        return xmlhttp;
    };

    // Get include content via ajax
    function callXmlhttp(url, callback) {
        var x = getXmlhttp();
        x.open("get", url);
        x.onreadystatechange = function() {
            if (x.readyState == 4) {
                callback(x.responseText)
            }
        };
        x.send()
    };

    ...


Quite frankly this code is a little overkill, and it could be smaller but most browsers will not get past the first two lines of the detection code and the wrapper for the ajax calls just means I can pass a URL to get and method call when it comes through. I should add some error handling later on but for the time being this will do.

Now I want users to be able to use any kind of path that the browser could recognise so I dropped in a function to use the browsers brains to give me the fully qualified URLS


    ...

    // Get URL path information
    function getFullURL(link) {
        var parser = document.createElement('a');
        parser.href = link;
        return parser.href;
    }

    ...


You can pass that code "/somefile.html". "../../somefile.html", "somefile.html", "//www.somedomain.com/somefile.html" or any other valid asset path and it will return the full path the browser would use to get to it.


The last bit of browser juggling was a function to get DOM elements, again check the function not the browser.


    ...

    // Get elements
    function getElements(root, type) {
        if( !root.querySelectorAll ) {
            return root.getElementsByTagName(type);
        }else{
            return root.querySelectorAll(type);
        }
    }

    ...


Here I pass a root element and an element name, e.g. var domset = getElements(document, 'div');

This is where the rubber met the road, I added an onload event to the document and a method to search for any divs with a "synject" attribute and process them.


    ...

    // Add an event document
    document.addEventListener('DOMContentLoaded', function () {
        loadSynjectors();
    });


    // Search root for divs with sybject tagging
    function loadSynjectors(root) {
        var synjectors = getSynjectors(root)
        if (synjectors != null) {
            for (var i = 0; i < synjectors.length; i++) {
                var url = getFullURL(synjectors[i].getAttribute("synject"));
                if (synjectors[i].getAttribute("cache") == "false" || !cache[url] ) {
                    // get file
                    fetchContent(url, synjectors[i]);
                }
                else {
                    // use cache
                    synjectors[i].innerHTML = cache[url];
                    synjectors[i].setAttribute("synjected", "true");
                    loadSynjectors(synjectors[i]);
                }
            }
        }
    }

    // Build call to fetch and handle content
    function fetchContent(url, target) {
        callXmlhttp(url, function() {
            return function(content) {
                target.innerHTML = content;
                target.setAttribute("synjected", "true");
                loadSynjectors(target);
                cache[url] = content;
            }
        }());
    }

    // Get elements by tag and attributes
    function getSynjectors(root) {
        var divs = [];
        var synjectors = [];
        root = (root != null)?root:document;
        divs = getElements(root, 'div');
        for (var i = 0; i < divs.length; i++) {
            if (divs[i].hasAttribute("synject") && !divs[i].hasAttribute("synjected")) {
                synjectors.push(divs[i])
            }
        }
        if (synjectors.length > 0) {
            return synjectors;
        }
        else {
            return null;
        }
    }


    ...


Crude but effective, html was retrieved and pasted into the body inside the div elements, however Javascript includes were a problem. So I added a new function I could call after pasting in html, it examines the innerHtml of an element (div) and then it detects script tags and evals the contents.

    ...

    // Simulate the normal processing of Javascript
    function executeJS(target){
        var scriptreg = '(?:<script(.*)?>)((\n|.)*?)(?:</script>)'; //Regex <script> tags.
        var match = new RegExp(scriptreg, 'img');
        var scripts = target.innerHTML.match(match);
        if (scripts) {
            for (var s = 0; s < scripts.length; s++) {
                var js = '';
                var match = new RegExp(scriptreg, 'im');
                js = scripts[s].match(match)[2];
                window.eval('try{' + js + '}catch(e){}');
                target.innerHTML = target.innerHTML.replace(scripts[s], '')
            }
        }
    }

    ...


Which seemed like a good first step but it broke whenever it encountered a document.write... well broke may be the wrong word, calling a document.write method after the page has loaded replaces the entire contents of the page... but I consider that "broke".

And yes, I know document.write should be avoided but there still libs that rely on it and I didn't want my code breaking because other people hadn't updated their approach. The solution is quite simple, I monkey patch the offending function by storing it in a local variable and redefining it with a function that simply put the string given it into a variable, then after the eval I replaced the script tag in the source with the collected output, flush the variable for the next script, and finally restore document.write.

    ...

    // Simulate the normal processing of Javascript
    function executeJS(target){
        var scriptreg = '(?:<script(.*)?>)((\n|.)*?)(?:</script>)'; //Regex <script> tags.
        var match = new RegExp(scriptreg, 'img');
        var scripts = target.innerHTML.match(match);
        var dwoutput = '';
        var doc = document.write; // Store document.write for overloading.
        document.write = function (content) {
            dwoutput += content;
        };
        if (scripts) {
            for (var s = 0; s < scripts.length; s++) {
                var js = '';
                var match = new RegExp(scriptreg, 'im');
                js = scripts[s].match(match)[2];
                window.eval('try{' + js + '}catch(e){}');
                target.innerHTML = target.innerHTML.replace(scripts[s], dwoutput)
                dwoutput = '';
            }
        }
        document.write = doc; // Restore document.write
    }

    ...


This was closer but still not perfect. Not all script tags have code to be executed directly, some simply point to a file to include.

Now the reflex solution here to add a script element to the head where they belong, but again I found myself thinking about legacy code and the best way to support it, so I added a method that would drop in a script element as a child of the element the code was being inserted into and I upgraded my executeJS code to correctly detect remote scripts.

    ...

    // Append a acript to the head
    function appendScript(target, path, type) {
        type = (type)? type:"text/javascript";
        var js = document.createElement("script");
        js.type = type;
        js.src = path;
        target.appendChild(js);
    }

    // Simulate the normal processing of Javascript
    function executeJS(target){
        var scriptreg = '(?:<script(.*)?>)((\n|.)*?)(?:</script>)'; //Regex <script> tags.
        var match = new RegExp(scriptreg, 'img');
        var scripts = target.innerHTML.match(match);
        var dwoutput = '';
        var doc = document.write; // Store document.write for overloading.
        document.write = function (content) {
            dwoutput += content;
        };
        if (scripts) {
            for (var s = 0; s < scripts.length; s++) {
                var js = '';
                var match = new RegExp(scriptreg, 'im');
                var src = null;
                var type = '';
                var tag = scripts[s].match(match)[1];
                if (tag) {
                    type = tag.match(/type=[\"\']([^\"\']*)[\"\']/);
                    if (type) {
                        type = type[1];
                    }
                    src = tag.match(/src=[\"\']([^\"\']*)[\"\']/);
                }
                if (src) {
                    src = src[1];
                    appendScript(target, src, type);
                    target.innerHTML = target.innerHTML.replace(scripts[s], '')
                }
                else {
                    js = scripts[s].match(match)[2];
                    window.eval('try{' + js + '}catch(e){}');
                    target.innerHTML = target.innerHTML.replace(scripts[s], dwoutput)
                    dwoutput = '';
                }
            }
        }
        document.write = doc; // Restore document.write
    }

    ...


And my code was almost done. I went back and added in a queue so that if the same code was being included in 20 different places and the first include fetch had not finished processing it would simply add the other destinations in without kicking off another 20 http calls.

Finally I added spans to the supported elements (they have less formatting impact that divs) and I threw in one last method to allow includes to be programatically called on any element by id.

    ...

    // Force an include to a target DOM element
    global.synject = function (target, path) {
        if (!target) {return;}
        if (typeof target == 'string' || target instanceof String) {
            target = document.getElementById(target);
        }
        var url = getFullURL(path);
        if (que[url]) {
            que[url]["targets"].push(target);
        }
        else {
            que[url] = {};
            que[url]["targets"] = [];
            que[url]["targets"].push(target)
            fetchContent(url);
        }
    }

    ...


Told you I had plans the global.synject() function...

The project is up on GitHub if your interested and I already have a couple of enhancements in mind (recursive scripts put in place by document.write calls will not work for instance) plus, of course, I welcome any suggestions or bug fixes.

Once you minify the code it's only about 2.2kb and closer to 1.2kb zipped. While it's far from perfect or the most beautiful code I've ever written, it does get the job done and I've hit no problems with it so far.

Sunday 1 March 2015

Don't grind down, build up...

I have a friend...

(hard to believe I know, but some people have low standards and don't find me too irritating...)

This friend is facing a 5 week development "death march" working nights and weekends to reach a very difficult deadline.

I sympathise I really do, my own back log has tripled in size in the last two working days.

But there is a difference in how we approach things, and it reflects in our outlook.

My friend "grinds down", I "build up".

By "grind down" I mean that my friend sees the work as a grind, lots of the same tasks repeated over and over again. At the end of a day they'll turn round and count how many lines of code they write and lament that they'll have to do the same tomorrow.

I'm different.

While far from being an optimist, I see each little task as "building up" towards my goal. Each step no matter how small is progress towards something. I let myself get a kick out of completing small tasks and I don't think about how many more I have to do.

When planning a project you need to be far-sighted to see the big picture and have sense of the history of what you're doing... thats a given.

However, when actually working on a project, short sighted is good, and it's emotionally much more helpful if you don't think about how much more work there is to do, but rather gratify yourself on much work you've just done.

I'm not "chipper" about it, and I don't sit there giggling to myself and smiling over a simple task completed, but I do allow myself a moment of self satisfaction before going onto the next task.

It's not about being proud of the small things. It's about giving yourself permission to feel good a few times during the day - especially when you have lots of work still to do.

It's like the difference between building a road or building a tower:
When you're building a road each day you turn and see empty ground going off to the horizon that you still have to pave, never feeling that you've made any progress. When you're building a tower you cant see all the tower you haven't built yet, you only see that the view is better every day...
It's not denial or self delusion, I know there is more work to do, I know how much more work there is to do,  just choose to enjoy the view from up here.

Tuesday 24 February 2015

Life beyond JQuery...

I knocked up a little project the other day (here) and threw it up onto GitHub but the first response I got from anyone was "Why didn't you do it in JQuery?"


* To keep the size down

In the best of all possible worlds you can get JQuery down to 28k, the simple minified version however is closer to 84k, for the functionality I actually wanted to include this was way over the top and I didn't need any other functionality that would drive me to include it.
Also, while my code would have been smaller had I used JQuery it would not have been that much smaller - mainly because I wanted some very specific behaviours.

* To reduce complexity

Every additional file you need to add into a project is one more thing the user of your code needs to keep track of, it sounds silly but I've handled more than my share of support requests from people who never bothered to check dependencies. 

* To make it load independent

When trying to optimised you code for speed it becomes a pain when you need to block rendering in order to make sure your JS files load in the order you expect and the order they require simply so you wont get any errors. 

* Because I didn't need it

In all honesty the syntax difference between using the JQuery element selector and doing it manually is tiny. If you've been programming JQuery since you started then you might find it difficult, but if you know how JQuery does things, it's actually not that hard to do without it. 


Really the last answer is the only I need. There was no reason for me to use JQuery, no functionality that could not be easily done without it and no issue of cross browser compatibility that could not be solved with some very simple code.

I'm not a purist. I use JQuery all the time in various project but sometimes it's the proverbial sledgehammer to crack a walnut. People need to remember that JQuery is a library that extends the DOM not a default upgrade to all browsers.



Thursday 27 November 2014

Pain is the mother of invention...

The last few weeks have been "interesting" for me. I have some problems with my right shoulder causing me a great deal of pain and immobility.

But nothing comes from nothing, and not being able to use my right hand for much led me to write a typing trainer to help me learn to type with my left hand.

The tool had several keyboard layouts including a left-hand only layout.

Comments/improvements welcome.

Tuesday 18 November 2014

Big Data is not just about size...

The term "Big Data" gets used a lot these days, and some people think you only have Big Data problems if you have billions of records and petabytes or even zettabytes of information in your system. In fact most articles I've seen start off by saying how much data modern systems have to deal with.

However Big Data is not only the domain of the googles of this world. You can have a Big Data problems in projects that don't have that much data.

In 2001 a paper on data management coined the 3 V's: Volume, Velocity, and Variety.

Volume is simply the size of the data, in raw bytes or number of records. A million records is not that many if they are simple and small, if they large and hard to index however multimillion record databases can become troublesome.

Velocity is the speed that the data is being collected. Website logs are a great example of this they can generate thousands and thousands of new records every hour even for relatively small sites.

Variety is the complexity of the data, anyone who has written SQL knows that even small databases can become sluggish if you try to link too many tables in a single query.

Today some people like to add other "Vs" to the mix:

Veracity is about biases and "noise" in the data, how meaningful is the method of analysis to the data. Any statistical analysis will have the odd result that falls way outside the line of best fit for the rest of the data. How often this occurs and how you have to deal with it can effect your data solution

Validity is similar to veracity and tends to deal with the applicability of the data to the question being asked, like web proxies and caches mean server logs are not a true reflection of actual page views.

Volatility may sound like speed but rather than how fast it's coming in, volatility deals with how fast the data gets out of date. Trading prices for shares are out of date almost before you get the data, with some companies doing micro trades and relying on millisecond differences in buy and sell times.

and finally...

Value is a measure of how much the data is worth in real world terms. This is often the most important factor because it drives the business.

Now if any of these factors reaches the point where it requires developers to introduce special strategies, processes, or systems in order to manipulate and store the data, then you have just crossed into the world of Big Data.

So the "big" in Big Data refers more to the requirements of the data than the data itself.

That's it, Big Data explained. Your welcome.

Now as far as solutions to the problems of Big Data...

You can take your pick because frankly there are as many Big Data solutions are there are Big Data problems.

The technologies that most projects encounter first are storage and retrieval specific, like clustering, sharding, distributed storage, cloud based infrastructure, search based applications, key-value stores and other NoSQL approaches to data storage.

Then as you approach the more complex tasks of analysis and manipulation you start to hit distributed computation, genetic and machine learning algorithms, signal processing, time series analysis, wavelets and other semi predictive techniques.

In the end Big Data is simple, it happens when the requirements of you're data application go beyond the systems you're currently using and the solution is that you need to evaluate those systems, starting with how you store and organise your data.

Thursday 13 November 2014

Back in the saddle...

After what seems like ages I finally have the time to sit down and blog again without running out of time or covering topics that are sensitive to my clients... Lets hope I get a solid run before circumstances change again :)