Saturday, September 29, 2018

Agile vs. waterfall

Another comment reply that outgrew the comment box.  Earl comments
This speaks to my prejudice in favor of technique over technology. And the concept of agility seems to be just the attitude of any good designer that your best weapon is your critical sense, and your compulsion to discard anything that isn't going to work.
To which I would reply:

Sort of ... "agile" is a term of art that refers to a collection of practices aimed at reducing the lag between finding out people want something and giving it to them.  Arguably the core of it is "launch and iterate", meaning "put your best guess out there, find out what it still needs, fix the most important stuff and try again".

This is more process than design, but there are definitely some design rules that tend to go with agile development, particularly "YAGNI", short for "You ain't gonna need it", which discourages trying to anticipate a need you don't yet know that you have.  In more technical terms, this means not trying to build a general framework for every possible use case, but being prepared to "refactor" later on if you find out that you need to do more than you thought you did.  Or, better, designing in such a way that later functionality can be added with minimum disruption to what's already there, often by having less to disrupt to begin with, because ... YAGNI.

Downey refers to "agile" both generally and in the more specific context of "agile vs. waterfall".  The "waterfall" design process called for exhaustively gathering all requirements up front, then producing a design to meet those requirements, then implementing the design, then independently testing the implementation against the requirements, fixing any bugs, retesting and eventually delivering a product to the customer.  Each step of the process flows into the next, and you only go forward, much like water flowing over a series of cascades.  Only the test/fix/retest/... cycle was meant to be iterative, and ideally with as few iterations as possible.  Waterfall projects can take months at the least and more typically years to get through all the steps, at which point there's a significant chance that the customer's understanding of what they want has evolved -- but don't worry, we can always gather more requirements, produce an improved design ...

(As an aside, Downey alludes to discussion over whether "customer" is an appropriate term for someone, say, accessing a public data website.  A fair point.  I'm using "customer" here because in this case this is someone paying money for a the service of producing software.   The concept of open source cuts against this, but that's a whole other discussion.)

The waterfall approach can be useful in situations like space missions and avionics.  In the first case, when you launch, you're literally launched and there is no "iterate".  In the second, the cost of an incomplete or not-fully-vetted implementation is too high to risk.  However, there's a strong argument to be made that "launch and iterate" works in more cases than one might think.

In contrast to waterfall approaches, agile methodologies think more in terms of weeks.  A series of two-week "sprints", each producing some number of improvements from a list, is a fairly common approach. Some web services go further and use a "push on green" process where anything that passes the tests (generally indicated by a green bar on a test console) goes live immediately.  Naturally, part of adding a new feature is adding tests that it has to pass, but that should generally be the case anyway.

Superficially, a series of two-week sprints may seem like a waterfall process on a shorter time scale, but I don't think that's a useful comparison.  In a classic waterfall, you talk to your customer up front and then go dark for months, or even a year or more while the magic happens, though the development managers may produce a series of progress reports with an aggregate number of requirements implemented or such.  Part of the idea or short sprints, on the other hand, is to stay in contact with your customer in order to get frequent feedback on whether you're doing the right thing.   Continuous feedback is one of the hallmarks of a robust control system, whether in software or steam engines.

There are also significant differences in the details of the processes.  In an agile process, the list of things to do (often organized by "stories") can and does get updated at any time.  The team will generally pick a set of things to implement at the the beginning of a sprint in order to coordinate their efforts, but this is more a tactical decision, and "requirements gathering" is not blocked while the developers are implementing.

Work in agile shops tends to be estimated in relative terms like "small", "medium" or "large", since people are much better at estimating relative sizes, and there's generally an effort to break "large" items into smaller pieces since people are better at estimating them.  Since this is done frequently, everyone ends up doing a bunch of fairly small-scale estimates on a regular basis, and hopefully skills improve.

Waterfall estimates are generally done up front by specialists.  By the end of the design phase, you should have a firm estimate of how long the rest will take (and, a cynic might add, a firm expectation of putting in serious overtime as the schedule begins to slip).

It's not clear how common a true waterfall process is in practice.  I've personally only seen it once up close, and the result was a slow-motion trainwreck the likes of which I hope never to see again.  Among other things, the process called for designers to reduce their designs to "pseudocode", which is basically a detailed description of an algorithm using words instead of a formal computer language.

This was to be done in such detail that the actual coder hired to produce the code would not have to make any decisions in translating the pseudocode to actual code.  This was explicitly stated in the (extensive) process documentation.  But if you can explain something in that much detail, you've essentially coded it and you're just using the coder as an expensive human typewriter, not a good proposition for anyone involved.  You've also put a layer of scheduling and paperwork between designing an algorithm and finding out whether it works.

We did, however, produce an impressive volume of paper binders full of documentation.  I may still have a couple somewhere.  I'm not sure I or anyone else has ever needed to read them.

This is an extreme case, but the mindset behind it is pervasive enough to make "agile vs. waterfall" a real controversy.  As with all such controversy at least some of the waterfallish practices actually out there have more merit than the extreme case.  The extreme case, even though it does exist in places, functions more as a strawman.  Nonetheless, I tend to favor the sort of "admirable impatience" that Downey exemplifies.  Like anything else it can be taken too far, but not in the case at hand.

Friday, September 28, 2018

One CSV, 30 stories (for small values of 30)

While re-reading some older posts on anonymity (of which more later, probably), and updating the occasional broken link, I happened to click through on the credit link on my profile picture.  Said link is still in fine fettle and, while it hasn't been updated in a while, and one of the more recent posts is Paul Downey chastising himself for just that, there's still plenty of interesting material there, including the (current) last post, now nearly three years old, with a brilliant observation on "scope creep".

What caught my attention in particular was the series One CSV, thirty stories, which took on the "do 30 Xs in 30 days" kind of challenge in an effort to kickstart the blog.  Taken literally, it wasn't a great success -- there only ended up being 21 stories, and there hasn't been much on the blog since -- but purely from a blogging point of view I'd say the experiment was indeed a success.

Downey takes a single, fairly large, CSV file containing records of land sales transactions from the UK and proceeds to turn this raw data into useful and interesting information.  The analysis starts with basic statistics such as how many transactions there are (about 19 million), how many years they cover (20) and how much money changed hands (about £3 trillion) and ends up with some nifty visualizations showing changes in activity from day to day within the week, over the course of the year and over decades.

This is all done with off-the-shelf tools, starting with old-school Unix commands that date back to the 70s and then pulling together various free-source from off the web.  Two of Downey's recurring themes, which were very much evident to me when we worked together on standards committees, um, a few years ago, are also very much in evidence here: A deep commitment to open data and software, and an equally strong conviction that one can and should be able to do significant things with data using basic and widely available tools.

A slogan that pops up a couple of times in the stories is "Making things open makes them better".  In this spirit, all the code and data used is publicly available.  Even better, though, the last story, Mistakes were made, catches the system in the act of improving itself due to its openness.  On a smaller scale, reader suggestions are incorporated in real time and several visualizations benefit from collaboration with colleagues.

There's even a "hack day" in the middle.  If anything sums up Downey's ideal of how technical collaboration should work, it's this: "My two favourite hacks had multidisciplinary teams build something, try it with users, realise it was the wrong thing, so built something better as a result. All in a single day!"  It's one thing to believe in open source, agile development and teamwork in the abstract.  The stories show them in action.

As to the second theme, the whole series, from the frenetic "30 things in 30 days" pace through to the actual results, shows an admirable sort of impatience:  Let's not spend a lot of time spinning up the shiniest tools on a Big Data server farm.  I've got a laptop.  It's got some built-in commands.  I've got some data.  Let's see what we can find out.

Probably my favorite example is the use of geolocation in Postcodes.  It would be nice to see sales transactions plotted on a map of the UK.  Unfortunately, we don't have one of those handy, and they're surprisingly hard to come by and integrate with, but never mind.  Every transaction is tagged with a "northing" and "easting", basically latitude and longitude, and there are millions of them.  Just plot them spatially and, voila, a map of England and Wales, incidentally showing clearly that the data set doesn't cover Scotland or Northern Ireland.

I wouldn't say that just anyone could do the same analyses in 30 days, but neither is there any deep wizardry going on.  If you've taken a couple of courses in computing, or done a moderate amount of self-study, you could almost certainly figure out how the code in the stories works and do some hacking on it yourself (in which case, please contribute anything interesting back to the repository).  And then go forth and hack on other interesting public data sets, or, if you're in a position to do so, make some interesting data public yourself (but please consult with your local privacy expert first).

In short, these stories are an excellent model of what the web was meant to be: open, collaborative, lightweight and fast.

Technical content aside, there are also several small treasures in the prose, from Wikipedia links on a variety of subjects to a bit on the connection between the cover of Joy Division's Unknown Pleasures and the discovery of pulsars by Jocelyn Bell Burnell et. al..

Finally, one of the pleasures of reading the stories was their sheer Englishness (and, if I understand correctly, their Northeast Englishness in particular).   The name of the blog is whatfettle.  I've already mentioned postcodes, eastings and northings, but the whole series is full of Anglicisms -- whilsta spot of breakfastcock-a-hoop, if you are minded, splodgy ... Not all of these may be unique to the British Isles, but the aggregate effect is unmistakeable.

I hesitate to even mention this for fear of seeming to make fun of someone else's way of speaking, but that's not what I'm after at all.   This isn't cute or quaint, it's just someone speaking in their natural manner.  The result is located or even embodied.  On the internet, anyone could be anywhere, and we all tend to pick up each other's mannerisms.  But one fundamental aspect of the web is bringing people together from all sorts of different backgrounds.  If you buy that, then what's the point if no one's background shows through?