Author Archives:

Report Writing for Occupational Therapists

Over the past 18 months I’ve been working on Report Hero, report writing software to help Paediatric Occupational Therapists write reports.  I’ve built it in partnership with one of the best Paediatric OT’s I know, and I’m happy to see it being used by a number of O.T’s.  If that’s your thing, take a look at the Report Hero website, and sign-up for a trial.

9 Things I learnt while moving data from RedShift into AWS Elastic Search with AWS Lambda

The amazon infrastructure is amazing and allows for interesting and cool scaling without the use of servers. It’s exciting to see what can be done. The trick with much of this is that many of the elements are asynchronous and so it can be easy to flood services, particularly when pulling data out of your RedShift data warehouse and putting it into Elastic Search. I’ve learnt a bunch of things while doing this, the salient points are below.

  1. Don’t gzip the data unloaded.
  2. Use the bulk load on elastic
  3. Use a large number of records in the bulk load (>5000) – fewer large bulk loads are better than more smaller ones. When working with AWS elastic search there is a risk of hitting the limits of the bulk queue size.
  4. Process a single file in the lambda and then recursively call the lambda function with an event
  5. Before recursing wait for a couple of seconds –> setTimeout.
  6. When waiting make sure that you aren’t idle for 30 seconds because your lambda will stop.
  7. Don’t use s3 object creation to trigger your lambda — you’ll end up with multiple lambda functions being called at the same time.
  8. Don’t bother trying to put kinesis in the middle – unloading your data into kinesis is almost certain to hit load limits in kinesis.
  9. Monitor your elastic search bulk queue size with something like this:
    curl https://%ES-SERVER:PORT%/_nodes/stats/thread_pool |jq ‘.nodes |to_entries[].value.thread_pool.bulk’

1 Unloading from RedShift

The process of doing the gunzip in the lambda takes time + resources in the lambda function. Avoid this by just storing the CSV in s3 and then streaming it out with S3.getObject(params).createReadStream().

Here is an unload function that works well for me.

credentials 'aws_access_key_id=%AWS_KEY_ID%;aws_secret_access_key=%AWS_ACCESS_KEY%'

2 Use the bulk load in elastic

The elastic bulk load operation is your friend. Don’t index each record one at a time and consume lots of resources, instead send up batches at the same time using the bulk operation.

3 Use a large number of records in the bulk load

More than 5000 records at a time in the bulk load is important to do. Fewer big loads is better than more small ones. See for setting the number and size of this.

4 Process a single file in each lambda function

To ensure that you don’t consume too many resources, process a single file in each lambda function then recurse using

    FunctionName: context.invokedFunctionArn,
    InvocationType: 'Event',
    Payload: JSON.stringify(payload)

Use either the promise version or callback version as preferred. Keep track of where you are in the payload.

5 Wait before recursing

Before doing the callback above wait a couple of seconds to give elastic a chance to catch up.
setTimeout(function(){recurseFunction(event, context, callback)}, 2000);

6 Keep the wait short

If you don’t do anything for 30 seconds Lambda will timeout. Keep the wait short. 2 seconds (as chosen above) wasn’t completely arbitrary.

7 Don’t use s3 object creation to trigger your lambda

One of the things we are seeing consistently is trying to control the rate of data flowing into elastic. Using the s3 object creation triggers for the lambda will result in multiple concurrent calls to your lambda function. This will result in too much at the same time. Trigger the lambda some other way.

8 Kinesis isn’t the answer to this problem

Putting the records to index into kinesis will not act as a good way to control the massive flow of data from redshift to elastic. While kinesis is great for controlling streams of data over time, it’s not really the right component for this scenario of loading lots of records at once. The approach outlined throughout this document is suitable.

9 Monitor your elastic resources with curl and jq

Unix commandline tools rock.

curl and jq are great tools for working with http data. curl for getting data, jq for processing json data.(

elastic provides json apis for seeing the data. The below command is how to look up the information on the bulk queue size.

curl https://%ES-SERVER:PORT%/_nodes/stats/thread_pool |jq '.nodes |to_entries[].value.thread_pool.bulk'


Serverless + the AWS stack is nice — you need to think about how to use it and knowing the tools + capabilities of the platform is important — with care you can do amazing things. Go build some great stuff.

Open Badges

I’ve recently been looking into the Open Badges Framework, with a goal of being able to understand what it is from a high-level technical standpoint.

The Open Badge Specification provides a standard for issuing badges

The key participants in the open badge system are:

  • the badge issuer
  • badge displayers
  • badge earners
  • a badge backpack

A badge issuer will create a badge and issue it to a badge earner. The badge will consist of a number of cryptographically verifiable assertions about the badge. With the earners consent an issuer may publish the badge to a badge backpack.

There is a reference implementation of a badge backpack implemented by mozilla. This reference impementation is hosted out of the united states, and is probably the default way to publish badges. The source code for the reference implementation has also been made available for download and deployment (

In a healthy open badge ecosystem, there would be a small number of badge backpacks, a larger number of issuers, and an even larger number of earners.

Every organisation that wants to issue badges would need to be an issuer, but most organisations would (and should) be able to use a standard backpack. That said, when dealing with children, legal rules may lead to the creation of regional badge backpacks.

Cool stuff you can do with rails has_one

Over the past three months has_one has shown itself to be the answer to questions that I was struggling to get an elegent answer for.

The two cases I found were:

1. Specifying a belongs_to :through
2. Accessing one item of a list of related records

The second case ends up expanding to be anytime you want to have a named scope that returns a single record.

Specifying a belongs_to :through with has_one

Say you have a model of the form:

You can see that we’ve used the standard has_many and belongs_to methods to model the relationships between the entities. We’ve also thrown in a has_many through to help model the fact that an organization has_many employees. Conceptually we also want to be able to specify the inverse of this relationship on the Employee. Something like:

The way to implement the belongs_to :through concept is to use has_one :through

Accessing one item of a list of related records

There can often be a desire to specify a scope that returns a single record. This isn’t possible in rails, as scopes by definition are chainable and will return subclasses of ActiveRecord::Relation, which is a list like object.

The way to do this is to setup a has_one with a block specifying a scope. For example:

has_one :head_of_department, -> { where role: ‘department_head’ }, class_name: ‘Employee’

For bonus points the proc and conditions can also be parameterised, so we can end up with a has_one relationship with a parameter. To see the coolness of this we can extend the model a little bit to include
a many-to-many entity and a has_many through relationship.

With the above model it would be handy to have an easy way of accessing the ProjectDepartment easily when you are working with projects. has_one to the rescue.

So with our collection of has ones all together, we end up with code something like:


has_one is a powerful tool to have in your rails modelling. Take advantage of it to give some really nice clean looking code.

Running Thin in a Thread to serve static files (or how to seed carrierwave images)

Published by Rob on March 27th, 2014 – in Ruby

I recently wanted to have a short-term local webserver. Specifically to be able to seed some images with carrierwave, using the handy external_image_url functionality. The project Gemfile has Thin in it, so it made a bunch of sense to use it. Another important property is that I want to be able to run this simple server from within a rails process. There are three bits of information that I learnt and particulary want to highlight

  • starting a thin server to serve a directory
  • running thin in a thread
  • the CarrierWave remote_#{attribute_name}_url (where attribute name is the name of the attribute you have CarrierWave uploadified – used as remote_image_url throughout the rest of this post)

Here is the code that I ended up with for the thin server:

For the carrierwave I ended up with:

Then at the end I clean up with:

So all in one spot the code is:

The above code is a nice little script for programatically uploading images to your carrierwave models in rails code.

Things to think about when building a new web project

I’ve been thinking and talking with people about building webapps. Every project has it’s own set of contexts to consider, and questions to work through. As such each will have things to consider. Here are some questions to work through.

  • Single page vs multi-page
  • JSON HTTP API for data and resources vs generating server side html
  • Big framework vs roll your own
  • Angular/Ember/Knockout
    • testing/learning/state today vs tomorrow
  • javascript module system
    • requirejs/browserify/framework based
  • What server-side frameworks/technologies do you want to use
  • What do you want to learn?
  • CSS Framework for getting started
    • foundation/bootstrap/pure css
  • mobile strategy (responsive design)

For many of the questions and options above, I think that I have strong opinions about the answer (as I’m sure many people do). Thinking through the questions above is an important thing to be doing on any project.

What questions do you think about?

How to use any JavaScript library with RequireJS 2.1

I’ve been using Require JS for a while, having cut my teeth on 0.27.0, and gotten quite familiar with the order plugin that was a part of life when using non require-js libraries.

RequireJS 2.1 provides a different tool for including third party libraries, providing a “shim” object that forms a part of the requirejs config.

With the shim it is easy to add dependencies on non-require js pieces of code. Take a look a the documentation on the shim config( for some details or see some examples below:

Using Twitter Typeahead with Require.js

shim: {
“typeahead”: [“jquery”]

Typeahead depends on jquery being loaded first. Because the API is done via jquery, we don’t need to worry about exporting anything.

Using Twitter Hogan with Require.js
shim: {
“hogan”: {exports: “Hogan”},

hogan doesn’t depend on anything being loaded before it. It exposes itself on the global namespace as Hogan.

Using Twitter Bootstrap with Require.js

shim: {
“bootstrap”: [“jquery”],

Twitter bootstrap depends on jquery being loaded first. Because the API is done via jquery, we don’t need to worry about exporting anything.

URI with Require.js

shim: {
“URI”: [“jquery”],

pushing in a jquery dependency to uri. It has magic in it to detect if amd is being used, and will define itself as an amd module.

SerializeJSON jQuery plugin with Require.js

shim: {
“jquery.serializeJSON”: [“jquery”],

serializeJSON depends on jquery being loaded first. Because the API is done via jquery, we don’t need to worry about exporting anything.

D3 with Require.js

shim: {
“d3″: {exports:”d3”},

D3 doesn’t have any dependencies and exports itself with the d3 global object.

Rickshaw with Require.js

shim: {
“d3″: {exports:”d3”},
“rickshaw”: { exports: “Rickshaw”, deps: [‘d3’]}

rickshaw depends on d3, and exposes itself with the Rickshaw global namespace.

Getting Started with D3 By Mike Dewar

I picked up a review copy of this book quite a while ago, being intrigued by D3, but not really knowing what it was. It was only more recently that I have been working on a codebase that uses D3, and so had a need to read the book. I’d been tinkering around the edges, but then had a strong need to do some d3 codeing, so picked up my copy of “Getting Started with D3”.

It’s a good thin little book that does a really good of introducing D3 and how to work and think in the D3 way. I found it a really useful tool for learning D3 and it gave me enough to do what I wanted. It’s a good entry point, and helps give an idea of how to work with the D3 APIs. The book shows good examples of consuimg JSON data, and how to render the data into the DOM with D3.  It additionally goes through some SVG and charting examples. It is worth noting that the book is a short read. If you are expecting a detailed reference to D3, this isn’t the book for you.

I’d recommend the book to any developer who wants to know more about D3. I’d strongly recommend it to someone who wants or needs to get started writing D3 code quickly.

[This book was reviewed as a part of the O’Reilly Blogger Review Program]

Notes From Yehuda Katz’s visit to Brisbane

Yehuda Katz( had a brief visit to Brisbane Australia, doing a public presentation, and a more private breakfast meeting. In this blog post I’m going over some of the things that struck me as particularly interesting or worth thinking about.

For those of you who don’t know Yehuda, he is an opinionated and very active developer. Yehuda is a member of the JQuery and Ruby on Rails core teams, and he is also one of the founders of Ember.js (a framework for building rich MVC JavaScript applications –

In the public talk Yehuda went through ember.js, talking through what the paradigm of the framework is, and walking through a demo of some of the key features of the framework. It looks like an interesting option for JavaScript applications. It’s on the brink of going 1.0, but already has some high profile applications using the framework. Apart from seeing the interesting elements of ember and how it’s used, it was very interesting to see the way people are using Ember.js with D3:

I’m definitely going to keep my on the framework as it moves forward.

In his spare time Yehuda is working to push the future of the web in ways that help facilitate rich applications. He has got a couple of ways that he is doing this:

  1. he is a member of the W3C TAG
  2. he is working to influence members of the chrome team to build things well.


The Technical Architecture Group works to specify and guide the architecture of the web moving forward. It has the feel of a good internal architecture group in an organisation, filled with smart people trying to make the web better (it’s membership includes Tim Berners-Lee and representatives of the community, large organisations and browser vendors).

Chrome Team

Through some of the work Yehuda has done, he has had opportunity to spend time with some of the people building new versions of Chrome, and helping to guide some of the thinking towards making apis and decisions that work well for web developers.

So I should have convinced you that Yehuda has some stuff worth listening to. Over the period of time he was I had some good opportunities to listen to both his public conversations and hear some of the more informal conversations. Here are some of the things that I found particularly interesting around where he sees the web heading.

Web RTC looks like a cool technology for real time communications ( The support for peer connections looks particularly interesting.

The new world being demonstrated by google polymer ( looks to be very exciting. Definitely well worth a look for web developers who want to get an idea of the way they will be writing applications in the future. Model Driven Views (, and custom elements ( are extremely exciting, and the shadow dom ( looks like a good tool for helping to support and customise the new features brought in. The HTML + CSS workflow currently is the language of the web, with many people speaking it, and with these tools, I think the language is moving in good directions.

The mechanisms for doing asynchronous JavaScript have been moving on from the straight callback approach that has been familiar to people, particularly through the use of node. There has been much discussion around the web around promises and futures, with things heading towards promises. Martin Fowler has an article describing JavaScript promises (, which is where the W3C TAG is currently headed ( I look forward to having this come into play, and having a standard option that doesn’t involve the deep nesting that can come from callback nesting.

It was interesting hearing Yehuda’s perspective on computer science and functional topics like Monads and functional reactive programming. The binding approach used in Ember.js takes inspiration from FRP, and Promises allow a transformation to a monadic approach.

One of the interesting new things coming to javascript in browsers is object.observe, a feature which will make it possible to observe any object for modifications on it or its attributes.

All in all there is a bunch of interesting stuff in the web future. It’s a great time to be doing web development, and I look forward to what the future holds.