Magic sprinkles for Capybara and PDF

One of the frequent things I end up doing these days is generating reports. When you create a start up called Report Hero, it comes with the territory. The two most common outputs that I’m generating are PDF, and word (yeah word, I’m sorry).

I’ve recently being going through some pain getting testing working with PDFs and capybara. It took me way to long to get to the answers, so I’m going to document what I ended up with.

First, it’s worth noting that at the time of writing, poltergeist doesn’t support downloading files:
https://github.com/teampoltergeist/poltergeist/issues/485
http://stackoverflow.com/questions/35585994/downloading-a-csv-with-capybara-poltergeist-phantomjs

This means that you’ll need to use a different driver, WebKit works well for me (https://github.com/thoughtbot/capybara-webkit). As mentioned on that github project page, by adding webkit you’ll probably need to make sure that you run your tests on CI with an xvfb server.

If you are on rails 3 (did I mention I was doing this work on a legacy project), the next thing to consider is that depending on your setup you’ll need to make it possible to have two threads running at the same time to help enable the wkhtmlwrapper gem (wicked pdf or pdfkit) to work. So in you’re test.rb environment you’ll need to set config.threadsafe!

Once you’ve got PDFs being downloaded, there a number of options for performing the tests around them. They essentially boil down to reading the content using something like PDF Reader, and then performing assertions on it.

Pivotal have a decent blog post talking about this: https://content.pivotal.io/blog/how-to-test-pdfs-with-capybara , and prawn have extracted a gem to help: https://github.com/prawnpdf/pdf-inspector . For what it’s worth in this project, I went with the prawn pdf-inspecter gem.

So, the final steps I ended up with are:

  • use the webkit driver so you can download the pdf (with appropriate CI settings)
  • (in rails 3) set config.threadsafe! so the pdf generation can happen
  • use pdf-inspector to extract content from the PDF and perform some assertions.

With this setup you can

Creating a Deliverable HTML Email on AWS Lambda with SES

Creating deliverable rich html emails is a great goal for many web applications, communicating with your customers, and helping to send the messages that are beautiful and make your marketing/designers happy.

As discussed in this send grid article, there are quite a number of approaches for doing this, but one of the best options for ensuring that images in emails with images. The leaders today in 2017 are using cid and referencing attachments, and using data urls.

According to the comments in this Campaign Monitor blog post, the cid method is supported by all the major clients today. (ironically the post is talking about using the cooler approach of data urls for images).

With this background, the question is how to do this with AWS Lambda and SES.

Thankfully it’s really straight forward.

The simple steps are:

  • create a simple html email that references images using cid: as the protocol.
  • create a raw rfc822 email string that can be sent with the SES api.
  • use the ses.sendRawEmail method to send the email.

1 Create a simple html email

For example:

<html><body><p>Hello world</p><img src=”cid:world”></body></html>

Note that the source of the image is of the format cid:world, this cid will be what you specify when attaching the image to the blog post

2 Create a raw rfc822 email string

The mailcomposer  package a part of Nodemail(https://nodemailer.com/extras/mailcomposer/) provides a great simple easy to use api for creating rfc822 emails with attachments. When creating attachments you can specify cids to refer to them by, and you can specify the contents of the attachment with a local filename, a buffer, or even a http resource. It’s a great api.  Take a look at the npm page to see more. One example of using this package is:

let from = 'from@example.com';
let to = 'to@example.com';
let subject = 'Subject';
let htmlMessage = '<html><body><p>Hello world</p><img src="cid:world"></body></html>';
let mail = new MailComposer({
  from: from, to: to, subject: subject, html: htmlMessage,
  attachments: [{
    filename: 'hello-world.jpg',
    path: 'https://cdn.pixabay.com/photo/2015/10/23/10/55/business-man-1002781_960_720.jpg',
    cid: 'world'
  }]
});
mail.build(function(err, res) {console.log(res.toString())});

3 Send the email with SES

Take the buffer that you create and send it with SES.

let sesParams = {
  RawMessage: {
    Data: message
  },
};
ses.sendRawEmail(sesParams, function(err, res){console.log(err, res)});

Full example using promises

Let’s put it all together, and pull in some of the promise code that I talked about in an earlier blog post(http://www.rojotek.com/blog/2017/04/11/create-a-promise-wrapper-for-a-standand-node-callback-method/)

function createEmail(){
  let from = 'from@example.com';
  let to = 'to@example.com';
  let subject = 'Subject';
  let htmlMessage = '<html><body><p>Hello world</p><img src="cid:world"></body></html>';
  let mail = new MailComposer({
    from: from, to: to, subject: subject, html: htmlMessage,
    attachments: [{
      filename: 'hello-world.jpg',
      path: 'https://cdn.pixabay.com/photo/2015/10/23/10/55/business-man-1002781_960_720.jpg',
      cid: 'world'
    }]
  });

  return new Promise((resolve, reject) => {
    mail.build(function(err, res) {
      err ? reject(err) : resolve(res);
    });
  });
}
createEmail().then(message =>{
  let sesParams = {
    RawMessage: {
      Data: message
    },
  };
  return ses.sendRawEmail(sesParams).promise();
});

Creating emails that include attachments is really quite easy with node, lambda and ses. Doing this is a great step to delivering rich emails that look like what your designers want.

 

Create a Promise Wrapper For a Standand Node Callback Method

JavaScript Promises are the future, and a great pattern for doing asynchronous javascript code (allegedly async await is an awesome way to do async javascript as well, but I’m not there yet). There are great APIs for working with promises, and many standard libraries for working with Promises.

Unfortunately not all libraries support promises. Fortunately it isn’t hard to wrap a standard javascript callback pattern api in a promisified version.

The Node.js way is to have callbacks with an error first callback. These are apis which are passed a callback function with the signature function(error, success);. For a good description see this decent blog post The Node.js Way – Understanding Error-First Callbacks.

The classic example they provide is read file:

fs.readFile('/foo.txt', function(error, data) {
  // TODO: Error Handling Still Needed!
  console.log(data);
});

To convert this to a promise, create a new promise object, which calls reject with the error, and resolve with the data. If this is wrapped in a function, you’ll end up with a nice promisified readFile as per the following:

const fs=require('fs');

function readFilePromise(fileName) {
  return new Promise(function(resolve, reject){
    fs.readFile(fileName, function(err, data){
      if (err) {
        reject(err);
      } else {
        resolve(data);
      }
    });
  });
}

For extra cool kid points, use arrow functions:

const fs=require('fs');

let readFilePromise = fileName => {
  new Promise((resolve, reject) => {
    fs.readFile(fileName, (err, data)=> {
      if (err) {
        reject(err);
      } else {
        resolve(data);
      }
    })
  })
}

 

or to shrink it a little bit more:

const fs=require('fs');

let readFilePromise = fileName => {
  return new Promise((resolve, reject) => {fs.readFile(fileName, (err, data)=> {err ? reject(err) : resolve(data)})})
}

or to go a bit crazy with the inlining, and make your javascript look almost like haskell 🙂

const fs=require('fs');

let readFilePromise = fileName => new Promise((res, rej) => fs.readFile(fileName, (e, d) => e ? rej(e) : res(d)));

So it’s easy to see that any asynchronous node callback style api can be wrapped in a promise api with 10 lines of readable code, or 1 line of terse javascript.

Adding rubocop to a legacy project

To add rubocop to a legacy project, first grab a .rubocop.yml that specifies your projects code style and then do:

rubocop -c .rubocop.yml --auto-gen-config --exclude-limit 500

Then you’ll want to include the automatically generated todo file into your .rubocop.yml.

inherit_from: .rubocop_todo.yml

Run rubocop. Any violations that you now see will be caused by your config overriding the todo exclusions.  Find the cops causing problems using rubocop -D

Then fix them by doing things like:

increasing your Metrics/LineLength in  the .rubocop.yml

or perhaps setting the Enabled to false.

By doing this you can relatively quickly add rubocop to a legacy project with settings matching an organisations coding style, ready for you to really start making a codebase better.

Report Writing for Occupational Therapists

Over the past 18 months I’ve been working on Report Hero, report writing software to help Paediatric Occupational Therapists write reports.  I’ve built it in partnership with one of the best Paediatric OT’s I know, and I’m happy to see it being used by a number of O.T’s.  If that’s your thing, take a look at the Report Hero website, and sign-up for a trial.

9 Things I learnt while moving data from RedShift into AWS Elastic Search with AWS Lambda

The amazon infrastructure is amazing and allows for interesting and cool scaling without the use of servers. It’s exciting to see what can be done. The trick with much of this is that many of the elements are asynchronous and so it can be easy to flood services, particularly when pulling data out of your RedShift data warehouse and putting it into Elastic Search. I’ve learnt a bunch of things while doing this, the salient points are below.

  1. Don’t gzip the data unloaded.
  2. Use the bulk load on elastic
  3. Use a large number of records in the bulk load (>5000) – fewer large bulk loads are better than more smaller ones. When working with AWS elastic search there is a risk of hitting the limits of the bulk queue size.
  4. Process a single file in the lambda and then recursively call the lambda function with an event
  5. Before recursing wait for a couple of seconds –> setTimeout.
  6. When waiting make sure that you aren’t idle for 30 seconds because your lambda will stop.
  7. Don’t use s3 object creation to trigger your lambda — you’ll end up with multiple lambda functions being called at the same time.
  8. Don’t bother trying to put kinesis in the middle – unloading your data into kinesis is almost certain to hit load limits in kinesis.
  9. Monitor your elastic search bulk queue size with something like this:
    curl https://%ES-SERVER:PORT%/_nodes/stats/thread_pool |jq ‘.nodes |to_entries[].value.thread_pool.bulk’

1 Unloading from RedShift

The process of doing the gunzip in the lambda takes time + resources in the lambda function. Avoid this by just storing the CSV in s3 and then streaming it out with S3.getObject(params).createReadStream().

Here is an unload function that works well for me.

UNLOAD ('%SOME_AMAZING_QUERY_FROM_A_BIG_TABLE%')
TO 's3://%BUCKET%/%FOLDER%'
credentials 'aws_access_key_id=%AWS_KEY_ID%;aws_secret_access_key=%AWS_ACCESS_KEY%'
DELIMITER AS ',' NULL AS '' ESCAPE ADDQUOTES;

2 Use the bulk load in elastic

The elastic bulk load operation is your friend. Don’t index each record one at a time and consume lots of resources, instead send up batches at the same time using the bulk operation.

3 Use a large number of records in the bulk load

More than 5000 records at a time in the bulk load is important to do. Fewer big loads is better than more small ones. See https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html#_how_big_is_too_big for setting the number and size of this.

4 Process a single file in each lambda function

To ensure that you don’t consume too many resources, process a single file in each lambda function then recurse using

Lambda.invoke({
    FunctionName: context.invokedFunctionArn,
    InvocationType: 'Event',
    Payload: JSON.stringify(payload)
});

Use either the promise version or callback version as preferred. Keep track of where you are in the payload.

5 Wait before recursing

Before doing the callback above wait a couple of seconds to give elastic a chance to catch up.
setTimeout(function(){recurseFunction(event, context, callback)}, 2000);

6 Keep the wait short

If you don’t do anything for 30 seconds Lambda will timeout. Keep the wait short. 2 seconds (as chosen above) wasn’t completely arbitrary.

7 Don’t use s3 object creation to trigger your lambda

One of the things we are seeing consistently is trying to control the rate of data flowing into elastic. Using the s3 object creation triggers for the lambda will result in multiple concurrent calls to your lambda function. This will result in too much at the same time. Trigger the lambda some other way.

8 Kinesis isn’t the answer to this problem

Putting the records to index into kinesis will not act as a good way to control the massive flow of data from redshift to elastic. While kinesis is great for controlling streams of data over time, it’s not really the right component for this scenario of loading lots of records at once. The approach outlined throughout this document is suitable.

9 Monitor your elastic resources with curl and jq

Unix commandline tools rock.

curl and jq are great tools for working with http data. curl for getting data, jq for processing json data.(https://stedolan.github.io/jq/)

elastic provides json apis for seeing the data. The below command is how to look up the information on the bulk queue size.

curl https://%ES-SERVER:PORT%/_nodes/stats/thread_pool |jq '.nodes |to_entries[].value.thread_pool.bulk'

Conclusion

Serverless + the AWS stack is nice — you need to think about how to use it and knowing the tools + capabilities of the platform is important — with care you can do amazing things. Go build some great stuff.

Open Badges

I’ve recently been looking into the Open Badges Framework, with a goal of being able to understand what it is from a high-level technical standpoint.

The Open Badge Specification provides a standard for issuing badges

The key participants in the open badge system are:

  • the badge issuer
  • badge displayers
  • badge earners
  • a badge backpack

A badge issuer will create a badge and issue it to a badge earner. The badge will consist of a number of cryptographically verifiable assertions about the badge. With the earners consent an issuer may publish the badge to a badge backpack.

There is a reference implementation of a badge backpack implemented by mozilla. This reference impementation is hosted out of the united states, and is probably the default way to publish badges. The source code for the reference implementation has also been made available for download and deployment (https://github.com/mozilla/openbadges-backpack).

In a healthy open badge ecosystem, there would be a small number of badge backpacks, a larger number of issuers, and an even larger number of earners.

Every organisation that wants to issue badges would need to be an issuer, but most organisations would (and should) be able to use a standard backpack. That said, when dealing with children, legal rules may lead to the creation of regional badge backpacks.

Cool stuff you can do with rails has_one

Over the past three months has_one has shown itself to be the answer to questions that I was struggling to get an elegent answer for.

The two cases I found were:

1. Specifying a belongs_to :through
2. Accessing one item of a list of related records

The second case ends up expanding to be anytime you want to have a named scope that returns a single record.

Specifying a belongs_to :through with has_one

Say you have a model of the form:

You can see that we’ve used the standard has_many and belongs_to methods to model the relationships between the entities. We’ve also thrown in a has_many through to help model the fact that an organization has_many employees. Conceptually we also want to be able to specify the inverse of this relationship on the Employee. Something like:

The way to implement the belongs_to :through concept is to use has_one :through

Accessing one item of a list of related records

There can often be a desire to specify a scope that returns a single record. This isn’t possible in rails, as scopes by definition are chainable and will return subclasses of ActiveRecord::Relation, which is a list like object.

The way to do this is to setup a has_one with a block specifying a scope. For example:

has_one :head_of_department, -> { where role: ‘department_head’ }, class_name: ‘Employee’

For bonus points the proc and conditions can also be parameterised, so we can end up with a has_one relationship with a parameter. To see the coolness of this we can extend the model a little bit to include
a many-to-many entity and a has_many through relationship.

With the above model it would be handy to have an easy way of accessing the ProjectDepartment easily when you are working with projects. has_one to the rescue.

So with our collection of has ones all together, we end up with code something like:

 

has_one is a powerful tool to have in your rails modelling. Take advantage of it to give some really nice clean looking code.

Running Thin in a Thread to serve static files (or how to seed carrierwave images)

Published by Rob on March 27th, 2014 – in Ruby

I recently wanted to have a short-term local webserver. Specifically to be able to seed some images with carrierwave, using the handy external_image_url functionality. The project Gemfile has Thin in it, so it made a bunch of sense to use it. Another important property is that I want to be able to run this simple server from within a rails process. There are three bits of information that I learnt and particulary want to highlight

  • starting a thin server to serve a directory
  • running thin in a thread
  • the CarrierWave remote_#{attribute_name}_url (where attribute name is the name of the attribute you have CarrierWave uploadified – used as remote_image_url throughout the rest of this post)

Here is the code that I ended up with for the thin server:

For the carrierwave I ended up with:

Then at the end I clean up with:

So all in one spot the code is:

The above code is a nice little script for programatically uploading images to your carrierwave models in rails code.