Challenge #2 – 5 Minute Read

12 November 2018 Solved Twig Easy

In this chal­lenge you will design an algorithm that pre­dicts how long a piece of text takes to read. This chal­lenge was inspired by the amus­ing dis­cov­ery that this excel­lent, rather lengthy art­icle was hard-coded (as were all oth­er blog art­icles on the site) to be a 5 minute read.

Chal­lenge

The chal­lenge is to write a twig macro called read­Time” that will accept a single para­met­er: text (a string of plain­text or mark­down). The macro should out­put a pre­dicted length of time that the text will take to read, using medi​um​.com as a meas­ure of accur­acy. So for example, calling:

{% set entry = craft.entries.slug('our-credentials').one() %}

{{ readTime(entry.text) }}

Might out­put:

1 minute

Where­as calling:

{% set entry = craft.entries.slug('an-annotated-webpack-4-config').one() %}

{{ readTime(entry.text) }}

Might out­put:

50 minutes

How you choose to cal­cu­late the read time based on the con­tent of the text is entirely up to you. The goal is to come up with a solu­tion that pre­dicts read times sim­il­ar to (or bet­ter than) those on medi​um​.com.

Rules

The macro must out­put a pre­dicted read time in minutes, giv­en the para­met­er as described above. It should not rely on any plu­gins and the code will be eval­u­ated based on the fol­low­ing cri­ter­ia in order of priority:

  1. Ori­gin­al­ity
  2. Read­ab­il­ity
  3. Accur­acy

It can use whatever ingeni­ous algorithm you come up with for pre­dict­ing an article’s read time, the more cre­at­ive the bet­ter. The code should nev­er­the­less be read­able and easy to under­stand and the res­ult should out­put read times sim­il­ar to medi​um​.com.

Tips

Begin with the easi­est met­ric, word count, and exper­i­ment and tweak from there. Get invent­ive and add oth­er met­rics to the mix. If you feel up to the chal­lenge then see what you can do with mark­down text as the input (a HTML to mark­down con­vert­er such as this one may help in testing). 

Your res­ults don’t need to be identic­al to every art­icle on medi​um​.com, but the fol­low­ing art­icles will be used as an ini­tial meas­ure of accuracy:

Finally, when you have a work­ing macro, see what read time your macro out­puts for An Annot­ated webpack 4 Con­fig for Fron­tend Web Devel­op­ment and then go tell Andrew Welch!

Solution

Depend­ing on your source, the aver­age read­ing speed of most adults is around 250 words per minute. Accord­ing to this page on medi​um​.com, read time is cal­cu­lated as follows:

Read time is based on the aver­age read­ing speed of an adult (roughly 265 WPM). We take the total word count of a post and trans­late it into minutes, with an adjust­ment made for images.

So to get us star­ted let’s split the text into words using the split fil­ter with the assump­tion that words are sep­ar­ated by spaces. We can then get the num­ber of words using the length fil­ter and divide it by 265 words per minute to give us a read time.

{% macro readTime(text) %}

    {% set words = text|split(' ') %}
    {% set readTime = words|length / 265 %}
    {{ readTime }} minutes

{% endmacro %}

It is pos­sible to solve this with a single line of code by simply chain­ing the twig fil­ters together.

{% macro readTime(text) %}

    {{ text|split(' ')|length / 265 }} minutes

{% endmacro %}

We’ll go back to the first, more read­able solu­tion, and fix 2 poten­tial issues. The first occurs if the read time turns out to be a frac­tion such as 3.25, in which case we’ll round it up using Craft’s ceil func­tion. The second occurs if the read time is 1, in which case we’ll leave the s” out of minutes” to make it singular.

{% macro readTime(text) %}

    {% set words = text|split(' ') %}
    {% set readTime = ceil(words|length / 265) %}
    {{ readTime }} minute{{ readTime > 1 ? 's' }}

{% endmacro %}

Sim­il­ar solu­tions: Quentin Del­court, Spenser Han­non, Doug St. John, Philip Thy­gesen.


Anoth­er way to determ­ine wheth­er to out­put minute” or minutes” is to use Yii’s inter­na­tion­al­iz­a­tion (I18N) plur­al fea­ture along with Craft’s translate or t filter. 

{% macro readTime(text) %}

    {% set words = text|split(' ') %}
    {% set readTime = ceil(words|length / 265) %}
    {{ readTime }} {{ '{n,plural,=1{minute} other{minutes}}'|t({n: readTime}) }}

{% endmacro %}

We could also take advant­age of the dur­a­tion fea­ture to have the dur­a­tion be auto­mat­ic­ally out­put in words. The para­met­er is expec­ted in seconds so we mul­tiply readTime by 60 to acco­mod­ate this.

{% macro readTime(text) %}

    {% set words = text|split(' ') %}
    {% set readTime = ceil(words|length / 265) %}
    {{ '{n,duration,%with-words}'|t({n: readTime * 60}) }}

{% endmacro %}

Up until now we’ve assumed that the text is provided as plain text, yet the chal­lenge stated that it can also be provided as mark­down text. In order to deal with this, we use the markdown fil­ter provided by Craft to con­vert it into HTML code, fol­lowed by the striptags fil­ter to remove all HTML tags, thereby con­vert­ing it to plain text.

We can also get smarter about how we identi­fy words, repla­cing all dashes and new lines with spaces using the replace fil­ter before split­ting the text.

{% macro readTime(text) %}

    {% set html = text|markdown %}
    {% set plaintext = html|striptags|replace({'—': ' ', '–': ' ', '-': ' ', '\n': ' '}) %}
    {% set words = text|split(' ') %}
    {% set readTime = ceil(words|length / 265) %}
    {{ readTime }} minute{{ readTime > 1 ? 's' }}

{% endmacro %}

Sim­il­ar solu­tions: Patrick Har­ring­ton.


If we assume that the text is provided as mark­down, we can take the num­ber of images into account in the read time cal­cu­la­tion, using the replace fil­ter with a reg­u­lar expres­sion that looks for all instances of ![...](...) in the mark­down text, or for all instances of <img ...> in the HTML, fol­lowed by the split fil­ter. That allows us to cal­cu­late the num­ber of images in the text and divide it by an arbit­rary images per minute” of 12, assum­ing that people spend an aver­age of 5 seconds look­ing at each image (this will of course depend on the type of image: a cat versus a com­plex graph).

{% macro readTime(text) %}

    {% set html = text|markdown %}
    {% set imageCount = html|replace('/<img ([^>]+?)>/', '%%IMAGE%%')|split('%%IMAGE%%')|length - 1 %}
    {% set plaintext = html|striptags|replace({'—': ' ', '–': ' ', '-': ' ', '\n': ' '}) %}
    {% set words = text|split(' ') %}
    {% set readTime = ceil((words|length / 265) + (imageCount / 12)) %}
    {{ readTime }} minute{{ readTime > 1 ? 's' }}

{% endmacro %}

Sim­il­ar solu­tions: Nate Iler, Alex Rop­er.


At this stage we’ve got a pretty accur­ate solu­tion that takes words and images per minute into account. I even ran the code above on the An Annot­ated webpack 4 Con­fig for Fron­tend Web Devel­op­ment art­icle and got an amaz­ingly roun­ded 90 minutes”!!

So, what oth­er approaches could we take?

This solu­tion by Matt Stein counts the num­ber of unique words in the text using Craft’s unique fil­ter and does some maths to take the com­plex­ity of the text into account when cal­cu­lat­ing a read time.

This solu­tion by Andrew Welch takes a dif­fer­ent approach to cal­cu­lat­ing the num­ber of words in the text. It counts the num­ber of char­ac­ters in the entire text and divides that by an aver­age word length of 5.1 char­ac­ters. The solu­tion also caches the cal­cu­la­tion glob­ally using Craft’s {% cache %} tags with a key con­struc­ted from the URL and the name of the entry field. This avoids hav­ing to cal­cu­late the entry’s read time on every request.

{% cache globally using key craft.app.request.url~"entry.someRichText" %}
    {{ readTime(entry.someRichText) }}
{% endcache %}

Submitted Solutions

  • Quentin Delcourt
  • Patrick Harrington
  • Nate Iler
  • Andrew Welch
  • Spenser Hannon
  • Doug St. John
  • Matt Stein
  • Alex Roper
  • Philip Thygesen