Adam Caudill

Security Leader, Researcher, Developer, Writer, & Photographer

Generating Content Stats for Hugo

Producing useful insight into your content

Image: Photo by Stephen Dawson on Unsplash

I recently became curious just how much time I had spent working on content for this site, which led me to an idea: it would be great to have a page that listed some useful data about the content, and how much effort was put into it. I had some hope that I could pull some of this directly out of Hugo, though unfortunately it didn’t expose the information I wanted (and certainly not in an efficient way).

Building JSON Stats Data #

I quickly decided that the simplest solution here was to use a simple Python script to collect the data I needed. This would be easy to integrate into the Cloudflare Pages build process, and would make it simple to add new information over time.

In this case, I wanted a few data elements:

  • Total Number of Posts
  • Total Number of Words
  • Total Number of Characters
  • Total Estimated Reading Time
  • Total Estimated Writing Time

This are simple enough to gather, and I put this script together.

This script simply looks for all of the .md files in content/posts/, uses the frontmatter library to load the content and return the body without the frontmatter (so it doesn’t skew the numbers), and then collects the information from each file.

There are a couple somewhat arbitrary choices in here that can impact the results:

  • Reading Speed: The general consensus from a few minutes of searching is that the standard value used to estimate reading speed is 200 words per minute.
  • Writing Speed: According to one source the average writing time varies quite dramatically; from 5 words per minute for “in-depth essays or articles” to 40-70 words per minute for other writing. When factoring editing, revisions, collecting feedback on drafts, and other add-on time required to go from a raw collection of words to a ready-to-post article, the 5 words per minute number seems most accurate to me & for my workflow. That said, it could be completely different for you.

When this script is run, it will produce a stats.json file in the content/stats/ directory. It should look something like this:

{
   "blog_post_count":342,
   "blog_total_chars":1284579,
   "blog_total_words":197629,
   "blog_average_words":577.8625730994152,
   "blog_reading_time":988.145,
   "blog_writing_time":39525.8,
}

This can then be read in, and used in the content/stats/index.md, being displayed however you like.

Displaying the JSON Data #

While Hugo does have options for displaying data, such as with Data Templates, or more manually through the getJSON function, though these seemed inelegant for my needs. Thankfully I found another option; a shortcode that reads in a JSON file, and makes the values easy to use with a simple syntax: {{< jsondata src="data.json" var="max_date" >}} - this makes it extremely easy to embed JSON data wherever it’s needed.

I did need to make a change to the shortcode to suite my needs however, the way some larger numbers are displayed wasn’t quite what I needed, and using the format parameter wasn’t able to resolve it. So I added a check for numeric datatypes when a format wasn’t specified, and used Hugo’s lang.NumFmt to format the number in a more human readable way.

{{- $json_filename := .Get "src" | default "data.json" -}}
{{- $json_data_filepath := path.Join "content" (path.Dir .Page.File) $json_filename -}}
{{- if fileExists $json_data_filepath -}}
  {{- $json_data := getJSON $json_data_filepath -}}
  {{- $json_varname := .Get "var" -}}
  {{- $var_value := index $json_data $json_varname -}}
  {{- if $var_value -}}
    {{- $json_format := .Get "format" -}}
    {{- if $json_format -}}
      {{ printf $json_format $var_value }}
    {{- else -}}
      {{- $type := (printf "%T" $var_value) -}}
      {{- if or (eq "int" $type) (eq "int64" $type) (eq "float64" $type) -}}
        {{ $var_value | lang.NumFmt 0 }}
      {{- else -}}
        {{ $var_value }}
      {{- end -}}
    {{- end -}}
  {{- else -}}
    {{ errorf "Cannot get the value of the variable %s 
       from the data file: %s" $json_varname $json_data_filepath }}
  {{- end -}}
{{- else -}}
  {{ errorf "Cannot find the file: %s" $json_data_filepath }}
{{- end -}}

Hopefully this is useful for you as it is for me, and saves someone else some time.

Adam Caudill


Related Posts

  • Hugo & Content-Based Related Content

    During my Christmas vacation last year, I converted this site from WordPress to Hugo; while I’ve been happy with the change, a couple of features are missing. One of these is that there was a section with related content at the bottom of each post. I wanted to get it back. Thankfully Hugo has native support for Related Content, so while I was hoping this would be a simple task, there’s a note that made things substantially more complicated:

  • Snapchat: API & Security

    Update 3: In 2014 the FTC filed a complaint against Snapchat for their failure to provide the level of security they promised. The findings listed below were sent to the founders of Snapchat, that email was quoted in the FTC compliant as proof that Snapchat was aware of these issues. Update 2: The Snapchat API has changed to address the issues I pointed out to them – and the new API has issues as well.

  • The (Questionable) Future of YAWAST

    The last release of YAWAST was on January 1, 2020; while the release history was sometimes unpredictable, the goal was a new release each month with new features and bug fixes. I intentionally took January off from the project. In February, I left the company I was at; the team of penetration testers there had helped to inspire new features while looking for ways to make them more productive. But something else happened in February, an issue was opened – something that appeared to be simple, but in fact, made me realize that the entire project was in doubt.

  • YAWAST v0.7 Released

    It has now been over a year since the last major release of YAWAST, but today I am happy to release version 0.7, which is one of the largest changes to date. This is the result of substantial effort to ensure that YAWAST continues to be useful in the future, and add as much value as possible to those performing security testing of web applications. If you are using the Gem version, simply run gem update yawast to get the latest version.

  • Exploiting the Jackson RCE: CVE-2017-7525

    Earlier this year, a vulnerability was discovered in the Jackson data-binding library, a library for Java that allows developers to easily serialize Java objects to JSON and vice versa, that allowed an attacker to exploit deserialization to achieve Remote Code Execution on the server. This vulnerability didn’t seem to get much attention, and even less documentation. Given that this is an easily exploited Remote Code Execution vulnerability with little documentation, I’m sharing my notes on it.