KISS Beats Accidental Complexity

The summer of 2019, COVID-19 was looming on the horizon, and we all foresaw the need to work remotely for the foreseeable future. An enterprise customer asked me to build an app that served their internal users so they’d never need to leave their home to sell their products.

This is not what happened, in fact COVID-19 wasn’t detected in the USA until Jan-2020.

For the sake of reference, let me call this enterprise customer as SerraFran Devices

SerraFran Devices

Enterprise Customer with global presence; in over 50 countries. They sell IOT devices, software, firmware, and licenses to multiple verticals

Problem Statement

The sales team uses leading CRM (Customer Relationship Management) to run CPQ (Configure Price and Quote). CPQ is a CRM module which allows sales people to configure their quotes with various prices, currencies, and configurations before the quote is sent to the customer. The CPQ provides the sales folks the flexibility to configure quotes, run different pricing scenarios before a quote is sent out to customers/potential customers. The CPQ solution however had (and still has some) drawbacks in that it does not allow for modeling services, existing installation base, service contracts. There was a need to build a solution that’d let the sales folks model various pricing scenarios, overlay market dynamics, correlate with current IB, and visualize impact on topology

Solution

We (my team and I) built a solution with the below technology

  1. Django 2.2 (Model, and Django Admin)
  2. DRF 3.9.4
  3. Python 3.7
  4. Postgres 11
  5. Jinja 2.10.1
  6. Gunicorn 19.9.0

We relied heavily on Django Admin for a lot of CRUD operations for Version 0.1. We onboarded 20 users

Version 0.1 Architecture About 4 weeks in we got a request to build some fancy User Interface and additional features

  1. SPA for modeling scenarios
  2. What-if analysis
  3. Overlay Market conditions
  4. RBAC
  5. Reporting requests querying data from an external data store

Changes to the tech-stack

  1. Celery 4.3 for async tasks
  2. VueJS 2.6

Version 1.0 Architecture

September-2019

We go live with above stack for 30 users and I hear nothing else for a month

October-2019

Customer Can we add more users to the application?

Me: How many users are you thinking of? What is the load going to be? How frequently are people going to use it?

Customer: About 200 users. This is not going to be an app of the HFT (High Frequency Trading) kind. The app should not go down, and that's all we are asking

I look at what we built and say Go for it!! Let me know if something goes wrong

Some time back I get a call from this client: We need to deploy another app and want you to take care of it

I login to their PaaS dashboard and am greeted with the below screenshot

Version 1.0 Architecture

57 months in, and the original app I built is chugging along.

I reach out to the client and ask Hey, hasn't this server been restarted in 4+ years?

And the response I get: I do not have the pem file so I never bothered to restart it. I do not know how much it is being used. I did not want to break whatever is running on it

Me: Let me look for the pem file

I login to the server and here is what I discover

  1. 300+ daily users on average
  2. Disk utilization hovering at about 60%
  3. CPU peaks at 40%
  4. Memory at about 60%

I dig deeper and come to the realization what this application is chugging along even though the database increased in size 20x over the same period.

Auto archival bash scripts scheduled as cron jobs

  1. Backing up database
  2. Offloading old data from Postgres into CSVs, compressing them and purging data from Postgres tables
  3. Log file compression and deletion

All of the above are running on a single Ubuntu Instance with 8GB RAM. Co-located with this app is an instance of Redash, and ETL jobs that run hours. The fact that I had not heard of a single complaint about the app in nearly 5 years. FWIW, enhancement requests do come, but none urgent. There are a few bugs here and there but no show stoppers. And even more astounding, this app is running on an EOL version of Django without having crashed. This app has moderate usage (not heavy)

The latest update is that someone is rewriting this app in React and NodeJS. Below is the architecture. That’s a story for another day.

Rewrite Architecture

Node Architecture The component below are the only ones running on the 16GB RAM server with average memory usage at over 80% for 50 users.