KISS Beats Accidental Complexity
The summer of 2019, COVID-19 was looming on the horizon, and we all foresaw the need to work remotely for the foreseeable future. An enterprise customer asked me to build an app that served their internal users so they’d never need to leave their home to sell their products.
This is not what happened, in fact COVID-19 wasn’t detected in the USA until Jan-2020.
For the sake of reference, let me call this enterprise customer as SerraFran Devices
SerraFran Devices
Enterprise Customer with global presence; in over 50 countries. They sell IOT devices, software, firmware, and licenses to multiple verticals
Problem Statement
The sales team uses leading CRM (Customer Relationship Management) to run CPQ (Configure Price and Quote). CPQ is a CRM module which allows sales people to configure their quotes with various prices, currencies, and configurations before the quote is sent to the customer. The CPQ provides the sales folks the flexibility to configure quotes, run different pricing scenarios before a quote is sent out to customers/potential customers. The CPQ solution however had (and still has some) drawbacks in that it does not allow for modeling services, existing installation base, service contracts. There was a need to build a solution that’d let the sales folks model various pricing scenarios, overlay market dynamics, correlate with current IB, and visualize impact on topology
Solution
We (my team and I) built a solution with the below technology
- Django 2.2 (Model, and Django Admin)
- DRF 3.9.4
- Python 3.7
- Postgres 11
- Jinja 2.10.1
- Gunicorn 19.9.0
We relied heavily on Django Admin for a lot of CRUD operations for Version 0.1. We onboarded 20 users
About 4 weeks in we got a request to build some fancy User Interface and additional features
- SPA for modeling scenarios
- What-if analysis
- Overlay
Market conditions
- RBAC
- Reporting requests querying data from an external data store
Changes to the tech-stack
- Celery 4.3 for async tasks
- VueJS 2.6
September-2019
We go live with above stack for 30 users and I hear nothing else for a month
October-2019
Customer Can we add more users to the application?
Me: How many users are you thinking of? What is the load going to be? How frequently are people going to use it?
Customer: About 200 users. This is not going to be an app of the HFT (High Frequency Trading) kind. The app should not go down, and that's all we are asking
I look at what we built and say Go for it!! Let me know if something goes wrong
Some time back I get a call from this client: We need to deploy another app and want you to take care of it
I login to their PaaS dashboard and am greeted with the below screenshot
57 months in, and the original app I built is chugging along.
I reach out to the client and ask Hey, hasn't this server been restarted in 4+ years?
And the response I get: I do not have the pem file so I never bothered to restart it. I do not know how much it is being used. I did not want to break whatever is running on it
Me: Let me look for the pem file
I login to the server and here is what I discover
- 300+ daily users on average
- Disk utilization hovering at about 60%
- CPU peaks at 40%
- Memory at about 60%
I dig deeper and come to the realization what this application is chugging along even though the database increased in size 20x over the same period.
Auto archival bash scripts scheduled as cron jobs
- Backing up database
- Offloading old data from Postgres into CSVs, compressing them and purging data from Postgres tables
- Log file compression and deletion
All of the above are running on a single Ubuntu Instance with 8GB RAM. Co-located with this app is an instance of Redash, and ETL jobs that run hours. The fact that I had not heard of a single complaint about the app in nearly 5 years. FWIW, enhancement requests do come, but none urgent. There are a few bugs here and there but no show stoppers. And even more astounding, this app is running on an EOL version of Django without having crashed. This app has moderate usage (not heavy)
The latest update is that someone is rewriting this app in React and NodeJS. Below is the architecture. That’s a story for another day.
Rewrite Architecture
The component below are the only ones running on the 16GB RAM server with average memory usage at over 80% for 50 users.