Hierarchical Time Series With Prophet and PyMC3 by Matthijs Brouns

Talk Abstract

When doing time-series modelling, you often end up in a situation where you want to make long-term predictions for multiple, related, time-series. In this talk, we’ll build an hierarchical version of Facebook’s Prophet package to do exactly that.

Talk

Matthijs Brouns

Matthijs is a data scientist, active in Amsterdam, The Netherlands. His current work involves training junior data scientists at Xccelerated.io. This means he divides his time between building new training materials and exercises, giving live trainings and acting as a sparring partner for the Xccelerators at his partner firms, as well as doing some consulting work on the side.

Matthijs spent a fair amount of time contributing to his open scientific computing ecosystem through various means. He maintains open source packages (scikit-lego, seers) as well as co-chairs the PyData Amsterdam conference and meetup and vice-chair the PyData Global conference.

In his spare time he likes to go mountain biking, bouldering, do some woodworking or go scuba diving.


This is a PyMCon 2020 talk

Learn more about PyMCon!

PyMCon is an asynchronous-first virtual conference for the Bayesian community.

We have posted all the talks here in Discourse on October 24th, one week before the live PyMCon session for everyone to see and discuss at their own pace.

If you are available on October 31st you can register for the live session here!, but if you are not don’t worry, all the talks are already available here on Discourse (keynotes will be posted after the conference) and you can network here on Discourse and on our Zulip.

We value the participation of each member of the PyMC community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the conference and at all conference events. Everyone taking part in PyMCon activities must abide by the PyMCon Code of Conduct. You can report any incident through this from.

If you want to support PyMCon and the PyMC community but you can’t attend the live session, consider donating to PyMC

Do you have suggestions to improve PyMCon? We have an anonymous suggestion box waiting for you

Have you enjoyed PyMCon? Please fill our PyMCon attendee survey. It is open to both async PyMCon attendees and people taking part in the live session.

10 Likes

I LOL at the title, also, excellent breakdown of the library API design. Could you share the materials during your talk (Streamlit link, etc)?

One limitation I see in the current API is that the change points in LinearTrend model are arbitrary constant intervals - granted it will make computation much faster as the input is static, but it will be much more powerful if we can have uneven interval with the number of the breakpoints (and their location) being inferred automatically.
Also, I really like the parameterization in linear trend which force the change point being “smooth” (ie, they are connect) - it might be useful however to allow the trend line being not connected to account for sudden jumps in the time series (eg, a pandemic that almost switch off the time series). I guess some strongly regulated additional constant added to g in the LinearTrend will do.

1 Like

Very nice combination of Python computational model with PyMC3 probablisitic programming, great job.

2 Likes

good questions @junpenglao!

I LOL at the title, also, excellent breakdown of the library API design. Could you share the materials during your talk (Streamlit link, etc)?

Thanks! The name was a joint effort together with @koaning. I came up with seers as a plural form of a single prophet, and Vincent then suggested to make it time-seers and added the great pun.

The streamlit dashboard can be found on http://prophet.mbrouns.com/
The full repo is here: GitHub - MBrouns/timeseers: Time should be taken seer-iously
The materials I worked through in the talk are here. The second attachment is actually a ipynb but I’m not allowed to upload those so you’ll have to rename it yourself:
plotting.py (3.7 KB)
pymcon.py (599.6 KB)

One limitation I see in the current API is that the change points in LinearTrend model are arbitrary constant intervals

correct: the current approach is to seed the domain with “enough” changepoints and strongly regularize them. That doesn’t feel like the most elegant way to solve this and I’m eager to work on more elegant solutions. The first version of that will be allowing the users to manually define changepoints, but I’m also interested in figuring out more automatic methods. For the next three weeks I’m up to my neck in work for PyData Global and when that’s done I want to push to a first timeseers release

it might be useful however to allow the trend line being not connected to account for sudden jumps in the time series

Definitely, my thinking there at the moment was to model this as an additional regressor component. Would you prefer it in the trend?

3 Likes

I think both way works - depending on how much user would like this to be handle automatically

Happy to chat more - I use structural time series from tensorflow probability at my work, would be great to compare notes.

2 Likes

Part of why I really like the name time-seers … puns aside … is that it allows you to import the tool like:

import timeseers as ts 

It’s a nitpick. But having a timeseries library that can be abbreviated to ts feels very appropriate.

Yea that would be awesome. My agenda is completely full until PyData Global is over, but happy to chat when that’s over!

1 Like

Thanks for this really awesome talk. You are a good teacher. covered, pymc3, matrix algebra, python AND hierarchical time series modelling in an accessible way in such a short time amount of time. I think allot of people are really going to take this new module seer-iously… Is Vincent a Dad by any chance? =P

1 Like

Thanks for the neat tutorial Matthijs, it’s been very useful as an introduction to pymc3 and a great way to dig into what Prophet does under the hood.

1 Like

Thanks for the really interesting talk @MBrouns! Just wondering if the dataset you used for this demonstration is available?

Glad you liked it! The dataset I used was a generated one. I’ve attached it below and you can find the code used to generate it here

timeseries.csv (138.3 KB)

1 Like

Hi Matthijs, Thank you so much for the great and detailed explanation. My question is, how can we use this method to appropriately model different hierarchies? For example the dataset you used has a column called “group” which introduces two “summer and winter” groups which means that the forecasts must be consistent with the summer and winter aggregate signal forecasts. How can we take care of this bottom-to-top aggregation using this library ?

@MBrouns great talk. Seems that the streamlit app is down. Could please share the code for it as well ?
This would be very helpful for learning and explaining the theory.

Best Jan

@Jan_Berthold ah yes, the app was hosted on Heroku and they stopped with their free tier. Let me check whether there’s anything in the repo that shouldn’t be public and I’ll share it after