I have never encountered a dataset involving people or places where seasonality had no effect. Wikipedia puts it well:
In time series data, seasonality refers to the trends that occur at specific regular intervals less than a year, such as weekly, monthly, or quarterly. Seasonality may be caused by various factors, such as weather, vacation, and holidays and consists of periodic, repetitive, and generally regular and predictable patterns in the levels of a time series.
In corporate real estate, the easiest seasonality to identify in a dataset involves holidays. They are not evenly scattered across a year leading to occupancy patterns that are not consistent across all parts of the year. In the US, the period from Thanksgiving through New Year’s Day is the most obvious example of seasonality. During this 6-week window, there are at least four business days that are general public holidays where no one is in the office. There are also holiday parties and end of year activities that drive higher than normal occupancy on target days. When looking at the data, this 6-week window often has the lowest average occupancy, but also contains the peak occupancy days.
Seasonality also encompasses:
- Vacation seasons (look at European data to see how the end of July through August window is always significantly lower than average)
- School calendars (having kids out of school often leads to higher rates of PTO)
- Business cycles (many businesses have predictable patterns with higher renewal and sales rates in one quarter over the others)
- Start/End of quarter activity (for example, Finance teams often see the highest in-office activity during the end of a financial reporting period)
- Weather patterns (notably hotter/colder seasons may see changes in occupancy patterns as employee home or commute situations struggle with the extremes)
- Retail sales trends (such as ramping up for the holiday sales season)
I began my career in supply chain and logistics, in part, designing processes to manage the distribution of products in, within, and out of warehouses. This meant looking at inventory and ordering patterns quite a bit. It’s all great to have a warehouse that works on average, but if you aren’t prepared for every client to hit their peak at the same time, you are going to have problems on the days that really matter. It was here that I learned the value of a 13-month analysis.
In my opinion, 13 months is the minimum acceptable time horizon for data used to make long-term decisions. It allows you to see your full seasonality while also providing a one month baseline to see if this was a period of growth, normality, or decline. Anything less than this leaves open questions of what happened during the rest of the year.
I was once provided workplace occupancy data from a four month window and asked to use it to predict how much space would be needed in a relocated workplace. Fortunately, I had worked with data from other offices in the region this office was in and knew some things to look for. The first thing that caught my eye was that the four-month dataset happened to fall during what is usually the slowest occupancy of the year. When I pointed this fact out, I was met with two points in reply:
- They had to measure during this period because otherwise it would have been too busy for them to implement the sensors. (This should have been a red-flag to them, but oh well.)
- It’s probably fine, just throw 20% on top of it and it will probably work out.
If I had gone along with their direction, they would have ended up with an undersized office. The real delta from what it would have otherwise been, when accounting for peaks, was closer to 40% different. By simply picking data during the wrong part of the year for the analysis, they would have accidentally undersized their office because they assumed occupancy was consistent throughout the year.
The problem with datasets less than 13 months is that you can’t know what is missing in those other months. You can guess. You can try to fill in the blanks with data from other sources. You can apply factors and trends to try and fill it in. But missing data is missing data.
Yes, there are times where having 13 months of data is not possible and we must try out best to work around that fact. But, the best approach is not to treat the data as whole and complete, but instead to approach it with a critical eye and question everything. What if this data actually reflects the busiest period? What if it is the slowest? What if it is average? What if it is a complete misrepresentation of occupancy (I have more than a few times seen managers try and game short term studies by requiring their employees to be more in the office than usual during the measurement window). Smaller datasets often require MORE analysis because there should be more questions about what they are missing than a more complete dataset would have.