Setting up the Historic Data

Forecast Pro works with historic time series, i.e., sequences of values of a variable measured at regular time intervals. Business series are usually expressed in terms of the calendar and can use any available frequency (e.g., hourly, daily, weekly, monthly, or quarterly). Forecast Pro can work with any type of time series but is oriented towards daily, weekly, monthly, and quarterly calendar data.

Forecast Pro requires historic data for your target variable (the variable you want to forecast) to be input in a specific format. This section explains the data formats supported by Forecast Pro and describes how to create the files. If you read the material carefully and examine the sample data files that are shipped with Forecast Pro, you will be up and running quickly.

Collection and maintenance of reliable historic data is up to you and your IT department and can be implemented in many ways, depending on information flow conditions in your organization.

Data length

Forecast Pro works by fitting a statistical model to your historic data and extrapolating it via the fitted model. Thus, your data must be long enough to provide reasonably stable estimates of the most important features of the data. Very short or very noisy historic data usually yields very simple models, because the data are too short to support statistical estimates of important features like seasonality.

If the data are very short, say four points or fewer, Forecast Pro cannot detect seasonality or trend and reverts to a Simple Moving Average model.

For more than four points but less than two years’ worth of data, Forecast Pro can fit and forecast trends but not seasonality. If your data are in fact nonseasonal, your forecasts are likely reasonable. But if your data are in fact seasonal, the forecasts are likely to be poor—Forecast Pro cannot detect or forecast the seasonality.

However, seasonal forecasts from short data sets are feasible using some of Forecast Pro’s customized approaches. For example, when the short data sets are nested within aggregate product groups with longer histories, top-down forecasting can be used. If there are other longer series which exhibit the same seasonal pattern then the pattern can be estimated from the longer data set and applied to the short data sets using the INDEXES modifier, a custom component model or the forecast by analogy model. Finally, you can map history from one or more items to your short data history item to create a longer history to use for forecasting.

Seasonality can be estimated and forecasted from two to three years of data, but this amount of data is marginal, especially when your data are noisy or intermittent. Patterns in the noise may be mistaken for seasonality, yielding inappropriate “seasonal” forecasts.

Robust capture of seasonality requires three or more years of data. Four to seven years is even better since there is more information from which the program can separate seasonality and trend from the noise.

Missing values and zeros

Each of the statistical forecasting methods requires an unbroken stream of historic values. Unfortunately, missing values are common in business data. You therefore need a well-formulated approach to missing data. There are two parts to the approach. You must first decide how to encode missing values in the input data file. Then you must decide how these missing values are to be treated by Forecast Pro. This is controlled by several settings on the Formats tab of the Options dialog.

The encoding of missing values depends upon the data format. However, one problem is common to all the formats—the distinction between missing values and zeroes. Forecast Pro ordinarily considers zeroes to be actual data values and treats them as such. If they were really intended as missing values, then the forecasts can be badly biased.

Forecast Pro treats a sequence of leading zeroes as missing values if you check “Ignore leading zeroes” on the Formats tab of the Options dialog, and it treats trailing zeroes as missing if you check “Ignore trailing zeroes”. If you can avoid using this strategy, do so. Its dependence upon these flag settings makes it a little risky.

When historic data is read into Forecast Pro, the program establishes a global ending date for the history which corresponds to the most recent observation available for any item being read in. If a given item does not have a historic observation for the global ending date, the item is flagged as “dead.” Dead items are not forecasted but their demand history goes into the group totals. This can be an issue if your data source code’s zeroes as missing (e.g., blank cells in Excel, no transaction in a table or query, etc.). If you check “Set trailing blanks to zero” on the Formats tab of the Options dialog, Forecast Pro will replace missing trailing observations with zeroes. If you can avoid using this strategy, do so. Its dependence upon these flag settings makes it a little risky.

Zeroes in the middle of your data are always treated as numerical values. Therefore, embedded missing values must be explicitly coded as missing. Since the way you do this depends upon the data format, it will be addressed as we discuss each specific data format.

Forecast Pro interprets your input data and missing values as the data are read. To change treatment of missing data, you must reread the data from scratch. The interpretation of zeros and missing values depends upon the “Ignore leading zeroes” and “Ignore trailing zeroes” flags and upon the “Missing values” setting on the Formats tab of the Options dialog box. There are three possible settings for missing values—Truncate, Impute or Zero.

Truncate directs Forecast Pro to use only the most recent unbroken stream of data. All data up to and including the last missing value are discarded.

Impute directs Forecast Pro to discard leading and trailing missing values but to impute embedded missing values via linear interpolation.

Zero directs Forecast Pro to reset missing values to zeroes and to treat them as numerical values. This setting is appropriate when the data file is written from a database in which only the non-zero records are stored. In other cases, it might lead to badly biased forecasts.

Header information

Once you have collected your time series data, you will need to define the header information. The required header information includes the following.

Variable name. The name by which the program refers to an individual item (often an SKU). An item name can contain any Unicode characters except \ (backslash) and | (pipeline) are prohibited. The word NULL is a reserved keyword and is also prohibited.

However, you may also be limited by restrictions imposed by the software to which you export your data. SQL databases, for instance, sometimes require that the first character in a name is a letter and do not regard upper and lower cases as distinct.

Variable description. A short description of the variable. Variable descriptions must not exceed 256 characters in length. Variable descriptions are displayed within Forecast Pro and can be included in Numeric Output Files and Formatted Forecast Report Files.

Starting year. The year and period corresponding to the first available data point. If the first available observation for a monthly series was April 2016, then the starting year would be 2016 and the starting period would be 4. The starting year must be 100 or later. For data that are not calendar-oriented, enter 100 for the starting year and 1 for the starting period.

Periods per year. 12 for monthly data, 4 for quarterly data, etc. For data that is not calendar oriented, use 1. Note that most weekly corporate calendars involve occasional years of 53 weeks—these occurrences will cause the forecast date tags to be off by a week.

Periods per cycle. The number of periods per seasonal cycle, usually equal to the number of periods per year. An exception might be daily data, where periods per cycle could be set to 7 to capture weekly patterns and periods per year set to 365. Or, if weekend days are excluded, periods per cycle might be 5 and periods per year 260. For nonseasonal data, periods per cycle should be 1. This is a critical entry because Forecast Pro uses it for seasonal adjustments.

Optionally, you may include attribute columns before the variable name. Attributes are used to define the data hierarchy, as described in Defining the Hierarchy.

Formatting rules

Keep the following points in mind when you are creating your spreadsheet.

If you are not using attribute fields each variable name must be unique.
If you are using attribute fields to define product groupings within a hierarchy, each variable name within a given group must be unique.
If you use zeroes to pad the data prior to an item’s availability, make sure that “Ignore leading zeros” is selected on the Formats tab of the Settings Options dialog box.
It is not necessary that the individual item histories begin and end at the same time.

TRAC General Operations Tutorials

FP100 & Extended General Operations Tutorials

Forecasting Operations Tutorials

Reference

Setting up the Historic Data

Data length

Missing values and zeros

Header information

Formatting rules