Short Term Model Diagnostics at WPC is the assessment of model output from forecast hour (fhr) 00 (initialization) to fhr 84.  The goal is to provide guidance to internal and external NWS users on which models are most accurately predicting the evolution of the atmosphere over the contiguous US (CONUS).  You are NOT simply trying to determine which model is performing best overall.  Rather, you are conveying where and at what forecast hour a particular model may be favored - with as much objective justification as possible.

Each Numerical Weather Prediction (NWP) model is a very complex system comprised of numerous components.  The more components of a modeling system that perform well, the better the chance the model will provide a reasonably accurate solution.

How does one determine if a model is providing a reasonable depiction of the atmosphere's evolution ?  How does one know which parameters to key in on to provide such an assessment - especially given the myriad of parameters provided by model output AND the short time one is expected to make such a diagnosis ?

These are great questions and the task may seem somewhat daunting, if not impossible given this introduction.  However, the best way to attack this problem is to first gain an understanding of a mode's construct.  Once you obtain this foundation, you will be able to focus your diagnostic efforts and ultimately provide a knowledgeable diagnosis to your customers.

Using a rather typical case (12Z July 15 2004) as an example, this tutorial will provide the following:

Completing this tutorial will provide you the fundamentals required to produce a model diagnostic discussion.  Ultimately, experience will increase your ability to anticipate model performance.  Keep in mind, modelers can benefit from knowledgeable diagnostic assessments because it helps focus where and how to make improvements in a model.

Model Primer
This section will ensure you have gained sufficient understanding on both model structure and the limitations of NWP.

GENERICALLY model output is produced using the following components:

  1. Analysis System - an observational data collection and analysis scheme used to determine the "initial state" of the atmosphere
  2. Dynamic Core - a set of predictive equations* that simulate atmospheric evolution (given an initial state of the atmosphere)
  3. Post Processing System - the production of usable output (data sets and images) from the raw model output

*These equations require no more than 5 parameters to simulate the motion in the atmosphere - temperature, moisture, pressure, wind speed, and wind direction)


Obviously a lot of detail has been omitted, but simplistically initial conditions are run through the model - and the output is post processed before being made available for use.

Inherent Model Errors
To date, every NWP model has been structured in the fashion described above and has served us well, However, it is steeped in error and will NEVER provide a perfect forecast.  This is due to at least four primary reasons - and one additional "gotcha" resulting from post processing:

1. Initial data - Availability
In order to capture the TRUE state of the atmosphere, we would require observations horizontally and vertically down to a subatomic scale over the ENTIRE GLOBE.  No such observational network exists.  Even satellite based observations do not fill in all the gaps (we are forced to fill in the gaps using interpolation schemes).  Further, for a variety of reasons observations are not always taken and/or transmitted.  This means our ability to capture the TRUE state of the atmosphere is limited.  We NEVER know the atmosphere's TRUE initial state. The bottom line is any analysis system introduces error into a modeling system !

Wind data from rawinsondes used to help initialize the then AVN (now GFS) model from 1000 to
700 mb for the 12z cycle May 23, 2001.  The red dots represent data that was deemed unusable by
the analysis scheme.  Note the gaps over the oceans and near the poles.

2. Initial Data - Accuracy
Did you ever hear of a guy named Lorenz ?  Back in 1961 Edward Lorenz was running a simple model which he ran out to a certain forecast hour.  After studying the output, he wanted to run the model further into the future.  Back then even simple models took a VERY long time to produce output.  So instead of rerunning the model over again, he started the model running a forecast hour a little bit earlier than where it had left off.  In order to do this, he manually entered output from the model back into the computer.  Trying to save time, he entered data out to 3 decimal places versus the 6 that had been provided by the model.

The model was restarted and he fully expected the overlapping output the "new" run to match the "old" run.  However, he noted how quickly the new run diverged from the previous run.  Ultimately, he was able to attribute entering data out to only 3 decimal places to be the cause for the difference.  Lorenz thought he could drop the extra 3 decimal places because they were inconsequential.  He discovered in dramatic fashion how sensitive the model was to a change.

The implication of his finding is that observational data used by a model must be accurate out to infinite decimal places as a prerequisite to produce a perfect forecast.  Currently our technology does not allow us to measure any atmospheric quantity to that level of precision.

3. Model Resolution (Space and Time)- The model can not resolve atmospheric processes and features smaller than its resolution
Consider a model with grid points 10 km apart.  When we try to capture the initial state of the atmosphere in a 10 km model, we can not resolve any feature smaller than at least 5x the grid resolution (in this case smaller than 50 km). This also means that the model misses any change that occurs between grid points.  The same is true for the vertical resolution of the model.  Over time these errors accumulate and eventually become the dominant signal in the model output !

The model takes initial conditions and steps them out into the future in small increments of time.  The duration of the time step is governed by the horizontal resolution of the model.  Generally the higher the resolution the model, the smaller the time step required.  For example, a 12 km horizontal resolution requires a time step of about 30 seconds. This means a 12 km model will miss changes in the atmosphere occurring between those 30 second increments. By themselves, these represent miniscule errors. But summed up over the course of a 84 hour run - the errors accumulate and can become a dominant signal in the model output.

Topography at 10 km resolution. Although a 10 km model is high in resolution, any changes occurring between grid points will go unnoticed by the model.

4. Dynamic Core - The equations in the model do not fully capture ALL processes occurring in the atmosphere
The equations in a model deal primarily with wind, temperature, and moisture.  Some of the equations in the model are so highly non-linear that it takes even the most powerful super computers too long to process through the full calculations and produce output in a timely fashion for operational use.  Therefore, some of the more complex terms in the equations are parameterized.  This means some of the complex terms are replaced by less complex and less accurate equations - or even by a climatological or representative value.  Typically atmospheric processes in the boundary layer like convection, friction, heat exchange, etc. are parameterized in operational models greater than 5km horizontal resolution.

5. Post Processing - Model output posted to a grid lower in resolution than the model may provide misleading results
This is a very important issue.  For example, a 12 km model can be made available on a variety of output grids not matching its native 12 km resolution (such as 90 km, 40 km, and 20 km grids).  Viewing a field like QPF from a high resolution model on a relatively low resolution grid may appear to smear low QPF amounts and water down maxima actually predicted by the model.  Likewise 500 mb height output from a 12 km model plotted on a 90 km grid may smooth out the height pattern.  A few very important vorticity maxima may not get plotted on a low resolution grid.

Be VERY careful you are mindful of what you view.

For large scale diagnosis, viewing mass fields on a grid lower in resolution than the model will actually filter out unwanted small scale details.  For fields like QPF, vorticity, snow cover, etc. however, you want to view the output on a grid identical in resolution to the model.

Comparison of 12 km Eta 6h QPF and PMSL forecast on a 90 km grid left and 20 km grid right.  Note the difference in QPF maximum values in southern Nebraska (a difference of over a .25"). This is not a model error, but rather an interpolation difference.  Note how the 90 km depiction blends precipitation areas in Nebraska, Missouri, and Oklahoma whereas the higher resolution depiction does not.  Also note the difference in the mountainous terrain of New England.  This example also shows the ADVANTAGE gained by viewing MASS fields on a grid lower in resolution than the model (the synoptic signal in the PMSL field is much easier to glean).

Summarizing, a model forecast is prone to error because:
1. Our ability to capture the initial state of the atmosphere is imperfect
2. Model resolution is not sufficient to capture all features in the atmosphere.
3. Equations used by a model do not fully capture processes in the atmosphere.

Even if we had perfect model resolution and perfect equations, our ability to assess the present state of the atmosphere is limited.. and contains error.

Model Diagnosis Strategy
Given the construct and limitations of a modeling system, it is logical to perform a model assessment in two phases:

The best place to begin your model diagnosis is in it's initialization.  Once this has been completed, a knowledgeable assessment of the model forecast can be made.

REMEMBER - be cognizant of the difference in resolution between the model and the grid its output has been posted.