Relatively speaking the initialization assessment was the easy phase of the model diagnosis. That is because you had observations to compare against the model initialization. A prognostic assessment is a bit more nebulous - you need to determine if or not the model is correctly predicting the pattern before the pattern verifies !
So how do you determine if a model is predicting a pattern properly ??? Your meteorological experience combined with your initialization assessment will serve as a guide.
Comparisons
A good starting point in prognostic assessment
model comparison (valid at the same lead time). Often you will encounter
situations where model solutions differ from each other yet seem to be equally
likely. You can more confidently rule out solutions whose differences can
be traced back to initialization discrepancies.
But which of the myriad of model output fields should you elect to compare ? That is a very good question, but here is a good tip.
Parameterization schemes and post processing algorithms require raw model data to provide output. This implies that an IDENTICAL parameterization scheme or post processing algorithm run in DIFFERENT MODELS will most likely NOT produce the same result.
"Why ?" you ask. The differences can occur solely due to model architecture (differences in horizontal or vertical resolution or coordinate systems).
For example, the same precipitation type algorithm used in the meso and global models will most likely offer a slightly different solution at a given lead time because the meso model has a higher horizontal and vertical resolution.
Similarly, model output generated from microphysical processes (packages) typically differ from one model to the next. An example would be moisture and QPF. The microphysics packages and parameterization schemes are different between the NAM and GFS. A difference in QPF may result solely because of parameterization differences instead of differences in forecast of heights, PMSL, temperature and wind fields (collectively termed "mass fields").
Although the microphysics packages and parameterization schemes are not truly independent from mass fields (and vice versa), your best strategy in a model comparison is to view output not directly related to those derived from these algorithms and schemes.
The scale at which you view comparisons should be larger than the CONUS. Ideally you should view comparisons at the North American scale. This will allow evaluation of all systems most likely to impact the Continental U.S.
The levels you wish to view mass field comparisons generally match both those you provided an initialization assessment and those most likely utilized for impending significant events (snow, heavy rain, wind, etc.).
To date one of the most efficient systems to perform a prognostic assessment at HPC is NAWIPS (NMAP and NTRANS).
Below is a comparison of the Eta and GFS
solutions from fhr 00 to fhr 72 at 500 mb...
... and at 700 mb.
It is a good idea to take a step back and look at the big picture. Both models are predicting an unusual winter or late spring like blocked pattern to persist over the next 72 hours. Both models agree on support a blocking ridge to just south of the 4 corners area of the American Southwest.. rising heights in the north Atlantic.. replacing the initial low in the east with more energy.. maintaining fast flow from Hudson Bay on north.. and even the location of the remnants of Blas.
The most notable differences at these levels are with the timing and strength of the Pacific cut off low and the 700 mb low in the east. More specifically, in the Pacific the GFS has a deeper and slower solution with the cutoff after fhr 36 while the Eta has a more open and therefore more progressive solution.
A look at the SREF mean and spread output at
500 mb shows that the greatest uncertainty is with this system.
And there is good clustering of solutions of
the SREF components (no clustering of only the "like" [RSM or Eta]
members) so the mean solution would be a good first guess. Historically
verification shows that the ensemble mean tends to outperform individual model
solutions for mass fields.

IMPORTANT: Ensemble MEAN fields typically provide a solution weaker in amplitude compared to operational deterministic runs. Pay more attention to the phase (location) of features in a mean field. More information on the ensemble system can be found at http://meted.ucar.edu/nwp/pcu1/ensemble and http://www.hpc.ncep.noaa.gov/ensembletraining
Recall in the initialization assessment both models seemed to under initialized the strength of the Pacific shortwave. If you step these above loops backwards you can see the differences between the two model solutions arise from the area where that shortwave had been identified on satellite. Further, recall that the GFS, although underdone, seemed to have a better handle on the initialization of this feature than the Eta.
Given that information, the GFS should immediately be the preferred solution with the evolution of this system in the Pacific. However, although the GFS is preferred, it was under-done with the strength of the shortwave at initialization. The actual system will very likely be a bit stronger, and therefore less progressive, than what the GFS is depicting.
Lets take a look at the surface comparisons
between the Eta and GFS.
Since we will buy off on the GFS like solution in the Pacific (actually slower and stronger than the GFS is depicting), we can focus on the differences in the east. We had noted differences at 700 mb and are now noting differences at the PMSL field.
Here is a case where in the east both models solutions at mid levels seem equally likely but differences arise as you get lower in the atmosphere. Not having noted any serious issues with initialization leaves us with little objective information to help determine which solution is more likely.
Further, the SREF is unavailable as it only runs out to fhr 63. At this point its a good idea to look at some trends.