Difference between revisions of "Template:GUM18"

From Gsshawiki
Jump to: navigation, search
(17.4 Monte Carlo Runs)
 
Line 9: Line 9:
 
=18.3 Automated Calibration with Shuffled Complex Evolution=
 
=18.3 Automated Calibration with Shuffled Complex Evolution=
 
{{Alternate_Run_Modes:Automated Calibration with Shuffled Complex Evolution}}
 
{{Alternate_Run_Modes:Automated Calibration with Shuffled Complex Evolution}}
 
 
=18.4 Monte Carlo Runs=
 
=18.4 Monte Carlo Runs=
 
{{Alternate_Run_Modes:Monte Carlo Runs}}
 
{{Alternate_Run_Modes:Monte Carlo Runs}}
 +
=18.5 ERDC Automated Model Calibration Software=
 +
{{Alternate_Run_Modes:ERDC Automated Model Calibration Software}}
 +
==18.5.1 Efficient Local Search==
 +
{{Alternate_Run_Modes:Efficient Local Search}}
 +
==18.5.2 Effective and Efficient Stochastic Global Optimization==
 +
{{Alternate_Run_Modes:Effective and Efficient Stochastic Global Optimization}}

Revision as of 22:35, 3 April 2012


There are several modes of running GSSHA. Each of these is for a particular situation, and some require specific hardware in order to be useful. Each of these alternate run modes is set from the command line when you run a project. These alternate run modes are:

Parallelization Models

  • OpenMP - Run a shared-memory parallelized version of GSSHA.
  • MPI - Run a distributed-memory parallelized version of GSSHA. Works on 32- or 64-bit Microsoft(R) Windows(R) machines. Must have the mpich routines installed and running. Compiled as a special exectable for 32- and 64-bit.

Control Models

  • Batch - Run several versions of a single simulation. Can run in serial, OpenMP, or MPI mode. Use -b command line option.
  • Calibration - Run GSSHA using the Shuffled Complex Evolution calibration routine as the controller. Can only run in serial mode. Use -c command line option.
  • Monte Carlo - Run GSSHA using a Monte Carlo calibration routine as the controller. Can run in serial, OpenMP, or MPI mode. Use -m command line option.
  • Efficient Local Search - Run GSSHA using the Levenberg-Marquardt (LM) local search method, or the Secant LM (SLM) method, an efficiency enhancement to the LM method, calibration routine as the controller. Use -slm command line option.
  • Multistart - Run GSSHA using the Multistart stochastic global optimization calibration routine, which uses the LM/SLM method for local searches as the controller. Use -ms command line option.
  • Trajectory Repulsion - Run GSSHA using the Trajectory Repulsion stochastic global optimization calibration routine, which uses the LM/SLM method for local searches as the controller. Use -tr command line option.
  • Effective and Efficient Stochastic Global Optimization - Run GSSHA using the Multilevel Single Linkage (MLSL) stochastic global optimization calibration routine, which uses the LM/SLM method for local searches as the controller. Use -mlsl command line option.

Inset Models

GSSHA is able to share data between individual models.


18.1 MPI and OpenMP Parallelization

In GSSHA, there are three parallelization modes supported. When running a simulation, a control mode and a parallelization mode are both selected; the default for most runs is a single-threaded (serial), run once mode. Apart from the single-thread or serial mode, GSSHATM can be run in OpenMP or MPI mode.

OpenMP

OpenMP is a parallelization paradigm that assumes you are running on a machine with multiple logical processors and a shared pool of memory. When running, OpenMP splits up some portions of the code to run in parallel on multiple threads, while others are run on a single thread or processing unit.

The following is a list of the portions of the code that have been parallelized using OpenMP.

  • Overland boundary conditions
  • Rain gage nearest neighbor algorithm (for Theissen polygon interpolation)
  • Rain gage inverse distance algorithm (for IDW interpolation)
  • Green and Ampt infiltration algorithm
  • Multi-layer Green and Ampt infiltration algorithm
  • Green and Ampt w/ Soil Moisture Redistribution algorithm
  • Evapotranspiration algorithm
  • NSM stream kinetics
  • NSM overland kinetics
  • NSM soil-overland interaction
  • Explicit, ADE, and ADE-PC overland flow algorithms
  • Some parts of the SCE routine

There are some significant portions of the code that have not been parallelized with OpenMP yet, such as groundwater, Richard's infiltration, and constituent transport.

The speed-up from using OpenMP will depend on various factors such as:

  • How big is the simulation (how many cells)
  • What portion of work is done by the parallelized codes v. the non-parallelized codes
  • How many threads (logical processors) are you using (versus how many actual processing units you have)

The one factor you have control over for OpenMP, if you are running in single-run mode, is the number of threads you want to dedicate to a particular process. For batch, SCE, and monte carlo methods it sets the thread count to the total number of logical processors you have running. Note that if you have a CPU with hyperthreading then each processor on the CPU appears as two logical processors. Thus a dual-core CPU with hyperthreading turned on would appear as four logical processors. There is some improvement from hyperthreading, especially if a lot of writing to output files is being done.

To set the thread count for single-run modes, use the following project file card.

NUM_THREADS ##

where ## is the number of threads to set; must be greater than or equal to 1. If you run a version of GSSHATM that was not compiled with OpenMP, the NUM_THREADS project card will result in a warning that it was not compiled with OpenMP; GSSHATM will then continue to run in single-thread mode.

MPI

Unlike OpenMP, MPI mode currently is only useful for running multiple versions of a single simulation. This is best suited for monte carlo runs on supercomputers. To run in MPI mode, you must use an external program that handles the setup and communication processes, such as MPICH. Typically these program require that you specify the GSSHATM command line as part of the command line for the MPI program. The GSSHATM command line for running the various control modes is specified as part of the control mode description in the following sections.


18.2 Batch Mode Runs

Batch Mode

Batch mode is designed for running many different run of the same basic model but each run having a few different parameters. To run in batch mode, a project should be set up with replacement parameters and the parameters file. The other file needed is the values file, described below. Once the values file and the replacement file are created, they must be defined in the project file by using the following cards:

REPLACE_PARAMS  “params.in”
REPLACE_VALS  “values.in”

where “params.in” is the name of the parameters file, and “values.in” is the name of the values file. If it is desired to run a single simulation from the many that have been set up, you can put

REPLACE_LINE   ##

In the project file before the REPLACE_PARAMS and REPLACE_VALS lines, where ## is the line number of the values file (starting with 1 for the first line.)

It should be noted that any part of any file can be changed once the REPLACE_PARAMS and REPLACE_VALS lines are read, but not before. Since the first file read in is the project file, any part of any line after these lines can be replaced, but not before. The only other files that are read in as the project file is being read in are the time series files. So any time series file that is specified in the project file before the REPLACE_PARAMS and REPALCE_VALS lines would not be able to be replaced. In practice, it is best to simply put the REPLACE_PARAMS and REPLACE_VALS lines after the watershed mask and WMS headers.

If you wish to put all output files into a separate folder for easier viewing and managing, then add the card

REPLACE_FOLDER  “path/to/folder/”

where “path/to/folder/” is either an absolute path (if no project file path) or a path relative to the project file path (if specified using PROJECT_FILE) or a path relative from where GSSHA was run (if no project file path.) If you are running on windows, use DOS style paths (like\this\path\) or if on unix or linux machine, use unix style paths (like/this/path/).

To run GSSHA™ in batch mode, for non-MPI runs, use the following command:

gssha –b## projectfile.prj

For example, the command gssha –b11 myproj.prj would run 11 runs of GSSHA using the first 11 lines of the values file. Each of these runs would be done one after the other.

For runs with MPI enabled, you must also specify how many processors should be dedicated to a single run, so there is an extra value in the command line to specify this number.

gssha –b## ## myproj.prj

For example, the command gssha –b11 10 myproj.prj will dedicate 10 processors to each run, and should have some multiple of 10 for the total number of processors it is running on, such as 40 or 100. If there are more processors available than the minimum specified for a run, then the runs will run in parallel. For example, if you have are able to run the command gssha –b11 2 myproj.prj on 22 processors, then they will all run simultaneously

Values File

The values file is essentially a list of all parameter values for individual runs. The values for each run go across a line, so each line should contain all values for a run. The order of the values across the line (from left to right) should be the same as the order in the parameters file (top to bottom.) If any string is empty or has a space or other whitespace in it, it should be enclosed in double quotes.

For example, a values file corresponding to the parameters file above could be:

5  2.6  ADE  0.4
10  2.6  ADE  0.4
15  2.6  ADE  0.4
5  1.6  ADE  0.3
10  1.6  ADE  0.3
15  1.6  ADE  0.3
5  2.6  EXPLICIT  0.4
10  2.6  EXPLICIT  0.4
15  2.6  EXPLICIT  0.4

This values file has 9 lines, so to run with it you would use a command-line option of –b9.

The filenames for all output for a batch mode run is prepended with a number, starting at 0 and corresponding to each line of the values file.

Parameters File

The parameters file lists the names of the parameters, in square brackets, and the C-style replacement string. For more information, see the Simulation Setup for Alternate Run Modes


18.3 Automated Calibration with Shuffled Complex Evolution

Calibration Mode

The automated calibration method is intended to both ease the burden of manual calibration and to increase the users ability to explore the parameter space and search for the optimal solution as defined by the user. GSSHA™ employs the Shuffled Complex Evolution method (SCE) (Duan et al., 1992). The SCE method was originally coded in FORTRAN as a stand-alone program using text files to communicate with other models but was recoded by Fred Ogden and Aaron Byrd into C++ as a class that can wrap around any functional model and incorporated into the GSSHA™ source code.

The SCE method uses the result of a cost function to assess the progress of the calibration. The cost is a numerical value that indicates goodness of fit of a simulation to a observed data set. As of the date of this update to the wiki, the automated calibration method contained within GSSHA™ can be used to calibrate parameters for discharge and for sediments. Calibration to outlet discharge can be performed for overland flow and channel discharge. For sediments, automated calibration can only be used for channel sediment discharge. Sediment parameters are calibrated to the wash load, or fines, which is represented by values of Total Suspended Solids (TSS). Calibration to internal points can only be done for channel discharge. The cost is derived from a comparison of storm event peak discharges and volumes between measured and observed. For sediments, peak sediment discharge and event sediment discharge volume are used. Calibration is typically conducted for long term simulations with multiple events, but calibrations to a single event can also be performed.

The absolute difference between observed and simulated for each storm event peak discharge and volume is calculated and then normalized by dividing the absolute error by the observed peak discharge or storm volume. This results in errors being specified as fractions of the observed values. The user specifies the importance of each observation with weighting factors that are applied to each of the observed values. The cost is the sum of the errors multiplied by the weighting factors. The SCE method will attempt to minimize this cost by varying parameter values within a range specfied by the user in the SCE control file. The definition of the cost will play a large role in the final set of parameter values derived by the SCE process. For the SCE method as incorporated into GSSHA™, the weights placed on each event peak discharge and volume determine how the cost function is calculated. Thus, the weights specified by the user have a significant effect on the final set of parameter values, and the resulting parameter values may change dramatically as the weights are modified. Therefore, if the calibration converges on results deemed inadequate for the purposes of the study, different results may be obtained by changing the weights and repeating the calibration. Additional details on the method can be found in Senarath et al. (2000).

The SCE method is very good at finding the absolute minimum cost while avoiding becoming trapped in local minima. However, the methods employed result in numerous simulations being required. Depending on the number of parameters to be calibrated, the paramater ranges, and the required convergence criteria the method will require hundreds to thousands of simulations to converge on a solution. Somewhere between 500 and 5,000 simulations are typically required. More simulations are required for more parameters, for larger parameter ranges, and for smaller convergence criteria. For this reason, the number of calibration parameters should be minimized, with only the most sensitive paramters being calibrated.

To utilize the SCE method for calibration in GSSHA™, GSSHA™ is run in the calibration mode by typing the following on the command line

gssha –c calib_file.in

where calib_file.in is the name of the calibration control file.

The calibration control file, as described below, contains all of the information needed to control the SCE simulation. In should be noted that for automated calibration the REPLACE_PARAMS and REPLACE_VALS cards are NOT used in the GSSHA™ project file.

Calibration Input File

The SCE control file contains the file names for the project file (Section 3), the parameter list (Section 18.2), and the observed data file (described below), followed by the control parameters, and then a list of initial, minimum and maximum values for the parameter list. The SCE method will search the parameter space for the supplied parameter list within the range of values specified under the conditions described in the control parameters until the method converges on a solution or until the maximum number of simulations is exceeded. The calibration file contains the following information in the format displayed below:

Projname.prj
Params.in
Observed.dat
[maxn]	[kstop]	[pcto]	[ngs]	[iseed]	[use_defaults]	[No. params]
[npg]	[nps]	[nspl]	[mings]	[iniflg]	[iprint]	
[init val]	[low val]	[high val]
[init val]	[low val]	[high val]
…
[init val]	[low val]	[high val]

Where: in line 1

Projname.prj is the project file name
Params.in is the parameters file name (described in section 18.?)
Observed.dat is the observed data file name (described below)
Maxn =maximum number of simulations
Kstop = stop if after the last x simulations no change greater than Pcto has occurred
Pcto = percent change in cost function that must occur for the simulation to continue
ngs = number of complexes in the initial population
Iseed = random number seed
Use_defaults = use internal default values for the values on the second line
= 0, use defaults
= 1, use values on second line
No. params = number of parameters to vary

And in line 2

npg = number of points in each complex (default is 2*(No. params)+1)
nps = number of points in a sub-complex (default is No. params +1)
nspl = number of evolution steps allowed for each complex before complex shuffling (default is npg)
mings = minimum number of complexes required, if the number of complexes is allowed to reduce as the optimization proceeds (default is ngs)
iniflg = flag on whether to include the initial point in population
= 0, not included (default)
= 1, included
iprint = flag for controlling print-out after each shuffling loop
= 0, print information on the best point of the population (default)
= 1, print information on every point of the population

Followed by one line for each calibration parameter:

Init_val is the initial value of the parameter
Low_val is the lowest possible value of the parameter
High_val is the highest possible value of the parameter.


An example file is shown below.

inf_calib1.prj						
params1.in						
observed.dat						
1000	5	1	4	2986	1	1
3	2	3	3	0	0	
0.5	0.010	2.0				

This calibration file for one parameter, allows up to 1000 simulations, and specifies that the best result must change by 1% or more in five simulations (kstop) to keep going. A random seed (Iseed) of 2986 is used to initialize the random number generator, and there are 4 complexes (ngs) in the initial population. The initial parameter value is 0.5 and the parameter will be allowed to vary between 0.01 and 2.0. Default values (generated from the number of complexes and number of parameters) will be used for the other control values.

Observed Data File

The observed data file describes the peak discharge and discharge volume for each event, as well as a weighting for each peak and volume. Unless you use the QOUT_CFS flag, make sure that peak discharges are in m3s-1 and event volumes are m3. If you use the QOUT_CFS flag in your project file make sure that peak discharges are in ft3s-1 and event volumes are in ft3.

The following format is used:

[# of events]
[Event #1 Peak]  [Weight on Peak 1]  [Event #1 Volume]  [Weight on Vol 1]
[Event #2 Peak]	 [Weight on Peak 2]  [Event #2 Volume]  [Weight on Vol 2]
[Event #3 Peak]	 [Weight on Peak 3]  [Event #3 Volume]  [Weight on Vol 3]
…
[Event #N Peak]	 [Weight on Peak N]  [Event #N Volume]  [Weight on Vol N]


The events must correspond to the events in the precipitation file, and the volume calculations should end once the EVENT_MIN_Q flow rate (user specified in the project file) is reached in the observed data. The event start and end times can be found in the SUMMARY file. The weights can be any postive real number but for the easiest to interpret results all of the weights should add up to 1.0. If all the weights sum to one then the cost function results can be interpreted as the mean absolute error.

For example, the observed file

5
0.0  0.0    0.0 0.0
2.4  0.0   53.3 0.0
16.5 0.25 103.4 0.25
0.0  0.0    0.0 0.0
8.3  0.25  75.2 0.25

contains information for five events, but only two will be used for calibration. For this example the first event that produces runoff is given a weight of zero and the other two event peaks and volumes are equally weighted.

Because of the uncertainty in defining initial moistures the typical approach is to not use the first large event for calibration but to instead use that event and the subsequent period after the event to initialize the soil moistures for the next event. Senarath et al. (1995) demonstrated that the importance of the value of the initial moisture estimation diminishes significantly after the first rainfall event.

Altough the weights are equally distributed between peaks and volumes for this example, the weight on the peaks and volumes can be different. Senarath et al. (2000) suggests weighting peaks and volumes equally or near equally, with no more than 70% of the emphasis on either the peak or the volume.

The weighting can also vary from storm to storm. Since very small events are typically subject to large errors in measurement, are difficult to reproduce, and are not usually important in terms of flooding, total discharge, or sediment and contaminant transport, they are typically weighted very little, if at all (Ogden et al., 2001). Emphasis is usually placed on larger events, or events of the magnitude relevant to the problem at hand.

For the example data depicted in the figure below, the weighting on the first two small events should be zero or minimal, with the majority of the weight placed on the following large events. A good rule of thumb is to weight events in proportion to the magnitude of the discharge from the event. To allow the model to properly initialize the the soil moistures, the weight on the first large event occuring around July 22 should also be zero.

Obs flows.jpg

If internal measurements are available in the watershed they can also be incorporated into the calibration. Multiple measurement points result in more robust calibrations. Multiple gages can only be used when simulating channel flow. For calibrating to multiple gages include the observed discharge locations in the input hydrograph location file, and then include those data in the observed data file on individual lines after each set of outlet data for each event. For example, if there are four internal gages the observed data file should have the following format. For four gages and three events, the number of records would be 15. The internal locations are specified in the IN_HYD_LOCATION file. The values in this file must correspond to the number and order of gages specified in the IN_HYD_LOCATION file. You must include a line for every link/node pair in the IN_HYD_LOCATION file, even if the weight is zero.


[# of records]
[Event #1 (outlet) Peak]  [Peak Weight]	[Event #1 (outlet) Volume]  [Volume Weight]
[Event #1 (ihl#1) Peak]   [Peak Weight]	[Event #1 (ihl#1) Volume]   [Volume Weight]
[Event #1 (ihl#2) Peak]   [Peak Weight]	[Event #1 (ihl#2) Volume]   [Volume Weight]
[Event #1 (ihl#3) Peak]   [Peak Weight]	[Event #1 (ihl#3) Volume]   [Volume Weight]
[Event #1 (ihl#4) Peak]   [Peak Weight]	[Event #1 (ihl#4) Volume]   [Volume Weight]
[Event #2 (outlet) Peak]  [Peak Weight]	[Event #2 (outlet) Volume]  [Volume Weight]
[Event #2 (ihl#1) Peak]   [Peak Weight]	[Event #2 (ihl#1) Volume]   [Volume Weight]
[Event #2 (ihl#2) Peak]   [Peak Weight]	[Event #2 (ihl#2) Volume]   [Volume Weight]
[Event #2 (ihl#3) Peak]   [Peak Weight]	[Event #2 (ihl#3) Volume]   [Volume Weight]
[Event #2 (ihl#4) Peak]   [Peak Weight]	[Event #2 (ihl#4) Volume]   [Volume Weight]
…
[Event #N (outlet) Peak]  [Peak Weight]	[Event #N (outlet) Volume]  [Volume Weight]
[Event #N (ihl#1) Peak]   [Peak Weight]	[Event #N (ihl#1) Volume]   [Volume Weight]
[Event #N (ihl#2) Peak]   [Peak Weight]	[Event #N (ihl#2) Volume]   [Volume Weight]
[Event #N (ihl#3) Peak]   [Peak Weight]	[Event #N (ihl#3) Volume]   [Volume Weight]
[Event #N (ihl#4) Peak]   [Peak Weight]	[Event #N (ihl#4) Volume]   [Volume Weight]

Project File

There are no specific project cards required for use with the SCE method. The required control parameters are contained in the SCE control file, as described above. The REPLACE_PARAMS and REPLACE_VALS cards described in Section 18.2 should NOT be used. The METRIC card should be specified in the project file. If the QOUT_CFS card is used make sure your observed values are in English units. In order to speed up the automated calibration process only required output files should be generated. In particular, make sure not to produce any time series maps. Use the QUIET or SUPER_QUIET card to suppress printing output to the screen. If you want to use sediment data to optimize sediment parameters include the OPTIMIZE_SED card in the project file.

Output Files

GSSHA produces multiple additional output files when run in the automated calibration mode. A brief description of the most useful files for users is presented below.

Best.out - contains a list of the final parameter values determined by the calibration.

sce_output.out - contains the detailed output from the SCE model including start up information and warnings, information from each suffling loop, as well as the final results.

log_file.txt - the log file list the cost, the minimum cost for the simulation, and the associated parameter values for each simulation. The log_file.txt file can be used to track the progress of the calibration. In case of power failure or a system crash during the calibration, the best value from the log_file.txt file can be used as the initial value in the SCE control file for a repeated simulation.

Forward Run

To see the results of the calibration you need to use the parameter set determined by the calibration to simulate the calibration period. This is called the forward run. The simplist way to produce the forward run is to make a simulation with your calibration input files using the replacement method (Section 18.2). As described above, the best parameter set can be found in the "best.out" file. Use these parameter values to make the REPLACE_VALS file. The REPLACE_PARAMS file already exist from your calibration. With these minor changes to your project file complete, run the project file in normal GSSHA mode, not in calibration mode, to see the simulation results with the optimized parameter set.

Calibration of Sediment Parameters

You can calibrate to sediment output in exactly the same manner as with flow. Everything is the same except in the project file you must include the card OPTIMIZE_SED and in the observed data file you must substitute the sediment discharge, peak discharge m3s-1 and event volume m3. The observed data should be the wash load (fines smaller than the specified size of sand, ie silt and clay). To optimize the sediments you must be working in metric units. In general, the only sediment parameter that is calibrated is the overland flow erodibility factor, SED_K. In certain circumstances including the rainfall detachment coefficient SPLASH_K, may improve the calibration results. Suggested starting parameters values for sediment simulations are given in Chapter 10 of the manual.

Summary of Automated Calibration Mode

1.  Calibrations are based on comparisons to event peak discharge and discharge volume.
2.  The importance of each event peak and volume is determined by the weight assigned
    to that measurement.
3.  Calibrations are best performed in the continuous mode for a period of record that
    includes multiple events of various sizes, including events similar to the ones of
    interest.
4.  The first large event should be used for system initialization, and assigned a
    weight of zero.
5.  Weights for subsequent observations should be proportional to the value of the
    observation.
6.  To simplify interpretation of results, all weights should sum to 1.0.
7.  Internal gages can be used if channel routing is being performed.
8.  Use of internal gages increases robustness of the calibration.
9.  Automated calibrations require hundreds to thousands of simulations.
10.  To reduce simulation time minimize the number of parameters being calibrated.
11.  Perform a sensitivity study to determine the important parameters for calibration.
12.  Suppress printing output to the screen and to files.
13.  If the final parameter values are at or near the maximum or mimimum values
     consider relaxing the the values and repeating the calibration.
14.  If the final parameter values do not produce the desired results modify the
     weights on the observations.
15.  Always use the best values from the previous calibration as the initial values for
     subsequent calibration attempts.
16.  To see model results with the calibrated parameters , make a forward run using the 
     values in the "best.out" file while running your project with value replacement.
17.  GSSHA can be calibrated to sediment data by including the '''OPTIMIZE_SED''' card in 
     the project file and using the observed sediment data in the observed data file.
18.  Sediment calibrations are based on the wash load, fines smaller than the user specified
     sand size.

18.4 Monte Carlo Runs

Monte Carlo Runs

The monte carlo mode is most useful for calibrating on a supercomputer where you can dedicate hundreds or thousands of processors to running various versions of a GSSHATM simulation. The monte carlo run mode is set up similar to the SCE calibration mode.

To run GSSHATM in monte carlo mode, use the command line format

gssha -m[# runs] [mc.in]

where [mc.in] is the name of the monte carlo input file and [# runs] is the number of monte carlo runs to make. For running in MPI mode, use the command line

gssha -m[# runs] [# proc per run] [mc.in]

where [# proc per run] is the number of logical processors to dedicate to an individual run. Currently this number should be set to 1. For both command lines, there is no space between the -m and the number of runs.

For example, to run 1000 runs in either serial or OpenMP mode, use the command line:

gssha -m1000 mc.in

The monte carlo input file has the following format:

[Projname.prj]
[params.in]
[observed.dat]
[# of parameters]
[lower bound 1] [upper bound 1]
[lower bound 2] [upper bound 2]
...
[lower bound N] [upper bound N]


Where

Projname.prj is the name of the GSSHA project file
params.in is the name of the paramters file
observed.dat is the name of the observed data file.

Just like for the SCE calibration file, the order of the parameter bounds needs to be the same in the parameters file as well as the monte carlo input file. The observed data file has the same format for the SCE calibration mode as for the monte carlo mode.

Single-run mode

The single-run monte carlo mode is used for manual calibration when you wish GSSHA to compute the cost function and other objective functions. The monte carlo input file is as follows:

[Projname.prj]
[params.in]
[observed.dat]
[# of parameters]
[value for parameter 1]
[value for parameter 2]
...
[value for parameter N]

You must run GSSHA in the command line like this:

gssha -m1 mc.in


Comments

You can also include comments after the values lines in the monte carlo input file. Comments should begin with a '#'.

[Projname.prj]
[params.in]
[observed.dat]
[# of parameters]
[lower bound 1] [upper bound 1] #parameter comment
[lower bound 2] [upper bound 2] #parameter comment
...
[lower bound N] [upper bound N] #parameter comment
[params.in]
[observed.dat]
[# of parameters]
[value for parameter 1] #parameter comment
[value for parameter 2] #parameter comment
...
[value for parameter N] #parameter comment


18.5 ERDC Automated Model Calibration Software

ERDC Automated Model Calibration Software

Research at the U.S. Army Engineer Research and Development Center (ERDC) has focused on the development of methodologies, or improvement of the efficiency of native algorithms, for the computer-based calibration of hydrologic and environmental models (wherein by efficiency we mean the number of forward model calls necessary for the calibration algorithm to converge on a solution). Our software is written to accommodate a popular model independent and input control file protocol. Two ERDC Technical Reports published in early 2012 demonstrate, by way of example(s), how to use the ERDC implementations of (1) the Levenberg-Marquardt (LM) local search method , and also the Secant LM (SLM) method, an efficiency enhancement to the LM method, and (2) the stochastic global optimization method MLSL, which uses our LM/SLM method for local searches, to calibrate, in a model independent manner, a Gridded Surface Subsurface Hydrologic Analysis (GSSHA) hydrologic model. The two noted technical reports, their related appendix material, and all of the files associated with the examples discussed in each respective report are provided below.

Following the initial efforts documented in the two noted technical reports, the LM/SLM and MLSL methods, as well as the stochastic global optimization methods multistart (MS) and trajectory repulsion (TR), which also use the ERDC LM/SLM method implementations for local searches, were directly interfaced with the GSSHA model such that they can be treated as alternate GSSHA run modes. Hence, there are four alternate GSSHA run modes that employ ERDC model calibration software, and their practical use is discussed in the section below titled “Four Alternate GSSHA Run Modes”.

Although the ERDC automated model calibration software was written, as previously mentioned, to accommodate a popular model independent protocol, to be consistent with the other alternate GSSHA run modes (such as, Batch, SCE, Monte Carlo), the GSSHA value replacement functionality was utilized instead when the ERDC automated model calibration software was directly interfaced with the GSSHA model to develop the four alternate GSSHA run modes. For clarity, it should be emphasized that one has the flexibility to calibrate a GSSHA hydrologic model either in a model independent manner, as specified in the two noted technical reports, or by using one of the four alternate GSSHA run modes.

We recommend that one calibrate a GSSHA hydrologic model using one of the four alternate GSSHA run modes given that the other alternate GSSHA run modes (such as, Batch, SCE, Monte Carlo) also use the GSSHA value replacement functionality. The two previously mentioned technical reports which describe in a clear and practical way how to calibrate, in a model independent manner, a GSSHA hydrologic model are provided, not only for completeness, but also because the documentation is a primary basis to prepare to use any one of the four alternate GSSHA run modes that employ ERDC automated model calibration software.

The section below titled “Four Alternate GSSHA Run Modes” describes the/an initial approach that one can take to utilize any one of the four alternate GSSHA run modes, and content within the individual sections (i.e., 18.6.1 – 18.6.4) contains any additional run mode specific information that must be supplied to support the use of that given alternate run mode to calibrate a GSSHA hydrologic model. Please note that later this calendar year (2012), a published ERDC Technical Note, similar to the two previously mentioned and provided ERDC TRs that document in a clear and practical way how to use the independent ERDC LM/SLM and MLSL implementations to calibrate a GSSHA hydrologic model in a model independent manner, will be provided at this location and its contents will briefly describe and also document, in a clear and practical way, how to use the four alternate GSSHA run modes for practice driven application.

Available project resources more often than not limit the time that one can devote to model calibration, and if so, then we recommend that one use the SLM method, possibly also with prior information (please see example 11 in the technical report ERDC-CHL-TR-12-3, below, for an example problem that could serve as a go by). However, if resources do permit a more thorough exploration of model parameter space, then, of the three available stochastic global optimization methods; viz., MS, TR, and MLSL, we recommend that one use MLSL.

For further assistance with using the independent ERDC LM, SLM, and MLSL implementations to calibrate a GSSHA hydrologic model in a model independent manner, or with using any one of the four alternate GSSHA run modes, please contact Brian Skahill at Brian.E.Skahill@usace.army.mil or 503-808-3973, or the principal GSSHA hydrologic model developer Charles W. Downer at charles.w.downer@usace.army.mil.

Model Independent Calibration

A Practical Guide to Calibration of a GSSHA Hydrologic Model Using ERDC Automated Model Calibration Software - Efficient Local Search

ERDC-CHL-TR-12-3 A Practical Guide to Calibration of a GSSHA Hydrologic Model Using ERDC Automated Model Calibration Software - Efficient Local Search

ERDC-CHL-TR-12-3 Appendix Material

Example problems for ERDC-CHL-TR-12-3

A Practical Guide to Calibration of a GSSHA Hydrologic Model Using ERDC Automated Model Calibration Software - Effective and Efficient Stochastic Global Optimization

ERDC-CHL-TR-12-2 A Practical Guide to Calibration of a GSSHA Hydrologic Model Using ERDC Automated Model Calibration Software - Effective and Efficient Stochastic Global Optimization

ERDC-CHL-TR-12-2 Appendix Material

Example problems for ERDC-CHL-TR-12-2

Four Alternate GSSHA Run Modes

The four alternate GSSHA run modes are (1) Efficient Local Search, (2) Multistart, (3) Trajectory Repulsion, and (4) Effective and Efficient Stochastic Global Optimization, and as previously mentioned, their practical use is discussed here and also in sections 18.6.1 - 18.6.4, respectively. The "Efficient Local Search" and "Effective and Efficient Stochastic Global Optimization" GSSHA run modes refer to the SLM and MLSL methods, respectively. As was previously mentioned, to be consistent with the other alternate GSSHA run modes (such as, Batch, SCE, Monte Carlo), the value replacement functionality is utilized for each of the four alternate GSSHA run modes that employ ERDC automated model calibration software (please see section 18.2 Simulation Setup for Alternate Run Modes for guidance regarding the GSSHA value replacement functionality). It is emphasized that it is assumed that the list of parameter names and their associated values that are provided within the GSSHA value replacement files designated by REPLACE_PARAMS and REPLACE_VALS, that must be specified in the GSSHA project file, are in the same order as they are also specified in the control file that will likely be prepared beforehand, as will be presented below.

Before describing the general path forward to using any one of the four alternate GSSHA run modes that employ the ERDC automated model calibration software, please also note that in the control file that must be prepared to guide the automated model calibration process using any one of the four GSSHA run modes, one still has the flexibility to designate parameters as log transformed, none, fixed, or tied (please see the ERDC-CHL-TR-12-3 technical report for further discussion, in particular, examples 8 and 9 in the report).

To employ any one of the four alternate GSSHA run modes to calibrate a given GSSHA hydrologic model deployment, we strongly recommend that one first follow the steps listed in example 1 in the technical report ERDC-CHL-TR-12-3, provided above, which will result in preparation of all of the files that are necessary to calibrate the GSSHA hydrologic model in a model independent manner using ERDC software.

Of course, when the time comes during this process to specify adjustable GSSHA model parameters (viz., step 2 in example 1 in ERDC-CHL-TR-12-3), it is presumed that one has some sense as to what GSSHA model parameters one wants to designate as adjustable for the specific GSSHA application under consideration. Please do not blindly mimic what is provided in example 1 in ERDC-CHL-TR-12-3. The example depicts the general sequence of steps required to employ automated model calibration software. It does not serve as general guidance for which GSSHA model parameters to specify as adjustable when calibrating a GSSHA hydrological model. In general, what hydrologic model parameters to specify as adjustable is a function of the complexity that one has expressed in the model, the predictions of interest for the deployed model, and their sensitivities to that specified model complexity.

Upon completing the necessary steps to calibrate the GSSHA hydrologic model in a model independent manner using ERDC software, as outlined in example 1 in the technical report ERDC-CHL-TR-12-3, provided above, there are a few modifications to the prepared files, in addition to the preparation of an additional input file, that are required to subsequently enable use of any one the four alternate GSSHA run modes. Directly below, we briefly discuss the relatively minor changes that are required regardless of which alternate GSSHA run mode is utilized. Each individual section (viz., 18.6.1 – 18.6.4) includes any additional GSSHA run mode specific information that must be prepared, the syntax for using that specific run mode, and also an example problem that one can follow. Upon completing the necessary steps to calibrate the GSSHA hydrologic model in a model independent manner using ERDC software, the relatively minor changes that are required regardless of which alternate GSSHA run mode is utilized include the following:

  1. Include on a new line at the end of the control data section of the prepared control file the name of the GSSHA project file (for further clarity, if needed, then please inspect the control data section of the control file in the example provided in any one of the individual sections (viz., 18.6.1 – 18.6.4) and compare it with the contents of appendix 10 that list the contents of the control file upon completion of step 7 in example 1 presented in the technical report ERDC-CHL-TR-12-3).
  2. The model command line section of the prepared control file will include one line with the entry “model.bat” following the steps listed in example 1 in the technical report ERDC-CHL-TR-12-3, provided above. Replace the one line model command line section entry “model.bat” with two lines, the first line containing “preGSSHA.bat”, and the second line containing “postGSSHA.bat”. It should be noted that the existence and contents of a file named “preGSSHA.bat” is currently of no consequence. It was used temporarily during source code development, but later discarded with for the current implementation(s) for each of the four alternate GSSHA run modes. What is of importance; however, is the file named “postGSSHA.bat”, and its contents are a subset of those already contained in the previously prepared batch file named “model.bat”. In particular, the contents for the file named “postGSSHA.bat” effectively include, as its name suggests, the contents from the previously prepared batch file named “model.bat” which follow the actual forward model (i.e., GSSHA call) execution call.
  3. The next step is to prepare the two files that were referred to above; viz., the GSSHA value replacement files designated by REPLACE_PARAMS and REPLACE_VALS that must be specified in the GSSHA project file to support use of the GSSHA alternate run modes. The preparation of these two files is relatively straightforward upon completing the steps necessary to calibrate the GSSHA hydrologic model in a model independent manner using ERDC model calibration software. To prepare the REPLACE_PARAMS file designated in the GSSHA project file, simply copy the parameter names as they are listed in the parameter data section of the control file into a new file, after specifying on the first line of this new file the number of parameters, which is already known, and its value is specified at the first entry, an integer value, on the fourth line of the control file (or second line of the control data section of the control file). Afterwards, place a ‘[‘ character before the name of each parameter and a ‘]’ character at the end of the name of each parameter, with no space between the noted characters and the begin and end of each parameter name. At this point, each line of the REPLACE_PARAMS file following the first line, which indicates the number of parameters, is of the form [parameter_name_i], for i = 1, …, n, where n is the number of parameters. Modify each one of the lines following the first to be of the form [parameter_name_i] “%lf”, to designate that each of the parameters will be treated as a double during model calibration. The REPLACE_VALS file is quickly and easily prepared – simply copy the initial values specified for each parameter as listed in the fourth column of the parameter data section of the control file into a new file and place the parameter values on a single line with one or more spaces between each specified parameter value. Assistance with preparing these two files can also be found, as previously mentioned, in section 18.2 Simulation Setup for Alternate Run Modes. Moreover, the respective REPLACE_PARAMS and REPLACE_VALS files are also provided in any one of the example problems provided in each section (i.e., 18.6.1 – 18.6.4).
  4. If the steps in example 1 in the technical report ERDC-CHL-TR-12-3 were followed as recommended, then there now exists one or more model input template files that were prepared to support model independent model calibration using ERDC software. Make a copy of each of the current one or more model input template files, and ensure that the name of each file copy is the appropriate model input file specified in the GSSHA project file. For example, if the model input template file for the GSSHA MAPPING_TABLE file is currently named input.cmt.tpl, and it is the basis for generating the actual GSSHA MAPPING_TABLE file specified in the GSSHA project file, named gssha_input.cmt, then rename the file copy of the template file named input.cmt.tpl to gssha_input.cmt. For each of the appropriately renamed template file copies, subsequently (1) delete the first line that starts with “ptf”, and (2) replace the existing template file specified adjustable model parameter designator; for example, ‘$’ upon inspection of the template files listed in appendices 1 and 2 associated with the technical report ERDC-CHL-TR-12-3, with that which is used for the GSSHA value replacement functionality; viz. ‘[’ and ‘]’ placed at the very beginning and end of the specified adjustable model parameter name (i.e., ensure that there are no spaces between ‘[’ and ‘]’ and the beginning and ending of the name of the specified adjustable model parameter).

As previously indicated, each individual section (18.6.1 – 18.6.4) includes (1) any additional run mode specific information that must be prepared to support use of that specific alternate GSSHA run mode, (2) the syntax for the given alternate GSSHA run mode, and (3) also a test problem which one can use as a go by. Please further note, as previously mentioned, that later this calendar year (2012), a published ERDC Technical Note, similar to the two previously mentioned and provided ERDC TRs that document in a clear and practical way how to use the independent ERDC LM/SLM and MLSL implementations to calibrate a GSSHA hydrologic model in a model independent manner, will be provided at this location and its contents will briefly describe and also document, in a clear and practical way, how to use the four alternate GSSHA run modes for practice driven application.

GSSHA Executables with the Four Alternate GSSHA Run Modes

GSSHA 5.7a Windows 32-bit, Debug Mode. With the Four New ERDC Alternate Run Modes. Date: April 10, 2012

GSSHA 5.7a Windows 32-bit, Release Mode. With the Four New ERDC Alternate Run Modes. Date: April 10, 2012


18.5.1 Efficient Local Search

Efficient Local Search

As previously mentioned, this GSSHA alternate run mode employs the independent ERDC implementations of the LM/SLM local search methods. There is no additional information that must be prepared for this specific GSSHA alternate run mode. The syntax required to use this alternate GSSHA run mode for GSSHA model calibration is:

gssha –slm case.pst

where case.pst is the name of the modified control file. The active user is referred to the technical report ERDC-CHL-TR-12-3, and its related appendix material and example problem files, for several examples which demonstrate in a clear and practical manner how to use various functionalities associated with the independent ERDC implementations of the LM/SLM local search methods, and also brief descriptions of the various output files that are associated with a LM/SLM supervised GSSHA model calibration run.

Example problem files prepared for a SLM supervised GSSHA alternate run mode model calibration run, which are supplied for use as a go by

Biased efficient local search

Spatially explicit physics-based models such as GSSHA support a more realistic characterization of the physical aspects of the watershed system and a more transparent simulation and evaluation of project alternatives than is possible with traditional hydrologic simulation models (viz., lumped and semi-distributed model structures). And they have the potential to predict with greater reliability than lumped hydrologic model structures . But, they also have the potential to easily become highly parameterized, particularly when they are deployed to simulate on a continuous basis heterogeneous watershed systems. Moreover, their model run times are often far greater than lumped and semi-distributed hydrologic models. It is this combination of computationally intensive forward model run times and the potential for a highly dimensional specified adjustable model parameter space which present a unique challenge for the computer-based calibration of spatially explicit physics-based hydrologic models. In particular, their combination necessitates the use of a calibration method that is as efficient as possible. Moreover, highly parameterized model deployments can make calibration problematical in that the information content encapsulated in the available observation dataset may not support the unique estimation for each of the specified adjustable model parameters, resulting in poor fits between the observations and their model simulated counterparts and/or non-physical models (i.e., estimated parameter sets).

This draft document describes how to use two separate but closely related methods of computer-based parameter estimation either of which can serve as an effective and efficient means to support the practical calibration of a GSSHA hydrologic model. The two methods are adaptations to the “efficient local search” alternate GSSHA run mode GSSHA model calibration methodology. The example problems files provided below relate to this draft document.

Efficient local search with prior information

Example 1 problem files prepared for a SLM with prior information supervised GSSHA alternate run mode model calibration run, which are supplied for use as a go by

Example 2 problem files prepared for a SLM with prior information supervised GSSHA alternate run mode model calibration run, which are supplied for use as a go by

Efficient local search version of the Tikhonov solution

Example 1 problem files prepared for a SLM version of the Tikhonov solution supervised GSSHA alternate run mode model calibration run, which are supplied for use as a go by

Example 2 problem files prepared for a SLM version of the Tikhonov solution supervised GSSHA alternate run mode model calibration run, which are supplied for use as a go by


18.5.2 Effective and Efficient Stochastic Global Optimization

Effective and Efficient Stochastic Global Optimization - Multilevel Single Linkage

As previously mentioned, this GSSHA alternate run mode employs the independent ERDC implementation of the stochastic global optimization method Multilevel Single Linkage (MLSL), which employs the independent ERDC implementations of the LM/SLM local search methods for local searches. Please see the technical report ERDC-CHL-TR-12-2, and its related appendix material and example problem files, for a discussion of the MLSL method and also four examples that explain in a clear and practical way how to calibrate a (GSSHA) model in a model independent manner using the MLSL method. Prior to execution of this alternate GSSHA run mode, which employs the MLSL method, in addition to the activities that must be performed that uniformly apply to any of the four alternate GSSHA run modes (previously discussed in section 18.6), one must also prepare a file named mlsl.in (for “multilevel single linkage input”). The file named mlsl.in contains eleven entries, each specified on its own individual line. The eleven entries are (1) the MLSL parameter N, (2) the MLSL parameter γ, (3) the MLSL parameter σ, (4) the input parameter d2, which is a floating point value specifying a distance threshold that is used for comparison during execution of our implementation of MLSL; in particular, if during a given local search with MLSL, the distance computed between the current location in adjustable model parameter space and any of the previously computed parameter upgrade vectors, obtained either during the existing or with any of the previously performed local searches, is less than this specified threshold value, then the current local search is prematurely terminated with the assumption that it has progressed into a region of attraction of a previously visited local minimum - the impact of this input entry has not been explored with much detail, and it is often set to zero, (5) a character that is either ‘Y’ or ‘N’ to indicate whether the program will write (‘Y’) or read (‘N’) the file named “RandomNumberSeeds.prn” (more often than not this entry will by ‘Y’), (6) the initial seed that is an integer between 0 and 4,294,967,295, (7) the maximum number of MLSL iterations to perform, (8) the maximum number of local searches to perform, (9) the maximum number of local searches to perform with no improvement, (10) the maximum number of MLSL iterations to perform with no improvement, and (11) the objective function improvement fraction judged to be negligible. Please refer to the technical report ERDC-CHL-TR-12-2, and its related appendix material and example problem files, for a discussion of the previously mentioned MLSL input parameters, and via the examples, implicit guidance regarding potential values to specify for your given application. The required syntax to use this alternate GSSHA run mode for model calibration is:

gssha –mlsl case.pst

where case.pst is the modified control file. Please see the technical report ERDC-CHL-TR-12-2, and its related appendix material and example problem files, for a discussion of the MLSL method outputs, the primary model output being named “slm_chl_mlsl.rec”.

Example problem files (both input and output) associated with a Multilevel Single Linkage (MLSL) supervised GSSHA alternate run mode model calibration run, which are supplied for use as a go by