Frequently Asked Questions

ANN technology, a human “learning” paradigm form of artificial intelligence, is a compelling and often superior alternative to physical-based and statistical modeling approaches. An ANN, through proper development and training, “learns” the system behavior of interest by processing representative data patterns through its architecture. One of the powerful features of ANN technology is that, as more data becomes available, ANN models can easily be updated and improved with additional training to more fully capture subtle tendencies. They have been proven to outperform other advanced modeling techniques in a variety of applications and are used extensively in many sectors, including the military, NASA, and Wall Street.

Unlike a physical-based (e.g., numerical) model, ANNs do not rely upon governing physical laws (e.g., Conservation of Momentum) and consequently, difficult to estimate parameters (e.g., hydraulic conductivity, streambed thickness, etc.) are typically not required for their development and operation. Instead, more easily measurable and less uncertain variables like water levels and air temperature can be used as inputs or predictor variables. Additionally, unlike physical-based and statistical models, ANNs are not constrained by simplifying mathematical assumptions (e.g., linear system, normal distribution, etc.) or physical assumptions (e.g., laminar flow). Because of their powerful non-linear modeling capability (see below), ANNs can accurately model highly non-linear and complex phenomena. In addition, unlike numerical models, ANNs can easily be initialized to real-time conditions, improving prediction accuracy.

Advanced physical-based models are based upon equations that explicitly represent the physics of interest. Even assuming that the model, inherently a simplification of reality, accurately represents the physics of the system of interest, the model will include physical parameters whose values are often highly uncertain. In reality, physical parameters often significantly vary spatially (e.g., heterogeneity), directionally (i.e., anisotropy), and temporally. A classic example is hydraulic conductivity, a fundamental material property for sediments and rocks, which helps control the rate of water movement in the subsurface. In the real world, this parameter exhibits extreme spatial variability, by many orders of magnitude. In addition, because hydraulic conductivity also varies significantly as a function of soil moisture (i.e., degree of saturation), it also exhibits high temporal variability (e.g. wet versus dry conditions) in the unsaturated zone. In groundwater flow modeling, due to a general lack of data over space, this parameter is typically assigned a constant value or a limited number of values, corresponding to geologic zones that the modeler has discretized within the simplified model domain. Invariably, we are limited by partial and imprecise information and data in characterizing often highly complex natural systems. In addition, no matter how complex we attempt to make our model, it still remains a gross simplification of reality. Consequently, during model development, a calibration process is typically employed, where select model parameters are iteratively adjusted in an attempt to minimize the prediction errors of the model against historical data (e.g. water levels). At times, automatic calibration tools that employ optimization algorithms are used. However, physical-based models are plagued by non-uniqueness, where any combination of model parameter values yields essentially the same solution. Mathematically, there are an infinite number of non-unique solutions. As a simple example, 2A x B =6 is an equation with two parameters, A and B, with an infinite number of combinations of values for these two parameters that will yield the product 6. Similarly, a set of equations that represents a numerical model with uncertain parameter values will have an infinite number of solutions for obtaining a targeted model calibration output. What further complicates calibration of physics-based models is parameter compensation, where adjustment of different parameter values can yield similar results. A classic example in groundwater models is areal recharge into the model area from precipitation and groundwater flux out of the system. Increasing the rate of recharge will increase water levels (i.e., more water enters the system) in the aquifer. Similarly, decreasing the outward flux of the system (e.g. reducing hydraulic conductivity values near the boundary) will also increase water levels. Furthermore, some combination of increasing recharge and reducing outward groundwater flux will increase water levels. Therefore, there will always be an infinite number of ways to achieve a given solution (e.g. higher water levels) by adjusting the physical parameters (and boundary conditions) within the model. With data-driven ANN models, the “data is the data” and the model “learns” the system behavior in accordance with what has been measured in the real-world. Although the ANN model does not explicitly represent the physics of the system, this is an advantage for avoidingmodeler bias and a-priori assumptions, which are based on partial and often highly imprecise and inaccurate information. And again, ANN models are not constrained by simplifying mathematical (e.g., laminar flow) and physical assumptions (e.g., isotropic) as traditional physics-based models are.

What distinguishes ANN technology from regression is the famous Kolmogorov’s Theorem. The theorem asserts that any continuous function can be represented *exactly *by a three layer feedforward neural network with n elements in the input layer, 2n+1 elements in the hidden layer, and m elements in the output layer, where n and m are arbitrary positive integers. By contrast, regression is guaranteed to provide only an approximation by computing the best fit from a given function family. In addition, unlike regression, which treats all output variables independent of each other, the presence of common arcs in the ANN architecture allows it to identify important interrelationships that may exist between output variables.

Selection of an appropriate set of input (i.e., predictor variables) during initial ANN model development requires a basic conceptual if not theoretical understanding of the governing system dynamics. During model development, the ANN-provided sensitivity analysis in conjunction with trial and error iterations help the modeler converge to the most appropriate feasible set of predictor variables, not only increasing prediction capability, but also improving the understanding of the governing factors that drive and/or influence the system behavior of interest. One of the fundamental advantages of ANN models is they often can use more easily measurable “surrogate” variables (e.g., temperature and precipitation) in lieu of difficult to estimate parameters typically required by physical-based models (e.g., areal recharge). In addition, other methods can be used for identifying the optimal set of input variables, and include genetic algorithms, self-organizing neural networks, and principle component analysis.

Robust ANN development is dependent upon the quantity and quality of the data used to train the models. The appropriate training set size used for ANN learning depends upon a number of factors, including the required ANN accuracy, the probability distribution of behavior, the level of “noise” in the system, the complexity of the system, and the size of the ANN model (i.e. number of nodes). Because many hydrologic and energy systems are relatively “well-behaved”, where small changes in input values do not produce significantly different or even contradictory output values, relatively few data patterns are often necessary. That said, the range of data should ideally span the expected range of system behavior or performance.

Each ANN-derived state-transition equation expresses the output variable(s) explicitly in terms of the known values of the input variables and connection weights formed during learning (i.e. training). Even when there are multiple state-transition equations used to simultaneously predict common output variables (e.g., water levels), because each state-transition equation is independent of the others, only simple arithmetic operations are required to solve for the unknown output variable(s). This computational ease of solution differs greatly from that of numerical models, which must use advanced numerical algorithms to simultaneously solve a set of dependent equations. In addition, the condensed nature of the ANN approach can result in a number of state-transition equations orders of magnitude less than the equations constituting a typical numerical model. By comparison then, computing solutions with the ANN-derived state-transition equations is typically orders of magnitude faster than a corresponding numerical model. Consequently, ANN models can be used to simulate large numbers of possible scenarios that might otherwise be infeasible with a large numerical groundwater model.** **In addition, as discussed further below, this computational efficiency lends significant advantages to mathematical optimization.

ANN models can perform a number of important tasks, including automatic quality assurance/quality control by flagging spurious outliers, improving system understanding through sensitivity analyses that quantify important input-output relationships, improving data collection strategies, serving as “meta-models” for more complex physics-based models, and serving as the basis for more efficient and accurate real-time optimization.

There are many different types of ANNs, each differentiated by their architecture and learning algorithms, which determines the types of problems that they can be applied to. Multi-perceptron neural networks are the most commonly used, perhaps accounting for as much as 80 percent or more of the applications, and are used to estimate values. Radial basis function or RBF neural networks are used for classification problems. For example, instead of explicitly estimating a single output value (e.g., 17 algae counts per milliliter), and RBF network estimates the class or bin (i.e. between 10 and 20 algae counts per milliliter). In this case, the RBF network is explicitly provided with training data that consists of a priori assigned classes or bins as output variables, rather than explicit values. Another popular ANN type are the self-organizing maps, commonly referred to as SOMs, which are used in cluster analysis. They are quite distinct from more traditional ANN models in that instead of being provided with specific output values or classes, they are used to identify clusters of data. Therefore, the user does not specify a priori the types or ranges of output variables, but uses the SOM to identify natural grouping or clusters in the data; for example, differentiating on the basis of input variables three different clusters that could then be classified as poor, moderate, and high water quality.

Mathematical optimization is used to maximize benefits and/or minimize costs under a given set of constraints or operating limits. In mathematics, computer science, and economics, optimization or mathematical programming refers to selecting the best element or solution from some set of available alternatives. Mathematical optimization formulates a complex management problem within a logical and transparent mathematical structure that can be solved using a variety of rigorous optimization algorithms. The optimization formulation consists of an objective function and constraint set, expressed in terms of the decision variables for which the optimal values are unknown. The decision variables are not only the most basic component of the optimization formulation, but are also the motivation. They constitute the human controls for which the decision maker is seeking to identify optimal values, such as the optimal pumping rates for minimizing energy consumption or optimal chemical dosing for achieving maximum water quality. The optimization program not only computes the optimal values for the decision variables, but also generates a sensitivity analyses of how the optimal solution changes with different constraint limits and coefficients. For example, how much more reductions in energy costs can be achieved by a unit increase in the allowable water level decline at a particular location. In effect, the optimal solution provides the optimal values for each of the decision variables that collectively produce the lowest feasible objective function value for a minimization problem (e.g., minimize costs) or the highest objective feasible objective function value for a maximization problem (e.g., maximize profits) without violating any constraints.

The origin of mathematical optimization dates back to Karl Gauss (1777-1855), considered by many the greatest mathematician in world history. Gauss developed the steepest descent method, an algorithm where the local minimum of a mathematical function is identified by successively computing and transecting the function along its gradient. Because many resource allocation and decision-making problems seek to minimize an objective (e.g., cost) or maximize an objective (e.g., profit) subject to various constraints, the basis of Gauss’s algorithm has been used to solve many classes of optimization problems. A number of other prominent mathematicians have contributed to this rich and important field. Leonid Vitaliyevich Kantorovich is a famous Russian mathematician known for his theory and development of techniques for optimal resource allocation, the earliest form of linear programming, for which he was awarded the Nobel Prize in Economics. George Dantzig independently invented and improved linear programming (Simplex Method) while at the University of Berkley. John von Neumann, the greatest mathematician of the 20th century, developed the duality theorem for linear programming. Other noteworthy contributors to the field include Richard Ernest Bellman, who invented dynamic programming, and Albert Tucker and Harold Kuhn, who made seminal contributions to non-linear programming.

Often so-called “optimal solutions” are far from optimal. Rather than being identified by sophisticated optimization algorithms, they are identified by an inferior trial and error approach whereby the decision variables in a model are systematically varied until a solution deemed acceptable is selected. Not only is this process highly inefficient and time consuming, it may only succeed in identifying the least poor of a limited number of solutions simulated by the modeler. As the system becomes more complex, with more decision variables, constraints, and multiple and even conflicting objectives, the system becomes increasingly complex. Mathematically, there often exist an infinite number of possible solutions within the feasible decision space. For complex systems, it is beyond human intuition to efficiently converge to even a good solution. In contrast, mathematical optimization uses sophisticated algorithms to efficiently and rapidly search the feasible decision space to converge to local optima (for non-concave non-linear problems) if not the global optimum (for linear or concave problems). In addition, multiobjective optimization generates the formal trade-off curve for multiple and conflicting objectives from which the optimal compromise solution can be identified using a variety of methods.

The large savings that can be realized through formal optimization is underscored by a study performed by the United States Environmental Protection Agency (EPA), during which the potential cost savings achieved at Superfund sites by formal optimization of pumping rates for contaminant remediation were assessed. The EPA concluded that a “20 percent in reduction of the objective function (i.e., cost) is typical” and that “improved pumping strategies at some sites could yield millions of dollars in life-cycle cost savings at some sites” (EPA 542-F-04-002, February, 2004). Savings through optimization can be transferred into many different areas, from water quality treatment costs to resource management and protection. When combined with ANN technology, the relative savings achieved with formal optimization can be even more significant.

There are often management decision or resource allocation problems with multiple and even conflicting objectives for which the decision maker is interested in identifying the optimal trade-off or compromise solution. A typical example is to maximize water quality while minimizing water treatment costs. Multi-objective optimization utilizes conventional optimization techniques for generating a formal trade-off or Pareto frontier between multiple and conflicting objectives. Following generation of the Pareto frontier, the optimal compromise solution or trade-off point can be identified using a variety of methods in accordance with the preferences and priorities of the decision makers. The Toms River, New Jersey case study provides a real-world water management problem of multi-objective optimization.

For any properly posed optimization problem, there is a bounded feasible decision space in which the optimal solution exists (i.e., optimal values of the decision variables). For linear optimization, if a feasible solution exists, it is mathematically guaranteed that the global optimum exists on one or more of the vertex points that bound the feasible space. The SIMPLEX method is the most commonly used method for linear optimization problems. Using matrix algebra-based techniques, the SIMPLEX method efficiently pivots along the vertex points until the global optimum is identified. For non-linear optimization problems, because of the non-convex structure of the feasible space, it is impossible to guarantee identification of the global optimum; instead, identification of very good local optima is sought. Non-linear optimization methods traditionally use gradient-based algorithms, where the algorithm iteratively searches along the gradient of the feasible space until the desired minimum or maximum value within a given criterion is identified. Also widely used for non-linear optimization problems are genetic algorithms. Genetic algorithms do not perform a gradient search, but randomly generate a number of potential solutions, rank them, and “mutate and combine” the most promising solutions until a given performance criterion is achieved (i.e., minimal improvement in successive mutated optimization solutions). Comparative published studies show that gradient-based algorithms and genetic algorithms perform relatively similar to each other. While genetic algorithms are more based on heuristics, gradient-based optimization algorithms are grounded more in theory. Regardless of the optimization algorithm used (e.g., SIMPLEX, genetic algorithm, etc.) in order to perform optimization, a prediction model such as an ANN simulating the physical system behavior of interest is necessary. Therefore, an ANN-based prediction/simulation model can be combined with any optimization algorithm of choice.

The ANN approach not only offers a computationally efficient and numerically stable optimization alternative, but also provides superior solutions. ANNs help avoid identification of erroneous solutions, which can occur when the commonly used response coefficient methodology is applied to non-linear (e.g., unconfined aquifer) optimization problems. For example, an erroneous solution is often obtained when the commonly used response coefficient methodology is applied to a non-linear problem, Riefler and Ahlfeld (1996) found that perturbation values for an unconfined problem that are either too large or too small can produce an erroneous solution. To avoid these problems, the generally less efficient embedding optimization approach can be used, where the simulation model is embedded into the optimization formulation as constraints. The set of ANN derived state-transition equations, used in lieu of a numerical model, can avoid inefficiencies by reducing the number of equations in the constraint set by orders of magnitude. Fewer mathematical operations are then required during optimization, minimizing round-off and precision errors that result from large numbers of mathematical operations (Szidarovszky and Yakowitz, 1978). Just as important, because the ANN models can achieve accuracy exceeding that of physical-based models, by extension, the computed optimal solution obtained with the ANN model is more accurate. Simply put, the resulting optimization solution will be more accurate because the ANN model can more accurately predict and simulate the system behavior of interest.

ANNs often serve as so-called meta-models (i.e., surrogate model), where they are trained directly from a complex numerical or other physical-based model(s). The ANN meta-model in effect constitutes a much more efficient version of the larger physics-based model, and, because of its more condensed form and simpler mathematical structure, is computationally orders of magnitude faster. Consequently, the meta-model is developed to perform large numbers of simulations/predictions that would otherwise be infeasible with the far less efficient physics-based model. In addition, as discussed above, the meta-model can overcome computational problems typically associated with numerical models for optimization problems. The Toms River, New Jersey case study serves as a meta-model example, where a large numerical groundwater flow model was replaced with a much more condensed and efficient ANN model for conducting multi-objective optimization.

ANN-derived state-transition equations can be combined directly with physical-based and/or interpolation equations to expand the range and domain of predictions. For example, if the objective is to accurately predict a groundwater flow field within a study area, ANN models can developed directly from real-world data to predict groundwater levels at select monitoring well locations of interest. The spatial domain of the ANN-predicted groundwater elevations can be increased by interpolating additional groundwater elevations at locations in the vicinity of the ANN predictions using any number of interpolation methods (e.g., inverse-distance weighted method). The potentiometric surface (i.e., contoured head surface) generated by the ANN-predicted groundwater levels in conjunction with the interpolated groundwater levels can then be combined with a physical-based equation (I,e., Darcy’s Law) to predict the groundwater flow field (i.e., groundwater velocity vectors) for the system. The expanded ANN predictions can similarly be used to increase the range and domain of optimization capability.

NOAH invented a methodology that explicitly combines physics-based numerical models with ANN models. By assigning the ANN-predicted values to the vector of future state values, the numerical model has an overdetermined system of equations which can be solved using a variety of mathematical techniques. In effect, the ANN models provide predictions that are used to constrain the companion numerical model to a more accurate solution. This combination of methods utilizes the full spatial power of the physics-based numerical model with the location and temporal-specific accuracy of the ANN technology to achieve a more accurate numerical model prediction that can be used in support of real-time management.