Technologies

**Artificial Neural Networks**

ANN technology, a form of artificial intelligence, is a powerful and often superior alternative to the physical-based and statistical modeling approaches. An ANN, through proper development and training, “learns” the system behavior of interest by processing representative data patterns through its architecture. What sets an ANN apart from a physical-based model is that because it does not rely upon the governing physical laws, information regarding physical parameters is often not required for its development and operation. In addition, unlike physical-based and statistical models, ANNs are not constrained by simplifying mathematical assumptions (e.g., linear system, normal distribution, etc.) or physical assumptions (e.g., laminar flow).

Figure 1 depicts a conceptual schematic of how an ANN uses real-world data to predict future system states. In this case, temperature, precipitation, and the initial water level of a pond are used to predict the future water level in the pond (and indirectly, the state of the ducks!).

**Figure 1. Conceptual Schematic of ANN**

Because of its empirical nature, ANN technology is sometimes erroneously referred to as an “advanced” type of regression analysis. What distinguishes ANN technology from regression is the famous Kolmogorov’s Theorem (Hecht-Nielsen, 1987, Sprecher, 1965). Specifically, this theorem asserts that any continuous function, from Rm to Rn, can be represented *exactly* by a three layer feedforward neural network with n elements in the input layer, 2n+1 elements in the hidden layer, and m elements in the output layer, where n and m are arbitrary** **positive integers. By contrast, regression is guaranteed to provide only an approximation by computing the best fit from a given function family. In addition, unlike regression, which treats all output variables independent of each other, the presence of common arcs in the ANN architecture allows it to identify important inter-relationships that may exist between output variables.

**Figure 2. Architecture for a simple multi-perceptron ANN**

Figure 2 depicts a sample three-layer feedforward ANN architecture. Each ANN layer consists of individual nodes (elements), and the nodes are interconnected across layers by special non-linear (usually non-rational) transfer functions, expressed in terms of the nodal input variables and connection weights. During training, data patterns are processed through the ANN, and the connection weights are adaptively adjusted until a minimum acceptable error between the ANN predicted output and the actual output is achieved. It is at this point that the ANN has “learned” to predict the system behavior of interest (i.e. values of output variables) in response to the values of the input variables.

For many projects, the commonly employed non-linear hyperbolic tangent transfer function,

was used, where *Sumj* represents the weighted sum for a node in the hidden layer, and e denotes the basis of the natural logarithm. In *Sumj*, the input value received by each node in the hidden layer is multiplied by an associated connection weight, whose value is identified during learning. This weighted sum can be formally represented as:

where wji represents the connection weight between the ith node in the input layer and the jth node in the hidden layer. The input xi is known, and represents the values of the input variables for node i in the input layer. A bias unit, which helps to provide numerical stability, is merely added as the connection weight *wjb* because it has a constant input value of 1.0.

There are various kinds of ANN learning algorithms, and the interested reader is referred to the work of Poulton (2001) for more details. Often for robustness and efficiency reasons, a combination of back propagation and conjugate gradient algorithms are used. The prediction accuracy of an ANN is measured by the mean squared difference between the actual and predicted output values. For a preselected ANN model and corresponding data set, this mean squared error depends only on the values of the connection weights. During learning, the ANN processes training patterns consisting of input-output patterns through the network, systematically adjusting the connection weights, so that the measure of the overall goodness of the ANN model defined as the root mean squared error (RMSE) between the ANN-estimated output values and the actual values, is minimized. The minimization learning algorithm is always iterative, and each step is considered “learning”.

The RMSE is mathematically defined as:

where wji is the ANN-estimated/predicted state value for the *k*th training event, *Ck* is its corresponding measured state value, and *N* denotes the total number of such events.

In selecting the most appropriate ANN model, a variety of factors must be considered. This includes the functional form of the ANN transfer functions, the number of hidden layers and nodes, the most appropriate set of input variables, and the method used to minimize the objective function. This process is typically conducted in an iterative manner within the context of professional judgment and modeling experience. For example, selection of an appropriate set of input variables during initial ANN development requires a basic understanding of the governing system dynamics. However, a sensitivity analysis in conjunction with trial and error can help the modeler converge to the most appropriate feasible set of predictor variables. The sensitivity analysis, which quantifies the relative** **importance of each input variable for accurately predicting each output variable, can be used in lieu of common statistical methods.

During ANN development, learning often proceeds in a series of training and verification steps. The ANN is presented with training data during which patterns are processed through the network, and the learning algorithm adaptively adjusts the network connection weights to minimize the RMSE between actual and estimated output values. Intermittently, the training phase is interrupted, and a separate (verification) data set is processed through the ANN to verify progressive learning, as indicated by a declining RMSE value obtained with the verification data set. Verification guards against overtraining, where the ANN has memorized or over-fitted the connection weights to the training patterns. Training proceeds until the verification RMSE either stabilizes or begins to increase. At this point, ANN training is terminated, and the ANN can now be validated with a third data set not previously used for training or verification. Validation is used to determine whether the ANN has learned system behavior of interest over the range of expected conditions. To provide robust training, verification, and validation, statistically similar data sets spanning the expected range of system behavior were used for ANN development and validation. One half of the available data are typically used for training, one quarter for verification, and the remaining one quarter for validation or testing.

Being “data-driven” models, robust ANN development is absolutely dependent upon the quantity and quality of the data used to train the models. As discussed by Coppola and others (2003), “appropriate training set size for an ANN depends upon a number of factors, including its dimension (i.e. number of connection weights), the required ANN accuracy, the probability distribution of behavior, the level of noise in the system, and the complexity of the system.” Complexity within the context of ANN modeling refers to a system where small changes in model input values produce large and even contradictory changes in model output values. A system that does not exhibit this type of complexity may then be referred to as a “well-behaved” system.

Sensitivity analysis and selective inclusion and exclusion of input variables can help the modeler converge to an appropriate set of predictor variables. However, ideally, the modeler should have a basic understanding of the system, which eliminates the possibility of excluding an important variable, but also promotes a more expedient convergence to a robust model. This understanding can help the modeler assess the potential strengths and weaknesses of the model, and identify situations under which its predictive capability may be suspect, or where inclusion of a certain variable critical. In addition, the modeler can help design a sampling program to include important or potentially important variables that to date have been sample either infrequently or not at all. This type of analysis is an accepted modeling methodology to reduce the dimensionality of the problem, and also eliminate spurious input variables (Swingler, 1996).

**When developed properly, ANNs provide a number of significant advantages including:**

- Superior real-time predictive accuracy that exceeds state of the art physical-based models (e.g., numerical models) and statistical-based models.
- As they are “data-driven” models that excel with large data sets, they can utilize continuous data streams in real-time, and as such are ideally suited for SCADA-type data collection systems.
- Easily initialized to real-time conditions, further increasing predictive accuracy and providing solutions that reflect existing system states.
- Condensed and efficient mathematical form ideal for performing a large number of simulations that might otherwise be infeasible with traditional numerical models.
- Condensed and efficient mathematical form ideal for performing formal optimization, eliminating round-off errors and/or perturbation problems associated with physical-based numerical models.
- Superior prediction accuracy produces more accurate optimal solutions.
- Valuable insights between cause and effect relationships, improving understanding of system dynamics.
- Improved data collection strategies through identification of important variables that influence system behavior of interest.
- Easily combined with physical-based equations and/or interpolation methods, resulting in increased domain of predictions and optimization.
- Easily combined with other ANN-derived state-transition equations to simulate and optimize a complex system consisting of various distinct but inter-related components (e.g., wellfield extraction, surface water extraction, water treatment and distribution, etc.).

**Combining ANN Prediction with Formal Optimization **

Over the last 40 years or so, various numerical models and optimization algorithms have been conjunctively applied to water resources management problems, ranging from water supply to contaminant remediation. The numerical model simulates the physical behavior of the water resources system of interest, and the optimization algorithm identifies the optimal management solution(s). For example, for a water supply problem, an optimal solution would allow the manager to extract water from an aquifer to the extent possible while minimizing to the extent possible adverse environmental and/or socioeconomic impacts, such as dewatering of wetlands or land subsidence. Typical management constraints include maintaining groundwater levels and/or hydraulic gradients at desired levels so as to minimize adverse environmental impacts, such as saltwater intrusion/upconing or wetlands dewatering.

Formal optimization requires formulation of management model into an equivalent mathematical model that consists of an objective function, constraints, and decision variables. The decision variables are not only the most basic component of the optimization problem, but are the motivation as well. Decision variables constitute the human controls for which the decision maker is seeking to identify the optimal values, such as optimal pumping rates for individual wells in a public supply wellfield to minimize energy consumption or the optimal chemical dosing to maximize a production process. Mathematically, there may be an infinite number of possible combinations of values for the decision variables but only one true optimal solution.

The objective function mathematically quantifies the objective that the decision maker is seeking to maximize (i.e., a benefit, such as profit) or minimize (i.e., a cost, such as energy consumption). For some problems, when competing and conflicting objectives exist, it may be valuable and even necessary to formulate the objective function in a multiobjective form (see Toms River case study for a more detailed example). For a multiobjective optimization formulation, a formal trade-off curve or Pareto frontier is generated, from which the optimal trade-off or compromise solution is identified.

The constraints represent the bounds of the physical problem, such as maximum possible pumping rates, as well as management/operational constraints that the decision-maker may impose, such as a minimum required water supply volume per time. Mathematically, for a well posed problem, the constraints bound the solution within some feasible space in which the optimal solution exists.

Optimization problems may be linear or nonlinear in form, depending in some cases upon the mathematical form of the equations used to simulate the physical system, and in all cases, the mathematical form of the equations used to define the optimization problem (i.e., objective function and constraints). If the problem is linear (or weakly non-linear and may be approximated by a linear system), the principle of convexity guarantees that the global optimum (i.e., absolute best solution) will be identified. For non-linear optimization problems, because of non-convexity, the solution is usually a local optima, with no guarantee of identifying the global optimum. Still, because of the rigorous non-linear search, a local optima typically provides a very good solution to the formulated optimization problem. Optimization algorithms that exist for solving optimization problems range from the Simplex Method for linear problems to conjugate gradient method and genetic algorithms for non-linear problems.

As indicated above, even when one is solving a non-linear problem, optimization has been demonstrated to have significant inherent advantages over less rigorous trial and error approaches. For many problems, where the decision variables are continuous, there are an infinite number of solutions, most if not all of which are inferior to the optimal solution. Even for problems where the decision variables are discrete, there may be billions or more possible solutions. In complex systems, the modeler’s intuition is limited and the set of good solutions may not be obvious. As the system becomes increasingly complex, with more decision variables, constraints, and even multiple and conflicting objective functions, the decision space becomes increasingly complex. The modeler using a trial and error approach may only succeeded in identifying the least poor solution from a limited number of simulated solutions.

Optimization not only supplants tedious and inefficient trial and error approaches that may fail to identify even a good solution, they are fast and robust, and generally converges to at least a good local optima. In addition, the optimal solution provides important qualitative and quantitative insights into the physical system and optimization formulation. For example, the user is provided with a sensitivity matrix that quantifies how much the optimal objective function value would increase or decrease by unit changes in each constraint (e.g., how much less pumping extraction is realized by increasing the minimum required water level constraint by one foot in neighboring wetlands). This not only provides valuable insights into system behavior, but allows the decision-maker to accurately quantify the potential costs and benefits of each imposed constraint on the management solution.

Many groundwater management problems require some type of hydraulic management of the system, such as maintaining water levels or hydraulic gradients within certain thresholds or limits (i.e., constraints) in order to capture contaminant plumes or protect environmentally sensitivity areas. Gorelick (1983) classified groundwater hydraulic management models into the embedding method and the response matrix approach. In the embedding method, the numerical approximations of the governing groundwater flow equations are embedded into a linear program as part of the constraint set. This approach has been applied to both steady (single management period) and transient (multiple time-period) cases. A major disadvantage of this method is that the size of the constraint matrix can become extremely large, complicating solution of the optimization problem.

Because of its computational efficiency, the response matrix approach first used by Maddock (1972) is the most common technique. With this technique, the unit response to a unit pulse of stress is numerically computed for all possible stresses at all points of interest in the aquifer and assembled into a response matrix. Particular care, however, must be exercised when applying the response matrix approach to non-linear systems (e.g., unconfined aquifers). Unlike the confined aquifer case, where system linearity ensures response coefficient values independent of the perturbation value (e.g. pumping increment), response coefficients for the unconfined aquifer are sensitive to the perturbation value. These same limitations would apply to any non-linear system.

As pointed out by Ahlfeld and Riefler (1999), because of system nonlinearity, perturbation for the unconfined case requires small increments in pumping to achieve reasonable accuracy. However, it must be large enough to produce several significant digits in the response coefficients. If the perturbation increment is too small, round-off error may limit the precision of the response coefficient. Riefler and Ahlfeld (1996) have found that perturbation values that are either too large or too small can result in an erroneous or infeasible solution during optimization, even when an optimal solution exists.

Because of their condensed mathematical structure, ANN models offer a computationally efficient and numerically stable optimization alternative. The embedding method can be used, whereby the ANN-derived state-transition equations are directly embedded into the optimization program, avoiding potential numerical problems associated with the above described response matrix approach. Furthermore, because the number of ANN-derived state-transition equations can be orders of magnitude less than the numerical groundwater flow model equations (for the Toms River case, 32 ANN-derived state-transition equations versus almost 80,000 finite-difference equations for the corresponding numerical flow model), the constraint set is significantly reduced. Because fewer mathematical operations are required for a smaller constraint set, the increased computational efficiency minimizes significant round-off and precision errors that often arise when performing a large number of mathematical operations for larger constraint sets, which may result in an erroneous optimization solution (Szidarovszky and Yakowitz, 1978).

For many optimization problems, the greatest advantage offered by the ANN-based optimization approach is the superior accuracy of the ANN-generated predictions/simulations. The accuracy of the any optimization solution obviously depends upon the accuracy of the model used to simulate the system of interest. If the prediction accuracy of the simulation model is relatively poor, by extension, the so-called optimization solution will be sub-optimal; simply put, “garbage in, garbage out.” Because the ANN models developed with real-world data can achieve superior accuracy over physical-based models, the optimal solution computed with the ANN model will be closer to the true optimum. In addition, using real-time data streams, the ANN models can automatically be initialized to real-time conditions, not only improving the predictive capability of the models, but improving the accuracy of the computed optimal solutions to reflect existing conditions. NOAH’s methodology also includes combination of physics-based and/or interpolation equations as necessary to expand the range and domain of prediction and optimization capability. In short, NOAH’s patented ANN-optimization system constitutes the most advanced real-time prediction and optimization management program available to the water and energy management sectors.

The large savings that can be realized through formal optimization is underscored by a study performed by the United States Environmental Protection Agency (EPA), during which potential cost savings at Superfund sites achieved by formal optimization of pumping rates for contaminant remediation were assessed. The EPA concluded that a “20 percent in reduction of the objective function (i.e., cost) is typical” and that “improved pumping strategies at some sites could yield millions of dollars in life-cycle cost savings at some sites” (EPA 542-F-04-002, February, 2004). Savings through optimization can be transferred into many different areas, from water quality treatment costs to resource management and protection. When combined with ANN technology, the relative savings achieved with formal optimization can be even more significant.