Genetic Algorithms (GAs) belong to the field of evolutionary computation which is inspired by biological evolution. From an engineering perspective, a GA is an heuristic tool that can approximately solve problems in which the search space is huge in the sense that an exhaustive search is not tractable. The appeal of GAs is that they can be parallelized and can give us "good" solutions to hard problems.
In the GA framework, a species or population is a collection of individuals or chromosomes, usually initially generated randomly. A predefined fitness function guides selection while operators like crossover and mutation are used probabilistically in order to emulate reproduction.
One of the difficulties in working with GAs is choosing the parameters—the population size, the crossover and mutation probabilities, the number of generations, the selection mechanism, the fitness function—appropriate to solve a particular problem. Besides the difficulty of the application problem to be solved, an additional difficulty arises because the quality of the solution found, or the sum total of computational resources required to find it, depends on the selection of the parameters of the GA; that is, finding a correct fitness function and appropriate operators and other parameters to solve a problem with GAs is itself a difficult problem. The contributions of this dissertation, then, are: to show that there is not a linear correlation between diversity in the initial population and the performance of GAs; to show that fitness functions that use information from the problem itself are better than fitness functions that need external tuning; and to propose a relationship between selection pressure and the probabilities of crossover and mutation that improve the performance of GAs in the context of of two extreme schema: small schema, where the building block in consideration is small (each bit individually can be considered as part of the general solution), and long schema, where the building block in consideration is long (a set of interrelated bits conform part of the general solution).
The Dissertation proposes three general hypotheses. The first one, in an attempt to measure the impact of the input over the output, study that there is not a linear correlation between diversity in the initial population and performance of GAs. The second one, proposes the use of parameters that belong to the problem itself to joint objective and constraint in fitness functions, and the third one use Holland's Schema Theorem for finding an interrelation between selection pressure and the probabilities of crossover and mutation that, if obeyed, is expected to result in better performance of the GA in terms of the solution quality found within a given number of generations and/or the number of generations to find a solution of a given quality than if the interrelation is not obeyed.
Theoretical and practical problems like the one-max problem and the intrusion detection problem (considered as problems with small schema) and the snake-in-the-box problem (considered as a problem with long schema) are tested under the specific hypotheses of the Dissertation.