Genetic Algorithms and Genetic Programming Modern Concepts and Practical Applications
© 2009 by Taylor & Francis Group,...
162 downloads
690 Views
5MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Genetic Algorithms and Genetic Programming Modern Concepts and Practical Applications
© 2009 by Taylor & Francis Group, LLC
Numerical Insights Series Editor
A. Sydow, GMDFIRST, Berlin, Germany Editorial Board P. Borne, École de Lille, France; G. Carmichael, University of Iowa, USA; L. Dekker, Delft University of Technology, The Netherlands; A. Iserles, University of Cambridge, UK; A. Jakeman, Australian National University, Australia; G. Korn, Industrial Consultants (Tucson), USA; G.P. Rao, Indian Institute of Technology, India; J.R. Rice, Purdue University, USA; A.A. Samarskii, Russian Academy of Science, Russia; Y. Takahara, Tokyo Institute of Technology, Japan The Numerical Insights series aims to show how numerical simulations provide valuable insights into the mechanisms and processes involved in a wide range of disciplines. Such simulations provide a way of assessing theories by comparing simulations with observations. These models are also powerful tools which serve to indicate where both theory and experiment can be improved. In most cases the books will be accompanied by software on disk demonstrating working examples of the simulations described in the text.
The editors will welcome proposals using modelling, simulation and systems analysis techniques in the following disciplines: physical sciences; engineering; environment; ecology; biosciences; economics. Volume 1 Numerical Insights into Dynamic Systems: Interactive Dynamic System Simulation with Microsoft® Windows™ and NT™ Granino A. Korn Volume 2 Modelling, Simulation and Control of NonLinear Dynamical Systems: An Intelligent Approach using Soft Computing and Fractal Theory Patricia Melin and Oscar Castillo Volume 3 Principles of Mathematical Modeling: Ideas, Methods, Examples A.A. Samarskii and A. P. Mikhailov Volume 4 Practical Fourier Analysis for Multigrid Methods Roman Wienands and Wolfgang Joppich Volume 5 Effective Computational Methods for Wave Propagation Nikolaos A. Kampanis, Vassilios A. Dougalis, and John A. Ekaterinaris Volume 6 Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications Michael Affenzeller, Stephan Winkler, Stefan Wagner, and Andreas Beham © 2009 by Taylor & Francis Group, LLC
Genetic Algorithms and Genetic Programming Modern Concepts and Practical Applications
Michael Affenzeller, Stephan Winkler, Stefan Wagner, and Andreas Beham
© 2009 by Taylor & Francis Group, LLC
Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487‑2742 © 2009 by Taylor & Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid‑free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number‑13: 978‑1‑58488‑629‑7 (Hardcover) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher can‑ not assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copy‑ right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978‑750‑8400. CCC is a not‑for‑profit organization that pro‑ vides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data Genetic algorithms and genetic programming : modern concepts and practical applications / Michael Affenzeller ... [et al.]. p. cm. ‑‑ (Numerical insights ; v. 6) Includes bibliographical references and index. ISBN 978‑1‑58488‑629‑7 (hardcover : alk. paper) 1. Algorithms. 2. Combinatorial optimization. 3. Programming (Mathematics) 4. Evolutionary computation. I. Affenzeller, Michael. II. Title. III. Series. QA9.58.G46 2009 006.3’1‑‑dc22 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
© 2009 by Taylor & Francis Group, LLC
2009003656
Contents
List of Tables
xi
List of Figures
xv
List of Algorithms Introduction
xxiii xxv
1 Simulating Evolution: Basics about Genetic Algorithms 1.1 The Evolution of Evolutionary Computation . . . . . . . . . 1.2 The Basics of Genetic Algorithms . . . . . . . . . . . . . . . 1.3 Biological Terminology . . . . . . . . . . . . . . . . . . . . . 1.4 Genetic Operators . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Models for Parent Selection . . . . . . . . . . . . . . . 1.4.2 Recombination (Crossover) . . . . . . . . . . . . . . . 1.4.3 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.4 Replacement Schemes . . . . . . . . . . . . . . . . . . 1.5 Problem Representation . . . . . . . . . . . . . . . . . . . . . 1.5.1 Binary Representation . . . . . . . . . . . . . . . . . . 1.5.2 Adjacency Representation . . . . . . . . . . . . . . . . 1.5.3 Path Representation . . . . . . . . . . . . . . . . . . . 1.5.4 Other Representations for Combinatorial Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.5 Problem Representations for RealValued Encoding . . 1.6 GA Theory: Schemata and Building Blocks . . . . . . . . . . 1.7 Parallel Genetic Algorithms . . . . . . . . . . . . . . . . . . . 1.7.1 Global Parallelization . . . . . . . . . . . . . . . . . . 1.7.2 CoarseGrained Parallel GAs . . . . . . . . . . . . . . 1.7.3 FineGrained Parallel GAs . . . . . . . . . . . . . . . 1.7.4 Migration . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 The Interplay of Genetic Operators . . . . . . . . . . . . . . 1.9 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . .
1 1 2 3 6 6 7 9 9 10 11 12 13 13 14 14 17 18 19 20 21 22 23
2 Evolving Programs: Genetic Programming 2.1 Introduction: Main Ideas and Historical Background . . . . . 2.2 Chromosome Representation . . . . . . . . . . . . . . . . . . 2.2.1 Hierarchical Labeled Structure Trees . . . . . . . . . .
25 26 28 28
v © 2009 by Taylor & Francis Group, LLC
vi
Genetic Algorithms and Genetic Programming 2.2.2
2.3
2.4
2.5
2.6 2.7 2.8
Automatically Deﬁned Functions and Modular Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Other Representations . . . . . . . . . . . . . . . . . . Basic Steps of the GPBased Problem Solving Process . . . . 2.3.1 Preparatory Steps . . . . . . . . . . . . . . . . . . . . 2.3.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Breeding Populations of Programs . . . . . . . . . . . 2.3.4 Process Termination and Results Designation . . . . . Typical Applications of Genetic Programming . . . . . . . . 2.4.1 Automated Learning of Multiplexer Functions . . . . . 2.4.2 The Artiﬁcial Ant . . . . . . . . . . . . . . . . . . . . 2.4.3 Symbolic Regression . . . . . . . . . . . . . . . . . . . 2.4.4 Other GP Applications . . . . . . . . . . . . . . . . . GP Schema Theories . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Program Component GP Schemata . . . . . . . . . . . 2.5.2 Rooted Tree GP Schema Theories . . . . . . . . . . . 2.5.3 Exact GP Schema Theory . . . . . . . . . . . . . . . . 2.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . Current GP Challenges and Research Areas . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . .
35 36 37 37 39 39 41 43 43 44 46 49 50 51 52 54 59 59 62 62
3 Problems and Success Factors 3.1 What Makes GAs and GP Unique among Intelligent Optimization Methods? . . . . . . . . . . . . . . . . . . . . . 3.2 Stagnation and Premature Convergence . . . . . . . . . . . .
65
4 Preservation of Relevant Building Blocks 4.1 What Can Extended Selection Concepts Do to Avoid Premature Convergence? . . . . . . . . . . . . . . . . . . . . 4.2 Oﬀspring Selection (OS) . . . . . . . . . . . . . . . . . . . . 4.3 The Relevant Alleles Preserving Genetic Algorithm (RAPGA) 4.4 Consequences Arising out of Oﬀspring Selection and RAPGA
69
5 SASEGASA – More than the Sum of All Parts 5.1 The Interplay of Distributed Search and Systematic Recovery of Essential Genetic Information . . . . . . . . . . . . . . . . 5.2 Migration Revisited . . . . . . . . . . . . . . . . . . . . . . . 5.3 SASEGASA: A Novel and SelfAdaptive Parallel Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 The Core Algorithm . . . . . . . . . . . . . . . . . . . 5.4 Interactions among Genetic Drift, Migration, and SelfAdaptive Selection Pressure . . . . . . . . . . . . . . . . . . . . . . . .
79
© 2009 by Taylor & Francis Group, LLC
65 66
69 70 73 76
80 81 82 83 86
Table of Contents 6 Analysis of Population Dynamics 6.1 Parent Analysis . . . . . . . . . 6.2 Genetic Diversity . . . . . . . . 6.2.1 In SinglePopulation GAs 6.2.2 In MultiPopulation GAs 6.2.3 Application Examples . .
. . . . .
. . . . .
vii
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
89 89 90 90 91 92
7 Characteristics of Offspring Selection and the RAPGA 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Building Block Analysis for Standard GAs . . . . . . . . . . 7.3 Building Block Analysis for GAs Using Oﬀspring Selection . 7.4 Building Block Analysis for the Relevant Alleles Preserving GA (RAPGA) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97 97 98 103
8 Combinatorial Optimization: Route Planning 8.1 The Traveling Salesman Problem . . . . . . . . . . . . . . . 8.1.1 Problem Statement and Solution Methodology . . . 8.1.2 Review of Approximation Algorithms and Heuristics 8.1.3 Multiple Traveling Salesman Problems . . . . . . . . 8.1.4 Genetic Algorithm Approaches . . . . . . . . . . . . 8.2 The Capacitated Vehicle Routing Problem . . . . . . . . . 8.2.1 Problem Statement and Solution Methodology . . . 8.2.2 Genetic Algorithm Approaches . . . . . . . . . . . .
. . . . . . . .
121 121 122 125 130 130 139 140 147
9 Evolutionary System Identification 9.1 DataBased Modeling and System Identiﬁcation . . . . . 9.1.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 An Example . . . . . . . . . . . . . . . . . . . . . 9.1.3 The Basic Steps in System Identiﬁcation . . . . . . 9.1.4 DataBased Modeling Using Genetic Programming 9.2 GPBased System Identiﬁcation in HeuristicLab . . . . . 9.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 9.2.2 Problem Representation . . . . . . . . . . . . . . . 9.2.3 The Functions and Terminals Basis . . . . . . . . . 9.2.4 Solution Representation . . . . . . . . . . . . . . . 9.2.5 Solution Evaluation . . . . . . . . . . . . . . . . . 9.3 Local Adaption Embedded in Global Optimization . . . . 9.3.1 Parameter Optimization . . . . . . . . . . . . . . . 9.3.2 Pruning . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Similarity Measures for Solution Candidates . . . . . . . 9.4.1 EvaluationBased Similarity Measures . . . . . . . 9.4.2 Structural Similarity Measures . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
157 157 157 159 166 169 170 170 171 173 178 182 188 189 192 197 199 201
© 2009 by Taylor & Francis Group, LLC
. . . . . . . . . . . . . . . . .
113
viii
Genetic Algorithms and Genetic Programming
10 Applications of Genetic Algorithms: Combinatorial Optimization 207 10.1 The Traveling Salesman Problem . . . . . . . . . . . . . . . . 208 10.1.1 Performance Increase of Results of Diﬀerent Crossover Operators by Means of Oﬀspring Selection . . . . . . . 208 10.1.2 Scalability of Global Solution Quality by SASEGASA 210 10.1.3 Comparison of the SASEGASA to the IslandModel CoarseGrained Parallel GA . . . . . . . . . . . . . . . 214 10.1.4 Genetic Diversity Analysis for the Diﬀerent GA Types 217 10.2 Capacitated Vehicle Routing . . . . . . . . . . . . . . . . . . 221 10.2.1 Results Achieved Using Standard Genetic Algorithms 222 10.2.2 Results Achieved Using Genetic Algorithms with Oﬀspring Selection . . . . . . . . . . . . . . . . . . . . 226 11 DataBased Modeling with Genetic Programming 235 11.1 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . 235 11.1.1 Time Series Speciﬁc Evaluation . . . . . . . . . . . . . 236 11.1.2 Application Example: Design of Virtual Sensors for Emissions of Diesel Engines . . . . . . . . . . . . . . . 237 11.2 Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 251 11.2.2 RealValued Classiﬁcation with Genetic Programming 251 11.2.3 Analyzing Classiﬁers . . . . . . . . . . . . . . . . . . . 252 11.2.4 Classiﬁcation Speciﬁc Evaluation in GP . . . . . . . . 258 11.2.5 Application Example: Medical Data Analysis . . . . . 263 11.3 Genetic Propagation . . . . . . . . . . . . . . . . . . . . . . . 285 11.3.1 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . 285 11.3.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . 286 11.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 288 11.3.4 Additional Tests Using Random Parent Selection . . . 289 11.4 Single Population Diversity Analysis . . . . . . . . . . . . . . 292 11.4.1 GP Test Strategies . . . . . . . . . . . . . . . . . . . . 292 11.4.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . 293 11.4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 297 11.5 MultiPopulation Diversity Analysis . . . . . . . . . . . . . . 300 11.5.1 GP Test Strategies . . . . . . . . . . . . . . . . . . . . 300 11.5.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . 301 11.5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 303 11.6 Code Bloat, Pruning, and Population Diversity . . . . . . . . 306 11.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 306 11.6.2 Test Strategies . . . . . . . . . . . . . . . . . . . . . . 307 11.6.3 Test Results . . . . . . . . . . . . . . . . . . . . . . . . 309 11.6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 318 Conclusion and Outlook
© 2009 by Taylor & Francis Group, LLC
321
Table of Contents
ix
Symbols and Abbreviations
325
References
327
© 2009 by Taylor & Francis Group, LLC
List of Tables
7.1 7.2 7.3
Parameters for test runs using Parameters for test runs using Parameters for test runs using genetic algorithm. . . . . . . .
8.1
Exemplary edge map of the parent tours for an ERX operator. 138
9.1 9.2
Databased modeling example: Training data. . . . . . . . . Databased modeling example: Test data. . . . . . . . . . .
160 164
Overview of algorithm parameters. . . . . . . . . . . . . . . Experimental results achieved using a standard GA. . . . . Experimental results achieved using a GA with oﬀspring selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Parameter values used in the test runs of the SASEGASA algorithms with single crossover operators as well as with a combination of the operators. . . . . . . . . . . . . . . . . . 10.5 Results showing the scaling properties of SASEGASA with one crossover operator (OX), with and without mutation. . 10.6 Results showing the scaling properties of SASEGASA with one crossover operator (ERX), with and without mutation. 10.7 Results showing the scaling properties of SASEGASA with one crossover operator (MPX), with and without mutation. 10.8 Results showing the scaling properties of SASEGASA with a combination of crossover operators (OX, ERX, MPX), with and without mutation. . . . . . . . . . . . . . . . . . . . . . 10.9 Parameter values used in the test runs of a island model GA with various operators and various numbers of demes. . . . 10.10 Results showing the scaling properties of an island GA with one crossover operator (OX) using roulettewheel selection, with and without mutation. . . . . . . . . . . . . . . . . . . 10.11 Results showing the scaling properties of an island GA with one crossover operator (ERX) using roulettewheel selection, with and without mutation. . . . . . . . . . . . . . . . . . . 10.12 Results showing the scaling properties of an island GA with one crossover operator (MPX) using roulettewheel selection, with and without mutation. . . . . . . . . . . . . . . . . . .
209 209
10.1 10.2 10.3
a conventional GA. . . . . . 99 a GA with oﬀspring selection. 104 the relevant alleles preserving . . . . . . . . . . . . . . . . . 113
209
211 211 212 212
213 215
215
216
216
xi © 2009 by Taylor & Francis Group, LLC
xii
Genetic Algorithms and Genetic Programming 10.13 Parameter values used in the CVRP test runs applying a standard GA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.14 Results of a GA using roulettewheel selection, 3tournament selection and various mutation operators. . . . . . . . . . . 10.15 Parameter values used in CVRP test runs applying a GA with OS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.16 Results of a GA with oﬀspring selection and population sizes of 200 and 400 and various mutation operators. The conﬁguration is listed in Table 10.15. . . . . . . . . . . . . . . . . . 10.17 Showing results of a GA with oﬀspring and a population size of 500 and various mutation operators. The conﬁguration is listed in Table 10.15. . . . . . . . . . . . . . . . . . . . . . . 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10 11.11 11.12
11.13 11.14 11.15 11.16
Linear correlation of input variables and the target values (NOx ) in the N Ox data set I. . . . . . . . . . . . . . . . . . Mean squared errors on training data for the N Ox data set I. Statistic features of the identiﬁcation relevant variables in the N Ox data set II. . . . . . . . . . . . . . . . . . . . . . . . . Linear correlation coeﬃcients of the variables relevant in the N Ox data set II. . . . . . . . . . . . . . . . . . . . . . . . . Statistic features of the variables in the N Ox data set III. . Linear correlation coeﬃcients of the variables relevant in the N Ox data set III. . . . . . . . . . . . . . . . . . . . . . . . . Exemplary confusion matrix with three classes . . . . . . . Exemplary confusion matrix with two classes . . . . . . . . Set of function and terminal deﬁnitions for enhanced GPbased classiﬁcation. . . . . . . . . . . . . . . . . . . . . . . . Experimental results for the Thyroid data set. . . . . . . . . Summary of the best GP parameter settings for solving classiﬁcation problems. . . . . . . . . . . . . . . . . . . . . . . . Summary of training and test results for the Wisconsin data set: Correct classiﬁcation rates (average values and standard deviation values) for 10fold CV partitions, produced by GP with oﬀspring selection. . . . . . . . . . . . . . . . . . . . . Comparison of machine learning methods: Average test accuracy of classiﬁers for the Wisconsin data set. . . . . . . . Confusion matrices for average classiﬁcation results produced by GP with OS for the Melanoma data set. . . . . . . . . . Comparison of machine learning methods: Average test accuracy of classiﬁers for the Melanoma data set. . . . . . . . Summary of training and test results for the Thyroid data set: Correct classiﬁcation rates (average values and standard deviation values) for 10fold CV partitions, produced by GP with oﬀspring selection. . . . . . . . . . . . . . . . . . . . .
© 2009 by Taylor & Francis Group, LLC
223 226 228
232
234 240 241 246 248 250 250 253 254 264 270 271
279 280 280 281
282
List of Tables 11.17 Comparison of machine learning methods: Average test accuracy of classiﬁers for the Thyroid data set. . . . . . . . . 11.18 GP test strategies. . . . . . . . . . . . . . . . . . . . . . . . 11.19 Test results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.20 Average overall genetic propagation of population partitions. 11.21 Additional test strategies for genetic propagation tests. . . . 11.22 Test results in additional genetic propagation tests (using random parent selection). . . . . . . . . . . . . . . . . . . . . . 11.23 Average overall genetic propagation of population partitions for random parent selection tests. . . . . . . . . . . . . . . . 11.24 GP test strategies. . . . . . . . . . . . . . . . . . . . . . . . 11.25 Test results: Solution qualities. . . . . . . . . . . . . . . . . 11.26 Test results: Population diversity (average similarity values; avg., std.). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.27 Test results: Population diversity (maximum similarity values; avg., std.). . . . . . . . . . . . . . . . . . . . . . . . . . 11.28 GP test strategies. . . . . . . . . . . . . . . . . . . . . . . . 11.29 Multipopulation diversity test results of the GP test runs using the Thyroid data set. . . . . . . . . . . . . . . . . . . 11.30 Multipopulation diversity test results of the GP test runs using the N Ox data set III. . . . . . . . . . . . . . . . . . . 11.31 GP parameters used for code growth and bloat prevention tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.32 Summary of the code growth prevention strategies applied in these test series. . . . . . . . . . . . . . . . . . . . . . . . . . 11.33 Performance of systematic and ESbased pruning strategies. 11.34 Formula size progress in test series (d). . . . . . . . . . . . . 11.35 Quality of results produced in test series (d). . . . . . . . . 11.36 Formula size and population diversity progress in test series (e). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.37 Formula size and population diversity progress in test series (f). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.38 Quality of results produced in test series (f). . . . . . . . . . 11.39 Formula size and population diversity progress in test series (g). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.40 Quality of results produced in test series (g). . . . . . . . . 11.41 Formula size and population diversity progress in test series (h). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.42 Quality of results produced in test series (h). . . . . . . . . 11.43 Comparison of best models on training and validation data (bt and bv , respectively). . . . . . . . . . . . . . . . . . . . . 11.44 Formula size and population diversity progress in test series (i). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.45 Quality of results produced in test series (i). . . . . . . . . .
© 2009 by Taylor & Francis Group, LLC
xiii
283 285 286 287 289 290 290 293 294 295 296 302 303 304 307 308 310 311 311 312 313 313 314 314 315 316 317 320 320
List of Figures
1.1 1.2 1.3
1.4 1.5 2.1 2.2 2.3 2.4
2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17
The canonical genetic algorithm with binary solution encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Schematic display of a single point crossover. . . . . . . . . Global parallelization concepts: A panmictic population structure (shown in left picture) and the corresponding master– slave model (right picture). . . . . . . . . . . . . . . . . . . Population structure of a coarsegrained parallel GA. . . . . Population structure of a ﬁnegrained parallel GA; the special case of a cellular model is shown here. . . . . . . . . . . . . Exemplary programs given as rooted, labeled structure trees. Exemplary evaluation of program (a). . . . . . . . . . . . . Exemplary evaluation of program (b). . . . . . . . . . . . . Exemplary crossover of programs (1) and (2) labeled as parent1 and parent2, respectively. Child1 and child2 are possible new oﬀspring programs formed out of the genetic material of their parents. . . . . . . . . . . . . . . . . . . . . . . . . . . Exemplary mutation of a program: The programs mutant1, mutant2, and mutant3 are possible mutants of parent. . . . Intronaugmented representation of an exemplary program in PDGP [Pol99b]. . . . . . . . . . . . . . . . . . . . . . . . . . Major preparatory steps of the basic GP process. . . . . . . The genetic programming cycle [LP02]. . . . . . . . . . . . . The GPbased problem solving process. . . . . . . . . . . . GA and GP ﬂowcharts: The conventional genetic algorithm and genetic programming. . . . . . . . . . . . . . . . . . . . The Boolean multiplexer with three address bits; (a) general black box model, (b) addressing data bit d5 . . . . . . . . . . A correct solution to the 3address Boolean multiplexer problem [Koz92b]. . . . . . . . . . . . . . . . . . . . . . . . . . . The Santa Fe trail. . . . . . . . . . . . . . . . . . . . . . . . A Santa Fe trail solution. The black points represent nodes referencing to the Prog3 function. . . . . . . . . . . . . . . . A symbolic regression example. . . . . . . . . . . . . . . . . Exemplary formulas. . . . . . . . . . . . . . . . . . . . . . . Programs matching Koza’s schema H=[(+ x 3), y]. . . . . .
4 8
18 19 20 30 31 32
34 35 38 38 40 41 42 44 44 45 46 48 49 51
xv © 2009 by Taylor & Francis Group, LLC
xvi
Genetic Algorithms and Genetic Programming 2.18 2.19
2.20 2.21
2.22
2.23 4.1 4.2
4.3
4.4
5.1
5.2 5.3 5.4 6.1
The rooted tree GP schema ∗(=, = (x, =)) and three exemplary programs of the schema’s semantics. . . . . . . . . . . The GP schema H = +(*(=,x),=) and exemplary u and l schemata. Cross bars indicate crossover points; shaded regions show the parts of H that are replaced by “don’t care” symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The GP hyperschema ∗(#, = (x, =)) and three exemplary programs that are a part of the schema’s semantics. . . . . The GP schema H = +(∗(=, x), =) and exemplary U and L hyperschema building blocks. Cross bars indicate crossover points; shaded regions show the parts of H that are modiﬁed. Relation between approximate and exact schema theorems for diﬀerent representations and diﬀerent forms of crossover (in the absence of mutation). . . . . . . . . . . . . . . . . . . . Examples for bloat. . . . . . . . . . . . . . . . . . . . . . . . Flowchart of the embedding of oﬀspring selection into a genetic algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . Graphical representation of the gene pool available at a certain generation. Each bar represents a chromosome with its alleles representing the assignment of the genes at the certain loci. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The left part of the ﬁgure represents the gene pool at generation i and the right part indicates the possible size of generation i + 1 which must not go below a minimum size and also not exceed an upper limit. These parameters have to be deﬁned by the user. . . . . . . . . . . . . . . . . . . . . . . . Typical development of actual population size between the two borders (lower and upper limit of population size) displaying also the identical chromosomes that occur especially in the last iterations. . . . . . . . . . . . . . . . . . . . . . .
53
56 56
57
58 60
71
74
74
76
Flowchart of the reuniﬁcation of subpopulations of a SASEGASA (light shaded subpopulations are still evolving, whereas dark shaded ones have already converged prematurely). . . . . . 84 Quality progress of a typical run of the SASEGASA algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Selection pressure curves for a typical run of the SASEGASA algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Flowchart showing the main steps of the SASEGASA. . . . 87 Similarity of solutions in the population of a standard GA after 20 and 200 iterations, shown in the left and the right charts, respectively. . . . . . . . . . . . . . . . . . . . . . . .
© 2009 by Taylor & Francis Group, LLC
93
List of Figures 6.2
6.3
6.4 6.5
7.1 7.2 7.3 7.4
7.5 7.6 7.7 7.8 7.9
7.10
7.11
7.12
Histograms of the similarities of solutions in the population of a standard GA after 20 and 200 iterations, shown in the left and the right charts, respectively. . . . . . . . . . . . . . Average similarities of solutions in the population of a standard GA over for the ﬁrst 2,000 and 10,000 iterations, shown in the upper and lower charts, respectively. . . . . . . . . . Multipopulation speciﬁc similarities of the solutions of a parallel GA’s populations after 5,000 generations. . . . . . . . . Progress of the average multipopulation speciﬁc similarity values of a parallel GA’s solutions, shown for 10,000 generations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality progress for a standard GA with OX crossover for mutation rates of 0%, 5%, and 10%. . . . . . . . . . . . . . Quality progress for a standard GA with ERX crossover for mutation rates of 0%, 5%, and 10%. . . . . . . . . . . . . . Quality progress for a standard GA with MPX crossover for mutation rates of 0%, 5%, and 10%. . . . . . . . . . . . . . Distribution of the alleles of the global optimal solution over the run of a standard GA using OX crossover and a mutation rate of 5% (remaining parameters are set according to Table 7.1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality progress for a GA with oﬀspring selection, OX, and a mutation rate of 5%. . . . . . . . . . . . . . . . . . . . . . Quality progress for a GA with oﬀspring selection, MPX, and a mutation rate of 5%. . . . . . . . . . . . . . . . . . . . . . Quality progress for a GA with oﬀspring selection, ERX, and a mutation rate of 5%. . . . . . . . . . . . . . . . . . . . . . Quality progress for a GA with oﬀspring selection, ERX, and no mutation. . . . . . . . . . . . . . . . . . . . . . . . . . . Quality progress for a GA with oﬀspring selection using a combination of OX, ERX, and MPX, and a mutation rate of 5%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Success progress of the diﬀerent crossover operators OX, ERX, and MPX, and a mutation rate of 5%. The plotted graphs represent the ratio of successfully produced children to the population size over the generations. . . . . . . . . . . . . . Distribution of the alleles of the global optimal solution over the run of an oﬀspring selection GA using ERX crossover and a mutation rate of 5% (remaining parameters are set according to Table 7.2). . . . . . . . . . . . . . . . . . . . . Distribution of the alleles of the global optimal solution over the run of an oﬀspring selection GA using ERX crossover and no mutation (remaining parameters are set according to Table 7.2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
© 2009 by Taylor & Francis Group, LLC
xvii
94
95 96
96 99 101 102
103 105 106 107 108
109
110
111
112
xviii 7.13 7.14 7.15 7.16
7.17
7.18
7.19
8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14 9.1 9.2 9.3 9.4
Genetic Algorithms and Genetic Programming Quality progress for a relevant alleles preserving GA with OX and a mutation rate of 5%. . . . . . . . . . . . . . . . . . . Quality progress for a relevant alleles preserving GA with MPX and a mutation rate of 5%. . . . . . . . . . . . . . . . Quality progress for a relevant alleles preserving GA with ERX and a mutation rate of 5%. . . . . . . . . . . . . . . . Quality progress for a relevant alleles preserving GA using a combination of OX, ERX, and MPX, and a mutation rate of 5%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality progress for a relevant alleles preserving GA using a combination of OX, ERX, and MPX, and mutation switched oﬀ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distribution of the alleles of the global optimal solution over the run of a relevant alleles preserving GA using a combination of OX, ERX, and MPX, and a mutation rate of 5% (remaining parameters are set according to Table 7.3). . . . Distribution of the alleles of the global optimal solution over the run of a relevant alleles preserving GA using a combination of OX, ERX, and MPX without mutation (remaining are set parameters according to Table 7.3). . . . . . . . . . . . . Exemplary nearest neighbor solution for a 51city TSP instance ([CE69]). . . . . . . . . . . . . . . . . . . . . . . . . . Example of a 2change for a TSP instance with 7 cities. . . Example of a 3change for a TSP instance with 11 cities. . . Example for a partially matched crossover. . . . . . . . . . . Example for an order crossover. . . . . . . . . . . . . . . . . Example for a cyclic crossover. . . . . . . . . . . . . . . . . Exemplary result of the sweep heuristic for a small CVRP. . Exemplary sequencebased crossover. . . . . . . . . . . . . . Exemplary routebased crossover. . . . . . . . . . . . . . . . Exemplary relocate mutation. . . . . . . . . . . . . . . . . . Exemplary exchange mutation. . . . . . . . . . . . . . . . . Example for a 2opt mutation for the VRP. . . . . . . . . . Example for a 2opt∗ mutation for the VRP. . . . . . . . . . Example for an oropt mutation for the VRP. . . . . . . . . Databased modeling example: Training data. . . . . . . . . Databased modeling example: Evaluation of an optimally ﬁt linear model. . . . . . . . . . . . . . . . . . . . . . . . . . . Databased modeling example: Evaluation of an optimally ﬁt cubic model. . . . . . . . . . . . . . . . . . . . . . . . . . . . Databased modeling example: Evaluation of an optimally ﬁt polynomial model (n = 10). . . . . . . . . . . . . . . . . . .
© 2009 by Taylor & Francis Group, LLC
114 115 115
116
116
118
119 126 128 129 134 135 136 144 149 151 152 152 153 153 154 160 161 162 162
List of Figures 9.5 9.6 9.7 9.8
9.9 9.10 9.11 9.12 9.13 9.14 9.15 10.1 10.2
10.3 10.4 10.5 10.6
10.7
10.8
10.9
Databased modeling example: Evaluation of an optimally ﬁt polynomial model (n = 20). . . . . . . . . . . . . . . . . . . Databased modeling example: Evaluation of an optimally ﬁt linear model (evaluated on training and test data). . . . . . Databased modeling example: Evaluation of an optimally ﬁt cubic model (evaluated on training and test data). . . . . . Databased modeling example: Evaluation of an optimally ﬁt polynomial model (n = 10) (evaluated on training and test data). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Databased modeling example: Summary of training and test errors for varying numbers of parameters n. . . . . . . . . . The basic steps of system identiﬁcation. . . . . . . . . . . . The basic steps of GPbased system identiﬁcation. . . . . . Structure tree representation of a formula. . . . . . . . . . . Structure tree crossover and the functions basis. . . . . . . . Simple examples for pruning in GP. . . . . . . . . . . . . . . Simple formula structure and all included pairs of ancestors and descendants (genetic information items). . . . . . . . . Quality improvement using oﬀspring selection and various crossover operators. . . . . . . . . . . . . . . . . . . . . . . . Degree of similarity/distance for all pairs of solutions in a SGA’s population of 120 solution candidates after 10 generations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Genetic diversity in the population of a conventional GA over time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Genetic diversity of the population of a GA with oﬀspring selection over time. . . . . . . . . . . . . . . . . . . . . . . . Genetic diversity of the entire population over time for a SASEGASA with 5 subpopulations. . . . . . . . . . . . . . . Quality progress of a standard GA using roulette wheel selection on the left and 3tournament selection the right side, applied to instances of the Taillard CVRP benchmark: tai75a (top) and tai75b (bottom). . . . . . . . . . . . . . . . . . . . Genetic diversity in the population of a GA with roulette wheel selection (shown on the left side) and 3tournament selection (shown on the right side). . . . . . . . . . . . . . . Box plots of the qualities produced by a GA with roulette and 3tournament selection, applied to the problem instances tai75a (top) and tai75b (bottom). . . . . . . . . . . . . . . . Quality progress of the oﬀspring selection GA for the instances (from top to bottom) tai75a and tai75b. The left column shows the progress with a population size of 200, while in the right column the GA with oﬀspring selection uses a population size of 400. . . . . . . . . . . . . . . . . . . . . .
© 2009 by Taylor & Francis Group, LLC
xix
163 163 164
165 165 167 170 179 181 195 202 210
218 219 219 220
223
225
227
229
xx
Genetic Algorithms and Genetic Programming 10.10 Inﬂuence of the crossover operators SBX and RBX on each generation of an oﬀspring selection algorithm. The lighter line represents the RBX; the darker line represents the SBX. 10.11 Genetic diversity in the population of an GA with oﬀspring selection and a population size of 200 on the left and 400 on the right for the problem instances tai75a and tai75b (from top to bottom). . . . . . . . . . . . . . . . . . . . . . . . . . 10.12 Box plots of the oﬀspring selection GA with a population size of 200 and 400 for the instances tai75a and tai75b. . . . . . 10.13 Box plots of the GA with 3tournament selection against the oﬀspring selection GA for the instances tai75a (shown in the upper part) and tai75b (shown in the lower part). . . . . . . 11.1 11.2 11.3 11.4
11.5 11.6 11.7
11.8 11.9 11.10 11.11 11.12
11.13 11.14
11.15
Dynamic diesel engine test bench at the Institute for Design and Control of Mechatronical Systems, JKU Linz. . . . . . . Evaluation of the best model produced by GP for test strategy (1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluation of the best model produced by GP for test strategy (2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluation of models for particulate matter emissions of a diesel engine (snapshot showing the evaluation of the model on validation / test samples). . . . . . . . . . . . . . . . . . Errors distribution of models for particulate matter emissions. Cumulative errors of models for particulate matter emissions. Target N Ox values of N Ox data set II, recorded over approximately 30 minutes at 20Hz recording frequency yielding ∼36,000 samples. . . . . . . . . . . . . . . . . . . . . . . . . Target HoribaN Ox values of N Ox data set III. . . . . . . . Target HoribaN Ox values of N Ox data set III, samples 6000 – 7000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two exemplary ROC curves and their area under the ROC curve (AUC). . . . . . . . . . . . . . . . . . . . . . . . . . . An exemplary graphical display of a multiclass ROC (MROC) matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Classiﬁcation example: Several samples with original class values C1 , C2 , and C3 are shown; the class ranges result from the estimated values for each class and are indicated as cr1 , cr2 , and cr3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . An exemplary hybrid structure tree of a combined formula including arithmetic as well as logical functions. . . . . . . . Graphical representation of the best result we obtained for the Thyroid data set, CVpartition 9: Comparison of original and estimated class values. . . . . . . . . . . . . . . . . . . . ROC curves and their area under the curve (AUC) values for classiﬁcation models generated for Thyroid data, CVset 9.
© 2009 by Taylor & Francis Group, LLC
230
231 233
233 238 241 242
244 244 245
247 248 249 255 257
261 265
272 273
List of Figures 11.16 MROC charts and their maximum and average area under the curve (AUC) values for classiﬁcation models generated for Thyroid data, CVset 9. . . . . . . . . . . . . . . . . . . 11.17 Graphical representation of a classiﬁcation model (formula), produced for 10fold cross validation partition 3 of the Thyroid data set. . . . . . . . . . . . . . . . . . . . . . . . . . . 11.18 pctotal values for an exemplary run of series I. . . . . . . . . 11.19 pctotal values for an exemplary run of series II. . . . . . . . 11.20 pctotal values for an exemplary run of series III. . . . . . . . 11.21 Selection pressure progress in two exemplary runs of test series III and V (extended GP with gender speciﬁc parent selection and strict oﬀspring selection). . . . . . . . . . . . . . 11.22 Distribution of similarity values in an exemplary run of NOx test series A, generation 200. . . . . . . . . . . . . . . . . . 11.23 Distribution of similarity values in an exemplary run of NOx test series A, generation 4000. . . . . . . . . . . . . . . . . . 11.24 Distribution of similarity values in an exemplary run of NOx test series (D), generation 20. . . . . . . . . . . . . . . . . . 11.25 Distribution of similarity values in an exemplary run of NOx test series (D), generation 95. . . . . . . . . . . . . . . . . . 11.26 Population diversity progress in exemplary Thyroid test runs of series (A) and (D) (shown in the upper and lower graph, respectively). . . . . . . . . . . . . . . . . . . . . . . . . . . 11.27 Exemplary multipopulation diversity of a test run of Thyroid series F at iteration 50, grayscale representation. . . . . . . 11.28 Code growth in GP without applying size limits or complexity punishment strategies (left: standard GP, right: extended GP). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.29 Progress of formula complexity in one of the test runs of series (1g), shown for the ﬁrst ∼400 iterations. . . . . . . . . . . . 11.30 Progress of formula complexity in one of the test runs of series (1h) (shown left) and one of series (2h) (shown right). . . . 11.31 Model with best ﬁt on training data: Model structure and full evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . 11.32 Model with best ﬁt on validation data: Model structure and full evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . 11.33 Errors distributions of best models: Charts I, II, and III show the errors distributions of the model with best ﬁt on training data evaluated on training, validation, and test data, respectively; charts IV, V, and VI show the errors distributions of the model with best ﬁt on validation data evaluated on training, validation, and test data, respectively. . . . . . . . . . . 11.34 A simple workbench in HeuristicLab 2.0. . . . . . . . . . . .
© 2009 by Taylor & Francis Group, LLC
xxi
274
275 287 287 288
291 297 298 298 299
299 305
310 315 316 318 318
319 323
List of Algorithms
1.1 4.1 9.1 9.2
9.3 9.4
Basic workﬂow of a genetic algorithm. . . . . . . . . . . . . . Deﬁnition of a genetic algorithm with oﬀspring selection. . . . Exhaustive pruning of a model m using the parameters h1 , h2 , minimizeM odel, cpmax , and detmax . . . . . . . . . . . . . . . Evolution strategy inspired pruning of a model m using the parameters λ, maxU nsuccRounds, h1 , h2 , minimizeM odel, cpmax , and detmax . . . . . . . . . . . . . . . . . . . . . . . . . Calculation of the evaluationbased similarity of two models m1 and m2 with respect to data base data . . . . . . . . . . . . . Calculation of the structural similarity of two models m1 and m2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 72 196
198 200 205
xxiii © 2009 by Taylor & Francis Group, LLC
Introduction
Essentially, this book is about algorithmic developments in the context of genetic algorithms (GAs) and genetic programming (GP); we also describe their applications to signiﬁcant combinatorial optimization problems as well as structure identiﬁcation using HeuristicLab as a platform for algorithm development. The main issue of the theoretical considerations is to obtain a better understanding of the basic workﬂow of GAs and GP, in order to establish new bionic, problem independent theoretical concepts and to substantially increase the achievable solution quality. The book is structured into a theoretical and an empirical part. The aim of the theoretical part is to describe the important and characteristic properties of the basic genetic algorithm as well as the main characteristics of the algorithmic extensions introduced here. The empirical part of the book elaborates two case studies: On the one hand, the traveling salesman problem (TSP) and the capacitated vehicle routing problem (CVRP) are used as representatives for GAs applied to combinatorial optimization problems. On the other hand, GPbased nonlinear structure identiﬁcation applied to time series and classiﬁcation problems is analyzed to highlight the properties of the algorithmic measures in the ﬁeld of genetic programming. The borderlines between theory and practice become indistinct in some parts as it is also necessary to describe theoretical properties on the basis of practical examples in the ﬁrst part of the book. For this purpose we go back to some smalldimensioned TSP instances that are perfectly suited for theoretical GA considerations. Research concerning the selfadaptive interplay between selection and the applied solution manipulation operators (crossover and mutation) is the basis for the algorithmic developments presented in this book. The ultimate goal in this context is to avoid the disappearance of relevant building blocks and to support the combination of those alleles from the gene pool that carry solution properties of highly ﬁt individuals. As we show in comparative test series, in conventional GAs and GP this relevant genetic information is likely to get lost quite early in the standard variants of these algorithms and can only be reintroduced into the population’s gene pool by mutation. This dependence on mutation can be drastically reduced by new generic selection principles based upon either selfadaptive selection pressure steering (oﬀspring selection, OS) or selfadaptive population size adjustment as proposed in the relevant alleles preserving genetic algorithm (RAPGA). Both algorithmic extensions certify the survival of essential genetic information by supporting the survival of rel
xxv © 2009 by Taylor & Francis Group, LLC
xxvi
Genetic Algorithms and Genetic Programming
evant alleles rather than the survival of above average chromosomes. This is achieved by deﬁning the survival probability of a new child chromosome depending on the child’s ﬁtness in comparison to the ﬁtness values of its own parents. With these measures it becomes possible to channel the relevant alleles, which are initially scattered in the entire population, to single chromosomes at the end of the genetic search process. The SASEGASA algorithm is a special coarsegrained parallel GA; the acronym “SASEGASA” hereby stands for SelfAdaptive Segregative Genetic Algorithm including aspects of Simulated Annealing. SASEGASA combines oﬀspring selection with enhanced parallelization concepts in order to avoid premature convergence, one of the major problems with GAs. As we will show for the TSP, it becomes possible to scale the robustness and particularly the achievable solution quality by the number of subpopulations. Due to the high focus on sexual recombination, evolution strategies (ES) are not considered explicitly in this book. Nevertheless, many of the theoretical considerations are heavily inspired by evolution strategies, especially the aspect of selection after reproduction and (self)adaptive selection pressure steering. Aside from other variants of evolutionary computation, further inspirations are borrowed from ﬁelds, as for example, population genetics. The implementation of bionic ideas for algorithmic developments is quite pragmatic and ignores debates on principles that are discussed in natural sciences. Of course, we are always aware of the fact that artiﬁcial evolution as performed in an evolutionary algorithm is situated on a high level of abstraction compared to the biological role model in any case. The problemoriented part of the book is dedicated to the application of the algorithmic concepts described in this book to benchmark as well as real world problems. Concretely, we examine the traveling salesman problem and the capacitated vehicle routing problem (which is thematically related to the TSP), but more in step with actual practice, as representatives of combinatorial optimization problems. Time series and classiﬁcation analysis are used as application areas of databased structure identiﬁcation with genetic programming working with formula trees representing mathematical models. As a matter of principle, we use standard problem representations and the appropriate problemspeciﬁc genetic operators known from GA and GP theory for the experiments shown in these chapters. The focus is set on the comparison of results achievable with standard GA and GP implementations to the results achieved using the extended algorithmic concepts described in this book. These enhanced concepts do not depend on a concrete problem representation and its operators; their inﬂuences on population dynamics in GA and GP populations are analyzed, too.
© 2009 by Taylor & Francis Group, LLC
Introduction
xxvii
Additional material related to the research described in this book is provided on the book’s homepage at http://gagp2009.heuristiclab.com. Among other information this website provides some of the software used as well as dynamical presentations of representative test runs as additional material.
© 2009 by Taylor & Francis Group, LLC
Chapter 1 Simulating Evolution: Basics about Genetic Algorithms
1.1
The Evolution of Evolutionary Computation
Work on what is nowadays called evolutionary computation started in the sixties of the 20th century in the United States and Germany. There have been two basic approaches in computer science that copy evolutionary mechanisms: evolution strategies (ES) and genetic algorithms (GA). Genetic algorithms go back to Holland [Hol75], an American computer scientist and psychologist who developed his theory not only under the aspect of solving optimization problems but also to study selfadaptiveness in biological processes. Essentially, this is the reason why genetic algorithms are much closer to the biological model than evolution strategies. The theoretical foundations of evolution strategies were formed by Rechenberg and Schwefel (see for example [Rec73] or [Sch94]), whose primary goal was optimization. Although these two concepts have many aspects in common, they developed almost independently from each other in the USA (where GAs were developed) and Germany (where research was done on ES). Both attempts work with a population model whereby the genetic information of each individual of a population is in general diﬀerent. Among other things this genotype includes a parameter vector which contains all necessary information about the properties of a certain individual. Before the intrinsic evolutionary process takes place, the population is initialized arbitrarily; evolution, i.e., replacement of the old generation by a new generation, proceeds until a certain termination criterion is fulﬁlled. The major diﬀerence between evolution strategies and genetic algorithms lies in the representation of the genotype and in the way the operators are used (which are mutation, selection, and eventually recombination). In contrast to GAs, where the main role of the mutation operator is simply to avoid stagnation, mutation is the primary operator of evolution strategies. Genetic programming (GP), an extension of the genetic algorithm, is a domainindependent, biologically inspired method that is able to create computer programs from a highlevel problem statement. In fact, virtually all problems in artiﬁcial intelligence, machine learning, adaptive systems, and
1 © 2009 by Taylor & Francis Group, LLC
2
Genetic Algorithms and Genetic Programming
automated learning can be recast as a search for a computer program; genetic programming provides a way to search for a computer program in the space of computer programs (as formulated by Koza in [Koz92a]). Similar to GAs, GP works by imitating aspects of natural evolution, but whereas GAs are intended to ﬁnd arrays of characters or numbers, the goal of a GP process is to search for computer programs (or, for example, formulas) solving the optimization problem at hand. As in every evolutionary process, new individuals (in GP’s case, new programs) are created. They are tested, and the ﬁtter ones in the population succeed in creating children of their own whereas unﬁt ones tend to disappear from the population. In the following sections we give a detailed description of the basics of genetic algorithms in Section 1.2, take a look at the corresponding biological terminology in Section 1.3, and characterize the operators used in GAs in Section 1.4. Then, in Section 1.5 we discuss problem representation issues, and in Section 1.6 we summarize the schema theory, an essentially important concept for understanding not only how, but also why GAs work. Parallel GA concepts are given in Section 1.7, and ﬁnally we discuss the interplay of genetic operators in Section 1.8.
1.2
The Basics of Genetic Algorithms
Concerning its internal functioning, a genetic algorithm is an iterative procedure which usually operates on a population of constant size and is basically executed in the following way: An initial population of individuals (also called “solution candidates” or “chromosomes”) is generated randomly or heuristically. During each iteration step, also called a “generation,” the individuals of the current population are evaluated and assigned a certain ﬁtness value. In order to form a new population, individuals are ﬁrst selected (usually with a probability proportional to their relative ﬁtness values), and then produce oﬀspring candidates which in turn form the next generation of parents. This ensures that the expected number of times an individual is chosen is approximately proportional to its relative performance in the population. For producing new solution candidates genetic algorithms use two operators, namely crossover and mutation: • Crossover is the primary genetic operator: It takes two individuals, called parents, and produces one or two new individuals, called oﬀspring, by combining parts of the parents. In its simplest form, the operator works by swapping (exchanging) substrings before and after a randomly selected crossover point. • The second genetic operator, mutation, is essentially an arbitrary mod
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
3
iﬁcation which helps to prevent premature convergence by randomly sampling new points in the search space. In the case of bit strings, mutation is applied by simply ﬂipping bits randomly in a string with a certain probability called mutation rate. Genetic algorithms are stochastic iterative algorithms, which cannot guarantee convergence; termination is hereby commonly triggered by reaching a maximum number of generations or by ﬁnding an acceptable solution or more sophisticated termination criteria indicating premature convergence. We will discuss this issue in further detail within Chapter 3. The socalled standard genetic algorithm (SGA), which represents the basis of almost all variants of genetic algorithms, is given in Algorithm 1.1 (which is formulated as in [Tom95], for example).
Algorithm 1.1 Basic workﬂow of a genetic algorithm. Produce an initial population of individuals Evaluate the ﬁtness of all individuals while termination condition not met do Select ﬁtter individuals for reproduction and produce new individuals (crossover and mutation) Evaluate ﬁtness of new individuals Generate a new population by inserting some new “good” individuals and by erasing some old “bad” individuals end while
A special and quite restricted GA variant, that has represented the basis for theoretical considerations for a long period of time, is given in Figure 1.1. This chart sketches a GA with binary representation operating with generational replacement, a population of constant size, and the following genetic operators: roulette wheel selection, single point crossover, and bit ﬂip mutation. This special type of genetic algorithms, which is the basis for theoretical GA research such as the well known schema theorem and accordingly the building block hypothesis, is also called the canonical genetic algorithm (CGA).
1.3
Biological Terminology
The approximative way of solving optimization problems by genetic algorithms holds a strong analogy to the basic principles of biological evolution. The fundamentals of the natural evolution theory, as it is considered nowadays, mainly refer to the theories of Charles Darwin, which were published
© 2009 by Taylor & Francis Group, LLC
4
Genetic Algorithms and Genetic Programming
FIGURE 1.1: The canonical genetic algorithm with binary solution encoding.
in 1859 in his wellknown work “The Origin of Species By Means of Natural Selection or the Preservation of Favoured Races in the Struggle for Life” (revised edition: [Dar98]). In this work Darwin states the following ﬁve major ideas: • Evolution, change in lineages, occurs and occurred over time. • All creatures have common descent. • Natural selection determines changes in nature. • Gradual change, i.e., nature changes somehow successively.
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
5
• Speciation, i.e., Darwin claimed that the process of natural selection results in populations diverging enough to become separate species. Although some of Darwin’s proposals were not new, his ideas (particularly those on common descent and natural selection) provided the ﬁrst solid foundation upon which evolutionary biology has been built. At this point it may be useful to formally introduce some essential parts of the biological terminology which are used in the context of genetic algorithms: • All living organisms consist of cells containing the same set of one or more chromosomes, i.e., strings of DNA. A gene can be understood as an “encoder” of a characteristic, such as eye color. The diﬀerent possibilities for a characteristic (e.g., brown, green, blue, gray) are called alleles. Each gene is located at a particular position (locus) on the chromosome. • Most organisms have multiple chromosomes in each cell. The sum of all chromosomes, i.e., the complete collection of genetic material, is called the genome of the organism and the term genotype refers to the particular set of genes contained in a genome. Therefore, if two individuals have identical genomes, they are said to have the same genotype. • Organisms whose chromosomes are arranged in pairs are called diploid, whereas organisms with unpaired chromosomes are called haploid. In nature, most sexually reproducing species are diploid. Humans for instance have 23 pairs of chromosomes in each somatic cell in their body. Recombination (crossover) occurs during sexual reproduction in the following way: • For producing a new child, the genes of the parents are combined to eventually form a new diploid set of chromosomes. Oﬀspring are subject to mutation where elementary parts of the DNA (nucleotides) are changed. The ﬁtness of an organism (individual) is typically deﬁned as its probability to reproduce, or as a function of the number of oﬀspring the organism has produced. For the sake of simpliﬁcation, in genetic algorithms the term chromosome refers to a solution candidate (in the ﬁrst GAs encoded as a bit). The genes are either single bits or small blocks of neighboring bits that encode a particular element of the solution. Even if an allele usually is either 0 or 1, for larger alphabets more alleles are possible at each locus. As a further simpliﬁcation to the biological role model, crossover typically operates by exchanging genetic material between two haploid parents whereas mutation is implemented by simply ﬂipping the bit at a randomly chosen locus. Finally it is remarkable that most applications of genetic algorithms employ haploid singlechromosome individuals, although the evolution of mankind has
© 2009 by Taylor & Francis Group, LLC
6
Genetic Algorithms and Genetic Programming
inspired the GAcommunity at most. This is most probably due to the easier and more eﬀective representation and implementation of singlechromosome individuals.
1.4
Genetic Operators
In the following, the main genetic operators, namely parent selection, crossover, mutation, and replacement are to be described. The focus hereby lies on a functional description of the principles rather than to give a complete overview of operator concepts; for more details about genetic operators the interested reader is referred to textbooks as for example [DLJD00].
1.4.1
Models for Parent Selection
In genetic algorithms a ﬁtness function assigns a score to each individual in a population; this ﬁtness value indicates the quality of the solution represented by the individual. The ﬁtness function is often given as part of the problem description or based on the objective function; developing an appropriate ﬁtness function may also involve the use of simulation, heuristic techniques, or the knowledge of an expert. Evaluating the ﬁtness function for each individual should be relatively fast due to the number of times it will be invoked. If the evaluation is likely to be slow, then concepts of parallel and distributed computing, an approximate function evaluation technique, or a technique, that only considers elements that have changed, may be employed. Once a population has been generated and its ﬁtness has been measured, the set of solutions, that are selected to be “mated” in a given generation, is produced. In the standard genetic algorithm (SGA) the probability, that a chromosome of the current population is selected for reproduction, is proportional to its ﬁtness. In fact, there are many ways of accomplishing this selection. These include: • Proportional selection (roulette wheel selection): The classical SGA utilizes this selection method which has been proposed in the context of Holland’s schema theorem (which will be explained in detail in Section 1.6). Here the expected number of descendants for an individual i is given as pi = ffi with f : S → R+ denoting the ﬁtness function and f representing the average ﬁtness of all individuals. Therefore, each individual of the population is represented by a space proportional to its ﬁtness. By repeatedly spinning the wheel, individuals are chosen using random sampling with replacement. In order to make proportional selection independent from the dimension of the ﬁtness values, socalled windowing techniques are usually employed.
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
7
Further variants of proportional selection aim to reduce the dominance of a single or a group of highly ﬁt individuals (“super individuals”) by stochastic sampling techniques (as for example explained in [DLJD00]). • Linearrank selection: In the context of linearrank selection the individuals of the population are ordered according to their ﬁtness and copies are assigned in such a way that the best individual receives a predetermined multiple of the number of copies the worst one receives [GB89]. On the one hand rank selection implicitly reduces the dominating eﬀects of “super individuals” in populations (i.e., individuals that are assigned a signiﬁcantly better ﬁtness value than all other individuals), but on the other hand it warps the diﬀerence between close ﬁtness values, thus increasing the selection pressure in stagnant populations. Even if linearrank selection has been used with some success, it ignores the information about ﬁtness diﬀerences of diﬀerent individuals and violates the schema theorem. • Tournament selection: There are a number of variants on this theme. The most common one is ktournament selection where k individuals are selected from a population and the ﬁttest individual of the k selected ones is considered for reproduction. In this variant selection pressure can be scaled quite easily by choosing an appropriate number for k.
1.4.2
Recombination (Crossover)
In its easiest formulation, which is suggested in the canonical GA for binary encoding, crossover takes two individuals and cuts their chromosome strings at some randomly chosen position. The produced substrings are then swapped to produce two new full length chromosomes. Conventional crossover techniques for binary representation include: • Single point crossover: A single random cut is made, producing two head sections and two tail sections. The two tail sections are then swapped to produce two new individuals (chromosomes); Figure 1.2 schematically sketches this crossover method which is also called one point crossover. • Multiple point crossover: One natural extension of the single point crossover is the multiple point crossover: In a npoint crossover there are n crossover points and substrings are swapped between the n points. According to some researchers, multiplepoint crossover is more suitable to combine good features present in strings because it samples uniformly along the full length of a chromosome [Ree95]. At the same time, multiplepoint crossover becomes more and more disruptive with an increasing number of crossover
© 2009 by Taylor & Francis Group, LLC
8
Genetic Algorithms and Genetic Programming Crossover Point
Parents
Crossover
Children
FIGURE 1.2: Schematic display of a single point crossover.
points, i.e., the evolvement of longer building blocks becomes more and more diﬃcult. Decreasing the number of crossover points during the run of the GA may be a good compromise. • Uniform crossover: Given two parents, each gene in the oﬀspring is created by copying the corresponding gene from one of the parents. The selection of the corresponding parent is undertaken via a randomly generated crossover mask: At each index, the oﬀspring gene is taken from the ﬁrst parent if there is a 1 in the mask at this index, and otherwise (if there is a 0 in the mask at this index) the gene is taken from the second parent. Due to this construction principle uniform crossover does not support the evolvement of higher order building blocks. The choice of an appropriate crossover operator depends very much on the representation of the search space (see also Section 1.5). Sequencing problems as routing problems for example often require operators diﬀerent from the ones described above as almost all generated children may be situated outside of the space of valid solutions. In higher order representations, a variety of realnumber combination operators can be employed, such as the average and geometric mean. Domain knowledge can be used to design local improvement operators which sometimes allow more eﬃcient exploration of the search space around good solutions. For instance, knowledge could be used to determine the appropriate locations for crossover points. As the number of proposed problemspeciﬁc crossovertechniques has been growing that much over the years, it would go beyond the scope of the present book even to discuss the more important ones. For a good discussion of crossoverrelated issues and further references the reader is referred to [Mic92] and [DLJD00].
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
1.4.3
9
Mutation
Mutations allow undirected jumps to slightly diﬀerent areas of the search space. The basic mutation operator for binary coded problems is bitwise mutation. Mutation occurs randomly and very rarely with a probability pm ; typically, this mutation rate is less than ten percent. In some cases mutation is interpreted as generating a new bit and in others it is interpreted as ﬂipping the bit. In higher order alphabets, such as integer numbering formulations, mutation takes the form of replacing an allele with a randomly chosen value in the appropriate range with probability pm . However, for combinatorial optimization problems, such mutation schemes can cause diﬃculties with chromosome legality; for example, multiple copies of a given value can occur which might be illegal for some problems (including routing). Alternatives suggested in literature include pairwise swap and shift operations as for instance described in [Car94]. In addition, adaptive mutation schemes similar to mutation in the context of evolution strategies are worth mentioning. Adaptive mutation schemes vary either the rate, or the form of mutation, or both during a GA run. For instance, mutation is sometimes deﬁned in such a way that the search space is explored uniformly at ﬁrst and more locally towards the end, in order to do a kind of local improvement of candidate solutions [Mic92].
1.4.4
Replacement Schemes
After having generated a new generation of descendants (oﬀspring) by crossover and mutation, the question arises which of the new candidates should become members of the next generation. In the context of evolution strategies this fact determines the life span of the individuals and substantially inﬂuences the convergence behavior of the algorithm. A further strategy inﬂuencing replacement quite drastically is oﬀspring selection which will be discussed separately in Chapter 4. The following schemes are possible replacement mechanisms for genetic algorithms: • Generational Replacement: The entire population is replaced by its descendants. Similar to the (µ, λ) evolution strategy it might therefore happen that the ﬁtness of the best individual decreases at some stage of evolution. Additionally, this strategy puts into perspective the dominance of a few individuals which might help to avoid premature convergence [SHF94]. • Elitism: The best individual (or the n best individuals, respectively) of the previous generation are retained for the next generation which theoretically allows immortality similar to the (µ + λ) evolution strategy and might
© 2009 by Taylor & Francis Group, LLC
10
Genetic Algorithms and Genetic Programming be critical with respect to premature convergence. The special and commonly applied strategy of just retaining one (the best) individual of the last generation is also called the “golden cage model,” which is a special case of nelitism with n = 1. If mutation is applied to the elite in order to prevent premature convergence, the replacement mechanism is called “weak elitism.” • Deletenlast: The n weakest individuals are replaced by n descendants. If n ≪ P OP  we speak of a steadystate replacement scheme; for n = 1 the changes between the old and the new generation are certainly very small and n = P OP  gives the already introduced generational replacement strategy. • Deleten: In contrast to the deletenlast replacement strategy, here not the n weakest but rather n arbitrarily chosen individuals of the old generation are replaced, which on the one hand reduces the convergence speed of the algorithm but on the other hand also helps to avoid premature convergence (compare elitism versus weak elitism). • Tournament Replacement: Competitions are run between sets of individuals from the last and the actual generation, with the winners becoming part of the new population.
A detailed description of replacement schemes and their eﬀects can be found for example in [SHF94], [Mic92], [DLJD00], and [Mit96].
1.5
Problem Representation
As already stated before, the ﬁrst genetic algorithm presented in literature [Hol75] used binary vectors for the representation of solution candidates (chromosomes). Consequently, the ﬁrst solution manipulation operators (single point crossover, bit mutation) have been developed for binary representation. Furthermore, this very simple GA, also commonly known as the canonical genetic algorithm (CGA), represents the basis for extensive theoretical inspections, resulting in the well known schema theorem and the building block hypothesis ([Hol75], [Gol89]). This background theory will be examined separately in Section 1.6, as it deﬁnes the scope of almost any GA as it should ideally be and distinguishes GAs from almost any other heuristic optimization technique. The unique selling point of GAs is to compile socalled building blocks, i.e., somehow linked parts of the chromosome which become larger as the
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
11
algorithm proceeds, advantageously with respect to the given ﬁtness function. In other words, one could deﬁne the claim of a GA as to be an algorithm which is able to assemble the basic modules of highly ﬁt or even globally optimal solutions (which the algorithm of course does not know about). These basic modules are with some probability already available in the initial population, but widespread over many individuals; the algorithm therefore has to compile these modules in such a clever way that continuously growing sequences of highly qualiﬁed alleles, the socalled building blocks, are formed. Compared to heuristic optimization techniques based on neighborhood search (as tabu search [Glo86] or simulated annealing [KGV83], for example), the methodology of GAs to combine partial solutions (by crossover) is potentially much more robust with respect to getting stuck in local but not global optimal solutions; this tendency of neighborhoodbased searches denotes a major drawback of these heuristics. Still, when applying GAs the user has to draw much more attention on the problem representation in order to help the algorithm to fulﬁll the claim stated above. In that sense the problem representation must allow the solution manipulation operators, especially crossover, to combine alleles of diﬀerent parent individuals. This is because crossover is responsible for combining the properties of two solution candidates which may be located in very diﬀerent regions of the search space so that valid new solution candidates are built. This is why the problem representation has to be designed in a way that crossover operators are able to build valid new children (solution candidates) with a genetic make up that consists of the union set of its parent alleles. Furthermore, as a tribute to the general functioning of GAs, the crossover operators also have to support the potential development of higherorder building blocks (longer allele sequences). Only if the genetic operators for a certain problem representation show these necessary solution manipulator properties, the corresponding GA can be expected to work as it should, i.e., in the sense of a generalized interpretation of the building block hypothesis. Unfortunately, a lot of more or less established problem representations are not able to fulﬁll these requirements, as they do not support the design of potentially suited crossover operators. Some problem representations will be considered exemplarily in the following attracting notice to their ability to allow meaningful crossover procedures. Even if mutation, the second solution manipulation concept of GAs, is also of essential importance, the design of meaningful mutation operators is much less challenging as it is a lot easier to fulﬁll the requirements of a suited mutation operator (which in fact is to introduce a small amount of new genetic information).
1.5.1
Binary Representation
In the early years of GA research there was a strong focus on binary encoding of solution candidates. To some extent, an outgrowth of these ambitions is certainly the binary representation for the TSP. There have been diﬀerent
© 2009 by Taylor & Francis Group, LLC
12
Genetic Algorithms and Genetic Programming
ways how to use binary representation for the TSP, the most straightforward one being to encode each city as a string of log2 n bits and a solution candidate as a string of n(log2 n) bits. Crossover is then simply performed by applying singlepoint crossover as proposed by Holland [Hol75]. Further attempts using binary encoding have been proposed using binary matrix representation ([FM91], [HGL93]). In [HGL93], Homaifar and Guan for example deﬁned a matrix element in the ith row and the jth column to be 1 if and only if in the tour city j is visited after city i; they also applied one or two point crossover on the parent matrices, which for onepoint crossover means that the child tour is created by just taking the column vectors left of the crossover point from one parent, and the column vectors right of the crossover point from the other parent. Obviously, these strategies lead to highly illegal tours which are then repaired by additional repair strategies [HGL93], which is exactly the point where a GA can no longer act as it is supposed to. As the repair strategies have to introduce a high amount of genetic information which is neither from the one nor from the other parent, child solutions emerge whose genetic makeup has only little in common with its own parents; this counteracts the general functioning of GAs as given in a more general interpretation of the schema theorem and the according building block hypothesis.
1.5.2
Adjacency Representation
Using the adjacency representation for the TSP (as described in [LKM+ 99], e.g.), a city j is listed in position i if and only if the tour leads from city i to city j. Based on the adjacency representation, the socalled alternating edges crossover has been proposed for example which basically works as follows: First it chooses an edge from one parent and continues with the position of this edge in the other parent representing the next edge, etc. The partial tour is built up by choosing edges from the two parents alternatingly. In case this strategy would produce a cycle, the edge is not added, but instead the operator randomly selects an edge from the edges which do not produce a cycle and continues in the way described above. Compared to the crossover operators based on binary encoding, this strategy has the obvious advantage that a new child is built up from edges of its own parents. However, also this strategy is not very well suited as a further claim to crossover is not fulﬁlled at all: The alternating edges crossover cannot inherit longer tour segments and therefore longer building blocks cannot establish. As a further development to the alternating edges crossover, the socalled subtour chunks crossover aims to put things right by not alternating the edges but subtours of the two parental solutions. However, the capabilities of this strategy are also rather limited.
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
1.5.3
13
Path Representation
The most natural representation of a TSP tour is given by the path representation. Within this representation, the n cities of a tour are put in order according to a list of length n, so that the order of cities to be visited is given by the list entries with an imaginary edge from the last to the ﬁrst list entry. A lot of crossover and mutation operators have been developed based upon this representation, and most of the nowadays used TSP solution methods using GAs are realized using path representation. Despite obvious disadvantages like the equivocality of this representation (the same tour can be described in 2n diﬀerent ways for a symmetrical TSP and in n diﬀerent ways for an asymmetrical TSP) this representation has allowed the design of quite powerful operators like the order crossover (OX) or the edge recombination crossover (ERX) which are able to inherit parent subtours to child solutions with only a rather small ratio of edges stemming from none of its own parents which is essential for GAs. A detailed description of these operators is given in Chapter 8.
1.5.4
Other Representations for Combinatorial Optimization Problems
Combinatorial optimization problems that are more in step with actual practice than the TSP require more complex problem representations, which makes it even more diﬃcult for the designer of genetic solution manipulation operators to construct crossover operators that fulﬁll the essential requirements. Challenging optimization tasks arise in the ﬁeld of logistics and production planning optimization where the capacitated vehicle routing problem with (CVRPTW, [Tha95]) and without time windows (CVRP, [DR59]) as well as the job shop scheduling problem (JSSP [Tai93]) denote abstracted standard formulations which are used for the comparison of optimization techniques on the basis of widely available standardized benchmark problems. Tabu search [Glo86] and genetic algorithms are considered the most powerful optimization heuristics for these rather practical combinatorial optimization problems [BR03]. Cheng et al. as well as Yamada and Nakano give a comprehensive review of problem representations and corresponding operators for applying Genetic Algorithms to the JSSP in [CGT99] and [YN97], respectively. For the CVRP, Br¨ aysy and Gendreau give a detailed overview about the application of local search algorithms in [BG05a] and about the application of metaheuristics in [BG05b]; concrete problem representations and crossover operators for GAs are outlined in [PB96] and [Pri04]. Furthermore, the application of extended GA concepts to the CVRP will be covered in the practical part of this book within Chapter 10.
© 2009 by Taylor & Francis Group, LLC
14
1.5.5
Genetic Algorithms and Genetic Programming
Problem Representations for RealValued Encoding
When using realvalued encoding, a solution candidate is represented as a realvalued vector in which the dimension of the chromosomes is constant and equal to the dimension of the solution vectors. Crossover concepts are distinguished into discrete and continuous recombination where the discrete variants copy the exact allele values of the parent chromosomes to the child chromosome whereas the continuous variants perform some kind of averaging. Mutation operators for realvalued encoding either slightly modify all positions of the gene or introduce major changes to only some (often just one) position. Often a mixture of diﬀerent crossover and mutation techniques leads to the best results for realvalued GAs. A comprehensive review of crossover and mutation techniques including also more sophisticated techniques like multiparent recombination is given in [DLJD00]. Although realvalued encoding is a problem representation which is especially suited for evolution strategies or particle swarm optimization rather than for GAs, a lot of operators have been established also for GAs which are quite similar to modern implementations of ES that make use of recombination [Bey01]. Realvalued encoding for GAs distinguishes itself from typical discrete representations for combinatorial optimization problems in that point that the evolvement of longer and longer building block sequences in terms of adjacent alleles is of minor or no importance. Nevertheless, GAbased techniques like oﬀspring selection have proven to be a very powerful optimization technique also for this kind of problem representation especially in case of highly multimodal ﬁtness landscapes [AW05].
1.6
GA Theory: Schemata and Building Blocks
Researchers working in the ﬁeld of GAs have put a lot of eﬀort into the analysis of the genetic operators (crossover, mutation, selection). In order to achieve better analysis and understanding, Holland has introduced a construct called schema [Hol75]: Under the assumption of a canonical GA with binary string representation of individuals, the symbol alphabet {0,1,#} is considered where {#}(don’t care) is a special wild card symbol that matches both, 0 and 1. A schema is a string with ﬁxed and variable symbols. For example, the schema [0#11#01] is a template that matches the following four strings: [0011001], [0011101], [0111001], and [0111101]. The symbol # is never actually manipulated by the genetic algorithm; it is just a notational device that makes it easier to talk about families of strings. Essentially, Holland’s idea was that every evaluated string actually gives partial information about the ﬁtness of the set of possible schemata of which
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
15
the string is a member. Holland analyzed the inﬂuence of selection, crossover, and mutation on the expected number of schemata, when going from one generation to the next. A detailed discussion of related analysis can be found in [Gol89]; in the context of the present work we only outline the main results and their signiﬁcance. Assuming ﬁtness proportional replication, the number m of individuals of the population belonging to a particular schema H at time t + 1 is related to the same number at the time t as m(H, t + 1) = m(H, t)
fH (t) f (t)
(1.1)
where fH (t) is the average ﬁtness value of the string representing schema H, while f (t) is the average ﬁtness value over all strings within the population. Assuming that a particular schema remains above the average by a ﬁxed amount cf (t) for a number t of generations, the solution of the equation given above can be formulated as the following exponential growth equation: m(H, t) = m(H, 0)(1 + c)t
(1.2)
where m(H, 0) stands for the number of schemata H in the population at time 0, c denotes a positive integer constant, and t ≥ 0. The importance of this result is the exponentially increasing number of trials to above average schemata. The eﬀect of crossover which breaks strings apart (at least in the case of canonical genetic algorithms) is that they reduce the exponential increase by a quantity that is proportional to the crossover rate pc and depends on the deﬁning length δ of a schema on the string of length l: δ(H) (1.3) l−1 The deﬁning length δ of a schema is the distance between the ﬁrst and the last ﬁxed string position. For example, for the schema [###0#0101] δ = 9 − 4 = 5. Obviously, short deﬁning length schemata are less likely to be disrupted by a single point crossover operator. The main result is that above average schemata with short deﬁning lengths will still be sampled at an exponential increasing rate. These schemata with above average ﬁtness and short deﬁning length are the socalled building blocks and play an important role in the theory of genetic algorithms. The eﬀects of mutation are described in a rather straightforward way: If the bit mutation probability is pm , then the probability of survival of a single bit is 1 − pm ; since single bit mutations are independent, the total survival probability is therefore (1 − pm )l with l denoting the string length. But in the context of schemata only the ﬁxed, i.e., nonwildcard, positions matter. This number is called the order o(H) of schema H and equals to l minus the number of “don’t care” symbols. Then the probability of surviving a mutation for a pc
© 2009 by Taylor & Francis Group, LLC
16
Genetic Algorithms and Genetic Programming
certain schema H is (1 − pm )o(H) which can be approximated by 1 − o(H)pm for pm ≪ 1. Summarizing the described eﬀects of mutation, crossover, and reproduction, we end up with Holland’s well known schema theorem [Hol75]: m(H, t + 1) ≥ m(H, t)
δ(H) fH (t) [1 − pc − o(H)pm ] l−1 f (t)
(1.4)
The result essentially says that the number of short schemata with low order and above average quality grows exponentially in subsequent generations of a genetic algorithm. Still, even if the schema theorem is a very important result in GA theory, it is obtained under idealized conditions that do not hold for most practical GA applications. Both the individual representation and the genetic operators are often diﬀerent from those used by Holland. The building block hypothesis has been found reliable in many cases but it also depends on the representation and on the genetic operators. Therefore, it is easy to ﬁnd or to construct problems for which it is not veriﬁed. These socalled deceptive problems are studied in order to ﬁnd out the inherent limitations of GAs, and which representations and operators can make them more tractable. A more detailed description of the underlying theory can for instance be found in [Raw91] or [Whi93]. The major drawback of the building block theory is given by the fact that the underlying GA (binary encoding, proportional selection, singlepoint crossover, strong mutation) is applicable only to very few problems as it requires more sophisticated problem representations and corresponding operators to tackle challenging realworld problems. Therefore, a more general theory is an intense topic in GA research since its beginning. Some theoretically interesting approaches like the forma theory of Radcliﬀe and Surry [RS94], who consider a socalled forma as a more general schema for arbitrary representations, state requirements to the operators, which cannot be fulﬁlled for practical problems with their respective constraints. By the end of the last millennium, Stephens and Waelbroeck ([SW97], [SW99]) developed an exact GA schema theory. The main idea is to describe the total transmission probability α of a schema H so that α(H, t) is the probability that at generation t the individuals of the GA’s population will match H (for a GA working on ﬁxedlength bit strings). Assuming a crossover probability pxo , α(H, t) is calculated as1 : α(H, t) = (1 − pxo )p(H, t) +
N −1 pxo X p(L(H, i), t)p(R(H, i), t) N − 1 i=1
(1.5)
with L(H, i) and R(H, i) being the left and right parts of schema H, respectively, and p(H, t) the probability of selecting an individual matching H to 1 We
here give the slightly modiﬁed version as stated in [LP02]; it is equivalent to the results in [SW97] and [SW99] assuming pm = 0.
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
17
become a parent. The “left” part of a schema H is thereby produced by replacing all elements of H at the positions from the given index i to N with “don’t care” symbols (with N being the length of the bit strings); the “right” part of a schema H is produced by replacing all elements of H from position 1 to i with “don’t care.” The summation is over all positions from 1 to N − 1, i.e., over all possible crossover points. Stephens later generalized this GA schema theory to variablelength GAs; see for example [SPWR02]. Keeping in mind that the ultimate goal of any heuristic optimization technique is to approximately and eﬃciently solve highly complex realworld problems rather than stating a mathematically provable theory that holds only under very restricted conditions, our intention for an extended building block theory is a not so strict formulation that in return can be interpreted for arbitrary GA applications. At the same time, the enhanced variants of genetic algorithms and genetic programming proposed in this book aim to support the algorithms in their intention to operate in the sense of an extended building block interpretation discussed in the following chapters.
1.7
Parallel Genetic Algorithms
The basic idea behind many parallel and distributed programs is to divide a task into partitions and solve them simultaneously using multiple processors. This divideandconquer approach can be used in diﬀerent ways, and leads to diﬀerent methods to parallelize GAs where some of them change the behavior of the GA whereas others do not. Some methods (as for instance ﬁnegrained parallel GAs) can exploit massively parallel computer architectures, while others (coarsegrained parallel GAs, e.g.) are better qualiﬁed for multicomputers with fewer and more powerful processing elements. Detailed descriptions and classiﬁcations of distributed GAs are given in [CP01], [CP97] or [AT99] and [Alb05]; the scalability of parallel GAs is discussed in [CPG99]. A further and newer variant of parallel GAs which is based on oﬀspring selection (see Chapter 4) is the socalled SASEGASA algorithm which is discussed in Chapter 5. In a rough classiﬁcation, parallel GA concepts established in GA textbooks (as for example [DLJD00]) can be classiﬁed into global parallelization, coarsegrained parallel GAs, and ﬁnegrained parallel GAs, where the most popular model for practical applications is the coarsegrained model, also very well known as the island model.
© 2009 by Taylor & Francis Group, LLC
18
Genetic Algorithms and Genetic Programming Master Slaven Slave1
… Slave2
Slave4 Slave3
FIGURE 1.3: Global parallelization concepts: A panmictic population structure (shown in left picture) and the corresponding master–slave model (right picture).
1.7.1
Global Parallelization
Similar to the sequential GA, in the context of global parallelization there is only one single panmictic2 population and selection considers all individuals, i.e., every individual has a chance to mate with any other. The behavior of the algorithm remains unchanged and the global GA has exactly the same qualitative properties as a sequential GA. The most common operation that is parallelized is the evaluation of the individuals as the calculation of the ﬁtness of an individual is independent from the rest of the population. Because of this the only necessary communication during this phase is in the distribution and collection of the workload. One master node executes the GA (selection, crossover, and mutation), and the evaluation of ﬁtness is divided among several slave processors. Parts of the population are assigned to each of the available processors, in that they return the ﬁtness values for the subset of individuals they have received. Due to their centered and hierarchical communication order, global parallel GAs are also known as singlepopulation master–slave GAs. Figure 1.3 shows the population structure of a master–slave parallel GA: This panmictic GA has all its individuals (indicated by the black spots) in the same population. The master stores the population, executes the GA operations, and distributes individuals to the slaves; the slaves compute the ﬁtness of the individuals. As a consequence, global parallelization can be eﬃcient only if the bottleneck in terms of runtime consumption is the evaluation of the ﬁtness function. Globally parallel GAs are quite easy to implement, and they can be a quite eﬃcient method of parallelization if the evaluation requires considerable computational eﬀort compared to the eﬀort required for the operations carried out by the master node. However, they do not inﬂuence the qualitative properties of the corresponding sequential GA.
2 In general, a population is called panmictic when all individuals are possible mating partners.
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
1.7.2
19
CoarseGrained Parallel GAs
Migration direction
Island Model
FIGURE 1.4: Population structure of a coarsegrained parallel GA.
In the case of a coarsegrained parallel GA, the population is divided into multiple subpopulations (also called islands or demes) that evolve mostly isolated from each other and only occasionally exchange individuals during phases called migration. This process is controlled by several parameters which will be explained later in Section 1.7.4. In contrast to the global parallelization model, coarsegrained parallel GAs introduce fundamental changes in the structure of the GA and have a diﬀerent behavior than a sequential GA. Coarsegrained parallel GAs are also known as distributed GAs because they are usually implemented on computers with distributed memories. Literature also frequently uses the notation “island parallel GAs” because there is a model in population genetics called the island model that considers relatively isolated demes. Figure 1.4 schematically shows the design of a coarsegrained parallel GA: Each circle represents a simple GA, and there is (infrequent) communication between the populations. The qualitative performance of a coarsegrained parallel GA is inﬂuenced by the number and size of its demes and also by the information exchange between them (migration). The main idea of this type of parallel GAs is that relatively isolated demes will converge to diﬀerent regions of the solutionspace, and that migration and recombination will combine the relevant solution parts [SWM91]. However, at present there is only one model in the theory of coarsegrained parallel GAs that considers the concept of selection pressure for recombining the favorable attributes of solutions evolved in the diﬀerent demes, namely the SASEGASA algorithm (which will be described later in Chapter 5). Coarsegrained parallel GAs are the most frequently used parallel GA concept, as they are quite easy to implement and are a natural extension to the general concept of sequential GAs making use of commonly available cluster computing facilities.
© 2009 by Taylor & Francis Group, LLC
20
1.7.3
Genetic Algorithms and Genetic Programming
FineGrained Parallel GAs
FIGURE 1.5: Population structure of a ﬁnegrained parallel GA; the special case of a cellular model is shown here.
Finegrained models consider a large number of very small demes; Figure 1.5 sketches a ﬁnegrained parallel GA. This class of parallel GAs has one spatially distributed population; it is suited for massively parallel computers, but it can also be implemented on other supercomputing architectures. A typical example is the diﬀusion model [M¨ uh89] which represents an intrinsic parallel GAmodel.
The basic idea behind this model is that the individuals are spread throughout the global population like molecules in a diﬀusion process. Diﬀusion models are also called cellular models. In the diﬀusion model a processor is assigned to each individual and recombination is restricted to the local neighborhood of each individual.
A recent research topic in the area of parallel evolutionary computation is the combination of certain aspects of the diﬀerent population models resulting in socalled hybrid parallel GAs. Most of the hybrid parallel GAs are coarsegrained at the upper level and ﬁnegrained at the lower levels. Another way to hybridize parallel GAs is to use coarsegrained GAs at the high as well as at the low levels in order to force stronger mixing at the low levels using high migration rates and a low migration rate at the high level [CP01]. Using this strategy, computer cluster environments at diﬀerent locations can collectively work on a common problem with only little communication overhead (due to the low migration rates at the high level).
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
1.7.4
21
Migration
Especially for coarsegrained parallel GAs the concept of migration is considered to be the main success criterion in terms of achievable solution quality. The most important parameters for migration are: • The communication topology which deﬁnes the interconnections between the subpopulations (demes) • The migration scheme which controls which individuals (best, random) migrate from one deme to another and which individuals should be replaced (worst, random, doubles) • The migration rate which determines how many individuals migrate • The migration interval or migration gap that determines the frequency of migrations The most essential question concerning migration is when and to which extent migration should take place. Much theoretical work considering this has already been done; for a survey of these eﬀorts see [CP97] or [Alb05]. It is very usual for parallel GAs that migration occurs synchronously meaning that it occurs at predetermined constant intervals. However, synchronous migration is known to be slow and ineﬃcient in some cases [AT99]. Asynchronous migration schemes perform communication between demes only after speciﬁc events. The migration rate which determines how many individuals undergo migration at every exchange can be expressed as a percentage of the population size or as an absolute value. The majority of articles in this ﬁeld suggest migration rates between 5% and 20% of the population size. However, the choice of this parameter is considered to be very problem dependent [AT99]. A recent overview of various migration techniques is given in [CP01]. Recent theory of selfadaptive selection pressure steering (see Chapters 4 and 5) plays a major role in defying the conventions of recent parallel GAtheory. Within these models it becomes possible to detect local premature convergence, i.e., premature convergence in a certain deme. Thus, local premature convergence can be detected independently in all demes, which should give a high potential in terms of eﬃciency especially for parallel implementations. Furthermore, the fact that selection pressure is adjusted selfadaptively with respect to the potential of genetic information stored in the certain demes makes the concept of a parallel GA much more independent in terms of migration parameters (see [Aﬀ05] and Chapter 5).
© 2009 by Taylor & Francis Group, LLC
22
1.8
Genetic Algorithms and Genetic Programming
The Interplay of Genetic Operators
In order to allow an eﬃcient performance of a genetic algorithm, a beneﬁcial interplay of exploration and exploitation should be possible. Critical factors for this interplay are the genetic operators selection, crossover, and mutation. The job of crossover is to advantageously combine alleles of selected (above average) chromosomes which may stem from diﬀerent regions of the search space. Therefore, crossover is considered to rather support the aspect of breadth search. Mutation slightly modiﬁes certain chromosomes at times and thus brings new alleles into the gene pool of a population in order to avoid stagnation. As mutation modiﬁes the genetic makeup of certain chromosomes only slightly it is primarily considered as a depth search operator. However, via mutation newly introduced genetic information does also heavily support the aspect of breadth search if crossover is able to “transport” this new genetic information to other chromosomes in other search space regions. As we will show later in this book, this aspect of mutation is of prime importance for an eﬃcient functioning of a GA. The aspect of migration in coarsegrained parallel GAs should also be mentioned in our considerations about the interplay of operators. In this kind of parallel GAs, migration functions somehow like a metamodel of mutation introducing new genetic information into certain demes at the chromosomelevel whereas mutation introduces new genetic information at the allele level. Concerning migration, a welladjusted interplay between breadth and depth search is aimed to function in the way that breadth search is supported in the intramigration phases by allowing the certain demes to drift to diﬀerent regions of the search space until a certain stage of stagnation is reached; the demes have expanded over the search space. Then migration comes into play by introducing new chromosomes stemming from other search space regions in order to avoid stagnation in the certain demes; this then causes the demes to contract again slightly which from a global point of view tends to support the aspect of depth search in the migration phases. The reason for this is that migration causes an increase of genetic diversity in the speciﬁc demes on the one hand, but on the other hand it decreases the diversity over all islands. This global loss of genetic diversity can be interpreted as an exploitation of the search space. This overall strategy is especially beneﬁcial in case of highly multimodal search spaces as it is the case for complex combinatorial optimization problems.
© 2009 by Taylor & Francis Group, LLC
Simulating Evolution: Basics about Genetic Algorithms
1.9
23
Bibliographic Remarks
There are numerous books, journals, and articles available that survey the ﬁeld of genetic algorithms. In this section we summarize some of the most important ones. Representatively, the following books are widely considered very important sources of information about GAs (in chronological order): • J. H. Holland: Adaptation in Natural and Artiﬁcial Systems [Hol75] • D. E. Goldberg: Genetic Algorithms in Search, Optimization and Machine Learning [Gol89] • Z. Michalewicz: Genetic Algorithms + Data Structures = Evolution Programs [Mic92] • D. Dumitrescu et al.: Evolutionary Computation [DLJD00] The following journals are dedicated to either theory and applications of genetic algorithms or evolutionary computation in general: • IEEE Transactions on Evolutionary Computation (IEEE) • Evolutionary Computation (MIT Press) • Journal of Heuristics (Springer) Moreover, several conference and workshop proceedings include papers related to genetic and evolutionary algorithms and heuristic optimization. Some examples are the following ones: • Genetic and Evolutionary Computation Conference (GECCO), a recombination of the International Conference on Genetic Algorithms and the Genetic Programming Conference • Congress on Evolutionary Computation (CEC) • Parallel Problem Solving from Nature (PPSN) Of course there is a lot of GArelated information available on the internet including theoretical background and practical applications, course slides, and source code. Publications of the Heuristic and Evolutionary Algorithms Laboratory (HEAL) (including several articles on GAs and GP) are available at http://www.heuristiclab.com/publications/.
© 2009 by Taylor & Francis Group, LLC
Chapter 2 Evolving Programs: Genetic Programming
In the previous chapter we have summarized and discussed genetic algorithms; it has been illustrated how this kind of algorithms is able to produce high quality results for a variety of problem classes. Still, a GA is by itself not able to handle one of the most challenging tasks in computer science, namely getting a computer to solve problems without programming it explicitly. As Arthur Samuel stated in 1959 [Sam59], this central task can be formulated in the following way: How can computers be made to do what needs to be done, without being told exactly how to do it? In this chapter we give a compact description and discussion of an extension of the genetic algorithm called genetic programming (GP). Similar to GAs, genetic programming works on populations of solution candidates for a given problem and is based on Darwinian principles of survival of the ﬁttest (selection), recombination (crossover), and mutation; it is a domainindependent, biologically inspired method that is able to create computer programs from a highlevel problem statement.1 Research activities in the ﬁeld of genetic programming started in the 1980s; still, it took some time until GP was widely received by the computer science community. Since the beginning of the 1990s GP has been established as a humancompetitive problem solving method. The main factors for its widely accepted success in the academic world as well as in industries can be summarized in the following way [Koz92b]: • Virtually all problems in artiﬁcial intelligence, machine learning, adaptive systems, and automated learning can be recast as a search for computer programs, and • genetic programming provides a way to successfully conduct the search in the space of computer programs. 1 Please note that we here in general see computer programs as entities that receive inputs, perform computations, and produce output.
25 © 2009 by Taylor & Francis Group, LLC
26
Genetic Algorithms and Genetic Programming In the following we • give an overview of the main ideas and foundations of genetic programming in Sections 2.1 and 2.2, • summarize basic steps of the GPbased problem solving process (Section 2.3), • report on typical application scenarios (Section 2.4), • explain theoretical foundations (GP schema theories, Section 2.5), • discuss current GP challenges and research areas in Section 2.6, • summarize this chapter on GP in Section 2.7, and ﬁnally • refer to a range of outstanding literature in the ﬁeld of theory and praxis of GP in Section 2.8.
2.1
Introduction: Main Ideas and Historical Background
As has already been mentioned, one of the central tasks in artiﬁcial intelligence is to make computers do what needs to be done without telling them exactly how to do it. This does not seem to be unnatural since it demands of computers to mimic the human reasoning process  humans are able to learn what needs to be done, and how to do it. In short, interactions of networks of neurons are nowadays believed to be the basis of human brain information processing; several of the earliest approaches in artiﬁcial intelligence aimed at imitating this structure using connectionist models and artiﬁcial neural networks (ANNs, [MP43]). Suitable network training algorithms enable ANNs to learn and generalize from given training examples; ANNs are in fact a very successful distributed computation paradigm and are frequently used in realworld applications where exact algorithmic approaches are too diﬃcult to implement or even not known at all. Pattern recognition, classiﬁcation, databased modeling (regression) are some examples of AI areas in which ANNs have been applied in numerous ways. Unlike this networkbased approach, genetic algorithms were developed using main principles of natural evolution. As has been explained in Chapter 1, GAs are populationbased optimization algorithms that imitate natural evolution: Starting with a primordial ooze of thousands of randomly created solution candidates appropriate to the respective problem, populations of solutions are progressively evolved over many generations using the Darwinian principles.
© 2009 by Taylor & Francis Group, LLC
Evolving Programs: Genetic Programming
27
Similar to the GA, GP is an evolutionary algorithm inspired by biological evolution to ﬁnd computer programs that perform a userdeﬁned computational task. It is therefore a machine learning technique used to optimize a population of computer programs according to a ﬁtness landscape determined by a program’s ability to perform the given task; it is a domainindependent, biologically inspired method that is able to create computer programs from a highlevel problem statement (with computer programs being here deﬁned as entities that receive inputs, perform computations, and produce output). The ﬁrst research activities in the context of GP have been reported in the early 1980s. For example, Smith reported on a learning system based on GAs [Smi80], and in [For81] Forsyth presented a computer package producing decisionrules (i.e., small computer programs) in forensic science for the UK police by induction from a database (where these rules are Boolean expressions represented by tree structures). In 1985, Cramer presented a representation for the adaptive generation of simple sequential programs [Cra85]; it is widely accepted that this article on genetic programming is the ﬁrst paper to describe the treelike representation and operators for manipulating programs by genetic algorithms. Even though there was noticeable research activity in the ﬁeld of GP going on by the middle of the 1980s, still it took some time until GP was widely received by the computer science community. GP is very intensive from a computational point of view and so it was mainly used to solve relatively simple problems until the 1990s. But thanks to the enormous growth in CPU power that has been going on since the 1980s, the ﬁeld of applications for GP has been extended immensely yielding human competitive results in areas such as databased modeling, electronic design, game playing, sorting, searching, and many more; examples (and respective references) are going to be given in the following sections. One of the most important GP publications was “Genetic Programming: On the Programming of Computers by Means of Natural Selection” [Koz92b] by John R. Koza, professor for computer science and medical informatics at Stanford University who has since been one of the main proponents of the GP idea. Based on extensive theoretical background as well as test results in many diﬀerent problem domains he demonstrated GP’s ability to serve as an automated invention machine producing novel and outstanding results for various kinds of problems. By now there have been three more books on GP by Koza (and his team), but also several other very important publications (for example by Banzhaf, Langdon, Poli and many others); a short summary is given in Section 2.8. Along with these ad hoc engineering approaches there was an increasing interest in how and why GP works. Even though GP was applied successfully for solving problems in various areas, the development of a GP theory was considered rather diﬃcult even through the 1990s. Since the early 2000s it has ﬁnally been possible to establish a theory of GP showing a rapid development since then. A book that has to be mentioned in this context is clearly
© 2009 by Taylor & Francis Group, LLC
28
Genetic Algorithms and Genetic Programming
“Foundations of Genetic Programming” [LP02] by Langdon and Poli since it presents exact GP schema analysis. As we have now summarized the historical background of GP, it is now high time to describe how it really works and how typical applications are designed. This is exactly what the reader can ﬁnd in the following sections.
2.2
Chromosome Representation
As in the context of any GAbased problem solving process, the representation of problem instances and solution candidates is a key issue also in genetic programming. On the one hand, the representation scheme should enable the algorithm to ﬁnd suitable solutions for the given problem class, but on the other hand the algorithm should be able to directly manipulate the coded solution representation. The use of ﬁxedlength strings (of bits, characters, or integers, e.g.) enables the conventional GA to solve a huge amount of problems and also allows the construction of a solid theoretical foundation, namely the schema theorem. Still, in the context of GP the most natural representation for a solution is a hierarchical computer program of variable size [Koz92b].
2.2.1 2.2.1.1
Hierarchical Labeled Structure Trees Basics
So, how can hierarchical computer programs be represented? The representation that is most common in literature and is used by Koza ([Koz92b], [Koz94], [KIAK99], [KKS+ 03b]), Langdon and Poli ([LP02]), and many other authors is the pointlabeled structure tree. Originally, these structure trees were for example seen as graphical representations of socalled Sexpressions of the programming language LISP ([McC60], [Que03], [WH87]) which have for example been used by Koza in [Koz92b] and [Koz94].2 Here we do not strictly stick to LISPsyntax for the examples given, but the main paradigms of Sexpressions are used. The following key facts are relevant in the context of structure tree based genetic programming: • All treenodes are either functions or terminals. 2 In fact, of course, any higher programming language is suitable for implementing a GPframework and for representing hierarchical computer programs. Koza, for example, switched to the C programming language as described in [KIAK99], and the HeuristicLab framework and the GPimplementation, which is realized as plugins for it, are programmed in C# using the .NET framework  this is to be explained in further detail later.
© 2009 by Taylor & Francis Group, LLC
Evolving Programs: Genetic Programming
29
• Terminals are evaluated directly, i.e., their return values can be calculated and returned immediately. • All functions have child nodes which are evaluated before using the children’s calculated return values as inputs for the parents’ evaluation. • The probably most convenient string representation is the preﬁx notation, also called Polish or Lukasiewicz3 notation: Function nodes are given before the child nodes’ representations (optionally using parentheses). Evaluation is executed recursively, depthﬁrst way, starting from the left; operators are thus placed to the left of their operands. In case of ﬁxed arities of the functions (i.e., if the numbers of function’s inputs is ﬁxed and known), no parentheses or brackets are needed. In a more formal way this program representation structure schema can be summarized as follows [ES03]: • Symbolic expressions can be deﬁned using – a terminal set T , and – a function set F . • The following general recursive deﬁnition is applied: – Every t ∈ T is a correct expression, – f (e1 , . . . , en ) is a correct expression if f ∈ F , arity(f ) = n and e1 , . . . , en are correct expressions, and – there are no other forms of correct expressions. • In general, expressions in GP are not typed (closure property: any f ∈ F can take any g ∈ F as argument). Still, as we see in the discussion of genetic operators in Section 2.2.1.3, this might be not true in certain cases depending on the function and terminal sets chosen. In the following we give exemplary simple programs. We thereby give conventional as well as preﬁx (not exactly following LISP notation) textual notations: • (a) IF (Y>X OR Y(Y,X),(Y,X),(3,7), ti,k and as class i if ej < ti,k ; f reqa is the frequency of class a, i.e., the number of samples that are originally classiﬁed as belonging to class a. The thresholds’ contribution to the classiﬁer’s ﬁtness, thresh, can be now calculated in two diﬀerent ways: We can consider the minimum sum of punishments for each pair of contiguous classes as
thresh =
c−1 X
mink∈[1;m] p(i, k)
(11.35)
i=1
or consider all thresholds which are weighted using threshold weights tw1...m as
thresh =
c−1 X i=1
1 Pm
k=1
twk
m X
p(i, k) · twk
(11.36)
k=1
Normally, we deﬁne the threshold weights tw using minimum and maximum weights, weighting the thresholds at near to the original class values minimally
© 2009 by Taylor & Francis Group, LLC
262
Genetic Algorithms and Genetic Programming
and those in the “middle” maximally: tw1 = twmin , twm = twmin ; twrange = twmax − twmin l = m/2 twl = twl+1 = twmax m mod 2 = 0 ⇒ ∀(i ∈ [2; l − 1]) : twrange · (i − 1) twi = twmin + l−1 ∀(i ∈ [l + 1; m − 1]) : twm−i+1 = twi l = (m + 1)/2 twl = twmax m mod 2 = 1 ⇒ ∀(i ∈ [2; l − 1]) : twrange · (i − 1) twi = twmin + (m−1)/2 ∀(i ∈ [l + 1; m − 1]) : twm−i+1 = twi
(11.37)
(11.38)
(11.39)
(M)ROC Analysis
Finally, we also consider the area under the (M)ROC curves as described in Section 11.2.3.3: For each class we calculate the AUC values for ROC curves and sets of MROC curves (with a given number of thresholds checked for each class), and then we can either use the average AUC or the maximum AUC for each class weighted with the weighting factors already mentioned before: Pc AvgAU C(Ci ) · wi : consider average AUCs auc = Pi=1 c i=1 M axAU C(Ci ) · wi : consider maximum AUCs
11.2.4.4
(11.40)
Combined Classifier Evaluation
As we have now compiled all information needed for estimating the quality of a classiﬁer model in GP, CLASS, we calculate the ﬁnal overall quality using respective weighting factors: a1 a2 a3 a4 a5 a6 a7 a8
= mee · c1 = vaf · c2 = r 2 · c3 = errormin · c4 = errormax · c5 = cr · c6 = thresh · c7 = auc · c8 P
CLASS(o, e) =
8
i=1 P 8
ai ·ci ci
i=1
© 2009 by Taylor & Francis Group, LLC
(c1 (c2 (c3 (c4 (c5 (c6 (c7 (c8
= wmee ) = wvaf ) = wr2 ) = werrmin ) = werrmax ) = wcr ) = wthresh ) = wauc )
(11.41)
DataBased Modeling with Genetic Programming
11.2.5 11.2.5.1
263
Application Example: Medical Data Analysis Benchmark Data Sets
For testing GPbased training of classiﬁers here we have picked the following data sets: The Wisconsin Breast Cancer, the Melanoma, and the Thyroid data sets. • The Wisconsin data set is a part of the UCI machine learning repository5 . In short, it represents medical measurements which were recorded while investigating patients potentially suﬀering from breast cancer. The number of features recorded is 9 (all being continuous numeric ones); the ﬁle version we have used contains 683 recorded examples (by now, 699 examples are already available since the data base is updated regularly). • The Thyroid data set represents medical measurements which were recorded while investigating patients potentially suﬀering from hypoor hyperthyroidism; this data set has also been taken from the UCI repository. In short, the task is to determine whether a patient is hypothyroid or not. Three classes are formed: Euthyroid (the state of having normal thyroid gland function), hyperthyroid (overactive thyroid), and hypothyroid (underactive thyroid). In total, the data set contains 7200 samples. The samples of the Thyroid data set are not equally distributed to the three given classes; in fact, 166 samples belong to class “1” (“subnormal functioning”), 368 samples are classiﬁed as “2” (“hyperfunction”), and the remaining 6666 samples belong to class “3” (“euthyroid”); a good classiﬁer therefore has to be able to correctly classify signiﬁcantly more than 92% of the samples simply because 92 percent of the patients are not hypo or hyperthyroid. 21 attributes (15 binary and 6 continuous ones) are stored in this data set. • The Melanoma data set represents medical measurements which were recorded while investigating patients potentially suﬀering from skin cancer. It contains 1311 examples for which 30 features have been recorded; each of the 1311 samples represents a pigmented skin lesion which has to be classiﬁed as a melanoma or a nonhazardous nevus. This data set has been provided to us by Prof. Dr. Michael Binder from the Department of Dermatology at the Medical University Vienna, Austria. A comparison of machine learning methods for the diagnosis of pigmented skin lesions (i.e., detecting skin cancer based on the analysis of visual data) can be found in [DOMK+ 01]; in this paper the authors describe the quality of classiﬁers produced for a comparable data 5 http://www.ics.uci.edu/~mlearn/.
© 2009 by Taylor & Francis Group, LLC
264
Genetic Algorithms and Genetic Programming
Table 11.9: Set of function and terminal deﬁnitions for enhanced GPbased classiﬁcation. Functions Name Arity Description + 2 Addition 2 Multiplication ∗ 2 Subtraction 2 Division / 1 Exponential Function ex 3 If [Arg0] then return [Then] branch ([Arg1]), IF otherwise return [Else] branch ([Arg2]) 2 Less or equal, greater or equal ≤, ≥ 2 Logical AND, logical OR &&,  Terminals Name Parameters Description var x, c Value of attribute x multiplied with coeﬃcient c d A constant double value d const
collection using kNN classiﬁcation, ANNs, decision trees, and SVMs. The diﬀerence is that in the data collection used in [DOMK+ 01] all lesions were separated into three classes (common nevi, dysplastic nevi, or melanoma); here we use data representing lesions that have been classiﬁed as benign or malign, i.e., we are facing a binary classiﬁcation problem. All three data sets were investigated via 10fold crossvalidation. This means that each original data set was divided into 10 disjoint sets of (approximately) equal size. Thus, 10 diﬀerent pairs of training (90% of the data) and test data sets (10% of the data) can be formed and used for testing the classiﬁcation algorithm. 11.2.5.2
Solution Candidate Representation Using Hybrid Tree Structures
The selection of the functions library is an important part of any GP modeling process because this library should be able to represent a wide range of systems; Table 11.9 gives an overview of the function set as well as the terminal nodes used for the classiﬁcation experiments documented here. As we can see in Table 11.9, mathematical functions and terminal nodes are used as well as Boolean operators for building complex arithmetic expressions. Thus, the concept of decision trees is included in this approach together with the standard structure identiﬁcation concept that tries to evolve nonlinear mathematical expressions. An example showing the structure tree representation
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
265
of a combined formula including arithmetic as well as logical functions is displayed in Figure 11.13.
FIGURE 11.13: An exemplary hybrid structure tree of a combined formula including arithmetic as well as logical functions.
11.2.5.3
Evaluation of Classification Models
There are several possible functions that can serve as ﬁtness functions within the GP process. For example, the ratio of misclassiﬁcations (using optimal thresholds) or the area under the corresponding ROC curves ([Zwe93], [Bra97]) could be used. Another function frequently used for quantifying the quality of models is the R2 function that takes into account the sum of squared errors as well as the sum of squared target values; an alternative, the socalled adjusted R2 function, is also utilized in many applications. We have decided to use a variant of the squared errors function for estimating the quality of a classiﬁcation model. There is one major diﬀerence of this modiﬁed mean squared errors function to the standard implementation of this function: The errors of predicted values that are lower than the lowest class value or greater than the greatest class value do not have a totally quadratic, but partially only linear contribution to the ﬁtness value. To be a bit more precise: Given N samples with original classiﬁcations oi divided into n classes
© 2009 by Taylor & Francis Group, LLC
266
Genetic Algorithms and Genetic Programming
c1 , ..., cn (with c1 being the lowest and cn the greatest class value), the ﬁtness value F of a classiﬁcation model producing the estimated classiﬁcation values ei is evaluated as follows: ∀(i ∈ [1, N ]) : (ei < c1 ) ⇒ fi = (oi − c1 )2 +  c1 − ei , (c1 ≤ ei ≤ cn ) ⇒ fi = (ei − oi )2 , (ei > cn ) ⇒ fi = (oi − cn )2 +  cn − ei 
F =
N 1 X fi N i=1
(11.42)
(11.43)
The reason for this is that values that are greater than the greatest class value or below the lowest value are anyway classiﬁed as belonging to the class having the greatest or the lowest class number, respectively; using a standard implementation of the squared error function would punish a formula producing such values more than necessary. 11.2.5.4
Finding Appropriate Class Thresholds: Dynamic Range Selection
Of course, a mathematical expression alone does not yet deﬁne a classiﬁcation model; thresholds are used for dividing the output into multiple ranges, each corresponding to exactly one class. These regions are deﬁned before starting the training algorithm in static range selection (SRS, see for example [LC05] for explanations), which brings along the diﬃculty of determining the appropriate range boundaries a priori. In the GPbased classiﬁcation framework discussed here we have therefore used dynamic range selection (DRS) which attempts to overcome this problem by evolving the range thresholds along with the classiﬁcation models: Thresholds are chosen so that the sum of classwise ratios of misclassiﬁcations for all given classes is minimized (on the training data, of course). In detail, let us consider the following: Given N (training) samples with original classiﬁcations oi divided into n classes c1 , . . . , cn (with c1 being the lowest and cn the greatest class value), models produced by GP can be in general used for calculating estimated values ei for all N samples. Assuming thresholds T = t1 , . . . , tn−1 (with cj < tj < cj+1 for j ∈ [1; n − 1]), each sample k is classiﬁed as eck : ek < t1 ⇒ eck (T ) = c1 tj < ek < tj+1 ⇒ eck (T ) = cj+1 ek > tn−1 ⇒ eck (T ) = cn
© 2009 by Taylor & Francis Group, LLC
(11.44) (11.45) (11.46)
DataBased Modeling with Genetic Programming
267
Thus, assuming a set of thresholds Tm , for each class ck we get the ratio of correctly classiﬁed samples crk as totalk (Tm ) = {a : (∀(x ∈ a) : ox = eck (Tm ))} correctk (Tm ) = {b : (∀(x ∈ b) : ox = eck (Tm ) ∧ ox = ck )} correctk (Tm ) crk (Tm ) = totalk (Tm )
(11.47) (11.48) (11.49)
The sum of ratios of correctly classiﬁed samples is – dependent on the set of thresholds Tm – calculated as cr(Tm ) =
n X
cri (Tm )
(11.50)
i=1
So, ﬁnally we can deﬁne the set of thresholds applied as that set Topt so that each other set of thresholds leads to equal or lower sums of classiﬁcation accuracies6 : Td 6= Topt ⇒ cr(Td ) ≤ cr(Topt )
(11.51)
These thresholds, that are optimal for the training samples, are ﬁxed and also applied on the test samples. Please note that this sum of classwise classiﬁcation accuracies is not equal to the total ratio of correctly classiﬁed samples which is used later on in Sections 11.2.5.5 and 11.2.5.8; the total classiﬁcation accuracy for a set of thresholds acc(Tm ) (assuming original and estimated values o and e) is deﬁned as z(Tm ) = {a(∀(x ∈ a) : ox = ecx (Tm ))} z acc(Tm ) = . N 11.2.5.5
(11.52) (11.53)
First Results, Identification of Optimal Operators and Parameter Settings
As ﬁrst reported in detail in [WAW07], during our thorough test series we have identiﬁed the following GPrelevant parameter settings as the best ones for solving classiﬁcation problem instances: • GPalgorithm: Enhanced GP using strict oﬀspring selection. • Mutation rate: 10% – 15%. • Population size: 500 – 2,000. 6 Please
note here that it could happen that more than one combination of thresholds can be optimal, simply because there could be more than one optimal threshold for any given pair of class values. This is why we here give an inequation in (11.51).
© 2009 by Taylor & Francis Group, LLC
268
Genetic Algorithms and Genetic Programming
• Selection operators: Whereas standard GA implementations use only one selection operator, the SASEGASA requires two, namely the socalled female selection operator as well as the male selection operator. Similar to our experience gained during the tests on the identiﬁcation of mechatronical systems, it seems to be the best to choose the roulettewheel selection in combination with the random selection operator. The reason for this is that apparently merging the genetic information of rather good individuals (models, formulas) with randomly chosen ones is the best strategy when using the SASEGASA for solving identiﬁcation problems. • Success ratio and selection pressure: As for instance described in [AW04b], there are some additional parameters of the SASEGASA regarding the selection of those individuals that are accepted to be a part of the next generation’s population. These are the success ratio and the maximal selection pressure that steer the algorithm’s behavior regarding oﬀspring selection. For model structure identiﬁcation tasks in general and especially in case of dealing with classiﬁcation problems, the following parameter settings seem to be the best ones: – Success ratio = 1.0, and – Maximum selection pressure = 100 – 500 (this value has to be deﬁned before starting a identiﬁcation process depending on other settings of the genetic algorithm used and the problem instance which is to be solved). As has already been explained in further detail in previous chapters, these settings have the eﬀect that in each generation only oﬀspring survive that are really better than their parent individuals (since the success ratio is set to 1.0, only better children are inserted into the next generation’s population). This is why the selection pressure becomes very high as the algorithm is executed, and therefore the maximum selection pressure has to be set to a rather high value (as, e.g., 100 or 500) to avoid premature termination. • Crossover operators: We have implemented and tested three different singlepoint crossover procedures for GPbased model structure identiﬁcation: One that exchanges rather big subtrees, one that is designed to exchange rather small structural parts (e.g., only one or two nodes), and one that replaces randomly chosen parts of the respective structure trees. Moreover, for each crossover operator we have also implemented an extended version that additionally randomly mutates all terminal nodes (i.e., manipulates the parameters of the represented formula). The following 6 structure identiﬁcation crossover operators are available: StandardSPHigh, StandardSPMedium, StandardSPLow, ExtendedSPHigh, ExtendedSPMedium, and ExtendedSPLow.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
269
Since arbitrarily many crossover operators can be selected when applying the SASEGASA7 , the task was not to ﬁnd out which operator can be used to produce the best results but rather which subset of operators is to be chosen. According to what we experienced, the following set of crossover operators should be applied: All three standard operators (StandardSPHigh, StandardSPMedium, and StandardSPLow) plus one of the extended ones, for instance ExtendedSPLow. • Mutation operators: The basic mutation operator for GP structure identiﬁcation we have implemented and tested, GAStandard, works as already described in Chapter 9: A function symbol could become another function symbol or be deleted; the value of a constant node or the index of a variable could be modiﬁed. Furthermore, we have also implemented an extended version (GAExtended) that additionally randomly mutates all terminal nodes (in analogy to the extended crossover operators). As the latest test series have shown, the choice of the crossover operators inﬂuences the decision which mutation operator to apply to the SASEGASA: If one of the extended crossover operators is selected, it seems to be best to choose the standard mutation operator. But if only standard crossover methods are selected, picking the extended mutation method yields the best results. Selected experimental results of the standard GP implementation and the SASEGASA algorithm for the Thyroid data set using various parameter settings are presented in Table 11.10. For each parameter settings version the 10fold cross validation test runs were executed, the resulting average results are listed. In all cases, the population size was 1000; furthermore, the following parameter settings were used: (1) crossover: roulette.
ExtendedSPMedium; mutation:
GAStandard; selection:
(2) crossover: roulette.
StandardSPMedium; mutation:
GAExtended; selection:
(3) crossover: all 6 available operators; mutation: GAExtended; selection: random and roulette (maximum selection pressure: 500). (4) crossover: all 6 available operators; mutation: GAStandard; selection: Random and roulette (maximum selection pressure: 500). 7 Using more than one crossover operator within the SASEGASA does not mean using a combination of several operators for creating one new solution, but rather in the following way: Every time a new child is to be produced using two parent individuals, one of the given crossover operators is chosen randomly; the chance of being applied is equal for each operator.
© 2009 by Taylor & Francis Group, LLC
270
Genetic Algorithms and Genetic Programming Table 11.10: Experimental results for the Thyroid data set. Using standard GP implementation Parameter Correct classiﬁcations Evaluation Prognosis settings (1) 92.80% 92.13% 93.91% 93.25% (2) Using the SASEGASA Parameter Correct classiﬁcations Evaluation Prognosis settings (3) 97.15% 96.34% 98.21% 98.07% (4) 97.70% 97.25% (5) 98.93% 98.53% (6)
(5) crossover: all 3 standard operators plus ExtendedSPLow; mutation: GAStandard; selection: roulette and roulette (maximum selection pressure: 500). (6) crossover: all 3 standard operators plus ExtendedSPLow; mutation: GAStandard; selection: random and roulette (maximum selection pressure: 500).
As an example, the model produced for cross validation partition 3 using the parameter settings combination (6) is shown in Figure 11.17.
These insights have been used also in the more extensive test series documented later on in this chapter. 11.2.5.6
Graphical Classifier Analysis
Graphical analysis can often help analyzing results achieved to any kind of problem; this is of course also the case in machine learning and in databased classiﬁcation. The most common and also simplest way how to illustrate classiﬁcation results is to plot the target values and the estimated values into one chart; Figure 11.14 shows a graphical representation of the best result obtained for the Thyroid data set, crossvalidation set 9. In Figure 11.15 we show 4 ROC chart examples that were generated for the classes ‘0’ and ‘2’ of the Thyroid data set, 10fold cross validation set number 9: (a) ROC curve for an unsuitable classiﬁer for class ‘2’, evaluated on training data;
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
271
Table 11.11: Summary of the best GP parameter settings for solving classiﬁcation problems. Parameter Optimal Value GP algorithm SASEGASA (single population, i.e., GP with oﬀspring selection) 10% – 15% Mutation rate 1,000 Population size 100 – 1,000 Maximum selection pressure Parent selection operators Random, roulette StandardSPLow, StandardSPMedium, Crossover StandardSPHigh, Operators ExtendedSPLow Mutation operator GAStandard Ratio of weighting the evaluation contributions (SumOfSquaredErrors : separability : class ranges) 4 : 1: 1
(b) ROC curve for the best identiﬁed classiﬁer for class ‘0’, evaluated on training data; (c) ROC curve for the best identiﬁed classiﬁer for class ‘0’, evaluated on test data; (d) ROC curve for the best identiﬁed classiﬁer for class ‘2’, evaluated on test data. In Figure 11.16 ﬁnally we show 4 MROC chart examples that were generated for the intermediate classes ‘1’ of the Thyroid data set, again on the basis of 10fold CVset number 9: (a) MROC curve for an unsuitable classiﬁer for class ‘1’, evaluated on training data; (b) MROC curve for an unsuitable classiﬁer for class ‘1’, evaluated on test data; (c) MROC curve for the best identiﬁed classiﬁer for class ‘1’, evaluated on training data; (d) MROC curve for the best identiﬁed classiﬁer for class ‘1’, evaluated on test data. On the webpage of this book8 interested readers can ﬁnd a collection of 10 example models (exactly one for each partition of the 10fold crossvalidation) 8 http://gagp2009.heuristiclab.com/.
© 2009 by Taylor & Francis Group, LLC
272
Genetic Algorithms and Genetic Programming
FIGURE 11.14: Graphical representation of the best result we obtained for the Thyroid data set, CVpartition 9: Comparison of original and estimated class values.
for the T hyroid data set, produced by GP; optimal thresholds are given as well as resulting confusion matrices for each data partition. 11.2.5.7
Classification Methods Applied in Detailed Test Series
For comparing GPbased classiﬁcation with other machine learning methods, the following techniques for training classiﬁers were examined: Genetic programming (enhanced approach using extended parents and oﬀspring selection), linear modeling, neural networks, the knearestneighbor method, and support vector machines. GPBased Training of Classifiers We have used the following parameter settings for our GP test series: • Single population approach; population size: 500 – 1000 • Mutation rate: 10% • Maximum formula tree height: 8 • Parent selection: Gender speciﬁc (random and roulette)
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
273
FIGURE 11.15: ROC curves and their area under the curve (AUC) values for classiﬁcation models generated for Thyroid data, CVset 9.
• Oﬀspring selection: Strict oﬀspring selection (success ratio as well as comparison factor set to 1.0) • 1elitism • Termination criteria: – Maximum number of generations: 1000; not reached, all executions were terminated via the – Maximum selection pressure: 100 • Function set: All functions as described in Table 11.9. • Fitness functions:
© 2009 by Taylor & Francis Group, LLC
274
Genetic Algorithms and Genetic Programming (c) [class 1, training]
True Classifications
True Classifications
(a) [class 1, training]
Avg. AUC: 0.3076477
Avg. AUC: 0.9721313
Max. AUC: 0.3647628
Max. AUC: 0.9981604
False Classifications
False Classifications
(d) [class 1, test]
True Classifications
True Classifications
(b) [class 1, test]
False Classifications
Avg. AUC: 0.3607539
Avg. AUC: 0.9740631
Max. AUC: 0.4361191
Max. AUC: 0.9976785 False Classifications
FIGURE 11.16: MROC charts and their maximum and average area under the curve (AUC) values for classiﬁcation models generated for Thyroid data, CVset 9.
– In order to keep the computational eﬀort low, the mean squared errors function with early abortion was used as ﬁtness function for the GP training process. – The eventual selection of models is done by choosing those models that perform best on validation data (or, if no validation samples are speciﬁed, then the models’ performance on training data is considered). For this selection we have used the classiﬁcation speciﬁc evaluation function described in Section 11.2: The mean squared error is considered as well as class ranges, thresholds qualities, and AUC values; all other possible contributions have been neglected in the test series reported and discussed here. Thus, c1 = 4.0, ck = 1.0 for k ∈ {6, 7, 8}, and ck = 0.0 for k ∈ {2, 3, 4, 5}.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
275



IF / THEN / ELSE
0.38 * Var16
2.648 * Var7
+
0.435
1.26 * Var19
+
=
0.298
0.964
0.736
0.32 * Var16
>=
1.883 * 0.142 * Var12 Var6
+
+

+
3.683 * Var20
0.04 * Var17
4.505
2.313
FIGURE 11.17: Graphical representation of a classiﬁcation model (formula), produced for 10fold cross validation partition 3 of the Thyroid data set.
In addition to splitting the given data into training and test data, extended GPbased training is implemented in such a way that a part of the given training data is not used for training models and serves as validation set; in the end, when it comes to returning classiﬁers, the algorithm returns those models that perform best on validation data. This approach has been chosen because it is assumed to help to cope with overﬁtting; it is also applied in other GPbased machine learning algorithms as for example described in [BL04]. In fact, this was also done in our standard GP tests for the Melanoma data set. Linear Modeling Given a data collection including m input features storing the information about N samples, a linear model is deﬁned by the vector of coeﬃcients θ1...m . For calculating the vector of modeled values e using the given input values matrix u1...m , these input values are multiplied with the corresponding coeﬃcients and added: e = u1...m ∗ θ
(11.54)
The vector of coeﬃcients can be computed by simply applying matrix division. For conducting the test series documented here we have used the matrix R division function provided by MATLAB : theta = InputValues \ TargetValues; If a constant additive factor is to be included into the model (i.e., the coeﬃcients vector), this command has to be extended:
© 2009 by Taylor & Francis Group, LLC
276
Genetic Algorithms and Genetic Programming r = size(InputValues,1); theta = [InputValues ones(r,1)] \ TargetValues;
Theoretical background of this approach can be found in [Lju99]. Neural Networks For training artiﬁcial neural network (ANN) models, threelayer feedforward neural networks with one output neuron were created using the backpropagation as well as the LevenbergMarquardt training method. Theoretical background and details can be found in [Nel01] (Chapter 11, “Neural Networks”), [Mar63], [Lev44], or [GMW82]. The following two approaches have been applied for training neural networks:
• On the one hand we have trained networks with 5 neurons in the hidden layer (referred to as “NN1” in the test series documentation in Section 11.2.5.8) as well as networks with 10 hidden neurons (referred to as “NN2” in the test series documentation); the number of iterations of the training process was set to 100 (in the ﬁrst variant, “NN1”) and 300 (in the second variant, “NN2”). In the context of analyzing the benchmark problems used here, higher numbers of nodes or iterations are likely to lead to overﬁtting (i.e., a better ﬁt on the training data, but worse test results). The ANN training framework used to collect the results reported in this book is the NNSYSID20 package, a neural network toolbox implementR ing the LevenbergMarquardt training method for MATLAB ; it has been implemented by Magnus Nørgaard at the Technical University of Denmark [Nør00].
• On the other hand, the multilayer perceptron training algorithm available in WEKA [WF05] has also been used for training classiﬁers. In this case the number of hidden nodes was set to (a + c)/2, where a is the number of attributes (features) and c the number of classes. The number of iterations was not predeﬁned, but 10% of the training data were designated to be used as validation data; in order to combat the danger of overﬁtting, the training algorithm was terminated as soon as the error on validation data got worse in 20 iterations consecutively. This training method, which applies backpropagation learning, is in the following referred to as the “NN3” method.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
277
kNN Classification Unlike other databased modeling methods based on linear models, neural networks or GP, knearestneighbor classiﬁcation works without creating any explicit models. During the training phase, the data are simply collected; when it comes to classifying a new, unknown sample xnew , the samplewise distance between xnew and all other training samples xtrain is calculated and the classiﬁcation is done on the basis of those k training samples (xN N ) showing the smallest distances from xnew . The distance between two samples is calculated as follows: First, all features are normalized by subtracting the respective mean values and dividing the remaining samples by the respective variables’ standard deviation. Given a data matrix x including m features storing the information about N samples, the normalized values xnorm are calculated as PN x(i, j) − N1 k=1 x(i, k) (11.55) ∀(i ∈ [1, m])∀(j ∈ [1, N ]) : xnorm (i, j) = σ(x(i, 1 . . . N )) where the standard deviation σ of a given variable x storing N values is calculated as v u N u 1 X (xi − x ¯)2 (11.56) σ(x) = t N − 1 i=1 with x¯ denoting the mean value of x. Then, on the basis of the normalized data, the distance between two samples a and b, d(a, b), is calculated as the mean squared variablewise distance: n
d(a, b) =
1X (anorm (i) − bnorm (i))2 n i=1
(11.57)
where n again is the number of features stored for each sample. In the context of classiﬁcation, the numbers of instances (of the k nearest neighbors) are counted for each given class and the algorithm automatically predicts that class that is represented by the highest number of instances. In the test series documented in this book we have applied weighting to kNN classiﬁcation: The distance between xnew and any sample xz is relevant for the classiﬁcation statement, and the weight of “nearer” samples is higher than that of samples that are “further” away from xnew . There is a lot of literature that can be found for kNN classiﬁcation; very good explanations and compact overviews of kNN classiﬁcation (including several possible variants and applications) are for example given in [DHS00] and [RN03].
© 2009 by Taylor & Francis Group, LLC
278
Genetic Algorithms and Genetic Programming
Support Vector Machines Support vector machines (SVMs) are a widely used approach in machine learning based on statistical learning theory [Vap98]; an example of the application of SVMs in the medical domain has been reported in [MIB+ 00], for example. The most important aspect of SVMs is that it is possible to give bounds on the generalization error of the models produced, and to select the respectively best model from a set of models following the principle of structural risk minimization [Vap98]. SVM are designed to calculate hyperplanes that separate the data from each other and maximize the margin between sets of data points. While the basic training algorithm is only able to construct linear separators, socalled kernel functions can be used to calculate scalar products in higherdimensional spaces; if the kernel functions used are nonlinear, then the separating boundaries will be nonlinear, too. In this work we have used the SVM implementation described in [Pla99] and [KSBM01]; we have used the implementation of this algorithm which is available for the WEKA machine learning framework [WF05]. Polynomial kernels have been used as well as Gaussian radial basis function kernels with the γ parameter (deﬁning the inverse variance) set to 0.01 and the complexity parameter c set to 10,000. 11.2.5.8
Detailed Test Series Results
The results summarized in this section have been partially published in [WAW06b], [WAW06e], and [WAW07]. Since the Wisconsin and the Thyroid data sets are publicly available, the results produced by GP are compared to those that have been published previously for various machine learning methods; the Melanoma is not openly available, therefore we have used all machine learning approaches mentioned for training classiﬁers for this data set. All three data sets were investigated via 10fold crossvalidation (CV). For each data collection, each of the resulting 10 pairs of training and test data partitions has been used in 5 independent GP test runs; for the Melanoma data set, all machine learning algorithms mentioned previously have also been applied to all pairs of training and test data, the stochastic algorithms again applied 5 times independently. Results for the Wisconsin Data Set Table 11.12 summarizes the results for the 10fold cross validation produced by GP with oﬀspring selection; these ﬁgures boil down to the fact that extended GP has in this case been able to produce classiﬁers that on average correctly classify 97.91% of training samples and 97.53% of test samples.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
279
Table 11.12: Summary of training and test results for the Wisconsin data set: Correct classiﬁcation rates (average values and standard deviation values) for 10fold CV partitions, produced by GP with oﬀspring selection. Partition Training Test Avg. Std.Dev. Avg. Std.Dev. 0 97.69% 0.27 97.06% 1.04 1 97.69% 0.85 97.65% 2.23 2 98.40% 0.72 97.94% 1.32 3 98.37% 0.56 98.24% 1.23 4 97.52% 0.78 97.06% 2.08 5 97.95% 0.77 97.94% 1.32 6 98.05% 0.43 97.05% 1.47 7 98.05% 0.47 97.65% 1.68 8 97.75% 0.62 97.65% 1.32 9 97.62% 0.74 97.06% 1.47 Avg. 97.91% 0.62 97.53% 1.51
In order to compare the quality of these results to those reported in the literature, Table 11.13 summarizes test accuracies that have been obtained using 10fold cross validation. For each method listed we give the references to the respective articles in which these results have been reported9 . Obviously the results summarized in Table 11.12 have to be considered surprisingly good as they outperform all other algorithms reported in the literature listed here. In [LC05], for example, recent results for several classiﬁcation benchmark problems are documented; the Wisconsin data set was there analyzed using standard GP as well as three other GPbased classiﬁcation variants (POPEGP, DecMOGP, and DecMOPGP), and the respective results are also listed in Table 11.13. Of course, for the sake of honesty we have to admit that the eﬀort of GP to produce these classiﬁers is higher than the runtime or memory consumed by most other machine learning algorithms; in our GP tests using the Wisconsin data set and populations with 500 individuals the average number of generations executed was 51.6 and the average number of solutions evaluated ∼1,296,742.
Results for the Melanoma Data Set For the Melanoma data set no results are available in the literature; therefore we have tested all machine learning algorithms mentioned previously for getting an objective evaluation of our GP methods. 9 An
even more detailed listing of test results for this data set can be found in [JHC04].
© 2009 by Taylor & Francis Group, LLC
280
Genetic Algorithms and Genetic Programming
Table 11.13: Comparison of machine learning methods: Average test accuracy of classiﬁers for the Wisconsin data set. Algorithm Test Accuracy GP with OS 97.53% Probit [WHMS03] 97.20% RLP [BU95] 97.07% SVM [WHMS03] 96.70% C4.5 (decision tree) [HSC96] 96.0% ANN [TG97] 95.61% DecMOPGP [LC05] 95.60% DecMOGP [LC05] 95.19% POPEGP [LC05] 95.08% StandardGP [LC05] 93.82%
First, in Table 11.14 we summarize original vs. estimated classiﬁcations obtained by applying the classiﬁers produced by GP with oﬀspring selection; in total, 97.17% of the training and 95.42% of the test samples are classiﬁed correctly (with standard deviations 0.87 and 2.13, respectively). These GP tests using the Melanoma data set were done with populations containing 1,000 individuals; the average number of generations executed was 54.4 and the average number of solutions evaluated ∼2,372,629.
Table 11.14: Confusion matrices for average classiﬁcation results produced by GP with OS for the Melanoma data set. Training Original Classiﬁcation [0] (Benign) [1] (Malign) Estimated [0] 1,043.21 (88.41%) 9.09 (0.77%) Classiﬁcation [1] 24.28 (2.06%) 103.42 (8.76%) Test Original Classiﬁcation [0] (Benign) [1] (Malign) Estimated [0] 115.18 (87.92%) 2.67 (2.04%) Classiﬁcation [1] 3.33 (2.54%) 9.82 (7.50%)
Test results obtained using other machine learning algorithms are collected in Table 11.15. Support vector machine based training was done with radial as well as with polynomial kernel functions. Furthermore we used γ values 0.001 and 0.01. In standard GP (SGP) tests we used tournament parent selection (k = 3), 8% mutation, single point crossover and the same structural limitations as in GP with OS; in order to get a fair comparison, the population
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
281
size was set to 1,000 and the number of generations to 2,500 yielding 2,500,000 evaluations per test run. As we can see in Table 11.15, our GP implementation performs approximately as well as the support vector machines and neural nets applying those settings that are optimal in this test case: GP with OS was able to classify 95.42% of the test cases correctly, SVMs correctly classiﬁed 94.89% – 95.47% and neural nets (with validation set based stopping) 95.27% of the test cases evaluated. Standard GP as well as kNN, linear regression, and standard ANNs perform worse. Even though it is nice to see that the average accuracy recorded for models produced by GP with OS is quite ﬁne, the relatively high standard deviation of this method’s performance (2.13, compared to 0.41 recorded for optimal SVMs) has to be seen as a negative aspect of these results.
Table 11.15: Comparison of machine learning methods: Average test accuracy of classiﬁers for the Melanoma data set. Algorithm Test Accuracy Avg. Std.Dev. SVM (radial, γ = 0.01) 95.47% 0.41 GP with OS 95.42% 2.13 SVM (polynomial, γ = 0.01) 95.40% 0.56 SVM (radial, γ = 0.001) 95.27% 0.74 NN3 95.27% 1.91 SVM (polynomial, γ = 0.001 94.89% 0.83 NN1 94.35% 2.39 kNN (k = 3) 93.59% 1.03 SGP 93.52% 3.72 NN2 92.90% 2.59 kNN (k = 5) 92.85% 0.94 Lin 92.45% 2.90
Results for the Thyroid Data Set Finally, the results achieved for the Thyroid data set are to be reported here. Table 11.16 summarizes the results for the 10fold cross validation produced by GP with oﬀspring selection. For each class we characterize the classiﬁcation accuracy on training and test data, giving average as well as standard deviation values for each partition. These ﬁgures boil down to the fact that extended GP has in this case been able to produce classiﬁers that on average correctly classify 99.10% of training samples and 98.76% of test samples, the total standard deviation values being 0.73 and 0.92, respectively.
© 2009 by Taylor & Francis Group, LLC
282
Genetic Algorithms and Genetic Programming
Table 11.16: Summary of training and test results for the Thyroid data set: Correct classiﬁcation rates (average values and standard deviation values) for 10fold CV partitions, produced by GP with oﬀspring selection. Partition Training Test Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 0 avg. 94.67% 97.64% 99.63% 90.00% 95.68% 99.19% 1.70 2.65 0.52 7.13 4.10 0.64 std.dev. 1 avg. 94.93% 98.67% 99.01% 88.75% 96.76% 99.43% 3.58 4.23 0.46 5.23 5.86 0.44 std.dev. 2 avg. 96.67% 98.49% 99.49% 91.25% 96.22% 97.90% 1.89 2.00 0.55 3.42 6.51 2.02 std.dev. 3 avg. 96.00% 98.19% 99.15% 90.00% 95.68% 99.46% 2.87 1.56 0.42 5.59 4.10 0.31 std.dev. 4 avg. 95.33% 97.04% 99.19% 88.75% 96.22% 99.61% 2.45 5.38 0.35 11.18 3.63 0.27 std.dev. 5 avg. 95.07% 96.62% 99.22% 95.00% 94.59% 99.37% 1.92 5.29 0.40 5.23 4.27 0.29 std.dev. 6 avg. 93.47% 97.76% 99.16% 87.50% 94.59% 98.56% 2.18 7.64 0.49 7.65 4.27 0.91 std.dev. 7 avg. 98.80% 98.97% 99.16% 87.50% 92.97% 99.40% 2.18 5.92 0.49 7.65 4.52 0.30 std.dev. 8 avg. 94.40% 98.01% 99.23% 96.25% 94.05% 99.34% 5.11 4.99 0.64 5.23 3.52 0.57 std.dev. 9 avg. 97.73% 96.62% 99.31% 91.25% 92.43% 99.55% 2.69 2.65 0.52 3.42 3.52 0.15 std.dev. Avg. avg. 95.71% 97.80% 99.26% 90.63% 94.92% 99.18% 2.66 4.23 0.48 6.17 4.43 0.59 std.dev.
In order to compare the quality of these results to those reported in the literature, Table 11.17 summarizes a selection of test accuracies that have been obtained using 10fold cross validation; again, for each method listed we give the references to the respective articles in which these results have been reported. Obviously, the results summarized in Table 11.16 have to be considered quite ﬁne, but not perfect as they are outperformed by results reported in [WK90] and [DAG01]. GP has also been repeatedly applied for solving the Thyroid problem; some of the results published are the following ones: In [LH06] (Table 8), results produced by a paretocoevolutionary GP classiﬁer system for the Thyroid problem are reported, and here in Table 11.17 these results are stated as the “PGPC” results; in fact, these results are not the mean accuracy values but rather the median value, which is why these results are not totally comparable to other results stated here. Loveard and Ciesielski
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
283
[LC01] reported that classiﬁers for the Thyroid problem could be produced using GP with test accuracies ranging from 94.9% to 98.2% (depending on the range selection strategy used). According to Banzhaf and Lasarczyk [BL04], GPevolved programs consisting of register machine instructions turned out to eventually misclassify on average 2.29% of the given test samples, and that optimal classiﬁers are able to correctly classify 98.64% of the test data. Furthermore, Gathercole and Ross [GR94] report classiﬁcation errors between 1.6% and 0.73% as best result using treebased GP, and that a classiﬁcation error of 1.52% for neural networks is reported in [SJW92]. In fact, Gathercole and Ross reformulated the Thyroid problem to classifying cases as “class 3” or “not class 3”; as is stated in [GR94], it turned out to be relatively straightforward for their GP implementation (DSSGP) to produce function tree expressions which could distinguish between classes “1” and “2” completely correctly on both the training and test sets. “To be fair, in splitting up the problem into two phases (class 3 or not, then class 1 or 2) the GP has been presented with an easier problem [. . .]. This could be taken in diﬀerent ways: Splitting up the problem is mildly cheating, or demonstrating the ﬂexibility of the GP approach.” (Taken from [GR94].)
Table 11.17: Comparison of machine learning methods: Average test accuracy of classiﬁers for the Thyroid data set. Algorithm Accuracy Training Test CART [WK90] 99.80% 99.36% PVM [WK90] 99.80% 99.33% Logical Rules [DAG01] – 99.30% GP [GR94] – 98.4% – 99.27% GP with OS 99.10% 98.76% GP [BL04] – 97.71% – 98.64% GP [LC01] – 94.9% – 98.2% BP + local adapt. rates [SJW93] 99.6% 98.5% ANN [SJW92] – 98.48% BP + genetic opt. [SJW93] 99.4% 98.4% Quickprop [SJW93] 99.6% 98.3% RPROP [SJW93] 99.6% 98.0% PGPC [LH06] – 97.44%
GP with strict oﬀspring selection was here applied with populations of 1000 individuals; on average, the number of generations executed in our GP tests
© 2009 by Taylor & Francis Group, LLC
284
Genetic Algorithms and Genetic Programming
for the Thyroid test studies was 73.9, and on average 2,463,635.1 models were evaluated in each GP test run. 11.2.5.9
Conclusion
We have here described an enhanced genetic programming method that was successfully used for investigating machine learning problems in the context of medical classiﬁcation. The approach works with hybrid formula structures combining logical expressions (as used for example in decision trees) and classical mathematical functions; the enhanced selection scheme originally successfully applied for solving combinatorial optimization problems using genetic algorithms was also applied yielding high quality results. We have intensively investigated GP in the context of learning classiﬁers for three medical data collections, namely the Wisconsin and the Thyroid data sets taken from the UCI machine learning repository and the Melanoma data set, a collection that represents medical measurements which were recorded while investigating patients potentially suﬀering from skin cancer. The results presented in this section are indeed satisfying and make the authors believe that an application in a realworld framework in the context of medical data analysis using the techniques presented here is recommended. As documented in the test results summary, our GPbased classiﬁcation approach is able to produce results that are – in terms of classiﬁcation accuracy – at least comparable to or even better than the classiﬁers produced by classical machine learning algorithms frequently used for solving classiﬁcation problems, namely linear regression, neural networks, neighborhoodbased classiﬁcation, or support vector machines as well as other GP implementations that have been used on the data sets investigated in our test studies.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
11.3 11.3.1
285
Genetic Propagation Test Setup
When speaking of analysis of genetic propagation as described in Section 6.1, we analyze how well which parts of the population succeed in propagating their genetic material to the next generation, i.e., to produce oﬀspring that will be included in the next generation’s population. In this section we shall report on tests in this area; major parts have been published in our article on oﬀspring selection and its eﬀects on genetic propagation in GPbased system identiﬁcation [WAW08] as well as in [Win08]. We have here used the NOx data set II already presented and described in Section 11.1.2.3. Originally, this data set includes 10 variables, each storing approximately 36,000 samples; the ﬁrst 10,000 samples are neglected in the tests reported on here, approximately 18,000 samples are training, and 4,000 samples are validation (which is in this case equivalent to test) data. The last ∼4,000 samples are again neglected. In principle, we are using conventional GP (with tournament and proportional selection) as well as extended GP (with gender speciﬁc selection as well as oﬀspring selection). The details of the test strategies used are given in Table 11.18.
Table 11.18: GP test strategies. Strategy Properties (I) Pop = 1000; Tournament parent selection (k = 3) Conventional GP nr. of rounds: 1000 (II) Pop = 1000; Proportional parent selection; Conventional GP nr. of rounds: 1000 (III) Pop = 500; Gender speciﬁc parent selection Extended (proportional, random); GP Oﬀspring selection (SuccessRatio = 1, MaxSelPres = 100)
In all three test strategies we used subtree exchange crossover, the time series analysis speciﬁc evaluation function (with early abortion as described
© 2009 by Taylor & Francis Group, LLC
286
Genetic Algorithms and Genetic Programming
in Section 11.1.1) for evaluating solutions, and applied 1elitism as well as 15% mutation rate.
11.3.2
Test Results
We have executed independent test series with 5 executions for each test strategy; the results are to be summarized and analyzed here. With respect to solution quality and eﬀort10 , the extended GP algorithm clearly outperforms the conventional GP variants (as summarized in Table 11.19).
Table 11.19: Test results. I II III Best min. 1,390.21 3,022.12 1,201.23 Quality avg. 1,513.84 5,014.96 1,481.69 (Training) max. 2,431.54 10,013.12 2,012.27 Best min. 8,231.76 12,312.83 4,531.56 Quality avg. 10,351.96 15,747.69 8,912.61 (Test) max. 13,945.23 21,315.23 16,123.34 Generations 500 64.31 Eﬀort 1,000,000 898,332.23
Regarding parent analysis, in all test runs we documented the propagation count for each individual and sum these over all generations. So we get X pctotal (i) = pc(i) (11.58) i∈[1;gen]
for each individual index i and assume that gen is the number of generations executed. Additionally, we form equally sized partitions of the population indices and sum up the pctotal values for each partition. In Table 11.20 we give the average pctotal values for percentiles of the populations of test series I, II, and III; for test series I and II we collected the pctotal of 100 indices for forming a partition, and for test series III we collected 50 indices for each partition. The Figures 11.18 and 11.19 show pctotal values of exemplary test runs of the series I and II summed up for partitions of 10 solution indices each. Figure 11.20 shows pctotal values of exemplary test runs of series III summed up for partitions of 5 solution indices each. 10 The
number of solutions evaluated is here interpreted as the algorithm’s total eﬀort.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
287
Table 11.20: Average overall genetic propagation of population partitions. Population Test Strategy I II III Percentile 0 27.88% 10.31% 13.54% 1 21.29% 10.35% 11.20% 2 16.65% 10.31% 11.67% 3 12.64% 10.26% 10.91% 4 9.06% 10.25% 10.63% 5 6.08% 10.28% 9.85% 6 3.71% 10.24% 9.39% 7 1.88% 10.16% 8.83% 8 0.72% 10.10% 7.92% 9 0.10% 7.74% 6.07%
FIGURE 11.18: pctotal values for an exemplary run of series I.
FIGURE 11.19: pctotal values for an exemplary run of series II.
© 2009 by Taylor & Francis Group, LLC
288
Genetic Algorithms and Genetic Programming
FIGURE 11.20: pctotal values for an exemplary run of series III.
As we see from the results given in Tables 11.19 and 11.20 and Figure 11.18, there is a rather high selection pressure when using tournament selection; the results are rather good and (as expected) less ﬁt individuals are by far not able to contribute to the population as well as ﬁtter ones, leading to a quick and drastic reduction of genetic diversity. The results for test series II, as given in Tables 11.19 and 11.20 and Figure 11.19, are signiﬁcantly diﬀerent: The results are a lot worse (especially on training data) than those of algorithm variant I, and obviously there is no strong selection pressure as almost all individuals (or, rather the individuals at the respective indices) are able to contribute almost to the same extent. Only the worst ones are not able to propagate their genetic material to the next generations as well as better ones. This is due to the fact that in the presence of very bad individuals roulette wheel selection selects the best individuals approximately as often as those that perform middlingly well. Especially in databased modeling there are often individuals that score extremely badly (due to divisions by very small values, for example), and in comparison to those all other ones are approximately equally ﬁt. Finally, test series III obviously produced the best results with respect to training as well as validation data (see also Table 11.19). Even more, the results that are given in Table 11.20, column III, and displayed in Figure 11.20, show that the combination of random and roulette parent selection and oﬀspring selection results in a very moderate distribution of the pctotal values: Fitter individuals contribute more than less ﬁt ones, but even the worst ones are still able to contribute to a signiﬁcant extent. Thus, genetic diversity is increased which also contributes positively to the genetic programming process.
11.3.3
Summary
Thus, in order to sum up this section, oﬀspring selection in GPbased system identiﬁcation signiﬁcantly inﬂuences the algorithm’s ability to create high quality results as well as the genetic propagation dynamics: Not only ﬁtter
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
289
individuals are able to propagate their genetic makeup, but also less ﬁt ones are able to contribute to the next population. This is also somehow the case when using proportional selection, but in the presence of individuals with very bad ﬁtness values the selection pressure is almost lost which leads to solutions of rather bad quality. When using oﬀspring selection, extremely bad individuals are eliminated immediately; when using OS in combination with gender speciﬁc parent selection (applying random and proportional selection mechanisms), GP is able to produce signiﬁcantly better results than when using standard techniques. Parents diversiﬁcation and thus increased genetic diversity in GP populations is considered one of the most inﬂuential aspects in this context.
11.3.4
Additional Tests Using Random Parent Selection
In addition to the tests reported on in the previous parts of this section we have also tested conventional as well as extended GP using random parent selection. Thus, we have two more test cases to be analyzed.
Table 11.21: Additional test strategies for genetic propagation tests. Strategy Properties (IV) Pop = 2000; Random parent selection Conventional GP nr. of rounds: 500 (V) Pop = 500; Random parent selection Extended Oﬀspring selection GP (SuccessRatio = 1, MaxSelPres = 100)
As we had expected, the test results obtained for standard GP with random parent selection were very bad; obviously, no suitable models were found. When using OS, on the contrary, the test results for random parent selection were not that bad at all: The models are (on training data) not quite as good as those obtained using random/roulette and OS or conventional GP with tournament parent selection, but still they perform (surprisingly) well on test data11 . In Table 11.22 we summarize the respective result qualities.
11 Of
course, these remarks are only valid for the tests reported on here  here we do not give any general statement regarding result quality using random parent selection and OS.
© 2009 by Taylor & Francis Group, LLC
290
Genetic Algorithms and Genetic Programming
Table 11.22: Test results in additional genetic propagation tests (using random parent selection). IV V Best min. >50,000.00 5,041.22 Quality avg. >50,000.00 7,726.11 (Training) max. >50,000.00 8,843.73 Best min. >50,000.00 7,129.31 Quality avg. >50,000.00 8,412.31 (Test) max. >50,000.00 12,653.98 Generations 500 102.86 Eﬀort 1,000,000 1,324,302
In Table 11.23 we give the average pctotal values for percentiles of the populations of test series IV and V (collecting the pctotal values of 200 indices for forming a partition for series IV and 50 indices for each partition for series V). Obviously (and exactly as we had expected) random parent selection leads to all individuals having the approximately same success in propagating their genetic makeup. When using OS, the result is (even a little bit surprisingly) signiﬁcantly diﬀerent: Better individuals have a much higher chance to produce successful oﬀspring than worse ones; the probability of the best 10%, for example, to produce successful children is almost twice as high as the probability of the worst 10% to do so.
Table 11.23: Average overall genetic random parent selection tests. Population Percentile 0 1 2 3 4 5 6 7 8 9
© 2009 by Taylor & Francis Group, LLC
propagation of population partitions for Test Strategy IV V 10.03 % 13.16 % 10.02 % 12.07 % 9.98 % 11.41 % 9.99 % 10.66 % 10.02 % 10.30 % 9.99 % 9.33 % 10.00 % 8.96 % 9.98 % 8.62 % 9.99 % 7.81 % 10.00 % 7.68 %
DataBased Modeling with Genetic Programming
291
Obviously, random parent selection leads to an increased number of generations that have to be executed until a given selection pressure limit is reached. This is graphically shown in Figure 11.21, which gives the selection pressure progress for two exemplary test runs of the test series including OS, i.e., III and V. In the standard case using random / roulette parent selection and oﬀspring selection, III, the selection pressure obviously rises faster than when using random parent selection in combination with strict oﬀspring selection. Still, even though it takes longer when using random parent selection, the characteristics are very similar, i.e., it rises steadily with some notable ﬂuctuations. Average Selection Pressure Progress 100 90 80 70
Sel.Pres. (III) Sel.Pres. (V)
60 50 40 30 20 10 0 0
10
20
30
40
50
60
70
80
90
100
Generations
FIGURE 11.21: Selection pressure progress in two exemplary runs of test series III and V (extended GP with gender speciﬁc parent selection and strict oﬀspring selection).
© 2009 by Taylor & Francis Group, LLC
292
11.4 11.4.1
Genetic Algorithms and Genetic Programming
Single Population Diversity Analysis GP Test Strategies
Within our ﬁrst series of empirical tests regarding solutions similarity and diversity we analyzed the diversity of populations of single population GP processes. For testing the population diversity analysis method described in Section 6.2 and illustrating graphical representations of the results of these tests we have used the following two data sets: • The NOx data set contains the measurements taken from a 2 liter 4 cylinder BMW diesel engine at a dynamical test bench (simulated vehicle: BMW 320d Sedan); this data set has already been described in Section 11.1 as N Ox data set III. • The Thyroid data set is a widely used machine learning benchmark data set containing the results of medical measurements which were recorded while investigating patients potentially suﬀering from hypothyroidism; further details regarding this data set can be found in Chapter 11.2. Both data collections have been split into training and validation / test data partitions taking the ﬁrst 80% of each data set as training samples available to the identiﬁcation algorithm; the rest of the data is considered as validation data. We have used various GP selection strategies for analyzing the NOx and the Thyroid data sets: • On the one hand, we have used standard GP with proportional as well as tournament selection (tournament size k = 3). • On the other hand we have also intensively tested GP using oﬀspring selection and gender speciﬁc parent selection (proportional and random selection). In general, we have tested GP with populations of 1,000 solution candidates (with a maximum tree size of 50 and a maximum tree height of 5), standard subtree exchange crossover, structural as well as parametric node mutation and total 15% mutation rate; the mean squared errors function was used for evaluating the solutions on training as well as on validation (test) data. Other essential parameters vary depending on the test strategies; these are summarized in Table 11.24.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
293
Table 11.24: GP test strategies. Strategy Properties (A) Standard GP Tournament parent selection (tournament size k = 3); Number of generations: 4000 (B) Standard GP Proportional parent selection; Number of generations: 4000 (C) GP with OS Gender speciﬁc parent selection; (Random & proportional) Success ratio: 0.8 Comparison factor: 0.8 (Maximum selection pressure: 50 (not reached) Number of generations: 4000 (D) GP with OS Gender speciﬁc parent selection; (Random & proportional) Success ratio: 1.0 Comparison factor: 1.0 Maximum selection pressure: 100
11.4.2
Test Results
In Table 11.25 we summarize the quality of the best models produced using the GP test strategies (A) – (D); for the NOx data set the quality is given as the mean squared error; for the Thyroid data set we give the classiﬁcation accuracy, i.e., the ratio of samples that are classiﬁed correctly. The models are evaluated on training as well as on validation data; as each test strategy was executed 5 times independently, we here state mean average and standard deviation values. Obviously, the test series (A) and (D) perform best; the results produced using oﬀspring selection are better than those using standard GP. The classiﬁcation results for the Thyroid data set are not quite as good as those reported in [WAW06e] and Section 11.2; this is due to the fact that we here used smaller models and concentrated on the comparison of GP strategies with respect to population diversity.
Solution quality analysis is of course important and interesting, but here we are more interested in a comparison of population diversity during the execution of the GP processes. We have calculated the similarity among the GP populations during the execution of the GP test series described in Table 11.24: The multiplicative similarity approach (as deﬁned in Equations 9.63 – 9.66) has been chosen; all coeﬃcients c1 . . . c10 were set to 0.2, only the coeﬃcient c1 weighting the level diﬀerence contribution d1 was set to 0.8.
© 2009 by Taylor & Francis Group, LLC
294
Genetic Algorithms and Genetic Programming Table 11.25: Test results: Solution qualities. Results for NOx test series GP Strategy (A) (B) (C) (D) Training (mse) 2.518 5.027 2.674 1.923 Training (std(mse)) 1.283 2.142 2.412 0.912 Validation (mse) 3.012 5.021 2.924 2.124 Validation (std(mse)) 1.431 3.439 2.103 1.042 Evaluated solutions, avg. 4 · 106 4 · 106 10.2 · 106 3.91 · 106 Generations (avg.) 4,000 4,000 4,000 98.2 Results for Thyroid test series GP Strategy (A) (B) (C) (D) Training (cl. acc., avg.) 0.9794 0.9758 0.9781 0.9812 Training (cl. acc., std) 0.0032 0.0017 0.0035 0.0012 Validation (cl. acc., avg.) 0.9764 0.9675 0.9767 0.9804 Validation (cl. acc., std) 0.0029 0.0064 0.0069 0.0013 Evaluated solutions, avg. 4 · 106 4 · 106 12.2 · 106 5.1 · 106 Generations (avg.) 4,000 4,000 4,000 167.8
In Table 11.26 we give the average population similarity values calculated using Equation 6.7; again, as each test series was executed several times, we give the average and standard deviation values (written in italic letters). As we see in the ﬁrst row, the average similarity values are approximately in the interval [0.2; 0.25] at the beginning of the GP runs, i.e., after the initialization of the GP populations. In standard GP, as can be seen in the ﬁrst column, the average similarity reaches values above 0.7 after 400 generations and stays at approximately this level until the end of the execution of the GP process; in the end, the average similarity was ∼0.87 in the NOx tests and ∼0.81 in the Thyroid test series. Analyzing the second and the third column we notice that this is not the case in test series (B) and (C): The similarity values do in test series (B) by far not rise as high as in series (A) (especially when working on the Thyroid data set), and also in test series (C) we have measured signiﬁcantly lower similarities than in series (A) (i.e., the population diversity was higher during the whole GP process). Obviously, the use of oﬀspring selection with rather soft parameter settings (i.e., success ratio and comparison factor set to values below 1.0) does not have the same eﬀects on the GP process as strict ones. The by far highest similarity values are documented for test series (D) using maximally strict oﬀspring selection (which has produced the best quality models, as documented in Table 11.25): As is summarized in the far right column, during the whole evolutionary process the mutual similarity among the models increases steadily, while also the selection pressure increases. In the end, when the selection pressure reaches a high level (in these cases, the
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
295
predeﬁned limit was set to 100) and the algorithm stops, we see a very high similarity among the solution candidates, i.e., the population has converged and evolution is likely to have gotten stuck. This is in fact consistent with the impression already stated in [WAW06a] or [WAW06e], e.g.; here we see that this in fact really happens.
Table 11.26: Test results: Population diversity (average similarity values; avg., std.). NOx tests Gen. GP Strategy Gen. GP Strategy (A) (B) (C) (D) 0 0.247 0.250 0.270 0 0.197 0.041 0.031 0.037 0.039 100 0.723 0.491 0.517 10 0.397 0.073 0.051 0.038 0.039 400 0.813 0.497 0.564 20 0.603 0.035 0.058 0.059 0.049 1000 0.859 0.510 0.520 40 0.810 0.021 0.055 0.052 0.039 4000 0.871 0.518 0.526 End of 0.985 run 0.032 (End of run) 0.019 0.059 0.053 Thyroid tests Gen. GP Strategy Gen. GP Strategy (A) (B) (C) (D) 0 0.206 0.205 0.208 0 0.197 0.041 0.040 0.036 0.040 100 0.581 0.241 0.444 10 0.397 0.047 0.043 0.035 0.039 400 0.737 0.321 0,610 20 0.602 0.032 0.058 0.026 0.049 1000 0.808 0.341 0.692 40 0.810 0.029 0.049 0.031 0.041 4000 0.812 0.343 0.701 End of 0.975 run 0.019 (End of run) 0.038 0.056 0.030
In Table 11.27 we summarize the maximum population diversity values calculated using Equation 6.8; again we give the average and standard deviation values (written in italic letters). As we see in the ﬁrst (left) column, in standard GP with tournament selection the average maximum similarity reaches
© 2009 by Taylor & Francis Group, LLC
296
Genetic Algorithms and Genetic Programming
Table 11.27: Test results: Population diversity (maximum avg., std.). NOx tests Gen. GP Strategy Gen. GP (A) (B) (C) 0 0.919 0.934 0.904 0 0.116 0.095 0.123 100 0.995 0.825 0.944 10 0.014 0.074 0.059 400 0.998 0.809 0.978 20 0.006 0.075 0.037 1000 0.999 0.811 0.965 40 0.005 0.059 0.044 4000 0.999 0.819 0.969 End of run (End of run) 0.003 0.066 0.035 Thyroid tests Gen. GP Strategy Gen. GP (A) (B) (C) 0 0.823 0.771 0.766 0 0.127 0.145 0.145 100 0,958 0.749 0.840 10 0.028 0.123 0.094 400 0.973 0.752 0.883 20 0.032 0.125 0.067 1000 0.977 0.744 0.913 40 0.022 0.117 0.061 4000 0.977 0.754 0.909 End of run (End of run) 0.021 0.111 0.058
similarity values;
Strategy (D) 0.936 0.109 0.961 0.049 0.971 0.033 0.995 0.012 0.996 0.009 Strategy (D) 0.777 0.157 0.873 0.101 0.934 0.049 0.976 0.022 0.999 0.004
values above 0.95 rather fast, i.e., for all models in the population rather similar solutions can be found. This is not the case when using proportional selection. When using oﬀspring selection the same eﬀect as in standard GP with tournament selection can be seen, especially in the NOx test series. The Figures 11.22 – 11.25 exemplarily show the average population diversity by giving the distribution of similarities among all individuals. The Figures 11.22 and 11.23 show the similarity distributions of an exemplary test run of series (A) at generation 200 and 4000; obviously, most similarity calculations returned similarity values between 0.7 and 1.0, and the distribution at generation 200 is comparable to the distribution at the end of the test run. For the GP runs incorporating oﬀspring selection this is not the case, as we exemplarily see in Figures 11.24 and 11.25: After 20 generations most similarity values almost ﬁt Gaussian distribution with mean value 0.8, and at the end of the run all models are very similar to each other (i.e., the population has converged, the selection pressure reaches the given limit and the algorithm
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
297
Similarity Values Histogram (NOx, A, Generation 200) 0.1
0.09
Proportion of Similarity Values
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
FIGURE 11.22: Distribution of similarity values in an exemplary run of NOx test series A, generation 200. stops). Finally, Figure 11.26 shows the average similarity values for each model (calculated using Equation 6.5) for exemplary test runs of the Thyroid test series (A)12 and (D). Obviously, the average similarity in standard GP reaches values in the range [0.7;0.8] very early and then stays at this level during the rest of the GP execution. When using gender speciﬁc selection and oﬀspring selection, otherwise, the average similarity steadily increases during the GP process and almost reaches 1.0 at the end of the run, when the maximum selection pressure is reached.
11.4.3
Conclusion
Structural similarity estimation has been used for measuring the genetic diversity among GP populations: Several variations of genetic programming using diﬀerent types of selection schemata have been tested using ﬁnegrained similarity estimation, and two machine learning data sets have been used for these empirical tests. The test results presented show that population diversity diﬀers a lot in the test runs depending on the selection schemata used.
12 In
fact, for the test run of series (A) we here only show the progress over the ﬁrst 2000 generations.
© 2009 by Taylor & Francis Group, LLC
298
Genetic Algorithms and Genetic Programming
Similarity Values Histogram (NOx, A, Generation 4000) 0.1
0.09
Proportion of Similarity Values
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
FIGURE 11.23: Distribution of similarity values in an exemplary run of NOx test series A, generation 4000.
Similarity Values Histogram (NOx, D, Generation 20) 0.1
0.09
Proportion of Similarity Values
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
FIGURE 11.24: Distribution of similarity values in an exemplary run of NOx test series (D), generation 20.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
299
Similarity Values Histogram (NOx, D, Generation 95) 1
0.9
Proportion of Similarity Values
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
FIGURE 11.25: Distribution of similarity values in an exemplary run of NOx test series (D), generation 95.
1
0.5
0 0
400
800
1,200
1,600
2,000
Iterations
1
0.5
0 0
10
20
30 Iterations
40
50
FIGURE 11.26: Population diversity progress in exemplary Thyroid test runs of series (A) and (D) (shown in the upper and lower graph, respectively).
© 2009 by Taylor & Francis Group, LLC
300
11.5
Genetic Algorithms and Genetic Programming
MultiPopulation Diversity Analysis
Our second series of empirical tests regarding solutions similarity and diversity was dedicated to the diversity of populations of multipopulation GP processes; for testing the multipopulation diversity analysis method described in Section 6.2 and illustrating graphical representations of the results of these tests we have again used the following two data sets: The NOx data set III described in Section 11.1 as well as the Thyroid data set. Both data collections have been split into training and validation / test data partitions: In the case of the NOx data set the ﬁrst 50% of the data set were used as training samples; in the case of the Thyroid data set the ﬁrst 80% were considered by the training algorithms.
11.5.1
GP Test Strategies
In general, 4 diﬀerent strategies for parallel genetic programming have been applied: • Parallel island GP without interaction between the populations; i.e., all populations evolve independently. • Parallel island GP with occasional migration after every 100th generation in standard GP and every 5th generation in GP with oﬀspring selection: The worst 1% of each population pi is replaced by copies of the best 1% of solutions in population pi−1 ; the best solutions of the last population (in the case of n population that is pn ) replace the worst ones of the ﬁrst population (p1 ). The unidirectional ring migration topology has been used. • Parallel island GP with migration after every 50th generation in standard GP and every 5th generation in GP with oﬀspring selection: The worst 5% of each population pi is replaced by copies of the best 5% of solutions in population pi−1 . Again, the unidirectional ring migration topology has been used. • Finally, the SASEGASA algorithm as described in Chapter 5 has been used as well. In all cases the algorithms have been initialized with 5 populations, each containing 200 solutions (in our case representing formulas, of course). Additionally, each of the ﬁrst 3 strategies has been tested with standard GP settings as well as oﬀspring selection; Table 11.28 summarizes the 7 test strategies that have been applied and whose results shall be discussed here.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
11.5.2
301
Test Results
All test strategies summarized in Table 11.28 have been executed 5 times using the N Ox as well as the Thyroid data set. Multipopulation diversity was measured using the equations given in Section 6.2.2: For each solution we calculate the average as well as the maximum similarities with solutions of all other populations of the respective algorithms (in the following, these values are denoted as M P div values). Additionally, we have also collected all solutions of the algorithms’ populations into temporary total populations and calculate the average as well as the maximum similarities of all solutions compared to all other ones (hereafter denoted as SP div values). Again, the multiplicative structural similarity approach (as deﬁned in Equations 9.63 – 9.66) has been used for estimating the similarity of model structures; all coeﬃcients c1 . . . c10 were set to 0.2, only the coeﬃcient c1 weighting the level diﬀerence contribution d1 was set to 0.8. In the following we summarize these values for all test runs by stating the average values as well as standard deviations: Table 11.29 summarizes the results of the test runs using the Thyroid data set, 11.30 those of the test runs using the N Ox data set. Figure 11.27 exemplarily illustrates the multipopulation diversity in a test run of series F at iteration 50: The value represented in row i of column j in bar k gives the average similarity of model i of population k with all formulas stored in population j. Low multipopulation similarity values are indicated by light cells; dark cells represent high similarity values.
© 2009 by Taylor & Francis Group, LLC
302
Genetic Algorithms and Genetic Programming
Table 11.28: GP test strategies. Strategy Properties (A) Parallel standard GP Tournament parent selection (tournament size k = 3); Number of generations: 2000 (B) Parallel GP with OS Random & roulette parent selection Strict Oﬀspring selection (success ratio: comparison factor: 1.0, maximum selection pressure: 200) (C) Parallel standard GP, Tournament parent selection (tournament size k = 3); 1% migration Number of generations: 2000 1% best / worst replacement after every 100th generation (D) Parallel GP with OS, Random & roulette parent selection Strict Oﬀspring selection (success ratio: 1% migration comparison factor: 1.0, maximum selection pressure: 200) 1% best / worst replacement after every 5th generation (E) Parallel standard GP, Tournament parent selection (tournament size k = 3); 5% migration Number of generations: 2000 5% best / worst replacement after every 50th generation (F) Parallel GP with OS, Random & roulette parent selection Strict Oﬀspring selection (success ratio: 5% migration comparison factor: 1.0, maximum selection pressure: 200) 5% best / worst replacement after every 5th generation (G) SASEGASA Random & roulette parent selection Strict Oﬀspring selection (success ratio: comparison factor: 1.0, maximum selection pressure: 200)
© 2009 by Taylor & Francis Group, LLC
1.0,
1.0,
1.0,
1.0,
DataBased Modeling with Genetic Programming
303
Table 11.29: Multipopulation diversity test results of the GP test runs using the Thyroid data set. Test Series A
B
C
D
E
F
G
11.5.3
Iteration 300 avg std 2000 avg std 20 avg std End of avg Run std 300 avg std 2000 avg std 20 avg std End of avg Run std 300 avg std 2000 avg std 20 avg std End of avg Run std 20 avg std 50 avg std 100 avg std 200 avg std End of avg Run std
Results for the Thyroid data set MPdiv (avg) MPdiv (max) 0.2433 0.3301 0.0514 0.0496 0.3592 0.3925 0.0613 0.0610 0.1698 0.2356 0.0497 0.0317 0.3915 0.4037 0.0599 0.0769 0.1778 0.2788 0.0587 0.0549 0.4145 0.4885 0.0551 0.0762 0.3276 0.4269 0.0486 0.1094 0.4412 0.5822 0.0734 0.0635 0.3395 0.6271 0.0441 0.0975 0.5329 0.8710 0.0833 0.0509 0.3721 0.5024 0.0629 0.0822 0.5915 0.8802 0.1034 0.0996 0.4839 0.5473 0.0823 0.0419 0.4325 0.5512 0.0518 0.0920 0.5102 0.7168 0.0730 0.0724 0.8762 0.9314 0.0505 0.0458 – – – –
SPdiv (avg) 0.2048 0.0612 0.3628 0.0593 0.2130 0.0317 0.3592 0.0820 0.1836 0.0296 0.3834 0.0665 0.3394 0.0175 0.3866 0.0772 0.2715 0.0139 0.3991 0.0921 0.2711 0.0981 0.4576 0.0514 0.3173 0.0581 0.3228 0.0672 0.3783 0.0861 0.4206 0.0792 0.9792 0.0256
SPdiv (max) 0.8973 0.0291 0.9027 0.0351 0.9182 0.0852 0.9850 0.0202 0.6543 0.0971 0.9236 0.0417 0.9312 0.0459 0.9736 0.0283 0.6116 0.0811 0.9129 0.0821 0.5192 0.0601 0.9828 0.0437 0.5237 0.0623 0.5828 0.0660 0.7296 0.0740 0.9512 0.0249 0.9934 0.0162
Discussion
As we see in Tables 11.29 and 11.30, the average diversity among populations in parallel island GP without interaction (i.e., in test series (A) and (B)) rises up to values between 0.35 and 0.4, no matter whether or not OS is applied; the maximum values eventually reach values between 0.45 and 0.5. Considering all solutions collected in temporary total populations, as expected the average similarities reach values below 0.4, the maximum similarities almost reach 1.0. The similarity values monitored in test series (C) and (D) are, in comparison to those of series (A) and (B), slightly higher, but not dramatically. This does not hold for the next pair of test series (with 5% migration): The similarity values calculated for test series (E) and (F ) are signiﬁcantly higher than those of test series (A) – (D); in other words, the exchange of only 5% of the populations’ models can lead to a signiﬁcant decrease of population diversity among populations of multipopulation GP. When using the SASEGASA, the diversity among populations is high in the beginning and then steadily decreases as the algorithm is executed. This
© 2009 by Taylor & Francis Group, LLC
304
Genetic Algorithms and Genetic Programming
Table 11.30: Multipopulation diversity test results of the GP test runs using the N Ox data set III. Test Series A
B
C
D
E
F
G
Iteration 300 avg std 2000 avg std 20 avg std End of avg Run std 300 avg std 2000 avg std 20 avg std End of avg Run std 300 avg std 2000 avg std 20 avg std End of avg Run std 20 avg std 50 avg std 100 avg std 200 avg std End of avg Run std
Results for the NOx data set MPdiv (avg) MPdiv (max) 0.3187 0.3991 0.0124 0.0685 0.3689 0.4627 0.0288 0.0390 0.1997 0.1498 0.0698 0.0912 0.3723 0.4811 0.0233 0.0244 0.2515 0.3323 0.0968 0.0685 0.3329 0.4741 0.0365 0.0323 0.2985 0.3922 0.0870 0.0825 0.5544 0.6839 0.0542 0.1039 0.5002 0.6697 0.0588 0.0696 0.6002 0.8523 0.0538 0.0263 0.3597 0.5248 0.0743 0.0769 0.5607 0.9080 0.0931 0.0799 0.4471 0.5303 0.0619 0.0897 0.4923 0.6102 0.0854 0.0749 0.5889 0.6939 0.1184 0.0835 0.9047 0.9148 0.0387 0.0258 – – – –
SPdiv (avg) 0.2773 0.0726 0.3300 0.0390 0.2902 0.0604 0.3440 0.0254 0.1935 0.0607 0.2821 0.0402 0.3791 0.0487 0.4208 0.0280 0.3111 0.0474 0.4745 0.0728 0.3901 0.0662 0.4877 0.0249 0.2694 0.0802 0.3025 0.0550 0.3923 0.0812 0.5741 0.1253 0.9683 0.0412
SPdiv (max) 0.8613 0.0799 0.9887 0.0434 0.8992 0.0634 0.9743 0.0482 0.8293 0.1062 0.9311 0.0441 0.8862 0.0829 0.9661 0.0332 0.6037 0.0453 0.9763 0.0910 0.5839 0.0775 0.9906 0.0181 0.4670 0.0522 0.6120 0.0902 0.7972 0.0805 0.9128 0.0401 0.9932 0.0319
is of course due to the reuniﬁcation of populations as soon as the maximum selection pressure is reached. By executing these test series and analyzing the results as given in this section we have demonstrated how multipopulation diversity can be monitored using similarity measures as those described in Section 9.4.2. Reference values are given by parallel GP without migration; of course, the higher the migration rates become, the more migration aﬀects the diversity among GP populations. When using the SASEGASA, rather high multipopulation speciﬁc diversity is given in the early stages of the parallel GP process, and due to the merging of population the diversity decreases and in the end reaches diversity values comparable to those of single population GP with oﬀspring selection.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
305
FIGURE 11.27: Exemplary multipopulation diversity of a test run of Thyroid series F at iteration 50, grayscale representation.
© 2009 by Taylor & Francis Group, LLC
306
11.6 11.6.1
Genetic Algorithms and Genetic Programming
Code Bloat, Pruning, and Population Diversity Introduction
In Chapter 2.6 we have described one of the major problems of genetic programming, namely permanent code growth, often also referred to as bloat; evolution is also seen as “survival of the fattest,” and, as Langdon and Poli expressed it, ﬁtnessbased selection leads to the fact that “ﬁtness causes bloat” [LP97]. There are several approaches for combating this unwanted unlimited growth of chromosome size, some of them being • limiting the size and / or the height of the program trees, • pruning programs, and • punishing complex programs by decreasing their quality depending on their respective tree representations’ size and / or height. Of course, there is no optimal strategy for ﬁxing formula size parameters, population size, or pruning strategies a priori (see also remarks in Chapter 2). Still, some code prevention strategies are surely more recommendable than others; we here report on an exemplary test series for characterizing some of the possible approaches. In all other test series executed and reported on in other sections in this book we have used ﬁxed complexity limits (limiting size and height of program trees); we shall here report on our tests regarding code growth in GPbased structure identiﬁcation applying the pruning strategies presented in Section 9.3.2 as well as structure tree size dependent ﬁtness manipulation and ﬁxed size limits (partially with additional pruning). All these approaches have been tested using standard GP as well as extended GP including gender speciﬁc selection and oﬀspring selection. As an example, we have tested these GP variants on the NOx data set II presented and described in Section 11.1.2.3; population diversity, formula complexity parameters as well as additional pruning eﬀort (only in case of applying pruning, of course) have been monitored and shall be reported on here. We have again used 50% of the given data for training models (namely samples 10,000 – 28,000), and 10% as validation data (samples 28,001 – 32,000 used by pruning strategies) and ∼7.5% as test data (samples 32,001 – 35,000). As we are also aware of the problem of overﬁtting, we have systematically collected each GP run’s best models with respect to best ﬁt on training as well as on validation data (using the mse function for estimating the formulas’ qualities). The algorithm is designed to optimize formulas with respect to training data; validation data are only used for pruning strategies (if used at all). At the end of each test run, the models with best ﬁt on training as well as on validation data are analyzed, and in order to ﬁght overﬁtting we select the
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
307
best model on validation data as the result returned by the algorithm. Test data, which are not available to the algorithm, are used for demonstrating that this strategy is a reasonable one: Analyzing the evaluation of the best models on test data we see that those that are best on validation data perform better on test data than those that were optimally ﬁt to training data. During the GP process, the standard mean squared error function was used; the time series speciﬁc ﬁtness function considering plain values as well as diﬀerential and integral values was used for selecting those models that perform best on training and validation data. All three components (i.e., plain values, diﬀerentials, and integral values) have been weighted using equal weighting factors. When comparing the quality of the results documented in the following sections we again state the ﬁtness values calculated using the mean squared errors function.
11.6.2
Test Strategies
In detail, the following test strategies have been applied: On the one hand the parameters for standard and extended GP are summarized in Table 11.31, and the code growth prevention parameters are summarized in Table 11.32. In all tests the initial population was created using a size limit of 50 nodes and a maximum height of 6 levels for each structure tree.
Table 11.31: GP parameters used for code growth and bloat prevention tests. Variant Parameters 1 Population size: 1000 (Standard GP, 2000 generations Single point crossover; structural and SGP) parametric node mutation Parent selection: Tournament selection (k = 3) 2 Population size: 1000 (Extended GP, Single point crossover; structural and parametric node mutation EGP) Parent selection: Gender speciﬁc selection (random & proportional) Strict oﬀspring selection (maximum selection pressure: 100)
In the following table and in the explanations given afterwards, md is the maximum deterioration limit and mc the maximum coeﬃcient of deterioration and structure complexity reduction as described in Section 9.3.2. For ES
© 2009 by Taylor & Francis Group, LLC
308
Genetic Algorithms and Genetic Programming
based pruning, mr denotes the maximum number of rounds, and mur the maximum number of unsuccessful rounds. In those tests including increased pruning (as applied in test series (h) and (i)) the initial pruning ratio is set to 0.3, i.e., in the beginning 30% of the population are pruned. Then, during the process execution, the pruning rate steadily increases and ﬁnally reaches 0.8; in standard GP runs, the rate is increased linearly, and in extended GP including oﬀspring selection we compute the actual pruning ratio in relation to the actual selection pressure (so that in the end, when the selection pressure has reached its maximum value, the pruning rate has also reached its maximum, namely 0.8). Furthermore, f s stands for the formula’s size (i.e., the number of nodes in the corresponding structure tree), and pf is the ﬁtness punishment factor: If structure complexity based punishment is applied, then the ﬁtness f of a model is modiﬁed as f ′ = f ∗ (1 + pf ) (if pf > 0).
Table 11.32: Summary of the code growth prevention strategies applied in these test series. Variant a b c d e f g h i
Characteristics No code growth prevention strategy 20% systematic pruning: md = 0, mc = 1 20% ESbased pruning: md = 0, mc = 1, λ = 5, mr = 5, mur = 1 50% ESbased pruning: md = 0.5, mc = 1, λ = 10, mr = 10, mur = 2 100% ESbased pruning: md = 2, mc = 1.5, λ = 20, mr = 10, mur = 2 Increasing ESbased pruning: md = 1, mc = 1.5, λ = 10, mr = 10, mur = 2 Quality punishment: pf = (f s − 50)/50 Fixed limits: Maximum tree height 6, maximum tree size 50 Fixed limits: Maximum tree height 6, maximum tree size 50 combined with occasional ESbased pruning standard GP: every 5th , extended GP: every 2nd generation md = 1, mc = 1, λ = 10, mr = 5, mur = 2
Please note that in strategies (b) and (c) pruning is done after each generation step, whereas in (d) – (g) it is done after each creation of a new model by crossover and / or mutation. In standard GP this does not make any diﬀerence, but when using oﬀspring selection the decision whether to prune after
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
309
each creation or after each generation has major eﬀects on the algorithmic process. The mean squared errors function (with early stopping, see Section 9.2.5.5) was used here since we mainly concentrate on pruning and population dynamics relevant aspects. Furthermore, all variables (including the target variable) were linearly scaled to the interval [100; +100].
11.6.3
Test Results
Once again, all test strategies have been executed 5 times independently; formula complexity has been monitored (and protocolled after each generation step) as well as structural population diversity which was protocolled after every 10th generation: The multiplicative similarity approach (as deﬁned in Equations 9.63 – 9.66) has again been chosen; all coeﬃcients c1 . . . c10 were set to 0.2, only the coeﬃcient c1 weighting the level diﬀerence contribution d1 was set to 0.8. The similarity of models was calculated symmetrically (as described in Equation 6.4). 11.6.3.1
No Formula Size Limitation
Exactly as we had expected, extreme code growth also occurs in GPbased structure identiﬁcation; Figure 11.28 illustrates the progress of formula complexity in terms of formula size in exemplary test runs of series 1a and 2a: The average formula size is given as well as minimum and maximum values and the progress of the best individual’s size. As we see here, formulas tend to grow very big rather quickly; when using oﬀspring selection, this eﬀect is even a bit more obvious: On average, in standard GP the formula size has reached 212.84 after 30 iterations; when using OS the average formula size was even higher after 30 generations (namely 276.35). 11.6.3.2
Light Pruning
The results of test series (b) and (c) can be summarized in the following way: Without any further mechanisms that limit the structural complexity of formula trees, light pruning as described in strategies (b) and (c) is not an appropriate way to prevent GP from growing enormous formula structures. After 100 generations, the average formula size in standard GP has grown to 471.34 in test series (1b) and 333.65 in test runs of series (1c) (average standard deviation: 204.29 and 238.27, respectively); in extended GP the average formula size at generation 30 on average reached 293.26 and 276.12 in test runs (2b) and (2c), the respective standard deviations being 157.23 and 124.80. Systematically analyzing the results of the pruning phases performed in test runs (b) and (c) we can compare the performances of ESbased and systematic
© 2009 by Taylor & Francis Group, LLC
310
Genetic Algorithms and Genetic Programming
FIGURE 11.28: Code growth in GP without applying size limits or complexity punishment strategies (left: standard GP, right: extended GP). pruning. For this purpose we have collected the pruning performance statistics for the tests (b) and (c) and summarize them in Table 11.33:
Table 11.33: Performance of systematic and ESbased pruning strategies. Parameter Systematic ESbased pruning pruning Solutions evaluated for 161.02 54.56 pruning one solution Runtime consumed (per iteration) 31.27 sec 12.23 sec Average coeﬃcient of deterioration 0.2495 0.4053 and reduction of structural complexity
Obviously, both pruning methods performed approximately equally well and were able to reduce the complexity of the formulas that were supposed to be pruned. Additionally, we also see that especially for bigger model structures the runtime consumption is a lot higher when using systematic pruning; in the course of a GP process it is not considered necessary or even beneﬁcial to reduce models as much as possible. Therefore we shall in the following
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
311
test runs concentrate on ESbased pruning phases. Thus, we suggest using systematic pruning as a preparation step for results analysis, but not during the execution of GPbased training processes. 11.6.3.3
Medium Pruning
Medium pruning, as applied in test series (d), is in fact able to reduce the size of the formulas stored in the GP populations signiﬁcantly.
Table 11.34: Formula size progress in test series (d). Test series Iteration Formula size avg std (1d) 20 21.83 32.12 50 74.24 111.84 100 123.67 144.78 500 167.51 156.89 2000 168.23 147.56 (2d) 10 10.77 13.27 20 90.43 52.79 50 228.02 112.51 End of run 283.98 172.33
Table 11.35: Quality of results produced in test series (d). Test Best model selection basis Training data Validation data series Evaluation data avg std avg std (1d) Training data 1,178.13 205.20 8,231.38 1,041.87 Validation data 17,962.78 762.97 15,850.49 1,309.10 Test data 7,162.48 690.10 5,996.27 927.09 (2d) Training data 1,823.43 823.56 6,005.74 729.47 Validation data 14,590.83 1,476.25 10,506.30 981.35 Test data 6,341.28 770.42 4,439.27 918.72
The best results obtained in the (d) test series are summarized in Table 11.35: For each test run we have collected the models with best ﬁt on training
© 2009 by Taylor & Francis Group, LLC
312
Genetic Algorithms and Genetic Programming
data as well as those that perform best on validation data; average values are given as well as standard deviations. Obviously, rather strong overﬁtting has happened here; as we had expected, the production of very large formulas leads to overﬁt formulas that are not able to perform well on samples that were not used during the training phase. 11.6.3.4
Strong Pruning
Rather strong pruning was applied in test series (e), and as we see in Table 11.36, the formulas produced by GP are signiﬁcantly smaller than those produced in the previous test series. Still, we observed the fact that genetic diversity is lost very quickly: Already in early stages of the evolutionary processes, the average structural similarity of solutions reaches a very high level (which is documented in the two most right columns of Table 11.36). The quality of the best models produced is very bad (above 5,000), which is why we do here not state any further details about the evaluation of these models on the given data partitions. We suppose that this low results quality is connected to the loss of population diversity (and of course also the fact that the pruning operations applied were allowed to decrease the models’ quality).
Table 11.36: Formula size and population diversity progress in test series (e). Test series Iteration Formula size Solutions similarity avg std avg std (1e) 50 12.82 15.76 0.8912 0.0912 100 18.27 18.15 0.9371 0.0289 500 19.75 23.52 0.9685 0.0187 2000 21.39 20.87 0.9891 0.0095 (2e) 10 15.77 9.23 0.9574 0.0318 20 19.86 10.83 0.9825 0.0247 50 21.64 16.34 0.9921 0.0082 End of run 20.03 18.27 0.9943 0.0093
11.6.3.5
Increased Pruning
As light, medium, and strong pruning did not lead to the desired results, we have also tried increasing pruning as deﬁned in test strategy (f). As we see in Table 11.37, this strategy performs rather well: The size of the formulas produced by GP rises especially in early stages of the GP process, but then decreases and on average ﬁnally reaches values between 80 and 100. In addition to this, the population diversity stays higher in the beginning
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
313
than in GP tests including constantly strong pruning, but eventually decreases and the solutions ﬁnally show higher similarities due to the increased pruning in later algorithmic stages.
Table 11.37: Formula size and population diversity progress in test series (f). Test series Iteration Formula size Solutions similarity avg std avg std (1f) 50 62.72 95.76 0.3674 0.0943 100 91.27 130.77 0.3897 0.1059 500 92.43 107.41 0.6820 0.1124 2000 87.02 90.68 0.8035 0.0861 (2f) 10 40.78 31.47 0.5235 0.0612 20 63.59 59.34 0.7052 0.0803 50 80.26 40.99 0.9450 0.0588 End of run 79.45 47.67 0.9967 0.0156
The quality values of the results produced in this test series are summarized in Table 11.38. Obviously, less overﬁtting has happened than in the tests with light or medium pruning.
Table 11.38: Quality of results produced in test series (f). Test Evaluation data Best model selection basis Training data Validation data series avg std avg std (1f) Training data 2,597.35 542.04 7,781.28 827.83 Validation data 8,904.91 611.02 5,981.52 974.31 Test data 3,786.51 800.38 2,830.78 427.08 (2f) Training data 2,275.24 649.11 3,814.93 850.89 Validation data 9,712.98 767.56 5,862.62 518.53 Test data 4,912.38 1,198.58 2,275.03 931.62
11.6.3.6
Complexity Dependent Quality Punishment
In fact, our GP test runs including complexity dependent quality punishment, i.e., those of test strategy (g), were also able to produce acceptable
© 2009 by Taylor & Francis Group, LLC
314
Genetic Algorithms and Genetic Programming
results for the N Ox data set investigated here. As we see in Table 11.39, in standard GP the formula sizes are rather high in the beginning and then decrease steadily, whereas in GP with oﬀspring selection the models on average include between 50 and 60 nodes during the whole execution of the GP processes. Population diversity values are comparable to those reported for GP tests without pruning or quality dependent punishment as summarized for example in Section 11.4. Figure 11.29 illustrates the formula complexity progress of an exemplary GP run of test series (2g). The qualities of the models with best ﬁt on training and validation are summarized in Table 11.40.
Table 11.39: Formula size and population diversity progress in test series (g). Test series Iteration Formula size Solutions similarity avg std avg std (1g) 50 140.76 90.75 0.3824 0.0534 100 92.62 71.23 0.3916 0.0620 500 73.73 64.99 0.6381 0.0825 2000 79.07 47.61 0.7202 0.0696 (2g) 10 50.24 64.67 0.4873 0.0836 20 60.71 59.01 0.5412 0.0741 50 65.34 48.33 0.8904 0.0852 End of run 58.82 41.87 0.9315 0.0423
Table 11.40: Quality of results produced in test series (g). Test Evaluation data Best model selection basis Training data Validation data series avg std avg std (1g) Training data 1,837.84 526.10 4,729.42 480.36 Validation data 12,902.67 767.35 4,531.73 588.30 Test data 2,597.73 835.41 2,708.36 825.64 (2g) Training data 1,402.19 593.84 3,121.86 773.91 Validation data 9,345.87 738.60 3,949.64 962.03 Test data 2,853.62 812.51 2,618.94 664.07
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
315
FIGURE 11.29: Progress of formula complexity in one of the test runs of series (1g), shown for the ﬁrst ∼400 iterations.
11.6.3.7
Fixed Size Limits
In the case of ﬁxed size limits the crossover and mutation operators have to consider limits for the complexity of models. Model size and population diversity statistics for test series (h) are summarized in Table 11.41; in GP with oﬀspring selection all formulas eventually are maximally big, and the solutions similarity values show results comparable to those reported in Section 11.4. Table 11.42 summarizes the quality of the results produced, again evaluated on training, validation, and test data. Figure 11.30 illustrates the formula complexity progress of exemplary GP test runs of series (1h) and (2h).
Table 11.41: Formula size and population diversity progress in test series (h). Test series Iteration Formula size Solutions similarity avg std avg std (1h) 50 37.4182 6.3174 0.4151 0.0935 100 40.2866 5.8133 0.7231 0.0729 500 41.7823 4.3973 0.8175 0.0326 2000 44.2108 5.0450 0.8629 0.0271 (2h) 10 22.4965 8.3763 0.3973 0.0386 20 27.6203 4.2514 0.6022 0.0493 50 44.9120 6.4871 0.8907 0.0371 End of run 50.0000 0.0000 0.9751 0.0189
© 2009 by Taylor & Francis Group, LLC
316
Genetic Algorithms and Genetic Programming
Table 11.42: Quality of results produced in test series (h). Test Evaluation data Best model selection basis Training data Validation data series avg std avg std (1h) Training data 1,774.94 300.51 4,168.30 1,186.62 Validation data 10,801.77 923.04 4,248.37 858.02 Test data 5,791.25 1,266.51 2,610.64 930.44 (2h) Training data 1,568.12 382.04 3,083.64 502.75 Validation data 9,641.89 833.71 3,738.13 504.89 Test data 4,802.30 1,371.22 1,374.61 704.73
FIGURE 11.30: Progress of formula complexity in one of the test runs of series (1h) (shown left) and one of series (2h) (shown right).
In addition to total statistics we shall also discuss two selected models returned by one of the test runs of series (2h): Model bt is the model that performs best on training data (shown in Figure 11.31), bv the one that performs best on validation data (shown in Figure 11.32). The error distributions on training, validation, and test data partitions are illustrated in Figure 11.33.
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
317
Table 11.43 characterizes the performance of bt and bv by means of mean squared errors as well as the integral values. For this we have calculated the sum of the target values on training, validation, and test data and compared these integral values to those calculated using the models under investigation. Obviously, bt shows a better integral ﬁt on training (and also validation) data, but when it comes to test data, the model that performed best on validation data (bv ) produces much more satisfying results (with an integral error of only 2.354% on test data).
Table 11.43: Comparison of best models on training and validation data (bt and bv , respectively). bt bv Training quality (MSE) 1,434.65 2,253.62 Validation quality (MSE) 9,187.53 3,748.61 Test quality (MSE) 2,936,40 1,461.95 Target training values integral 6.010 ∗ 106 Estimated training values integral 6.037 ∗ 106 6.084 ∗ 106 (0.452%) (+1.220%) Target validation values integral 4.660 ∗ 105 Estimated validation values integral 4.620 ∗ 105 4.517 ∗ 106 (+0.872%) (+3.173%) Target test values integral 3.978 ∗ 105 Estimated test values integral 3.198 ∗ 105 3.886 ∗ 106 (+24.395%) (+2.354%)
11.6.3.8
Fixed Size Limits and Occasional Pruning
Finally, test series with ﬁxed size limits and occasional pruning have also been executed and analyzed; the results regarding formula complexity, population diversity, and results qualities are summarized in Tables 11.44 and 11.45.
Obviously, the results produced are (with respect to evaluation quality) comparable to those produced in the previous series. Still, of course the formula sizes are a bit smaller (due to pruning), and also overﬁtting seems
© 2009 by Taylor & Francis Group, LLC
318
Genetic Algorithms and Genetic Programming
FIGURE 11.31: Model with best ﬁt on training data: Model structure and full evaluation.
FIGURE 11.32: Model with best ﬁt on validation data: Model structure and full evaluation. to have decreased: Even though the ﬁt on training data is not as good as on previous test series, the quality on test data is still very good and comparable to the test performance reached in test series (g) and (h).
11.6.4
Conclusion
In this section we have demonstrated the eﬀects of code bloat and selected prevention strategies for GP. As expected and known from literature, without any limitations or size reducing strategies GP tends to produce bigger and bigger models that ﬁt the given training data, but of course this also increases the probability of producing overﬁt models. Pruning strategies have been analyzed, and the test results show that only strong pruning is able to prevent GP from producing bigger and bigger models, which again decreases
© 2009 by Taylor & Francis Group, LLC
DataBased Modeling with Genetic Programming
319
FIGURE 11.33: Errors distributions of best models: Charts I, II, and III show the errors distributions of the model with best ﬁt on training data evaluated on training, validation, and test data, respectively; charts IV, V, and VI show the errors distributions of the model with best ﬁt on validation data evaluated on training, validation, and test data, respectively. population diversity and leads to results which are not optimal. Complexity dependent ﬁtness punishment as well as ﬁxed size limits enable GP to produce quite good results; occasional pruning in combination with ﬁxed size limits can help to decrease overﬁtting.
© 2009 by Taylor & Francis Group, LLC
320
Genetic Algorithms and Genetic Programming
Table 11.44: Formula size and population diversity progress in test series (i). Test series Iteration Formula size Solutions similarity avg std avg std (1i) 50 34.8365 6.1534 0.4682 0.0852 100 37.1863 4.9901 0.7413 0.0711 500 39.2217 5.2673 0.8388 0.0450 2000 40.1260 4.9724 0.8992 0.0251 (2i) 10 18.5330 6.6114 0.4307 0.0518 20 21.5286 5.3083 0.7202 0.0772 50 38.5143 5.6305 0.9248 0.0403 End of run 48.2051 4.6228 0.9859 0.0178
Table 11.45: Quality of results produced in test series (i). Test series Evaluation data Best model selection basis Training data Validation data avg std avg std (1i) Training data 2,258.22 561.27 5,869.40 1.233.09 Validation data 6,608.26 1,463.49 4,819.26 730.51 Test data 2,238.61 983.57 1,811.05 834.83 (2i) Training data 1,723.07 623.11 4,209.57 499.89 Validation data 6,361.46 921.26 3,607.13 736.05 Test data 3,289.33 945.79 1,434.63 739.22
© 2009 by Taylor & Francis Group, LLC
Conclusion and Outlook
In this book we have discussed basic principles as well as algorithmic improvements in the context of genetic algorithms (GAs) and genetic programming (GP); new problem independent theoretical concepts have been described which are used in order to substantially increase achievable solution qualities. The application of these concepts to signiﬁcant combinatorial optimization problems as well as structure identiﬁcation in time series analysis and classiﬁcation has also been described.
We have presented enhanced concepts for GAs, which enable a selfadaptive interplay of selection and solution manipulation operators. By using these concepts we want to avoid the disappearance and support the combination of alleles from the gene pool that represent solution properties of highly ﬁt individuals (introduced as relevant genetic information). As we have shown in several test series, relevant genetic information is often lost in conventional implementations of GAs and GP; if this happens, it can only be reintroduced into the population’s gene pool by mutation. This dependence on mutation can be reduced by using generic selection principles such as oﬀspring selection (which is also used in the SASEGASA) or selfadaptive population size adjustment (as used by the RAPGA). The survival of essential genetic information by supporting the survival of relevant alleles rather than the survival of aboveaverage chromosomes is the main goal of both these approaches.
In the empirical part of this book we have documented and discussed our experiences in applying these new algorithmic concepts to benchmark as well as real world problems. Concretely, we have used traveling salesman problems as well as vehicle routing problems as representatives of combinatorial optimization problems; time series and classiﬁcation analysis problems have been used as application areas of databased structure identiﬁcation with genetic programming. We have compared the results achievable with standard implementations of GAs and GP to the results achieved using extended algorithmic concepts that do not depend on a concrete problem representation and its operators; the inﬂuences of the new concepts on population dynamics in GA and GP populations have also been analyzed.
321 © 2009 by Taylor & Francis Group, LLC
322
Genetic Algorithms and Genetic Programming
Nevertheless, there is still a lot of work to be done in the context of the research areas we have dealt with in this book. Furthermore, there are a lot of potential synergies which have to be considered and should be explored. • The most important aspect is the following one: As the enhanced algorithmic concepts discussed in this book are problem independent, they can be applied to any kind of optimization problem which can be tackled by a GA or GP. Of course, there are numerous kinds of optimization problems beside traveling salesman and vehicle routing problems which can be solved successfully by genetic algorithms; regarding GP we have up to now more or less only gained experience in using oﬀspring selection in the context of databased modeling, but there is a huge variety of other problems which should also be tried to be solved using the approaches discussed in this book. • HeuristicLab (HL) is our environment for developing and testing optimization methods, tuning parameters, and solving a multitude of problems. The development of HL was started in 2002 and has meanwhile led to a stable and productive optimization platform; it is continuously enhanced and a topic of several publications ([WA04c], [WA04a], [WA04b], [WA05a], and [WWB+ 07]). On the respective website13 the interested reader can ﬁnd information about the design of HeuristicLab, its development over the years, installable software packages, documentation, and publications in the context of HeuristicLab and the research group HEAL14 . One of the most beneﬁcial aspects of HeuristicLab is its plugin based architecture. In software engineering in general, plugin based software systems have become very popular; by not only splitting the source code into diﬀerent modules, but compiling these modules into enclosed readytouse software building blocks, the development of a whole application or complex software system is reduced to the task of selecting, combining, and distributing the appropriate modules. Due to the support of dynamic location and loading techniques oﬀered in modern application frameworks as for example Java or .NET, the modules do not need to be statically linked during compilation, but can be dynamically loaded at runtime. Thus, the core application can be enriched by adding these building blocks, which are therefore called “plugins” as they are additionally plugged into the program. Several problem representations, solution encodings, and numerous algorithmic concepts have so far been developed for HeuristicLab, realizing a large number of heuristic and evolutionary algorithms (genetic algorithms, genetic programming, evolution strategies, tabu search, etc.) for 13 http://www.heuristiclab.com. 14 Heuristic
and Evolutionary Algorithms Laboratory, Linz / Hagenberg, Austria.
© 2009 by Taylor & Francis Group, LLC
Conclusion and Outlook
323
a wide range of problem classes including the traveling salesman problem, the vehicle routing problem, realvalued test functions in diﬀerent dimensions, and, last, but not least, also databased modeling. Still, not only the software platform itself is ﬂexible and extensible, also the algorithms provided in HL are (since version 2.0) not ﬁxed and hardcoded, but can be parameterized and even designed by the user. This is possible by realizing all solution generating and processing procedures as operators working on single solutions or sets of solutions. By providing a set of plugins, each realizing a speciﬁc solution representation or operation, the process of developing new heuristic algorithms is revolutionized. Algorithms do not need to be programmed anymore, but can be created by combining operators of diﬀerent plugins. This approach has a huge advantage: By providing a graphical user interface for selecting plugins and combining operators, no programming or software engineering skills are necessary for this process. As a consequence, algorithms can be modiﬁed, tuned, or developed by experts of diﬀerent ﬁelds with little or even no knowledge in the ﬁeld of software development.
FIGURE 11.34: A simple workbench in HeuristicLab 2.0.
In Figure 11.34 we show a screenshot of a simple HeuristicLab workbench (version HL 2.0): A structure identiﬁcation problem is solved by a GP algorithm using oﬀspring selection. All relevant parts of the algorithm (as for example population initialization, crossover, generational replacement, and oﬀspring selection) can be seen in the left part of the workbench GUI; these parts can be easily rearranged or replaced by users who are not necessarily experts in heuristic optimization or even
© 2009 by Taylor & Francis Group, LLC
324
Genetic Algorithms and Genetic Programming computer science. Thus, we want to transfer algorithm development competence from experts in heuristic optimization to users working on concrete applications; users, who work in domains other than heuristic optimization, will thus no longer have to use heuristics as black box techniques (as it is frequently done nowadays), but can use them as algorithms which can be modiﬁed and easily tuned to speciﬁc problem situations.
• One of our current research interests is to combine agentbased software development techniques with heuristic optimization methods. Here again Genetic Programming is one of the ﬁelds that would on the one hand proﬁt from the intrinsic parallelization of software agents as well as improve the quality and expressiveness of found models. Agents could be programmed to identify diﬀerent variables in the given data sets and examine a broader range of correlations. Each of these agents represents a GP process evolving a population of formulas (models); at given synchronization points, these agents exchange information among each other. Unlike other parallel GP approaches, in which parts of populations are exchanged which in principle all have the same goal (namely to solve a given identiﬁcation task), we here want to establish an information exchange mechanism by which partial information about relationships in the data is passed and shared among the identiﬁcation agents. The probably most important goal of such a parallel GP approach is to develop an automated mechanism that can identify not only singular relationships in data, but rather whole information networks that describe lots of relationships that can be found. This incorporates the use of GP agents that aim to identify models for diﬀerent target variables. So it should become possible to identify classes of equivalent models that diﬀer only in the way certain input variables are described; these results will hopefully help to ﬁnd answers to one of the most important questions in system identiﬁcation, namely which of the potential models are best suited for further theoretical analyses. We hope that one of the results of this book will be an increased interest in population dynamics analysis as well as generic algorithmic developments as for example enhanced selection methods for evolutionary algorithms. By showing the general applicability of enhanced selection concepts in GAs applied to combinatorial problems as well as in GP, we hope that we have been able to inspire readers to apply these concepts to other problems as well as to include them in other variants of evolutionary algorithms.
© 2009 by Taylor & Francis Group, LLC
Symbols and Abbreviations
Symbol Description ANN AUC CGA (C)VRP(TW) CX EA ERX ES GA GP HL kNN NOx OS OX PMX RAPGA RBX ROC SASEGASA SBX SGA TS TSP
Artiﬁcial neural network Area under a ROC curve Canonical genetic algorithm (Capacitated) vehicle routing problem (with time windows) Cyclic crossover for the TSP Evolutionary algorithm Edge recombination crossover for the TSP Evolution strategy Genetic algorithm Genetic programming HeuristicLab knearest neighbor algorithm Nitric oxides Oﬀspring selection Order crossover for the TSP Partially matched crossover for the TSP Relevant alleles preserving genetic algorithm Routebased crossover for the CVRP Receiver operating characteristic Selfadaptive segregative genetic algorithm including aspects of simulated annealing Sequencebased crossover for the CVRP Standard genetic algorithm Tabu search Traveling salesman problem
325 © 2009 by Taylor & Francis Group, LLC
References
[AA05] W. Ashlock and D. Ashlock. Single parent genetic programming. In D. Corne et al., editors, Proceedings of the 2005 IEEE Congress on Evolutionary Computation, volume 2, pages 1172–1179, Edinburgh, UK, 25 September 2005. IEEE Press. [AD95] J. Antes and U. Derigs. A new parallel tour algorithm for the vehicle routing problem with time windows. Technical report, Institut f¨ ur Wirtschaftsinformatik und Operations Research, Universit¨ at K¨oln, 1995. [AD04] E. Alba and B. Dorronsoro. Solving the vehicle routing problem by using cellular genetic algorithms. In J. Gottlieb and G. R. Raidl, editors, Evolutionary Computation in Combinatorial Optimization, volume 3004 of Lecture Notes in Computer Science, pages 11–20, Coimbra, Portugal, 2004. Springer. [AD06] E. Alba and B. Dorronsoro. Computing nine new bestsofar solutions for capacitated VRP with a cellular genetic algorithm. Information Processing Letters, 98(6):225–230, June 2006. [AdRWL05] D. Alberer, L. del Re, S. Winkler, and P. Langthaler. Virtual sensor design of particulate and nitric oxide emissions in a DI diesel engine. In Proceedings of the 7th International Conference on Engines for Automobile ICE 2005, number 200524063, 2005. [Aﬀ01a] M. Aﬀenzeller. A new approach to evolutionary computation: Segregative genetic algorithms (SEGA). In J. Mira and A. Prieto, editors, Connectionist Models of Neurons, Learning Processes, and Artiﬁcial Intelligence, volume 2084 of Lecture Notes in Computer Science, pages 594–601. Springer, 2001. [Aﬀ01b] M. Aﬀenzeller. Segregative genetic algorithms (SEGA): A hybrid superstructure upwards compatible to genetic algorithms for retarding premature convergence. International Journal of Computers, Systems and Signals (IJCSS), 2(1):18–32, 2001. [Aﬀ01c] M. Aﬀenzeller. Transferring the concept of selective pressure from evolutionary strategies to genetic algorithms. In Z. Bub
327 © 2009 by Taylor & Francis Group, LLC
328
References nicki and A. Grzech, editors, Proceedings of the 14th International Conference on Systems Science, volume 2, pages 346– 353. Oﬁcyna Wydawnicza Politechniki Wroclawskiej, 2001. [Aﬀ02] M. Aﬀenzeller. New variants of genetic algorithms applied to problems of combinatorial optimization. In R. Trappl, editor, Cybernetics and Systems 2002, volume 1, pages 75–80. Austrian Society for Cybernetic Studies, 2002. [Aﬀ03] M. Aﬀenzeller. New Hybrid Variants of Genetic Algorithms: Theoretical and Practical Aspects. Schriften der Johannes Kepler Universit¨at Linz. Universit¨atsverlag Rudolf Trauner, 2003. [Aﬀ05] M. Aﬀenzeller. Population Genetics and Evolutionary Computation: Theoretical and Practical Aspects. Trauner Verlag, 2005. [AK89] E. Aarts and J. Korst. Simulated Annealing and Boltzman Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. John Wiley and Sons, 1989. [Alb05] E. Alba. Parallel Metaheuristics: A New Class of Algorithms. Wiley Interscience, 2005. [Alt94a] L. Altenberg. Emergent phenomena in genetic programming. In A. Sebald and L. Fogel, editors, Evolutionary Programming — Proceedings of the Third Annual Conference, pages 233– 241, San Diego, CA, USA, 2426 February 1994. World Scientiﬁc Publishing. [Alt94b] L. Altenberg. The Schema Theorem and Price’s Theorem. In D. Whitley and M. Vose, editors, Foundations of Genetic Algorithms 3, pages 23–49, Estes Park, Colorado, USA, 31 July–2 August 1994. Morgan Kaufmann. Published 1995. [And71] T. W. Anderson. The Statistical Analysis of Time Series. Wiley, 1971. [And76] O. D. Anderson. Time Series Analysis and Forecasting: the BoxJenkins Approach. Butterworth, 1976. [Ang93] P. J. Angeline. Evolutionary Algorithms and Emergent Intelligence. PhD thesis, Ohio State University, 1993. [Ang94] P. J. Angeline. Genetic programming and emergent intelligence. In K. E. Kinnear, Jr., editor, Advances in Genetic Programming, chapter 4, pages 75–98. MIT Press, 1994. [Ang98] P. J. Angeline. Subtree crossover causes bloat. In J. R. Koza et al., editors, Genetic Programming 1998: Proceedings of the
© 2009 by Taylor & Francis Group, LLC
References
329
Third Annual Conference, pages 745–752, University of Wisconsin, Madison, Wisconsin, USA, 2225 July 1998. Morgan Kaufmann. [AT99] E. Alba and J. M. Troya. A survey of parallel distributed genetic algorithms. Complexity (USA), 4(4):31–52, 1999. [AW03] M. Aﬀenzeller and S. Wagner. SASEGASA: An evolutionary algorithm for retarding premature convergence by selfadaptive selection pressure steering. In J. Mira and J. R. Alvarez, editors, Computational Methods in Neural Modeling, volume 2686 of Lecture Notes in Computer Science, pages 438– 445. Springer, 2003. [AW04a] M. Aﬀenzeller and S. Wagner. Reconsidering the selection concept of genetic algorithms from a population genetics inspired point of view. In R. Trappl, editor, Cybernetics and Systems 2004, volume 2, pages 701–706. Austrian Society for Cybernetic Studies, 2004. [AW04b] M. Aﬀenzeller and S. Wagner. SASEGASA: A new generic parallel evolutionary algorithm for achieving highest quality results. Journal of Heuristics  Special Issue on New Advances on Parallel MetaHeuristics for Complex Problems, 10:239– 263, 2004. [AW05] M. Aﬀenzeller and S. Wagner. Oﬀspring selection: A new selfadaptive selection scheme for genetic algorithms. In B. Ribeiro, R. F. Albrecht, A. Dobnikar, D. W. Pearson, and N. C. Steele, editors, Adaptive and Natural Computing Algorithms, Springer Computer Science, pages 218–221. Springer, 2005. [AWW07] M. Aﬀenzeller, S. Wagner, and S. Winkler. Selfadaptive population size adjustment for genetic algorithms. In Proceedings of Computer Aided Systems Theory: EuroCAST 2007, Lecture Notes in Computer Science, pages 820–828. Springer, 2007. [AWW08] M. Aﬀenzeller, S. Winkler, and S. Wagner. Advances in Evolutionary Algorithms, chapter Evolutionary Systems Identiﬁcation: New Algorithmic Concepts and Applications. Artiﬁcial Intelligence. Itech, 2008. [B+ 05] H.G. Beyer et al., editors. GECCO 2005: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, Washington DC, USA, 2529 June 2005. ACM Press. [Bal93] N. Balakrishnan. Simple heuristics for the vehicle routing problem with soft time windows. Journal of the Operational Research Society, 44(3):279–287, 1993.
© 2009 by Taylor & Francis Group, LLC
330
References [BB94] M. Bohanec and I. Bratko. Trading accuracy for simplicity in decision trees. Machine Learning, 15:223 – 250, 1994. [BD91] P. J. Brockwell and R. A. Davis. Time Series: Theory and Methods. Springer, 1991. [BD96] P. J. Brockwell and R. A. Davis. A First Course in Time Series Analysis. Springer, 1996. ¨ [Ber98] M. Bergmann. Tourenplanung mit Zeitfenstern  ein Uberblick. Shaker Verlag, 1998. [BES01] E. Bradley, M. Easley, and R. Stolle. Reasoning about nonlinear system identiﬁcation. Artiﬁcial Intelligence, 133:139–188, December 2001. [Bey01] H. G. Beyer. The Theory of Evolution Strategies. Springer, 2001. [BG05a] O. Br¨ aysy and M. Gendreau. Vehicle Routing Problem with Time Windows, Part I: Route Construction and Local Search Algorithms. Transportation Science, 39(1):104–118, 2005. [BG05b] O. Br¨ aysy and M. Gendreau. Vehicle routing problem with time windows, Part II: Metaheuristics. Transportation Science, 39(1):119–139, 2005. [BGK04] E. K. Burke, S. Gustafson, and G. Kendall. Diversity in genetic programming: An analysis of measures and correlation with ﬁtness. IEEE Transactions on Evolutionary Computation, 8(1):47–62, 2004. [BH74] M. Bellmore and S. Hong. Transformation of multisalesman problem to the standard traveling salesman problem. Journal of the Association of Computer Machinery, 21:500–504, 1974. [BJ76] G. E. P. Box and G. M. Jenkins. Time Series Analysis: Forecasting and Control. HoldenDay, 1976. [BJN+ 98] C. Bernhart, E. L. Johnson, G. L. Nemhauser, M. W. P. Savelsbergh, and P. H. Vance. Branchandprice: Column generation for solving huge integer programs. Mathematical Programming: State of the Art, 46(3):316–329, 1998. [BL04] W. Banzhaf and C. W. G. Lasarczyk. Genetic programming of an algorithmic chemistry. In U.M. O’Reilly, T. Yu, R. L. Riolo, and B. Worzel, editors, Genetic Programming Theory and Practice II, chapter 11, pages 175–190. Springer, Ann Arbor, 1315 May 2004.
© 2009 by Taylor & Francis Group, LLC
References
331
[BMR93] L. Bianco, A. Mingozzi, and S. Ricciardelli. The traveling salesman problem with cumulative costs. NETWORKS: Networks: An International Journal, 23:81–91, 1993. [BNKF98] W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming – An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann, San Francisco, CA, USA, January 1998. [Boc58] F. Bock. An algorithm for solving travelingsalesman and related network optimization problems. 14th National Meeting of the ORSA, 1958. [BR03] C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys, 35(3):268 – 308, 2003. [Bra97] A. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:1145–1159, 1997. [BU95] C. E. Brodley and P. E. Utgoﬀ. Multivariate decision trees. Machine Learning, 19(1):45–77, 1995. [Car94] H. Cartwright. Getting the timing right  the use of genetic algorithms in scheduling. Proceedings of Adaptive Computing and Information Processing Conference, pages 393–411, 1994. [Cav75] D. Cavicchio. Adaptive Search Using Simulated Evolution. PhD thesis, University of Michigan, 1975. [CE69] N. Christoﬁdes and S. Eilon. An algorithm for the vehicle dispatching problem. Operational Research Quarterly, 20:309– 318, 1969. [CGT99] R. Cheng, M. Gen, and Y. Tsujimura. A tutorial survey of jobshop scheduling problems using genetic algorithms. Part II: Hybrid genetic search strategies. Computers & Industrial Engineering, 37(12):51–55, 1999. [Cha01] C. Chatﬁeld, editor. Time Series and Forecasting. Chapman and Hall, 2001. [CJ91a] R. J. Collins and D. R. Jeﬀerson. Antfarm: Towards simulated evolution. In C. G. Langton, C. Taylor, J. Doyne Farmer, and S. Rasmussen, editors, Artiﬁcial Life II, pages 579–601. AddisonWesley, Redwood City, CA, 1991. [CJ91b] R. J. Collins and D. R. Jeﬀerson. Representations for artiﬁcial organisms. In JeanArcady Meyer and Stewart W. Wilson, editors, Proceedings of the First International Conference on
© 2009 by Taylor & Francis Group, LLC
332
References Simulation of Adaptive Behavior: From Animals to Animats, pages 382–390. MIT Press, 1991. [CL98] T. G. Crainic and G. Laporte. Fleet Management and Logistics. Khuwer, 1998. [CMT81] N. Christoﬁdes, A. Mingozzi, and P. Toth. Exact algorithms for solving the vehicle routing problem based on spanning trees and shortest path relaxations. Mathematical Programming, 20(3):255–282, 1981. [CO07] S. Christensen and F. Oppacher. Solving the artiﬁcial ant on the Santa Fe trail problem in 20,696 ﬁtness evaluations. In D. Thierens et al., editors, GECCO ’07: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, volume 2, pages 1574–1579, London, 711 July 2007. ACM Press. [CP80] H. P. Crowder and M. Padberg. Solving largescale symmetric travelling salesman problems to optimality. Management Science, 26:495–509, 1980. [CP94] W. Sam Chung and R. A. Perez. The schema theorem considered insuﬃcient. Proceedings of the Sixth IEEE International Conference on Tools with Artiﬁcial Intelligence, pages 748– 751, 1994. [CP97] E. Cant´ uPaz. A survey of parallel genetic algorithms. Technical Report IlliGAL 97003, University of Illinois at UrbanaChampaign, 1997. [CP01] E. Cant´ uPaz. Eﬃcient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publishers, 2001. [CP+ 03a] E. Cant´ uPaz et al., editors. Genetic and Evolutionary Computation – GECCO 2003, Part I, volume 2723 of Lecture Notes in Computer Science, Chicago, IL, USA, 1216 July 2003. Springer. [CP+ 03b] E. Cant´ uPaz et al., editors. Genetic and Evolutionary Computation – GECCO 2003, Part II, volume 2724 of Lecture Notes in Computer Science. Springer, 1216 July 2003. [CPG99] E. CantuPaz and D. E. Goldberg. On the scalability of parallel genetic algorithms. Evolutionary Computation, 7(4):429– 449, 1999. [Cra85] N. L. Cramer. A representation for the adapative generation of simple sequential programs. International Conference on Genetic Algorithms and Their Applications (ICGA85), pages 183–187, 1985.
© 2009 by Taylor & Francis Group, LLC
References
333
[Cro58] G. Croes. A method for solving travellingsalesman problems. Operations Research, 6:791–812, 1958. [CTE+ 06] P. Collet, M. Tomassini, M. Ebner, S. Gustafson, and A. Ek´ art, editors. Proceedings of the 9th European Conference on Genetic Programming, volume 3905 of Lecture Notes in Computer Science, Budapest, Hungary, 10  12 April 2006. Springer. [CW64] G. Clarke and J. Wright. Scheduling of vehicles from a central depot to a number of delivery points. Operations Research, 12:568–581, 1964. [D+ 04a] K. Deb et al., editors. Genetic and Evolutionary Computation – GECCO2004, Part I, volume 3102 of Lecture Notes in Computer Science, Seattle, WA, USA, 2630 June 2004. SpringerVerlag. [D+ 04b] K. Deb et al., editors. Genetic and Evolutionary Computation – GECCO2004, Part II, volume 3103 of Lecture Notes in Computer Science, Seattle, WA, USA, 2630 June 2004. SpringerVerlag. [DAG01] W. Duch, R. Adamczak, and K. Grabczewski. A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks, 12:277–306, 2001. [Dar98] C. Darwin. The Origin of Species. Wordsworth Classics of World Literature. Wordsworth Editions Limited, 1998. [Dav85] L. Davis. Applying adaptive algorithms to epistatic domains. In Proceedings of the International Joint Conference on Artiﬁcial Intelligence, 1985. [Dei04] M. Deistler. System identiﬁcation and time series analysis: Past, present, and future. In Stochastic Theory and Control: Proceedings of a Workshop held in Lawrence, Kansas, Lecture Notes in Control and Information Sciences, pages 97–110. Springer Berlin / Heidelberg, 2004. [DeJ75] K. A. DeJong. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, University of Michigan, 1975. [DG89] K. Deb and D. E. Goldberg. An investigation of niche and species formation in genetic function optimization. In Proceedings of the Third International Conference on Genetic Algorithms, pages 42–50. Morgan Kaufmann, 1989.
© 2009 by Taylor & Francis Group, LLC
334
References [DH02] J. E. Devaney and J. G. Hagedorn. The role of genetic programming in describing the microscopic structure of hydrating plaster. In E. Cant´ uPaz et al., editors, Late Breaking Papers at the Genetic and Evolutionary Computation Conference (GECCO2002), pages 91–98, New York, NY, July 2002. AAAI. [DHS00] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classiﬁcation. Wiley Interscience, 2nd edition, 2000. [DLJD00] D. Dumitrescu, B. Lazzerini, L. C. Jain, and A. Dumitrescu. Evolutionary Computation. The CRC Press International Series on Computational Intelligence. CRC Press, 2000. [dN06] L. M. de Menezes and N. Y. Nikolaev. Forecasting with genetically programmed polynomial neural networks. International Journal of Forecasting, 22(2):249–265, AprilJune 2006. [Dom90] W. Domschke. Logistik: Rundreisen und Touren. Oldenburg Verlag M¨ unchen Wien, 1990.
[DOMK+ 01] S. Dreiseitl, L. OhnoMachado, H. Kittler, S. Vinterbo, H. Billhardt, and M. Binder. A comparison of machine learning methods for the diagnosis of pigmented skin lesions. Journal of Biomedical Informatics, 34:28–36, 2001. [DR59] G. B. Dantzig and R. H. Ramser. The Truck Dispatching Problem. Management Science, 6:80–91, 1959. [dRLF+ 05] L. del Re, P. Langthaler, C. Furtm¨ uller, S. Winkler, and M. Affenzeller. NOx virtual sensor based on structure identiﬁcation and global optimization. In Proceedings of the SAE World Congress 2005, number 2005010050, 2005. [Dro98] S. Droste. Genetic programming with guaranteed quality. In J. R. Koza et al., editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 54–59, University of Wisconsin, Madison, Wisconsin, USA, 2225 July 1998. Morgan Kaufmann. [E+ 07] M. Ebner et al., editors. Proceedings of the 10th European Conference on Genetic Programming, volume 4445 of Lecture Notes in Computer Science, Valencia, Spain, 11  13 April 2007. Springer. [Eic07] C. F. Eick. Evolutionary Programming: Genetic Programming (http://www2.cs.uh.edu/∼ceick/6367/eiben6.ppt). Department of Computer Science, University of Houston, Texas, 2007.
© 2009 by Taylor & Francis Group, LLC
References
335
[EKK04] J. Eggermont, J. N. Kok, and W. A. Kosters. Detecting and pruning introns for faster decision tree evolution. In X. Yao et al., editors, Parallel Problem Solving from Nature  PPSN VIII, volume 3242 of LNCS, pages 1071–1080, Birmingham, UK, 1822 September 2004. SpringerVerlag. [EN00] A. Ekart and S. Z. Nemeth. A metric for genetic programs and ﬁtness sharing. In R. Poli et al., editors, Genetic Programming, Proceedings of EuroGP’2000, volume 1802 of LNCS, pages 259–270, Edinburgh, 1516 April 2000. SpringerVerlag. [EN01] A. Ekart and S. Z. Nemeth. Selection based on the pareto nondomination criterion for controlling code growth in genetic programming. Genetic Programming and Evolvable Machines, 2(1):61–73, March 2001. [ES03] A. E. Eiben and J. E. Smith. Introduction to Evolutionary Computing. Springer, 2003. [FBF+ 03] P. Flach, H. Blockeel, C. Ferri, J. Hern´andezOrallo, and J. Struyf. Decision support for data mining: Introduction to ROC analysis and its applications. Data mining and decision support: Integration and collaboration, 2003. [FE05] J. E. Fieldsend and R. M. Everson. Formulation and comparison of multiclass ROC surfaces. Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning, pages 41–48, 2005. [FG97] D. B. Fogel and A. Ghozeil. Schema processing under proportional selection in the presence of random eﬀects. IEEE Transactions on Evolutionary Computation, 1(4):290–293, 1997. [FG98] D. B. Fogel and A. Ghozeil. The schema theorem and the misallocation of trials in the presence of stochastic eﬀects. Proceedings of the 7th International Conference on Evolutionary Programming VI, 1447:313–321, 1998. [FJM97] M. L. Fisher, K. J¨ornsteen, and O. B. G. Madsen. Vehicle routing with time windows: Two optimization algorithms. Operations Research, 45(3):488–492, 1997. [FM91] B. R. Fox and M. B. McMahon. Genetic operators for sequencing problems. In Gregory J. E. Rawlins, editor, Foundations of Genetic Algorithms, pages 284–300. Morgan Kaufmann Publishers, 1991. [Fog93] D. B. Fogel. Applying evolutionary programming to selected travelling salesman problems. Cybernetics and Systems, 24:27– 36, 1993.
© 2009 by Taylor & Francis Group, LLC
336
References [Fog94] D. B. Fogel. An introduction to simulated evolutionary optimization. IEEE Transactions on Neural Networks, 5(1):3–14, 1994. [For81] R. Forsyth. BEAGLE – A Darwinian approach to pattern recognition. Kybernetes, 10:159–166, 1981. [FP93] C. Foisy and J. Potvin. Implementing an insertion heuristc on parallel hardware. Computers and Operations Research, 20(7):737–745, 1993. [FP98] P. Funes and J. Pollack. Evolutionary body building: Adaptive physical designs for robots. Artiﬁcial Life, 4(4):337–357, Fall 1998. [FPS06] G. Folino, C. Pizzuti, and G. Spezzano. Improving cooperative GP ensemble with clustering and pruning for pattern classiﬁcation. In M. Keijzer et al., editors, GECCO 2006: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, volume 1, pages 791–798, Seattle, Washington, USA, 812 July 2006. ACM Press. [FPSS96] U. M. Fayyad, G. PiatetskyShapiro, and P. Smyth. From data mining to knowledge discovery: An overview. Advances in Knowledge Discovery and Data Mining, 1996.
[GAMRRP07] M. GarciaArnau, D. Manrique, J. Rios, and A. RodriguezPaton. Initialization method for grammarguided genetic programming. KnowledgeBased Systems, 20(2):127–133, March 2007. AI 2006, The 26th SGAI International Conference on Innovative Techniques and Applications of Artiﬁcial Intelligence. [Gao03] Y. Gao. Population size and sampling complexity in genetic algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) 2003, 2003. [Gas67] T. Gaskell. Bases for vehicle ﬂeet scheduling. Operational Research Quarterly, 18:281–295, 1967. [GAT06] A. L. GarciaAlmanza and E. P. K. Tsang. Simplifying decision trees learned by genetic programming. In Proceedings of the 2006 IEEE Congress on Evolutionary Computation, pages 7906–7912, Vancouver, 621 July 2006. IEEE Press. [GB89] J. J. Grefenstette and J. Baker. How genetic algorithms work: A critical look at implicit parallelism. In J. D. Schaﬀer, editor, Proceedings of the Third International Conference on Genetic Algorithms. Morgan Kaufmann Publishers, 1989.
© 2009 by Taylor & Francis Group, LLC
References
337
[GBD80] B. L. Golden, L. Bodin, and T. Doyle. Approximate traveling salesman algorithm. Operations Research, 28:694–711, 1980. [GGRG85] J. J. Grefenstette, R. Gopal, B. Rosmaita, and D. Van Gucht. Genetic algorithms for the traveling salesperson problem. International Conference on Genetic Algorithms and Their Applications, pages 160–168, 1985. [GL85] D. Goldberg and R. Lingle. Alleles, loci, and the traveling salesman problem. International Conference on Genetic Algorithms, 1985. [GL97] F. Glover and F. Laguna. Tabu Search. Kluwer Academic Publishers, 1997. [Glo86] F. Glover. Future paths for integer programming and links to artiﬁcial intelligence. Computers & Operations Research, 13:533–549, 1986. [GM74] B. Gillett and L. Miller. A heuristic for the vehicle–dispatch problem. Operations Research, 22:340–349, 1974. [GMW82] P. Gill, W. Murray, and M. Wright. Practical Optimization. Academic Press, 1982. [Gol84] B. L. Golden. Introduction to and recent advances in vehicle routing methods. Transportation Planning Models, pages 383– 418, 1984. [Gol89] D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley Longman, 1989. [Gom63] R. E. Gomory. An algorithm for integer solutions to linear programs. In R. L. Graves and P. Wolfe, editors, Recent Advances in Mathematical Programming, pages 269–302. McGrawHill, New York, 1963. [GR94] C. Gathercole and P. Ross. Dynamic training subset selection for supervised learning in genetic programming. In Y. Davidor, H.P. Schwefel, and R. M¨anner, editors, Parallel Problem Solving from Nature III, volume 866 of LNCS, pages 312–321, Jerusalem, 914 October 1994. SpringerVerlag. [Gr¨ o77] M. Gr¨ otschel. Polyedrische Charakterisierung kombinatorischer Optimierungsprobleme. PhD thesis, University of Bonn, 1977. [Gru94] F. Gruau. Neural Network Synthesis using Cellular Encoding and the Genetic Algorithm. PhD thesis, Laboratoire de l’Informatique du Parallilisme, Ecole Normale Supirieure de Lyon, France, 1994.
© 2009 by Taylor & Francis Group, LLC
338
References [GS90] M. GorgesSchleuter. Genetic Algorithms and Population Structures — A Massively Parallel Algorithm. PhD thesis, University of Dortmund, 1990. [Ham58] C. L. Hamblin. Computer languages. The Australian Journal of Science, 20:135–139, 1958. [Ham62] C. L. Hamblin. Translation to and from Polish notation. Computer Journal, 5:210–213, 1962. [Ham94] J. D. Hamilton. Time Series Analysis. Princeton University Press, 1994. [HC89] D. L. Hartl and A. G. Clark. Principles of Population Genetics. Sinauer Associates Inc., 2nd edition, 1989. [Hel00] K. Helsgaun. An eﬀective implementation of the LinKernighan traveling salesman heuristic. European Journal of Operational Research, 126(1):106–130, 2000. [Hey88] J. B. Heywood. Internal Combustion Engine Fundamentals. McGrawHill, 1988. [HGL93] A. Homaifar, S. Guan, and G. E. Liepins. A new approach on the traveling salesman problem by genetic algorithms. In Proceedings of the 5th International Conference on Genetic Algorithms, pages 460–466. Morgan Kaufmann Publishers Inc., 1993. [HHM04] H. Tuan Hao, N. Xuan Hoai, and R. I. McKay. Does it matter where you start? A comparison of two initialisation strategies for grammar guided genetic programming. In R. I. Mckay and S.B. Cho, editors, Proceedings of The Second AsianPaciﬁc Workshop on Genetic Programming, Cairns, Australia, 67 December 2004. [HM82] J. Hanley and B. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143:29–36, 1982.
[HOFLe04] J. Hern´ andezOrallo, C. Ferri, C. Lachiche, and P. Flach (editors). ROC Analysis in Artiﬁcial Intelligence, 1st International Workshop ROCAI2004. 2004. [Hol75] J. H. Holland. Adaption in Natural and Artiﬁcal Systems. University of Michigan Press, 1975. [HRv07] K. Holladay, K. Robbins, and J. von Ronne. FIFTH: A stack based GP language for vector processing. In M. Ebner et al., editors, Proceedings of the 10th European Conference on Genetic Programming, volume 4445 of Lecture Notes in Computer
© 2009 by Taylor & Francis Group, LLC
References
339
Science, pages 102–113, Valencia, Spain, 11  13 April 2007. Springer. [HS95] D. P. Helmbold and R. E. Schapire. Predicting nearly as well as the best pruning of a decision tree. Proceedings of the Eighth Annual Conference on Computational Learning Theory, pages 61–68, 1995. [HSC96] H. J. Hamilton, N. Shan, and N. Cercone. RIAC: A rule induction algorithm based on approximate classiﬁcation. Technical Report CS 9606, Regina University, 1996. [IIS98] T. Ito, H. Iba, and S. Sato. Nondestructive depthdependent crossover for genetic programming. In W. Banzhaf et al., editors, Proceedings of the First European Workshop on Genetic Programming, volume 1391 of LNCS, pages 71–82, Paris, 1415 April 1998. SpringerVerlag. [Jac94] D. Jacquette. Philosophy of Mind. Prentice Hall, 1994. [Jac99] C. Jacob. Lindenmayer systems and growth program evolution. In T. S. Hussain, editor, Advanced Grammar Techniques Within Genetic Programming and Evolutionary Computation, pages 76–79, Orlando, Florida, USA, 13 July 1999. [JCC+ 92] D. Jeﬀerson, R. Collins, C. Cooper, M. Dyer, M. Flowers, R. Korf, C. Taylor, and A. Wang. Evolution as a theme in artiﬁcial life: The genesys/tracker system. Artiﬁcial Life II, pages 417–434, 1992. [Jes42] R. J. Jessen. Statistical investigation of a sample survey for obtaining farm facts. Research Bulletin 304, Iowa State College of Agriculture, 1942. [JHC04] I. Jonyer, L. B. Holder, and D. J. Cook. Attributevalue selection based on minimum description length. In Proceedings of the International Conference on Artiﬁcial Intelligence, pages 1154–1159, 2004. [JM05] X. Jiang and Y. Motai. Incremental online PCA for automatic motion learning of eigen behavior. Proceedings of the 1st International Workshop on Automatic Learning and RealTime ALaRT ‘05, pages 153–164, 2005. [K+ 06] M. Keijzer et al., editors. GECCO 2006: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, Seattle, Washington, USA, 812 July 2006. ACM Press. [Kar77] R. M. Karp. Probabilistic analysis of partitioning algorithms of the traveling salesman problem in the plane. Mathematics of Operations Research, 2:209–224, 1977.
© 2009 by Taylor & Francis Group, LLC
340
References [Kar79] R. M. Karp. A patching algorithm for the nonsymmetric traveling salesman problem. SIAM Journal of Computing, 8:561– 573, 1979. [KBAK99] J. R. Koza, F.H. Bennett III, D. Andre, and M. A. Keane. The design of analog circuits by means of genetic programming. In P. Bentley, editor, Evolutionary Design by Computers, chapter 16, pages 365–385. Morgan Kaufmann, San Francisco, USA, 1999. [Kei96] M. Keijzer. Eﬃciently representing populations in genetic programming. In P. J. Angeline and K. E. Kinnear, Jr., editors, Advances in Genetic Programming 2, chapter 13, pages 259– 278. MIT Press, Cambridge, MA, USA, 1996. [Kei02] M. Keijzer. Scientiﬁc Discovery using Genetic Programming. PhD thesis, Danish Technical University, Lyngby, Denmark, March 2002. [Ken73] M. G. Kendall. Time Series. Griﬃn, 1973. [KGV83] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220:671–680, 1983. [KIAK99] J. R. Koza, F. H. Bennett III, D. Andre, and M. A. Keane. Genetic Programming III: Darvinian Invention and Problem Solving. Morgan Kaufmann Publishers, 1999. [Kin93] K. E. Kinnear, Jr. Generality and diﬃculty in genetic programming: Evolving a sort. In S. Forrest, editor, Proceedings of the 5th International Conference on Genetic Algorithms, ICGA93, pages 287–294, University of Illinois at UrbanaChampaign, 1721 July 1993. Morgan Kaufmann.
[KKS+ 03a] J. R. Koza, M. A. Keane, M. J. Streeter, W. Mydlowec, J. Yu, and G. Lanza. Genetic Programming IV: Routine HumanCompetitive Machine Intelligence. Kluwer Academic Publishers, 2003. [KKS+ 03b] J. R. Koza, M. A. Keane, M. J. Streeter, W. Mydlowec, J. Yu, and G. Lanza. Genetic Programming IV: Routine HumanCompetitive Machine Learning. Kluwer Academic Publishers, 2003. [KM97] N. Kohl and O.B.G. Madsen. An optimization algorithm for the vehicle routing problem with time windows based on lagrangean relaxation. Operations Research, 45(3):395–406, 1997. [KO90] M. G. Kendall and J. K. Ord. Time Series. Edward Arnold, 1990.
© 2009 by Taylor & Francis Group, LLC
References
341
[KOL+ 04] M. Keijzer, U.M. O’Reilly, S. M. Lucas, E. Costa, and T. Soule, editors. Genetic Programming 7th European Conference, EuroGP 2004, Proceedings, volume 3003 of LNCS, Coimbra, Portugal, 57 April 2004. SpringerVerlag. [Koz89] J. R. Koza. Hierarchical genetic algorithms operating on populations of computer programs. In N. S. Sridharan, editor, Proceedings of the Eleventh International Joint Conference on Artiﬁcial Intelligence IJCAI89, volume 1, pages 768–774. Morgan Kaufmann, 2025 August 1989. [Koz92a] J. R. Koza. A genetic approach to the truck backer upper problem and the intertwined spiral problem. In Proceedings of IJCNN International Joint Conference on Neural Networks, volume IV, pages 310–318. IEEE Press, 1992. [Koz92b] J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, 1992. [Koz94] J. R. Koza. Genetic Programming II: Automatic Discovery of Reusable Programs. The MIT Press, 1994. [KP98] R. Kohavi and F. Provost. Glossary of terms. Machine Learning, Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, 30:271–274, 1998. [KRKT87] A. Kolen, A. RinnooyKan, and H. Trienekens. Vehicle routing with time windows. Operations Research, 35(2):266–274, 1987. [KSBM01] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to platt’s SMO algorithm for SVM classiﬁer design. Neural Computation, 13(3):637–649, 2001. [KTC+ 05] M. Keijzer, A. Tettamanzi, P. Collet, J. I. van Hemert, and M. Tomassini, editors. Proceedings of the 8th European Conference on Genetic Programming, volume 3447 of Lecture Notes in Computer Science, Lausanne, Switzerland, 30 March  1 April 2005. Springer. [Kus98] I. Kuscu. Evolving a generalised behavior: Artiﬁcial ant problem revisited. In V. William Porto, N. Saravanan, D. Waagen, and A. E. Eiben, editors, Seventh Annual Conference on Evolutionary Programming, volume 1447 of LNCS, pages 799–808, Mission Valley Marriott, San Diego, California, USA, 2527 March 1998. SpringerVerlag. [Lan95] W. B. Langdon. Evolving data structures using genetic programming. In L. Eshelman, editor, Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA95),
© 2009 by Taylor & Francis Group, LLC
342
References pages 295–302, Pittsburgh, PA, USA, 1519 July 1995. Morgan Kaufmann. [Lan98] W. B. Langdon. Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming!, volume 1 of Genetic Programming. Kluwer, Boston, 24 April 1998. [Lan99] W. B. Langdon. Size fair tree crossovers. In Eric Postma and Marc Gyssen, editors, Proceedings of the Eleventh Belgium/Netherlands Conference on Artiﬁcial Intelligence (BNAIC’99), pages 255–256, Kasteel Vaeshartelt, Maastricht, Holland, 34 November 1999. [Lan00] W. B. Langdon. Size fair and homologous tree genetic programming crossovers. Genetic Programming and Evolvable Machines, 1(1/2):95–119, April 2000. [Lap92] G. Laporte. The vehicle routing problem: An overview of exact and approximate algorithms. European Journal of Operational Research, 59:345–358, 1992. [Lar99] J. Larsen. Parallelization of the Vehicle Routing Problem with Time Windows. PhD thesis, Department of Computer Science, University of Copenhagen, 1999. [LAWR05] P. Langthaler, D. Alberer, S. Winkler, and L. Del Re. Design eines virtuellen Sensors f¨ ur Partikelmessung am Dieselmotor. In M. Horn, M. Hofbauer, and N. Dourdoumas, editors, Proceedings of the 14th Styrian Seminar on Control Engineering and Process Automation (14. Steirisches Seminar u ¨ber Regelungstechnik und Prozessautomatisierung), pages 71–87, 2005. [LC01] T. Loveard and V. Ciesielski. Representing classiﬁcation problems in genetic programming. In Proceedings of the Congress on Evolutionary Computation, volume 2, pages 1070–1077, COEX, World Trade Center, 159 Samseongdong, Gangnamgu, Seoul, Korea, 2001. IEEE Press. [LC05] D. P. X. Li and V. Ciesielski. Multiobjective techniques in genetic programming for evolving classiﬁers. Proceedings of the 2005 Congress on Evolutionary Computation (CEC ’05), pages 183–190, 2005. [Lev44] K. Levenberg. A method for the solution of certain nonlinear problems in least squares. The Quarterly of Applied Mathematics, 2:164–168, 1944.
© 2009 by Taylor & Francis Group, LLC
References
343
[Lev66] V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707– 710, 1966. [LGX97] Y. Leung, Y. Gao, and Z. B. Xu. Degree of population diversity  a perspective on premature convergence in genetic algorithms and its Markov chain analysis. IEEE Transactions on Neural Networks, 8(5):1165–1176, 1997. [LH06] P. Lichodzijewski and M. I. Heywood. Paretocoevolutionary genetic programming for problem decomposition in multiclass classiﬁcation. Proceedings of the Genetic and Evolutionary Computation Conference GECCO’07, pages 464–471, 2006. [Lin65] S. Lin. Computer solutions of the traveling salesman problem. Systems Technical Journal, 44:2245–2269, 1965. [Lju99] L. Ljung. System Identiﬁcation – Theory For the User, 2nd edition. PTR Prentice Hall, Upper Saddle River, N.J., 1999. [LK73] S. Lin and B. W. Kernighan. An eﬀective heuristic algorithm for the travelingsalesman problem. Operations Research, 21:498–516, 1973. [LKM+ 99] P. Larranaga, C. M. H. Kuijpers, R. H. Murga, I. Inza, and D. Dizdarevic. Genetic algorithms for the travelling salesman problem: A review of representations and operators. Artiﬁcial Intelligence Review, 13:129–170, 1999. [LLRKS85] E. L. Lawler, J. K. Lenstra, A. RinnooyKan, and D. B. Shmoys. The Travelling Salesman Problem. Wiley, New York, 1985. [LN00] W. B. Langdon and J. P. Nordin. Seeding GP populations. In Riccardo Poli, Wolfgang Banzhaf, William B. Langdon, Julian F. Miller, Peter Nordin, and Terence C. Fogarty, editors, Genetic Programming, Proceedings of EuroGP’2000, volume 1802 of LNCS, pages 304–315, Edinburgh, 1516 April 2000. SpringerVerlag. [LP97] W. B. Langdon and R. Poli. Fitness causes bloat. In P. K. Chawdhry, R. Roy, and R. K. Pant, editors, Soft Computing in Engineering Design and Manufacturing, pages 13–22. SpringerVerlag London, 2327 June 1997. [LP98] W. B. Langdon and R. Poli. Why ants are hard. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual
© 2009 by Taylor & Francis Group, LLC
344
References Conference, pages 193–201, University of Wisconsin, Madison, Wisconsin, USA, 2225 July 1998. Morgan Kaufmann. [LP02] W. B. Langdon and R. Poli. Foundations of Genetic Programming. Springer Verlag, Berlin Heidelberg New York, 2002. [LS97] S. Luke and L. Spector. A comparison of crossover and mutation in genetic programming. In J. R. Koza et al., editors, Genetic Programming 1997: Proceedings of the Second Annual Conference, pages 240–248, Stanford University, CA, USA, 1316 July 1997. Morgan Kaufmann. [LW95] J. Y. B. Lee and P. C. Wong. The eﬀect of function noise on GP eﬃciency. In X. Yao, editor, Progress in Evolutionary Computation, volume 956 of Lecture Notes in Artiﬁcial Intelligence, pages 1–16. SpringerVerlag, Heidelberg, Germany, 1995. [Mah40] P. Mahalanobis. A sample survey of the acreage under jute in Bengal. Sankhyu, 4:511–530, 1940. [Man97] Y. Mansour. Pessimistic decision tree pruning based on tree size. Proceedings of the Fourteenth International Conference on Machine Learning, pages 195–201, 1997. [Mar63] D. W. Marquardt. An algorithm for leastsquares estimation of nonlinear parameters. SIAM Journal on Applied Mathematics, 11:431–441, 1963. [McC60] J. L. McCarthy. Recursive functions of symbolic expressions and their computation by machine, part I. Communications of the ACM, 3(4):184–195, 1960. [McK00] R. I. McKay. Fitness sharing in genetic programming. In D. Whitley et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO2000), pages 435–442, Las Vegas, Nevada, USA, 1012 July 2000. Morgan Kaufmann. [Men27] K. Menger. Zur allgemeinen Kurventheorie. Mathematicae, 10:96–115, 1927.
Fundamenta
[M¨ uh89] H. M¨ uhlenbein. Parallel genetic algorithms, population genetics and combinatorial optimization. Proceedings of the 3rd International Conference on Genetic Algorithms, pages 416– 421, 1989. [MH99] N. F. McPhee and N. J. Hopper. Analysis of genetic diversity through population history. In W. Banzhaf et al., editors, Proceedings of the Genetic and Evolutionary Computation Con
© 2009 by Taylor & Francis Group, LLC
References
345
ference, volume 2, pages 1112–1120, Orlando, Florida, USA, 1317 July 1999. Morgan Kaufmann. [MIB+ 00] K. Morik, M. Imhoﬀ, P. Brockhausen, T. Joachims, and U. Gather. Knowledge discovery and knowledge validation in intensive care. Artiﬁcial Intelligence in Medicine, 19:225–249, 2000. [Mic92] Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer, 1992. [Min89] J. Mingers. An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4:227 – 243, 1989. [Mit96] M. Mitchell. An Introduction to Genetic Algorithms. The MIT Press, 1996. [Mit00] T. M. Mitchell. Machine Learning. McGrawHill, New York, 2000. [MJK07] D. C. Montgomery, C. L. Jennings, and M. Kulahci. Introduction to Time Series Analysis and Forecasting. Wiley & Sons, 2007. [MK00] Y. Maeda and S. Kawaguchi. Redundant node pruning and adaptive search method for genetic programming. In D. Whitley et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO2000), page 535, Las Vegas, Nevada, USA, 1012 July 2000. Morgan Kaufmann. [Mor91] F. Morrison. The Art of Modeling Dynamic Systems: Forecasting for Chaos, Randomness, and Determinism. John Wiley & Sons, Inc, 1991. [MP43] W. S. McCulloch and W. H. Pitts. A logical calculus of the ideas imminent in nervous activity. In Bulletin of Mathematical Biophysics, volume 5, pages 115–137, 1943. [NB95] P. Nordin and W. Banzhaf. Complexity compression and evolution. In L. Eshelman, editor, Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA95), pages 310–317, Pittsburgh, PA, USA, 1519 July 1995. Morgan Kaufmann. [Nel01] O. Nelles. Nonlinear System Identiﬁcation. Springer Verlag, Berlin Heidelberg New York, 2001. [Nol97] D. Nolan. Quantitative parsimony. British Journal for the Philosophy of Science, 48(3):329–343, 1997.
© 2009 by Taylor & Francis Group, LLC
346
References [Nor97] P. Nordin. Evolutionary Program Induction of Binary Machine Code and its Applications. PhD thesis, Universit¨at Dortmund, Fachbereich Informatik, 1997. [Nør00] M. Nørgaard. Neural network based system identiﬁcation toolbox. Technical Report 00E891, Technical University of Denmark, 2000. [NV92] A. E. Nix and M. D. Vose. Modeling genetic algorithms with markov chains. Annals of Mathematics and Artiﬁcial Intelligence, 5(1):79–88, 1992. [OO94] U.M. O’Reilly and F. Oppacher. The troubling aspects of a building block hypothesis for genetic programming. In L. D. Whitley and M. D. Vose, editors, Foundations of Genetic Algorithms 3, pages 73–88, Estes Park, Colorado, USA, 31 July–2 August 1994. Morgan Kaufmann. Published 1995. [O’R95] U.M. O’Reilly. An Analysis of Genetic Programming. PhD thesis, Carleton University, OttawaCarleton Institute for Computer Science, Ottawa, Ontario, Canada, 22 September 1995. [O’R97] U.M. O’Reilly. Using a distance metric on genetic programs to understand genetic operators. In IEEE International Conference on Systems, Man, and Cybernetics, Computational Cybernetics and Simulation, volume 5, pages 4092–4097, Orlando, Florida, USA, 1215 October 1997. [OSH87] I. M. Oliver, D. J. Smith, and J. R. C. Holland. A study of permutation crossover operators on the travelling salesman problem. In J. J. Grefenstette, editor, Genetic algorithms and their applications: Proceedings of the Second International Conference on Genetic Algorithms, pages 224–230, Hillsdale, NJ, 1987. Lawrence Erlbaum Assoc. [Osm93] I. H. Osman. Metastrategy simulated annealing and tabu search algorithms for the vehicle routing problem. Annals of Operations Research, 41(1–4):421–451, 1993. [Pan83] A. Pankratz. Forecasting With Univariate BoxJenkins Models: Concepts and Cases. Wiley, 1983. [Pan91] A. Pankratz. Forecasting With Dynamic Regression Models. Wiley, 1991. [PB96] J.Y. Potvin and S. Bengio. The Vehicle Routing Problem with Time Windows  Part II: Genetic Search. INFORMS Journal on Computing, 8(2):165–172, 1996.
© 2009 by Taylor & Francis Group, LLC
References
347
[Per94] T. Perkis. Stackbased genetic programming. In Proceedings of the 1994 IEEE World Congress on Computational Intelligence, volume 1, pages 148–153, Orlando, Florida, USA, 2729 June 1994. IEEE Press. [PL97a] R. Poli and W. B. Langdon. An experimental analysis of schema creation, propagation and disruption in genetic programming. In T. Back, editor, Genetic Algorithms: Proceedings of the Seventh International Conference, pages 18–25, Michigan State University, East Lansing, MI, USA, 1923 July 1997. Morgan Kaufmann. [PL97b] R. Poli and W. B. Langdon. Genetic programming with onepoint crossover. In P. K. Chawdhry, R. Roy, and R. K. Pant, editors, Soft Computing in Engineering Design and Manufacturing, pages 180–189. SpringerVerlag London, 2327 June 1997. [PL97c] R. Poli and W. B. Langdon. A new schema theory for genetic programming with onepoint crossover and point mutation. In J. R. Koza et al., editors, Genetic Programming 1997: Proceedings of the Second Annual Conference, pages 278–285, Stanford University, CA, USA, 1316 July 1997. Morgan Kaufmann. [Pla99] J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods  Support Vector Learning, pages 185–208. MIT Press, 1999. [PM01a] R. Poli and N. F. McPhee. Exact GP schema theory for headless chicken crossover and subtree mutation. In Proceedings of the 2001 Congress on Evolutionary Computation CEC2001, pages 1062–1069, COEX, World Trade Center, 159 Samseongdong, Gangnamgu, Seoul, Korea, 2730 May 2001. IEEE Press. [PM01b] R. Poli and N. F. McPhee. Exact schema theorems for GP with onepoint and standard crossover operating on linear structures and their application to the study of the evolution of size. In J. F. Miller, M. Tomassini, P. L. Lanzi, C. Ryan, A. G. B. Tettamanzi, and W. B. Langdon, editors, Genetic Programming, Proceedings of EuroGP’2001, volume 2038 of LNCS, pages 126–142, Lake Como, Italy, 1820 April 2001. SpringerVerlag. [PM01c] R. Poli and N. F. McPhee. Exact schema theory for GP and variablelength GAs with homologous crossover. In L. Spec
© 2009 by Taylor & Francis Group, LLC
348
References tor et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO2001), pages 104–111, San Francisco, California, USA, 711 July 2001. Morgan Kaufmann. [PM03a] R. Poli and N. F. McPhee. General schema theory for genetic programming with subtreeswapping crossover: Part I. Evolutionary Computation, 11(1):53–66, March 2003. [PM03b] R. Poli and N. F. McPhee. General schema theory for genetic programming with subtreeswapping crossover: Part II. Evolutionary Computation, 11(2):169–206, June 2003. [PMR04] R. Poli, N. F. McPhee, and J. E. Rowe. Exact schema theory and markov chain models for genetic programming and variablelength genetic algorithms with homologous crossover. Genetic Programming and Evolvable Machines, 5(1):31–70, March 2004. [Pol97] R. Poli. Evolution of graphlike programs with parallel distributed genetic programming. In T. Back, editor, Genetic Algorithms: Proceedings of the Seventh International Conference, pages 346–353, Michigan State University, East Lansing, MI, USA, 1923 July 1997. Morgan Kaufmann. [Pol99a] R. Poli. New results in the schema theory for GP with onepoint crossover which account for schema creation, survival and disruption. Technical Report CSRP9918, University of Birmingham, School of Computer Science, December 1999. [Pol99b] R. Poli. Parallel distributed genetic programming. In David Corne, Marco Dorigo, and Fred Glover, editors, New Ideas in Optimization, Advanced Topics in Computer Science, chapter 27, pages 403–431. McGrawHill, Maidenhead, Berkshire, England, 1999. [Pol00a] R. Poli. Exact schema theorem and eﬀective ﬁtness for GP with onepoint crossover. In D. Whitley et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO2000), pages 469–476, Las Vegas, Nevada, USA, 1012 July 2000. Morgan Kaufmann. [Pol00b] R. Poli. Hyperschema theory for GP with onepoint crossover, building blocks, and some new results in GA theory. In R. Poli, W. Banzhaf, W. B. Langdon, J. F. Miller, P. Nordin, and T. C. Fogarty, editors, Genetic Programming, Proceedings of EuroGP’2000, volume 1802 of LNCS, pages 163–180, Edinburgh, 1516 April 2000. SpringerVerlag.
© 2009 by Taylor & Francis Group, LLC
References
349
[Pol00c] R. Poli. A macroscopic exact schema theorem and a redeﬁnition of eﬀective ﬁtness for GP with onepoint crossover. Technical Report CSRP001, University of Birmingham, School of Computer Science, February 2000. [Pol01] R. Poli. Exact schema theory for genetic programming and variablelength genetic algorithms with onepoint crossover. Genetic Programming and Evolvable Machines, 2(2):123–163, June 2001. [Pop92] K. Popper. The Logic of Scientiﬁc Discovery. Taylor & Francis, 1992. [PP01] F. Previdi and T. Parisini. Modelfree fault detection: a spectral estimation approach based on coherency functions. International Journal of Control, 74:1107–1117, 2001. [PR93] J. Potvin and J. Rousseau. A parallel route building algorithm for the vehicle routing and scheduling problem with time windows. European Journal of Operations Research, 66:331–340, 1993. [Pri04] C. Prins. A simple and eﬀective evolutionary algorithm for the vehicle routing problem. Computers & Operations Research, 31(12):1985–2002, 2004. [PRM01] R. Poli, J. E Rowe, and N. F. McPhee. Markov models for GP and variablelength GAs with homologous crossover. Technical Report CSRP016, University of Birmingham, School of Computer Science, January 2001. [PS82] C. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. PrenticeHall, 1982. [PTMC02] F. B. Pereira, J. Tavares, P. Machado, and E. Costa. GVR: A New Genetic Representation for the Vehicle Routing Problem. In AICS ’02: Proceedings of the 13th Irish International Conference on Artiﬁcial Intelligence and Cognitive Science, pages 95–102, London, UK, 2002. SpringerVerlag. [PTT01] D. Pe˜ na, G. C. Tiao, and R. S. Tsay. A Course in Time Series Analysis. Wiley, 2001. [Que03] Christian Queinnec. LISP in Small Pieces. Cambridge University Press, 2003. [Raw91] G. J. E. Rawlins, editor. Foundations of Genetic Algorithms, volume 1. Morgan Kaufmann Publishers, 1991. [RB96] J. P. Rosca and D. H. Ballard. Discovery of subroutines in genetic programming. In Peter J. Angeline and K. E. Kinnear,
© 2009 by Taylor & Francis Group, LLC
350
References Jr., editors, Advances in Genetic Programming 2, chapter 9, pages 177–202. MIT Press, Cambridge, MA, USA, 1996. [Rec73] I. Rechenberg. Evolutionsstrategie. Friedrich Frommann Verlag, 1973. [Ree95] C. Reeves. Modern Heuristic Techniques for Combinatorial Optimization. McGrawHill International Ltd., 1995. [Rei91] G. Reinelt. TSPLIB  A traveling salesman problem library. ORSA Journal on Computing, 3:376–384, 1991. [RF99] J. L. Rodr´ıguezFern´andez. 23:121–125, 1999.
Ockham’s razor.
Endeavour,
[RN03] S. J. Russell and P. Norvig. Artiﬁcial Intelligence: A Modern Approach. Prentice Hall, 2nd edition, 2003. [Ros95a] J. Rosca. Towards automatic discovery of building blocks in genetic programming. In E. V. Siegel and J. R. Koza, editors, Working Notes for the AAAI Symposium on Genetic Programming, pages 78–85, MIT, Cambridge, MA, USA, 10– 12 November 1995. AAAI. [Ros95b] J. P. Rosca. Entropydriven adaptive representation. In Justinian P. Rosca, editor, Proceedings of the Workshop on Genetic Programming: From Theory to RealWorld Applications, pages 23–32, Tahoe City, California, USA, 9 July 1995. [Ros97] J. P. Rosca. Analysis of complexity drift in genetic programming. In J. R. Koza et al., editors, Genetic Programming 1997: Proceedings of the Second Annual Conference, pages 286–294, Stanford University, CA, USA, 1316 July 1997. Morgan Kaufmann. [RS94] N. J. Radcliﬀe and P. D. Surry. Fitness variance of formae and performance prediction. In L. D. Whitley and M. D. Vose, editors, Foundations of Genetic Algorithms, volume 3, pages 51–72. Morgan Kaufmann Publishers, 1994. [Rus95] R. A. Russell. Hybrid heuristics for the vehicle routing problem with time windows. Transportation Science, 29:156–166, 1995. [Sam59] A. L. Samuel. Some studies in machine learning using the game of checkers. In IBM Journal of Research and Development, volume 3, pages 211 – 229, 1959. [Sav85] M. W. P. Savelsbergh. Local search in routing problems with time windows. Annals of Operations Research, 4:285–305, 1985.
© 2009 by Taylor & Francis Group, LLC
References
351
[Sch75] H.P. Schwefel. Evolutionsstrategie und numerische Optimierung. PhD thesis, Technische Universit¨at Berlin, 1975. [Sch94] H.P. Schwefel. Numerische Optimierung von ComputerModellen mittels der Evolutionsstrategie. Birkh¨auser Verlag, Basel, Switzerland, 1994. [SD88] M. Solomon and J. Desrosiers. Time window constrained routing and scheduling problems. Transportation Science, 22(1):1– 13, 1988. [SF98] T. Soule and J. A. Foster. Removal bias: a new cause of code growth in tree based evolutionary programming. In 1998 IEEE International Conference on Evolutionary Computation, pages 781–186, Anchorage, Alaska, USA, 59 May 1998. IEEE Press. [SFD96] T. Soule, J. A. Foster, and J. Dickinson. Code growth in genetic programming. In J. R. Koza et al., editors, Genetic Programming 1996: Proceedings of the First Annual Conference, pages 215–223, Stanford University, CA, USA, 28–31 July 1996. MIT Press. [SFP93] R. E. Smith, S. Forrest, and A. S. Perelson. Population diversity in an immune systems model: Implications for genetic search. In Foundations of Genetic Algorithms, volume 2, pages 153–166. Morgan Kaufmann Publishers, 1993. [SG99] K. Sterelny and P. E. Griﬃths. Sex and Death: An Introduction to Philosophy of Biology. University of Chicago Press, 1999. [SH98] P. W. H. Smith and K. Harries. Code growth, explicitly deﬁned introns, and alternative selection schemes. Evolutionary Computation, 6(4):339–360, Winter 1998. [SHF94] E. Sch¨ oneburg, F. Heinzmann, and S. Feddersen. Genetische Algorithmen und Evolutionsstrategien. AddisonWesley, 1994. [Sig86] I. K. Sigal. Computational implementation of a combined branch and bound algorithm for the travelling salesman problem. Computational Mathematics and Mathematical Physics, 26:14–19, 1986. [SJW92] W. Schiﬀmann, M. Joost, and R. Werner. Optimization of the backpropagation algorithm for training multilayer perceptrons. Technical Report 15, University of Koblenz, Institute of Physics, 1992. [SJW93] W. Schiﬀmann, M. Joost, and R. Werner. Comparison of optimized backpropagation algorithms. Proceedings of the Eu
© 2009 by Taylor & Francis Group, LLC
352
References ropean Symposium on Artiﬁcial Neural Networks ESANN ’93, pages 97–104, 1993. [Smi80] S. F. Smith. A Learning System Based on Genetic Adaptive Algorithms. PhD thesis, University of Pittsburgh, 1980. [SMM+ 91] T. Starkweather, S. McDaniel, K. Mathias, D. Whitley, and C. Whitley. A comparison of genetic scheduling operators. Proceedings of the Fourth International Conference on Genetic Algorithms, pages 69–76, 1991. [SOG04] K. Sastry, U.M. O’Reilly, and D. E. Goldberg. Population sizing for genetic programming based on decision making. In U.M. O’Reilly et al., editors, Genetic Programming Theory and Practice II, chapter 4, pages 49–65. Springer, Ann Arbor, 1315 May 2004. [Sol86] M. M. Solomon. On the worstcase performance of some heuristics for the vehicle routing and scheduling problem with time window constraints. Networks, 16:161–174, 1986. [Sol87] M. M. Solomon. Algorithms for the Vehicle Routing and Scheduling Problem with Time Window Constraints. Operations Research, 35:254–265, 1987. [SP94] M. Srinivas and L. Patnaik. Adaptive probabilities of crossover and mutation in genetic algorithms. In IEEE Trans. on Systems, Man, and Cybernetics, volume 24, pages 656–667, 1994. [SPWR02] C. R. Stephens, R. Poli, A. H. Wright, and J. E. Rowe. Exact results from a coarse grained formulation of the dynamics of variablelength genetic algorithms. In W. B. Langdon et al., editors, GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 578–585, New York, 913 July 2002. Morgan Kaufmann Publishers. [Sri99] A. Srinivasan. Note on the location of optimal classiﬁers in ndimensional ROC space (technical report PRGTR299). Technical report, Oxford University Computing Laboratory, 1999. [SW97] C. R. Stephens and H. Waelbroeck. Eﬀective degrees of freedom in genetic algorithms and the block hypothesis. Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA97), pages 34–40, 1997. [SW99] C. R. Stephens and H. Waelbroeck. Schemata evolution and building blocks. Evolutionary Computation, 7(2):109–124, 1999.
© 2009 by Taylor & Francis Group, LLC
References
353
[SWM91] T. Starkweather, D. Whitley, and K. Mathias. Optimization using distributed genetic algorithms. Parallel Problem Solving from Nature, pages 176–185, 1991. [T+ 07] D. Thierens et al., editors. GECCO 2007: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, London, UK, 711 July 2007. ACM Press. [Tai93] E. D. Taillard. Benchmarks for basic scheduling problems. European Journal of Operational Research, 64:278–285, 1993. [TG97] I. Taha and J. Ghosh. Evaluation and ordering of rules extracted from feedforward networks. Proceedings of the IEEE International Conference on Neural Networks, pages 221–226, 1997. [TH02] M. Terrio and M. I. Heywood. Directing crossover for reduction of bloat in GP. In W. Kinsner, A. Seback, and K. Ferens, editors, IEEE CCECE 2003: IEEE Canadian Conference on Electrical and Computer Engineering, pages 1111–1115. IEEE Press, 1215 May 2002. [Tha95] S. R. Thangiah. Vehicle Routing with Time Windows using Genetic Algorithms, chapter Chapter 11, pages 253–278. The Practical Handbook of Genetic Algorithms: New Frontiers. CRC Press, 1995. [THL94] J. TingHoLo. Synthetic approach to optimal ﬁltering. IEEE Transactions on Neural Networks, 5:803–811, 1994. [TMPC03] J. Tavares, P. Machado, F. B. Pereira, and E. Costa. On the inﬂuence of GVR in vehicle routing. In SAC’03: Proceedings of the 2003 ACM Symposium on Applied Computing, pages 753–758. ACM, 2003. [Tom95] M. Tomassini. A survey of genetic algorithms. Annual Reviews of Computational Physics, 3:87–118, 1995. [TOS94] S. Thangiah, I. Osman, and T. Sun. Hybrid genetic algorithm simulated annealing and tabu search methods for vehicle routing problem with time windows. Technical report, Computer Science Department, Slippery Rock University, 1994. [TPS96] S. R. Thangiah, J.Y. Potvin, and T. Sun. Heuristic approaches to vehicle routing with backhauls and time windows. International Journal on Computers and Operations Research, 23(11):1043–1057, 1996. [Vap98] V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
© 2009 by Taylor & Francis Group, LLC
354
References [vB95] A. v. Breedam. Vehicle routing: Bridging the gap between theory and practice. Belgian Journal of Operations Research, Statistics and Computer Science, 35(1):63–80, 1995. [vBS04] R. van Basshuysen and F. Sch¨afer. Internal Combustion Engine Handbook. SAE International, 2004. [VL91] M. D. Vose and G. E. Liepins. Punctuated equilibria in genetic search. Complex Systems, 5:31–44, 1991. [Voi31] B. F. Voigt. Der Handlungsreisende, wie er sein soll und was er zu thun hat, um Auftr¨ age zu erhalten und eines gl¨ ucklichen Erfolgs in seinen Gesch¨ aften gewiss zu sein. Von einem alten Commis Voyageur, 1831. [Vos99] M. D. Vose. The Simple Genetic Algorithm: Foundations and Theory. MIT Press, Cambridge, MA, 1999. [W18] M. Thorburn W. The myth of occam’s razor. Mind, 27:345– 353, 1918. [WA04a] S. Wagner and M. Aﬀenzeller. HeuristicLab grid  a ﬂexible and extensible environment for parallel heuristic optimization. In Z. Bubnicki and A. Grzech, editors, Proceedings of the 15th International Conference on Systems Science, volume 1, pages 289–296. Oﬁcyna Wydawnicza Politechniki Wroclawskiej, 2004. [WA04b] S. Wagner and M. Aﬀenzeller. HeuristicLab grid  a ﬂexible and extensible environment for parallel heuristic optimization. Journal of Systems Science, 30(4):103–110, 2004. [WA04c] S. Wagner and M. Aﬀenzeller. The heuristicLab optimization environment. Technical report, Institute of Formal Models and Veriﬁcation, Johannes Kepler University, Linz, Austria, 2004. [WA05a] S. Wagner and M. Aﬀenzeller. HeuristicLab: A generic and extensible optimization environment. In B. Ribeiro, R. F. Albrecht, A. Dobnikar, D. W. Pearson, and N. C. Steele, editors, Adaptive and Natural Computing Algorithms, Springer Computer Science, pages 538–541. Springer, 2005. [WA05b] S. Wagner and M. Aﬀenzeller. SexualGA: Genderspeciﬁc selection for genetic algorithms. In N. Callaos, W. Lesso, and E. Hansen, editors, Proceedings of the 9th World MultiConference on Systemics, Cybernetics and Informatics (WMSCI) 2005, volume 4, pages 76–81. International Institute of Informatics and Systemics, 2005.
© 2009 by Taylor & Francis Group, LLC
References
355
[WAW04a] S. Winkler, M. Aﬀenzeller, and S. Wagner. Identifying nonlinear model structures using genetic programming techniques. In R. Trappl, editor, Cybernetics and Systems 2004, volume 1, pages 689–694. Austrian Society for Cybernetic Studies, 2004. [WAW04b] S. Winkler, M. Aﬀenzeller, and S. Wagner. New methods for the identiﬁcation of nonlinear model structures based upon genetic programming techniques. In Z. Bubnicki and A. Grzech, editors, Proceedings of the 15th International Conference on Systems Science, volume 1, pages 386–393. Oﬁcyna Wydawnicza Politechniki Wroclawskiej, 2004. [WAW05a] S. Winkler, M. Aﬀenzeller, and S. Wagner. Genetic programming based model structure identiﬁcation using online system data. In F. Barros, A. Bruzzone, C. Frydman, and N. Gambiasi, editors, Proceedings of Conceptual Modeling and Simulation Conference CMS 2005, pages 177–186. Frydman, LSIS, Universit´e Paul C´ezanne Aix Marseille III, 2005. [WAW05b] S. Winkler, M. Aﬀenzeller, and S. Wagner. New methods for the identiﬁcation of nonlinear model structures based upon genetic programming techniques. Journal of Systems Science, 31(1):5–13, 2005. [WAW06a] S. Winkler, M. Aﬀenzeller, and S. Wagner. Advances in applying genetic programming to machine learning, focussing on classiﬁcation problems. In Proceedings of the 9th International Workshop on Nature Inspired Distributed Computing NIDISC ’06, part of the Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium IPDPS 2006. IEEE, 2006. [WAW06b] S. Winkler, M. Aﬀenzeller, and S. Wagner. Automatic data based patient classiﬁcation using genetic programming. In R. Trappl, R. Brachman, R.A. Brooks, H. Kitano, D. Lenat, O. Stock, W. Wahlster, and M. Wooldridge, editors, Cybernetics and Systems 2006, volume 1, pages 251–256. Austrian Society for Cybernetic Studies, 2006. [WAW06c] S. Winkler, M. Aﬀenzeller, and S. Wagner. HeuristicModeler: A multipurpose evolutionary machine learning algorithm and its applications in medical data analysis. In A. Bruzzone, A. Guasch, M. Piera, and J. Rozenblit, editors, Proceedings of the International Mediterranean Modelling Multiconference I3M 2006, pages 629–634. Piera, LogiSim, Barcelona, Spain, 2006. [WAW06d] S. Winkler, M. Aﬀenzeller, and S. Wagner. Sets of receiver operating characteristic curves and their use in the evalua
© 2009 by Taylor & Francis Group, LLC
356
References tion of multiclass classiﬁcation. In Proceedings of the Genetic and Evolutionary Computation Conference GECCO 2006, volume 2, pages 1601–1602. Association for Computing Machinery (ACM), 2006. [WAW06e] S. Winkler, M. Aﬀenzeller, and S. Wagner. Using enhanced genetic programming techniques for evolving classiﬁers in the context of medical diagnosis  an empirical study. In Proceedings of the GECCO 2006 Workshop on Medical Applications of Genetic and Evolutionary Computation (MedGEC 2006). Association for Computing Machinery (ACM), 2006. [WAW07] S. Winkler, M. Aﬀenzeller, and S. Wagner. Advanced genetic programming based machine learning. Journal of Mathematical Modelling and Algorithms, 6(3):455–480, 2007. [WAW08] S. Winkler, M. Aﬀenzeller, and S. Wagner. Oﬀspring selection and its eﬀects on genetic propagation in genetic programming based system identiﬁcation. In Robert Trappl, editor, Cybernetics and Systems 2008, volume 2, pages 549–554. Austrian Society for Cybernetic Studies, 2008. [WC99] P. A. Whigham and P. F. Crapper. Time series modelling using genetic programming: An application to rainfallrunoﬀ models. In L. Spector et al., editors, Advances in Genetic Programming 3, chapter 5, pages 89–104. MIT Press, Cambridge, MA, USA, June 1999.
[WEA+ 06] S. Winkler, H. Efendic, M. Aﬀenzeller, L. Del Re, and S. Wagner. Online modeling based on genetic programming. International Journal on Intelligent Systems Technologies and Applications, 2(2/3):255–270, 2006. [Wei06] W. S. Wei. Time Series Analysis – Univariate and Multivariate Methods. AddisonWesley, 2006. [Wen95] O. Wendt. Tourenplanung durch Einsatz naturanaloger Verfahren. Deutscher Universit¨atsverlag, 1995. [WER06] S. Winkler, H. Efendic, and L. Del Re. Quality preassessment in steel industry using data based estimators. In S. Cierpisz, K. Miskiewicz, and A. Heyduk, editors, Proceedings of the IFAC Workshop MMM’2006 on Automation in Mining, Mineral and Metal Industry. International Federation for Automatic Control, 2006. [WF05] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2005.
© 2009 by Taylor & Francis Group, LLC
References
357
[WH87] P. H. Winston and B. K. P. Horn. LISP. Addison Wesley, 1987. [Whi93] D. Whitley, editor. Foundations of Genetic Algorithms, volume 2. Morgan Kaufmann Publishers, 1993. [Whi95] P. A. Whigham. A schema theorem for contextfree grammars. In 1995 IEEE Conference on Evolutionary Computation, volume 1, pages 178–181, Perth, Australia, 29 November  1 December 1995. IEEE Press. [Whi96a] P. A. Whigham. Grammatical Bias for Evolutionary Learning. PhD thesis, School of Computer Science, University College, University of New South Wales, Australian Defence Force Academy, Canberra, Australia, 14 October 1996. [Whi96b] P. A. Whigham. Search bias, language bias, and genetic programming. In John R. Koza, David E. Goldberg, David B. Fogel, and Rick L. Riolo, editors, Genetic Programming 1996: Proceedings of the First Annual Conference, pages 230–237, Stanford University, CA, USA, 28–31 July 1996. MIT Press. [WHMS03] J. WenHua, D. Madigan, and S. L. Scott. On bayesian learning of sparse classiﬁers. Technical Report 200308, Avaya Labs Research, 2003. [Wig60] E. P. Wigner. The unreasonable eﬀectiveness of mathematics in the natural sciences. In Communications on Pure and Applied Mathmatics, volume XIII, pages 1–14. John Wiley & Sons, Inc, New York, 1960. [Win04] S. Winkler. Identifying nonlinear model structures using genetic programming. Master’s thesis, Johannes Kepler University, Linz, Austria, 2004. [Win08] S. Winkler. Evolutionary System Identiﬁcation  Modern Concepts and Practical Applications. PhD thesis, Institute for Formal Models and Veriﬁcation, Johannes Kepler University Linz, 2008. [WK90] S. M. Weiss and I. Kapouleas. An empirical comparison of pattern recognition, neural nets, and machine learning classiﬁcation methods. In J. W. Shavlik and T. G. Dietterich, editors, Readings in Machine Learning, pages 177–183. Kaufmann, San Mateo, CA, 1990. [WL96] A. S. Wu and R. K. Lindsay. A survey of intron research in genetics. In H.M. Voigt, W. Ebeling, I. Rechenberg, and H.P. Schwefel, editors, Parallel Problem Solving From Nature IV. Proceedings of the International Conference on Evolutionary
© 2009 by Taylor & Francis Group, LLC
358
References Computation, volume 1141 of LNCS, pages 101–110, Berlin, Germany, 2226 September 1996. SpringerVerlag. [Wri43] S. Wright. Isolation by distance. Genetics, 28:114–138, 1943. [WSF89] D. Whitley, T. Starkweather, and D. Fuguay. Scheduling problems and traveling salesman: The genetic edge recombination operator. Proceedings of the Third International Conference on Genetic Algorithms and Their Applications, pages 133–140, 1989.
[WWB+ 07] S. Wagner, S. Winkler, R. Braune, G. Kronberger, A. Beham, and M. Aﬀenzeller. Beneﬁts of pluginbased heuristic optimization software systems. 2007. [YA94] Y. Yoshida and N. Adachi. A diploid genetic algorithm for preserving population diversity  pseudomeiosis GA. In Lecture Notes in Computer Science, volume 866, pages 36–45. Springer, 1994. [YN97] T. Yamada and R. Nakano. Genetic algorithms for jobshop scheduling problems, 1997. [Zha97] B.T. Zhang. A taxonomy of control schemes for genetic code growth. Position paper at the Workshop on Evolutionary Computation with Variable Size Representation at ICGA97, 20 July 1997. [Zha00] B.T. Zhang. Bayesian methods for eﬃcient genetic programming. Genetic Programming and Evolvable Machines, 1(3):217–242, July 2000. [Zhu00] K. Q. Zhu. A new genetic algorithm for VRPTW. In Proceedings of the International Conference on Artiﬁcial Intelligence, 2000. [ZM96] B.T. Zhang and H. M¨ uhlenbein. Adaptive ﬁtness functions for dynamic growing/pruning of program trees. In P. J. Angeline and K. E. Kinnear, Jr., editors, Advances in Genetic Programming 2, chapter 12, pages 241–256. MIT Press, Cambridge, MA, USA, 1996. [Zwe93] M. H. Zweig. Receiveroperating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39:561–577, 1993.
© 2009 by Taylor & Francis Group, LLC