2018年3月1日 星期四

AI I - genetic algorithm

As a professional programmer, it is no-brainer paying attention to what's the trend now.  So, what's the hottest topic now? We see AlphaGo defeat all human masters and Google change its strategy from 'mobile first' to 'AI first' and even Apple was developing its own AI chip. Obviously, every one is crazy about AI. As a programmer interesting in technology, of course we have to try our hands at it.

Caveat: if you are interested in learning AI, don't waste your time reading this blog, head for
http://cs231n.github.io/
and start learning! Nothing compare to a systematical learning experience!
Also, there are videos on youtube you can watch:
Stanford
NTU
I am sure you will soon gain more knowledge than I do. (Well, I am not an expert anyway. XD)


Whoops, you are still reading me?
Okay... as a new start, let's learn a little bit of Genetic Algorithm!

What? Does this have anything to do with AI?
It seems long ago there were an alternative way of finding the weights of a neural network beside the main stream gradient descend method. That's genetic algorithm, and it seems only a small group of people still investing research effort in this domain...

If it is not trendy, then why should I study it? Well, as a hobby, does it matter if the topic is trendy? No! If we can gain some insights from the study, there's no problem.

You can even apply the evolution-inspired genetic algorithm in some domains, and it works! That's actually quite impressive!

OK, genetic algorithm is very simple, just follow these steps:

  1. Generate some population represented by some format randomly. We call it chromosome. The format can be an array of bit, byte, float, etc.
  2. Put these creatures into our run-time environment to test. When tests end, they gain a score represented their fitness. (scores like how many candies one got, how far it run, how long it survive)
  3. Spawn new generation base on their scores. One with higher score have higher chance to be selected to breed. After you choose every two entities, you make them crossover to exchange their genes or put them to new population directly. In the process, there's chance mutation happen, like bit 0 become bit 1, or a float number fluctuate a bit.
  4. When you have enough amount of entities, go to step 2 for next run. Repeat these steps until we find the solution.


You have certainly noticed we have a lot of places in the process that depends on chances, probability. After all, genetic algorithm is a systematical method to try all possible solutions. If you blindly try all random combinations, chances are even you can find the solution, it will take prohibitive amount of time.

Below is an example code for you to check out. It can find the solution neural network for XOR operation.

And before that, whoops, sorry for brought up neural network out of nowhere. So lets see what's a neural network.
A neural network is a network consist of neurons where neurons connect with each forming some topology. The calculation start from the inputs from outside world, we call input layer. Those neurons connected with the inputs sum its connected inputs and through an activation function (ex. sigmoid, relu) produce its output. Then next layer of neurons compute their output from these outputs, and so on until the final outputs computed.

Below is a video of neural network which can solve the XOR problem:


The important aspect of why neural network can solve non-linear problem (like XOR operation) is because of 'bias'. Without bias, the output is just a linear combination of inputs.


/*
 * This sample use GA to find a solution for XOR problem
 */

#include <vector>
#include <deque>
#include <algorithm>
#include <iostream>
#include <cmath>
#include <cassert>
#include <ctime>
#include "MesenneTwister.h"

using namespace std;

#define MAX_INIT_WEIGHT   30.0
#define MAX_MUTATE_WEIGHT 15.0
#define MAX_FITNESS       1000000.0
const double crossover_rate = 0.8;
const double mutate_rate = 0.6;
const size_t numPopulation = 100;
const size_t max_generation = 100;
const double epsilon = 0.00001;    

MTRand g_rng(time(NULL));

double sigmoid(double x)
{
    return 1.0 / (1.0 + exp(-x));
}

class Node // well, 'node' is shorter than 'neuron'...
{
public:
    Node (int numInputs);
    double run(vector<double> const&inputs);
    void getWeights(vector<double> &weights);
    size_t setWeights(const double *weights);
    
private:
    vector<double> weights_;
};

Node::Node(int numInputs)
{
    weights_.resize(numInputs+1);
}

double Node::run(const vector< double >& inputs)
{
    assert (inputs.size()+1 == weights_.size() && "number of inputs wrong");
    double v=0;
    for (size_t i=0; i < inputs.size(); ++i) {
        v += inputs[i] * weights_[i];
    }
    v += weights_.back();
    return sigmoid(v);
}

void Node::getWeights(vector< double >& weights)
{
    weights.insert(weights.end(), weights_.begin(), weights_.end());
}

size_t Node::setWeights(const double* weights)
{
    weights_.assign(weights, weights + weights_.size());
    return weights_.size();
}



class Layer
{
public:
    Layer(int numNodes, int numInputs);
    ~Layer();

    void run(vector<double> const&inputs, vector<double> &outputs);
    void getWeights(vector<double> &weights);
    size_t setWeights(const double *weights);
    
private:
    vector<Node *> nodes_;
    Layer(Layer const&rhs) {};
    Layer &operator=(Layer const&rhs) { return *this;}
};

Layer::Layer(int numNodes, int numInputs)
{
    nodes_.resize(numNodes);
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i] = new Node(numInputs);
    }
}

Layer::~Layer()
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        delete nodes_[i];
    }
    nodes_.clear();
}

void Layer::run(const vector< double >& inputs, vector< double >& outputs)
{
    outputs.resize(nodes_.size());
    for (size_t i=0; i < nodes_.size(); ++i) {
        outputs[i] = nodes_[i]->run(inputs);
    }
}

void Layer::getWeights(vector< double >& weights)
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i]->getWeights(weights);
    }

}

size_t Layer::setWeights(const double* weights)
{
    size_t pos = 0;
    for (size_t i=0; i < nodes_.size(); ++i) {
        pos += nodes_[i]->setWeights(weights + pos);
    }
    return pos;
}



class NeuralNetwork
{
public:
    NeuralNetwork(int numInputs, vector<int> const& hiddenLayerTopology, int numOutpus);
    ~NeuralNetwork();
    void run (vector<double> const &inputs, vector<double> &outputs);
    void getWeights(vector<double> &weights);
    void setWeights(vector<double> const &weights);
    
private:
    vector<Layer *> layers_;
    NeuralNetwork(const NeuralNetwork &rhs) {}
    NeuralNetwork &operator=(const NeuralNetwork &rhs) { return *this; }
};

NeuralNetwork::NeuralNetwork(int numInputs, const vector< int >& hiddenLayerTopology, int numOutputs)
{
    layers_.resize(hiddenLayerTopology.size() + 1);
    layers_[0] = new Layer(hiddenLayerTopology[0], numInputs);
    for (size_t i=1; i < layers_.size()-1; ++i) {
        layers_[i] = new Layer(hiddenLayerTopology[i], hiddenLayerTopology[i-1]);
    }
    layers_.back() = new Layer(numOutputs, hiddenLayerTopology.back());
}

NeuralNetwork::~NeuralNetwork()
{
    for (size_t i=0; i < layers_.size(); ++i) {
        delete layers_[i];
    }
    layers_.clear();
}

void NeuralNetwork::run(const vector< double >& inputs, vector< double >& outputs)
{
    vector< double > linputs, loutputs;
    linputs = inputs;
    for (size_t i=0; i < layers_.size(); ++i) {
        layers_[i]->run(linputs, loutputs);
        linputs = loutputs;
    }
    assert (outputs.size() == linputs.size());
    outputs.swap(linputs);
}

void NeuralNetwork::getWeights(vector< double >& weights)
{
    weights.clear();
    for (size_t i=0; i < layers_.size(); ++i) {
        layers_[i]->getWeights(weights);
    }

}

void NeuralNetwork::setWeights(const vector< double >& weights)
{
    int pos = 0;
    for (size_t i=0; i < layers_.size(); ++i) {
        pos += layers_[i]->setWeights(&weights[pos]);
    }
}

struct Genome
{
    double fitness_;
    vector<double> genes_;
    
    Genome(size_t numGenes=0);
    void init(size_t numGenes);
    void copy_genes_from (Genome const&source, size_t from_index, size_t to_index, size_t replace_start_index, double mutate_rate);
    void report () const;
    
    bool operator<(const Genome &rhs) const
    {
        return fitness_ > rhs.fitness_;
    }
};

Genome::Genome(size_t numGenes)
{
    if (numGenes) {
        genes_.resize(numGenes);
    }
}


void Genome::init(size_t numGenes)
{
    genes_.resize(numGenes);
    for (size_t i=0; i < numGenes; ++i) {
        genes_[i] = (g_rng.rand() - 0.5) * 2 * MAX_INIT_WEIGHT;
    }
}

void Genome::copy_genes_from(const Genome& source, size_t from_index, size_t to_index, size_t replace_start_index, double mutate_rate)
{
    for (size_t i=from_index,cgene=0; i < to_index; ++i,++cgene) {
        genes_[replace_start_index+cgene] = source.genes_[i];
        if (g_rng.rand() < mutate_rate) {
            genes_[replace_start_index+cgene] += (g_rng.rand() - 0.5) * 2 * MAX_MUTATE_WEIGHT;
        }
    }
}

void Genome::report() const
{
    cout << "weights ";
    for (size_t i=0; i < genes_.size(); ++i) {
        cout << genes_[i] << " ";
    }
}

void spawn_new_generation (vector<Genome> &population)
{
    assert (!population.empty());
    
    double total_fitness = 0;

    const size_t numPopulation = population.size();
    const size_t numGenes = population[0].genes_.size();
    
    for (size_t i=0; i < numPopulation; ++i) {
        total_fitness += population[i].fitness_;
    }
    
    // make fitness order list
    deque<double> fitness_list(numPopulation+1);
    // normalize fitness, put in a ordered list for search later
    for (int p=0; p < numPopulation; ++p) {
        population[p].fitness_ /= total_fitness;
        fitness_list[p+1] = population[p].fitness_ + fitness_list[p];
    }
    
    // new population by crossover, mutation
    vector<Genome> new_population;
    new_population.reserve(numPopulation);
    while (new_population.size() < numPopulation) {
        int lidx = upper_bound(fitness_list.begin(), fitness_list.end(), g_rng.rand()) - fitness_list.begin();
        --lidx;
        int ridx = upper_bound(fitness_list.begin(), fitness_list.end(), g_rng.rand()) - fitness_list.begin();
        --ridx;
        Genome a(numGenes);
        Genome b(numGenes);
        if (g_rng.rand() < crossover_rate) {
            size_t switch_point = g_rng.randInt() % (numGenes-4); // switch point at begin or end doesn't make sense.
            switch_point += 4;
            a.copy_genes_from (population[lidx], 0, switch_point, 0, mutate_rate);
            a.copy_genes_from (population[ridx], switch_point, numGenes, switch_point, mutate_rate);
            b.copy_genes_from (population[ridx], 0, switch_point, 0, mutate_rate);
            b.copy_genes_from (population[lidx], switch_point, numGenes, switch_point, mutate_rate);
        } else {
            a.copy_genes_from (population[lidx], 0, numGenes, 0, mutate_rate);
            b.copy_genes_from (population[ridx], 0, numGenes, 0, mutate_rate);
        }
        new_population.push_back(a);
        new_population.push_back(b);
    }
    
    new_population.swap(population);
}

double calculate_fitness(vector<double> const&results, vector<double> const&expects)
{
    assert (results.size() == expects.size());
    double delta = 0;
    for (size_t i=0; i < results.size(); ++i) {
        delta += fabs(results[i] - expects[i]);
    }
    
    double fitness;
    if (delta <= epsilon) {
        fitness =  MAX_FITNESS;
    } else {
        fitness = 1.0 / (delta*delta);
        fitness *= fitness;
    }
    
    cout << "results ";
    for (size_t i=0; i < results.size(); ++i) {
        cout << results[i] << ", ";
    }
    cout << " fitness " << fitness;
    cout << endl;
    
    return fitness;
}

int main()
{
    // XOR test, two inputs, one output
    vector<int> topo; // hidden layers
    topo.push_back(2);
    NeuralNetwork network(2, topo, 1);
    
    vector<double> weights;
    network.getWeights(weights);
    const size_t numGenes = weights.size();
    /*
    vector<double> sol;
    sol.push_back(20);
    sol.push_back(20);
    sol.push_back(-10);

    sol.push_back(-20);
    sol.push_back(-20);
    sol.push_back(30);

    sol.push_back(20);
    sol.push_back(20);
    sol.push_back(-30);
    //*/
    vector<Genome> population(numPopulation);
    for (size_t i=0; i < population.size(); ++i) {
        population[i].init(numGenes);
        //population[i].genes_ = sol;
    }
    
    
    bool found_solution = false;
    for (int generation = 0; !found_solution && generation < max_generation; ++generation) {
        cout << "generation " << generation << endl;
        
        vector<double> inputs(2);
        vector<double> outputs(1);
        
        vector<double> results(4);
        vector<double> expects(4);
        expects[0] = 0;
        expects[1] = 0;
        expects[2] = 1;
        expects[3] = 1;
        
        for (size_t i=0; i < population.size(); ++i) {
            network.setWeights(population[i].genes_);
            inputs[0] = 0;
            inputs[1] = 0;
            network.run(inputs, outputs);
            results[0] = outputs[0];
            
            inputs[0] = 1;
            inputs[1] = 1;
            network.run(inputs, outputs);
            results[1] = outputs[0];
            
            inputs[0] = 0;
            inputs[1] = 1;
            network.run(inputs, outputs);
            results[2] = outputs[0];
            
            inputs[0] = 1;
            inputs[1] = 0;
            network.run(inputs, outputs);
            results[3] = outputs[0];
            
            population[i].fitness_ = calculate_fitness(results, expects);
            
            if (population[i].fitness_ == MAX_FITNESS) {
                found_solution = true;
                cout << "*** found solution ";
                population[i].report();
                cout << " ***\n";
                cout << "at generation " << generation << endl;
                ::exit(0);
            }
        }
        
        spawn_new_generation (population);
    }
    
    
    return 0;
}

You can usually find a solution after 3-5 generations if we use 100 entities.
It works a miracle if you ever try to find a solution by pure randomly assigning weights.

And that's all for now, see you next time!


沒有留言:

張貼留言

ftps.space released!

  I started doing web site development some time ago.   After fiddled with Flask, nginx, javascript, html, css, websocket, etc, etc... I h...