Tuesday, October 9, 2018

Q-learn: solving Knight and Princess problem from scratch with Q-learn algorithm


The Problem:

The problem is about finding a path to the Princess ignoring the enemies. In the below figure "K" represents "Knights", "P" represents "Princess", and "E" represents "Enemy". In the figure, blank cell means, the knight can move to that cell. e.g, the knight can move to right and down only. Knight will be killed if moves to the enemy cell.


Solution:

*You can download the entire Python code here.

We will solve this using Q-learn algorithm and implement using Python. We will not use any other library such as gym
To implement from the scratch, we need to understand the Q-learn algorithm and the problem itself. It is recommended, to go through the theory of Q-learn reinforcement algorithm. 

Step-1: Represent the problem mathematically.
As we know, the Q-learn algorithm is all about states and actions. So in our problem, let the current position of the Knight is the "state". So lets numbered the all possible position as states number (starting from state #0), in below figure.


So, the state of knight is 0. The state of Princess is 22. The states of enemies are 6, 8, 12, 15, and 19. The objective is to move the knight from state 0 to state 22 ignoring states 6, 8, 12, 15, and 19.
The possible actions from each state can be UP (action 0), DOWN (action 1), LEFT (action 2), and RIGHT (action 3). For each movement, the knight will be rewarded. 


Step-2: Create the reward matrix


1
2
import numpy as np
R = np.full((25,4),-1)
 
For each legal movement the knight will be rewarded -1 (which is default). 
However if the next state is the states of enemies, then the reward value will be -100.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# reward to reach knight at state 6 is -100
R[1][1] = -100.0
R[5][3] = -100.0
R[7][2] = -100.0
R[11][0] = -100.0
# reward to reach knight at state 8 is -100
R[3][1] = -100.0
R[7][3] = -100.0
R[9][2] = -100.0
R[13][0] = -100.0
# reward to reach knight at state 12 is -100
R[7][1] = -100.0
R[11][3] = -100.0
R[13][2] = -100.0
R[17][0] = -100.0
# reward to reach knight at state 15 is -100
R[10][1] = -100.0
R[16][2] = -100.0
R[20][0] = -100.0
# reward to reach knight at state 19 is -100
R[14][1] = -100.0
R[18][3] = -100.0
R[24][0] = -100.0

And if the next state is 22 (i.e. state of Princess) than the reward value will be 100.

1
2
3
4
# reward to reach the castle (prince) at state 22 is 100
R[17][1] = 100.0
R[21][3] = 100.0
R[23][2] = 100.0


To define the illegal movement by using -2 as reward value. For example, the knight can not move UP from state 0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# -2 represent the illegal movement
illegal = -2
# illegal movement from states in FIRST ROW
R[0][0]=illegal
R[1][0]= illegal
R[2][0]= illegal
R[3][0]= illegal
R[4][0]= illegal
# illegal movement from states in FIRST COLUMN
R[0][2]= illegal
R[5][2]= illegal
R[10][2]= illegal
R[15][2]= illegal
R[20][2]= illegal
# illegal movement from states in LAST ROW
R[20][1]= illegal
R[21][1]= illegal
R[22][1]= illegal
R[23][1]= illegal
R[24][1]= illegal
# illegal movement from states in LAST COLUMN
R[4][3]= illegal
R[9][3]= illegal
R[14][3]= illegal
R[19][3]= illegal
R[24][3]= illegal

Step-2: Create the Q matrix

1
2
3
4
5
noOfState = 25
noOfAction = 4

# Q matrix
Q = np.zeros((noOfState,noOfAction))
Set the learning rate (alpha) to 0.8 and the discount rate (gamma) to 0.2.

1
2
gamma = 0.2
alpha = 0.8
Step-3: Create the supporting functions 
Function to return all possible action that can be taken from any state
1
2
3
4
def available_directions_to_go(state):
    curr_state_row = R[state,]
    act = np.where(curr_state_row != -2)
    return act[0]


Function to select and return one action from all possible actions
1
2
3
def next_action_dir(all_psbl_directions):
    next_action_dir = int(np.random.choice(all_psbl_directions))
    return next_action_dir


Function to return the next state value after applying any specific action

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def next_state(current_state, direction):
    if direction == 0: # UP direction
        st= current_state-5
    if direction == 1: # DOWN direction
        st = current_state+5
    if direction == 2: # LEFT direction
        st = current_state-1
    if direction == 3: # RIGHT direction
        st = current_state+1
    if verify_state(st) == False:
        print("6. Something wrong: (current_state, direction) ", current_state, direction)
    return st


Function to update the Q matrix

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def updateQ(cur_state,action_dir,gamma):
    if verify_state(cur_state) == False or verify_action(action_dir)==False:
        print("3. something is Wrong")    
    next_st = next_state(cur_state, action_dir)
    if verify_state(next_st) == False:
        print("4. something is Wrong: (next_st, cur_state, action_dir)",next_st, cur_state, action_dir)
        print(Q)
    index_of_max_val = np.where(R[next_st,] == np.max(R[next_st,]))[0][0]
    if verify_action(index_of_max_val)==False:
        print("5. something is Wrong")        
    Q[cur_state, action_dir] = Q[cur_state, action_dir] + alpha*(R[cur_state, action_dir]+ gamma*(Q[next_st,index_of_max_val] - Q[cur_state, action_dir]))

Step- 4: Training 
Now its time to train the Q matrix.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
for i in range(50000):
    cur_state = np.random.randint(0,noOfState)
    if verify_state(cur_state) == False:
        print("1. something is Wrong")
    next_act_dir = next_action_dir(available_directions_to_go(cur_state))
    if verify_action(next_act_dir) == False:
        print("2. something is Wrong")
    updateQ(cur_state, next_act_dir, gamma)
    if i%10000 == 0:
        print("10k iterations finished.")


Step-5: Testing 
After training the Q-matrix, the resultant matrix can be use to fin the path from state 0 to state 22.



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
print("After training below is the Q matrix: ")
for i in range(Q.shape[0]):
    print(i,"  ",Q[i,])
start_st = 0
goal_st = 22
path = [start_st]
while start_st != goal_st:
    tmp = Q[start_st,]
    row_max = np.max(tmp[tmp != 0])
    max_reward_act_dir = np.where(Q[start_st,]==row_max)[0][0]
    next_st =  next_state(start_st, max_reward_act_dir)
    path.append(next_st)
    start_st = next_st
print("Path from state ",start_st," to state ",goal_st,": ",path)


The procedure to find the path is as follows:
5.a.  Start from the state of knight. In our case it is state 0
5.b.  Find the maximum value but should not be 0. The cell value 0 represents the illegal movement. So we should ignore such movement.
5.c.  Find the corresponding action direction. For example, if the maximum value is present at index 2. This mean, the knight should move towards LEFT direction.
5.d.  Go the next state and make that state as current state 
5.e.  If the current state is not goal state or the state of Princess, repeat from Step-5.b
5.f.  If the current state is goal state, print the sequence of states knight has visited.

Change the value of start_st and goal_st to change the  position of Knight and Princess, respectively. Play around the value of gamma and alpha to train the Q matrix.
You can download the entire Python code here.


Thats all for now.

-Thanks


Wednesday, October 3, 2018

Understanding Q-learning algorithm with example using Python

Before knowing Q-learning algorithm, let's first understand the motivation behind this. what kind questions can be answered by Q-learning algorithm.

Problem: 
Let's we have a house of 7 rooms: named from 0 to 6. Outside area of the house is numbered 7. Only through room 0, 2, and 6, one can go out of the house, i.e. to 7. Let a robot is in room 1 and needs to go out. In order to find the path from room 1 to go out of the house, Q-learning algorithm can be applied.

Q-learning approach:

1. First step is to convert the above room diagram into a graph of "states or nodes" and "actions or edges". For the above house outline, we can derive the following graph:


The number of nodes/states is equal to the number of rooms. Edge represents the action which infers the movement from one room to other. The value on edge represents the reward value when robot move from one room to other. The robot will get 100 reward value only when it move to outside of the house (i.e. state 7) from room 0, 2, and 6. If the robot move to a room except 7, the reward value will be 0.

 2. Second step is to convert the graph into a matrix called R matrix. For the above graph, following is the R matrix:
R-matrix is a matrix with dimension [# of states x # of actions]. In our case the dimension of the R matrix would be 8x8 since we have 8 number of states and 8 number of actions.


The value in a cell represent the edge value from the corresponding states. The cell value can be 
        -1, if no edge is present
        0, if there is an edge present from state in the row to the state in the column and the reward value is 0
        100, if there is an edge present from state in the row to the state in the column and the reward value is 100 
e.g. the value in the last column of first row is 100. This is because we have the reward value of 100 from the state 0 to the state 7 in the graph in First step
The cell value in 5th column of second row is -1. This is because there is no edge from state 1 to state 4 in the graph in First step
The cell value in 4th column of 5th row is 0. This is because there is an edge from state 1 to state 4 is present and the reward value is 0 in the graph in First step

3. Third step is to create a Q matrix with dimension equal to the dimension of R matrix. Initialize the Q matrix with zeros i.e. all the cell values will be 0.



Decide the value of gamma or learning factor. Let gamma=0.8

Now we have two stages: Training stage and Testing stage

4. Forth step is Training stage:
In this stage we will train the robot by iteratively updating the Q matrix. 
For training stage the steps are as follows:
4.1.   Randomly select any state, called current_state
4.2.   Find all the possible next states. This means the states to where the robot can move.
4.3.   Select any random state from the above step 4.2, called next_state.
4.4.   Now apply the following equation to update Q matrix.
        Find the maximum value from the (next_state)th row of Q matrix. Let max_value be that maximum value
        Q[ current_state , next_state ] = R[ current_state , next_state ] + gamma * max_value
4.5.   Go to Step 4.1 and repeat this process for a certain number of time. Let 20000. 

Following is the Q matrix after 10000 iterations( also called episodes)


This Q matrix will be now used by the robot to find the path from any room to out of the house.

5. Fifth step is Testing stage:
In this step we will use the updated Q matrix obtained from the previous step.
Following are the steps for the final testing stage: 
5.1.  Set the cs = initial_state.
5.2.  Find the state s, with maximum value in the (cs)th row of Q matrix. 
5.3.  if s is the goal state (in this example goal state is 7), return sequence of all states
        else, cs = s and repeat from the Step 5.2.

Below is the Python code for the above example. The code will print the Q matrix after each 1000 iterations/episodes. The Python file (Q-learning_example.py) can be downloaded from here.
====================================

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import numpy as np

nos = 8 # number of states
goal_state = 7 # the goal state
intermediate_Q_mat_at_step=1000

# R matrix
R = np.matrix([ [-1,0,-1,-1,-1,-1,-1,100],
  [0,-1,0,-1,-1,-1,-1,-1],
  [-1,0,-1,0,-1,-1,-1,100],
  [-1,-1,0,-1,0,-1,-1,-1],
  [-1,-1,-1,0,-1,0,-1,-1],
                [-1,-1,-1,-1,0,-1,0,-1],
                [-1,-1,-1,-1,-1,0,-1,100],
  [0,-1,0,-1,-1,-1,0,100] ])

# Q matrix
Q = np.matrix(np.zeros([nos,8]))

# Gamma (learning factor)
gamma = 0.8


def available_actions(state):
    current_state_row = R[state,]
    av_act = np.where(current_state_row >= 0)[1]
    return av_act


# A random action is selected
def sample_next_action(available_actions_range):
    next_action = int(np.random.choice(available_act,1))
    return next_action


# update the Q-matrix for training phase
def update(current_state, action, gamma):
    
    max_index = np.where(Q[action,] == np.max(Q[action,]))[1]

    if max_index.shape[0] > 1:
        max_index = int(np.random.choice(max_index, size = 1))
    else:
        max_index = int(max_index)
        
    # Q learning formula
    Q[current_state, action] = R[current_state, action] + gamma *  Q[action, max_index]
    


############ Training stage ###################################


# Train for 10000 iterations.
for i in range(10000): 
    current_state = np.random.randint(0, int(Q.shape[0]))
    available_act = available_actions(current_state)
    action = sample_next_action(available_act)
    update(current_state,action,gamma)
    # print the intermediate Q matrix
    if((i % intermediate_Q_mat_at_step) == 0):
        print("Q-matrix at step ", i)
        print(Q/np.max(Q)*100)
    
# Normalize the "trained" Q matrix
print("Trained Q matrix:")
print(Q/np.max(Q)*100)


###################### Testing Stage  ###################################


current_state = 4
steps = [current_state]

while current_state != goal_state:

    next_step_index = np.where(Q[current_state,] == np.max(Q[current_state,]))[1]
    
    if next_step_index.shape[0] > 1:
        next_step_index = int(np.random.choice(next_step_index, size = 1))
    else:
        next_step_index = int(next_step_index)
    
    steps.append(next_step_index)
    current_state = next_step_index

# Print selected sequence of steps
print("Selected path:")
print(steps)


References:

Tuesday, October 2, 2018

Regression vs Classification in Machine Learning

REGRESSION :
  • The output variable takes continues values
  • This predicts a value from a continue set
  • Regression involves estimating or predicting a response
  • e.g. The price of a house depending on the 'size', 'location' can be some 'numerical value'. This numerical value can be continues. 
  • Given  f : x -> y,  if y is real number/continues, then this is a regression problem.
  • Regression models makes predictions that answer questions like the following:
    • what is the value of a house in Taipei, Taiwan ?
    • what is the probability that a user click on this ad ?
  • Different types of regression :
    • Linear regression
    • Logistic regression
    • Polynomial regression
    • Stepwise regression
    • Ridge regression
    • Lasso regression
    • ElasticNet regression
    • and many more ....

CLASSIFICATION :
  • The output variable takes class labels
  • classification predicts the "belonging" to the class
  • classification is identifying group membership
  • e.g. The prediction of house price can be classified into 'very costly', 'costly', 'affordable', 'cheap', 'very cheap'. Here each class may correspond to some range of values.
  • Given  f : x -> y,  if y is discrete/categorical variable, then this is classification problem.
  • Classification model predict discrete values that ans. the questions like the following:
    • Is the given email message spam or not spam ?
    • Is this an image of dog, a cat or a hamster ?
  • Different classification methods:
    • Linear classifier: Logistic regression, Naive Bayes classifier
    • Support Vector Machines
    • Decision Trees
    • Boosted Trees
    • Random forest
    • Neural Networks
    • Nearest neighbor
    • and many more ....



[src] :

Tuesday, August 14, 2018

Difference between Classification and Clustering in Machine Learning



Classification
Clustering
What is it?
Given a set of historical/old data along with their class name and a set of new data, classification is the process of assigning each new data with class name that is obtained from the old/historical data. 
Given a set of samples/data, clustering is the process of grouping the data based on their similarities and the patterns of the data.
Type of input data
Set of data with label/class name
Only the data set. No labeling is required.
Input
  • A set of samples or data
  • A set of classes
A set of samples
Output
Class name of each new sample based on the classes of existing samples
Groupism of set of samples based on the data patterns. 
Each sample will be assigned with a group/label/class name.
Number of classes can be known.
Number of classes
Number of classes are known
Unknown number of classes. 
(number of classes can be known after clustering process is over)
Learning type
Supervised learning
Unsupervised learning
Algorithms
Decision tress
Bayesian classifier
SVM
K-mean
Expectation maximization
Dependencies
Depend on training data
no training data is required. No prior knowledge is required.





[src] : 

Friday, May 25, 2018

Classification using NaiveBayse Algorithm using Python

#Import Library of Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB
import numpy as np
import pandas as pd
 
#assigning predictor and target variables
dataF = pd.DataFrame()
dataF["x1"] = np.random.random_sample((5,)) + 1
dataF["x2"] = np.random.random_sample((5,)) + 2 

dataL = pd.DataFrame()
dataL = np.array([3, 3, 4, 3, 4])
 
#Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets 
model.fit(dataF, dataL)

#Predict Output 
predicted= model.predict([[1,2],[3,4]])
print(predicted)

Output: ([3,4])
 
-Thank you 

Wednesday, May 23, 2018

Basic Panda's functions for dataset manipulation in Python


Let we have two tables
Data1.schema:
    name:
    Company:
    Address:
Data2.schema:
    Name:
    gender:
    Age:

Commands:


# delete a column

del Data1[“name”]

# Return all the rows of the ‘name’ column

data1.loc[:, ‘name’]

# Return all the rows from ‘name’ column to ‘Company’ column

data1.loc[:, ‘name’:’Company’]

# Return all the rows from ‘Company’ column to ‘Address’ column

data1.loc[:, ‘Company’:’Address’]

# Return the rows whose Address is “India”

data1.loc[lambda var: var.Address == “India”, : ]
or
data1.loc[data1[“Address”] == “India”, :]

# Return only the name whose Address is “India”

data1.loc[lambda var: var.Address == “India”, ’name’]

# Return only the first five names whose Address is “India”

data1.loc[lambda var: var.Address == “India”, ’name’].head(5)

# Return only the last five names whose Address is “India”

data1.loc[lambda var: var.Address == “India”, ’name’].tail(5)

# Return the list of unique Addresses

data1.Address.unique()

# Return the list of unique Companies

data1.Company.unique()

# Return the list of unique names

data1.name.unique()

# Return the first unique Address

data1.Address.unique()[0]

# Return the total number of unique Addresses

data1.Address.unique().size

# Return the name, age and gender from data2 if the Address is India

t = data1.loc[data1[“Address”] == “India”, “name”].unique()
for x in range(0,t.size)
    data[2].loc[data2[“name”]==t[x],:]

Wednesday, May 2, 2018

Accessing remote GUI application using PuTTY

Extending the post in here, today we will see how the remote GUI applicaiton can be accessed using X server and the PuTTY.

Remote Server configuration:
  •     Ubuntu 16.04
  •     OpenSSH-server
Client machine configuration:
  •     Windows 8 or later
  •     Xming software tool

Remote server setup:
Install openssh-server package by executing following command:
sudo apt-get install openssh-server

Client machine setup:
  • Download Xming for Windows 8 64bit.
  • Install Xming executable file. You don't have to do any changes. Leave everything to the default selection.
  • Download and install the latest verion of the PuTTY from here.

  • Run Xming from the start menu or Issue the following command in command prompt
cd "c:\Program Files (x86)\Xming"
Xming.exe -multiwindow 
  •  Run PuTTY
  • In the PuTTY configuration window enter the IP address, port and select SSH radio button.

  • Select SSH from Connection category.
  • Check "Enable X11 forwarding" option and click on Open button. 


  • In the consol, now enter "gnome-calculator" to verify the above steps.
    Instead of  "gnome-calculator", you can enter command to access remote GUI application.
    I assume that, you know how to open your application from command line/ terminal.


Thats all for now.  


-Thanks


Monday, April 9, 2018

What is Virtual Network Function (VNF) ?

It is easy to understand the concept of VNF, if you know what is a Virtual Machine. In conventional manner, the network functions are embedded onto dedicated hardware. But, in some way, if we can make the network functions independent of the hardware and install it on any machine that will lead to the concept of VNF. 
Virtualized network functions or VNFs are the software realisations of the various network functions that can be deployed on a Network Functions Virtualization Infrastructure and needed to enable the network to operate. In this way, a VNF handles a specific network function that run on one or more virtual machines on top of the hardware networking infrastructure.

The individual VNF, can be considered to be building blocks and they can be connected or combined together, providing all the capabilities required to provide a complete networking communication service.

some examples of virtual network functions are 
     
  • Switching: BNG, CG-NAT, routers.
  • Tunnelling gateway elements: IPSec/SSL VPN gateways.
  • Traffic analysis: DPI, QoE measurementI.
  • Signalling: SBCs, IMS.
  • Application-level optimisation: CDNs, load Balancers.
  • Home routers and set top boxes.
  • Mobile network nodes: HLR/HSS, MME, SGSN, GGSN/PDN-GW, RNC.
  • Network-wide functions: AAA servers policy control, charging platforms.
  • Security functions: firewalls, intrusion detection systems, virus scanners, spam protection.
Later we will focus on Network Function Virtualization (NFV).

-Thanks

Thursday, March 29, 2018

Adding biography section into elsevier latex article


Mostly in research community Latex is used to write the articles that are published in different journals. Biography section is the last section in all articles. IEEEtran and elsarticle are the most widely used latex template which offer multiple commands to represent the research content/data in different manner. For example, IEEEbiography command is used to write the biography section in IEEEtran template. But no such similar command is available if you write the article using elsarticle template.

In reality, you don’t need any special command to write author’s biography if you are using elsarticle template. You can compare both the papers. The first paper is published in IEEE and the second is published in Elsevier.
IEEE paper: http://ieeexplore.ieee.org/document/8169111/?anchor=authors
Elsevier paper: https://www.sciencedirect.com/science/article/pii/S0164121216300887#b1

In elsarticle template, you can use following simple code to add biography section.


Code:

1
2
3
4
5
6
\pagebreak
% Biography Section
%\section*{ }
\noindent \textbf{Author1 name} author description goes here.
\subsection*{  } % This subsection (with no heading) is added to give more space between two biographies
\noindent \textbf{Author2 name}  author description goes here. 

Output:

I have observed that, most of the Elsevier journals add the biography section in new page. For this purpose \pagebreak command should be added

Usually in the biography section contains no title is given. For which either you can complete ignore use of \section command or you can use \section command with no title.

The entire content (From Line 1 to Line 6) should be at the end of the document but must be before \end{document} command.


-Thats all

Monday, March 19, 2018

Scheduling Ubuntu program with gnome-schedule

In Ubuntu you can schedule any task to run automatic or on a specific time.
For this either you can use gnome-schedule GUI or using crontab command.

Using gnome-schedule

=========================
1.   First make sure your program is running properly.
2.   Open the terminal (using Ctrl+Atl+t shortcut key).
3.   Enter the following command to install gnome-schedule
sudo apt-get install gnome-schedule
4.   Enter gnome-schedule command in the terminal window to open the desired GUI, as given below.

5.   Now click on "New" button.
6.   Select "A task that launches recurrently" option to execute the programm repeatedly in a specific time inerval.


7. Enter the Description and Command in corresponding fields.
8. Click Advanced radio button to explicitly mention the execution time. Now click on "Add" button.
    Examples:
                    Minute: 5, Hour: 15,  Day: *, Month: *, Weekday: *
                          This will execute the program Everyday, every month, at time 15:05 (3:05 PM)

    Click here to see some examples.  


Using crontab command

==========================
1.   Open terminal (using Ctrl+Atl+t shortcut key).
2.   Enter crontab -e
3.   Add one line in following format at the end of the file.
          <minute> <hours> <day> <month> <weekday> <Command to execute>
4.   Now save the file.

That's all.