Understanding Probability

Randomness is not a property of a phenomenon. It is simply an unpredectability of occurence of events around you. It occurs in different scenarios of our life. For example, while roaming around street, you found a coin; now you would certainly look for other coin around that spot. But, there will be not any certainty or possibility or pattern of finding one. Other examples are - tossing coin / dice, fluctuating market prices for common goods.

In the field of Mathematics and probability, we assign some numerical value for identifying each of this random outcome. i.e. we use probability to quantify randomness. And, probability of certain event is calculated by the relative frequency of that event in the experiment.

In probability, the current occurrence / selection you do for your experiment is an event. For example,fliping a coin is an event.And, the act of tossing the coin is called independent trail. If you do number of trails, it is called an experiment. And, all the possible outcomes of an experiment is called sample space. So, we can say that an event is also a subset of sample space.

Another example : Suppose you need to choose a point from an interval (10, 100). Your selection E = (12, 34) is an event.

Disjoint and Independent Event

Disjoint event means if an event occurs than other events can't occur simultaneously. That means joint probabilit will be zero P(A and B) = 0. We can say that disjoint events are very dependent. That is : when an event occurs with certain probability, then other event will have zero chance of occurrence. For example: when tossing a coin, the result can either be heads or tails but cannot be both.

On the other hand, independent events are those where occurrence of one does not affect the occurence of another event. For example: when tossing two coins, the result of one flip does not affect the result of the other.

       conditions for independence
          P(A and B) = P(A)×P(B)
          P(A, given that B occurs) = P(A)

Joint and Marginal Probability

Joint probability measure the probability that two events will occur simultaneously and marginal probability is the probability of single event.

Joint probability is expressed as :P(A1,A2 ...,An) ; where A1, A2...An are the events.

Let us take an example:

P(life expectancy=70, nationality=Nepal) = 0.5 means there is 0.5 chances of a person, picked from a population, is a Nepali and has the life expectancy of 70 years.

Conditional Probability

Formally, conditional probability can be defined as the probability of an event (A); given the probability of another event (B).

Mathematically, it is denoted as P(A | B).

   P(A | B) = P(A and B) / P(B)   ; P(A and B) - joint probability of A and B.

Let us take an example :

Consider that a student has an 70% chance of being accepted in a university, only 40% of all of the accepted students will get the student residence offer. Then the chance of student getting accepted and receiving student residence offer is defined by

P(Accepted and Student resi.) = P(Student resi.|Accepted)P(Accepted)
                               = (0.40)*(0.70) = 0.28 

Bayes' Law

Let us suppose that we have a prior knowledge of a probability for a disease D to occurs when there is a symptom S P(S|D); and the probability of having disease D P(D); the probability of having symptom S P(S). If we have all these three information; we can calculate the probability of occurrence of disease D; given that the person has symptom S i.e P(D|S).

By using joint probability,

     P(A and B) = P(A|B) * P(A) = P(B|A) * P(B)

     Which leads to

      P(A|B) =  P(A)   *   P(B|A) / P(B)
            (prior prob.)   (Posterior prob.)


       P(B|A) =P(A|B)P(B) / [P(A|B)P(B) + P(A|B′)P(B′)]

       P(B′) is the probability of B not occurring.


Suppose it has been observed empirically that the word “Congratulations” occurs in 1 out of 10 spam emails, but that “Congratulations” only occurs in 1 out of 1000 non-spam emails. Suppose it has also been observed empirically that about 4 out of 10 emails are spam. Suppose we get a new email that contains “Congratulations”.Then, what is the probability of this email being a spam?

Ans : Let C be the event representing emails having word 'Congratulations' and S be the event saying the email is spam.

Now, we need to calculate : P ( S | C ) ?


 P(C|S) ~= 1/10
 P(C|S') ~= 1/1000
 P(S) ~= 4/10
 P(S') ~= 6/10

By Bayes’ Theorem: P(S|C) = P(C|S )P(S) / [P(C|S )P(S) + P(C|S')P(S')]

                      = (1/10) (4/10) / [ 1/10 * 4/10 + 1/1000 * 6/10]
                      = 0.985
In [ ]:
Sijan Bhandari on

Using perceptron model for classification : an illustrative approach

In this post, we are going to devise a measurement tool (perceptron model) in order to classify : whether a person is infected by a diseases or not.

In binary terms, the output will be

            1   if infected 
            0   not infected

To build inputs for our neural network, we take readings from the patients and we will treat readings as follows :

  body temperature = {
                          1   if body temperator > 99'F
                         -1   if body temperator = 99'F

  heart rate = {
                      1   if heart rate > 60 to 100
                     -1   if heart rate = 60 to 100

   blood pressure = {
                          1   if heart rate > 120/80
                         -1   if heart rate = 120/80

So, input from each patient will be represented as a three dimensional vector:

  input = (body temperatur, heart rate, blood pressure)

So, a person can now be represented as :

(1, -1, 1)
i.e (body temperator > 99'F, heart rate = 60 to 100, heart rate > 120/80)

Let us create two inputs with desired output value

      x1 = (1, 1, 1), d1 = 1 (infected)
       x2 = (-1, -1, -1), d2 = 0 (not infected)

Let us take initial values for weights and biases: weights, w0 = (-1, 0.5, 0) bias, b0 = 0.5

And, activation function:

         A(S)   = {
                    1 if S >=0
                    0 otherwise

Feed x1 = (1, 1, 1) into the network.


S = (-1, 0.5, 0) * (1, 1, 1)^T + 0
  = -1 + 0.5 + 0 + 0
  = -0.5

When passed through activation function A(-0.5) = 0 = y1 We passed an infected input vector, but our perceptron classified it as not infected. Let's calculate the error term:

             e = d1 - y1 = 1 - 0 = 1

Update weight as:

             w1 = w0 + e * x1 = (-1, 0.5, 0) + 1 * (1, 1, 1) = (0, 1.5, 1)

And, update bias as:

             b1 = b0 + e = 1

Now, we feed second input (-1, -1, -1) into our network.

weighted_sum :

S = w1 * x2^T + b1 
  = (0, 1.5, 1) * (-1, -1, -1)^T + 1
  = -1.5 - 1 + 1
  = -1.5

When passed through activation function A(-1.5) = 0 = y2 We passed an not infected input vector, and our perceptron successfully classified it as not infected.


Since, our first input is mis-classified, so we will go for it.

weighted_sum :

S = w1 * x1^T + b1 
  = (0, 1.5, 1) * (1, 1, 1)^T + 1
  = 1.5 + 1 + 1
  = 3.5

When passed through activation function A(3.5) = 1 = y3 We passed an infected input vector, and our perceptron successfully classified it as infected.

Here, both input vectors are correctly classified. i.e algorithm is converged to a solution point.

In [ ]:

What is perceptron and how it works?

Perceptron is simply an artificial neuron capable of solving linear classification problems. It is made up of single layer feed-forward neural network.

A percentron can only takes binary input values and signals binary output for decision making. The output decision (either0 or 1), is based on the value of weighted sum of inputs and weights.

Mathematically perceptron can be defined as :

output O(n)=
                    {    0 if ∑wixi + $\theta$ <= 0
                         1 if ∑wixi + $\theta$ > 0

$\theta$ = threshold / bias

Perceptron Learning Algorithm

Perceptron learning is basically done by adjusting the weights and bias in training process.

Initially, we will have training set of input vector

[x_1(n), x_2(n),....x_m(n) ]

And, weight vetor

[w_1(n), w_2(n),....w_m(n)]

And, bias = b

For convenience let w0=b(n) and x0(n) = 1

i.e input vector = [1, x_1(n),x_2(n),.....x_m(n)]
and weight vector = [b(n), w_1(n), w_2(n),....,w_m(n)]

y(n) = actual output in training
d(n) = desired output
η = the learning rate

And, suppose, M and N are two different classes. Where output +1 belongs to M and -1 belongs N.

learning steps:

1. Initialization

We will set initial value for weights : w(0) = 0

2. For number of iterations (Iternations will be your selection)
a. Activation

We will supply (input vector, desired output) = [x(n), d(n)] to perceptron.

b. Actual Response

For each input vector, we will calculate the actual output based on

      y(n) = sgn[w(n)x(n)]

Where sgn(.) represents signum function as following:

    sgn(x) = {  +1 if x>=0
                -1 if x<0 }

c. Weight adjustment :

We will adjust the weight vector as follows:

calculate error :

         e = d(n) - y(n) 

    w(n+1) = w(n) + eta * e * x(n)


      d(n) = { +1   if x(n) classified as M
             -1   if x(n) classified as N
In [ ]:

What is Deep Learning and Neural Network?

Deep learning, in simpler version is a learning mechanisms for Neural networks. And, Neural networks are computational model mimicing human nervous system which are capable of learning. Like interconnected neurons in human brains, the neural network is also connected by different nodes. It receives signals as a set of inputs, perform calcuations and signals output based on some activation value.

Here are some list of problems, that deep learning can solve

  1. Classification : object and speech recongnistion, classify sentiments from text
  2. Clustering : Fraud detection

Elements of Neural Networks

  1. Weights Biological neurons has synaptic strengths to define the importance of particular inputs. In the similar fashion, inputs to NNs have associated relative weights. These weights ultimately defines the connection intensity of the input to any neuron. These weights are adaptive in nature since they will be modified in the process of training.

  2. Summation This is the first step in NN, where each input is multiplied by its corresponding weight and weighted sum is computed.

  3. Transfer In the transfer process, we simply compare the summation output with threshold value and decide the final neural output.

An example would be,

(x1) = 2
(x2) = 1

(w1) = 0.7
(w2) = 0.8

threshold = 2

Summation value :

 x1w1 + x2w2 = (2 x 0.7) + (1 x 0.8) = 2.2

Since summation value is greather than threshold, neuron will be fired.

In [ ]: