Dynamic faults detection in Mobile Ad-hoc Network

PROJECT VIDEO

ABSTRACT

Mobile Ad-Hoc Networks (MANETs) are set of mobile nodes that communicates wirelessly without a centralized supporting system. Faulty nodes a ect the reliable transmission of messages across the network. In this thesis we deal with the fault identi cation problem in static topology MANETs. A comparison based approach is used where a set of tasks is given to the nodes and outcomes are compared. Based on these comparisons the nodes are classi ed either as faulty or fault free. Our new diagnosis model is based on the spanning tree concept in which the testing of the nodes as well as the construction of the spanning tree takes place simultaneously. As a result of which the maintenance and the repairing overhead of the spanning tree is completely avoided thus reducing the number of messages exchanged. We have also developed a simulator which can be applied to a network with large number of nodes. We have carried out the simulation in-order to nd out the total number of messages exchanged and the total diagnosis time. On analysing the results we have seen that our model performs better than its previous counterparts. The correctness and complexity proofs are also being provided which also shows that our model performs better from a communication as well as latency viewpoint.

INTRODUCTION

Since the early 1980s wireless cellular systems are quite popular. These cellular systems mainly operate with the help of a centralised supporting system, otherwise known as the access point. It is this access point that helps the users to stay connected in the network. But when it comes to places where there is no xed access point, this technology has own its limitations. In case of rescue and emergency operations installing a centralised supporting system is time consuming. So in-order to overcome this problem we have mobile ad-hoc networks which can be quickly deployed in places where it is not possible otherwise. MANETs are basically a collection of mobile nodes that communicate wirelessly.

MANETs

Mobile Ad-Hoc Networks(MANETs) are basically a collection of mobile nodes that communicate wirelessly without any centralised supporting system. Here the users or nodes are free to roam within the transmission range. Mobile ad-hoc networks (MANET) are gaining much popularity in various rescue and emergency operations because of its self-organizable, autonomous and can-be-deployed-anywhere type of characteristics. Nodes in MANER are equipped with a receiver and a transmitter.

FAULT DIAGNOSIS

As MANETs are mainly used in rescue and emergency operations, having a reliable communication between the mobiles is of utmost importance. Hence the design of dependable MANETs is gaining popularity among the research communities. But the main problem in designing of dependable MANETs is the distributed self-diagnosis problem. Here each fault free mobile has to keep information regarding the state of all the nodes in the neighborhood or in some applications each node should be able to identify the state of all the nodes in the network [1]. Many elegant distributed diagnosis algorithms are available for wired networks and most of them are based either on the invalidation models such as PMC model [2] or comparison model such as the broadcast comparison model [3] and the generalized comparison model [4]. The comparison approach is the most popular diagnosis approach. Here the nodes are given a set of tasks, the tasks are then executed and the outcomes are compared. The comparison outcomes output by the generalized comparison model is summarized below. The comparison outcome is 0, when both the comparator and the compared mobiles are fault-free. If at least one of the compared mobiles is faulty and the comparator is fault-free, the comparison outcome is 1. Finally, the comparison result is unreliable if the comparator mobile is faulty[1]. The earliest works of fault diagnosis in case of MANETs using the comparison approach was proposed by Chessa and Santi in [5]. They have used the shared nature of the communication channel to distribute the diagnosis. In [5], Chessa and Santi have presented with a distributed diagnosis algorithm that allows the fault free mobiles to know the fault status of all the mobiles in the network. The most recent work to solve the diagnosis problem is presented in [1]. In [1] an adaptive distributed self-diagnosis protocol (Adaptive-DSDP) is proposed to solve the diagnosis problem in xed-topology MANETs. In case of xed-topology MANETs it is assumed that the topology of the network is xed during the diagnosis session. This model uses a spanning tree containing all the fault-free nodes which is maintained, repaired and used to transmit the information about other nodes. In this report we have proposed a new diagnosis model based on the spanning tree concept in which the testing of the nodes as well as the construction of the spanning tree takes place simultaneously. Here the test request message helps in the construction of the spanning tree. As a result the overhead of maintaining and repairing of the spanning is completely is avoided, thus improving the time as well as the message complexity.

MOTIVATION

We have analysed the Adaptive-DSDP model and have found that there is an overhead of spanning tree maintenance which occurs all the time even if there is no diagnosis session running. Also the spanning tree is maintained with a particular node as its root i.e. the initiator is xed. So if the initiator node fails or any other node detects an altered behavior the diagnosis session will not start. Further the spanning tree repairing starts after the testing and gathering phase which increases the diagnosis latency as well as the communication complexity. Spanning tree maintenance and repairing consumes a lot of time, so constructing it in the testing and the gathering phase itself will be more efficient.

CONCLUSION

The earliest work on fault identi cation in case of mobile ad-hoc networks was carried out by Chessa and Santi in there work in [5]. There model, known as Static-DSDP, considers a comparison based approach and the network topology is assumed to be xed during the diagnosis session. Such type of network is known as xed topology network. Another work in case of xed topology MANETs was carried out by Elhadef et all. in [6]. The model also known as Dynamic-DSDP uses a spanning tree approach to disseminate the local diagnostic messages collected during the testing phase. Here the spanning tree is constructed after the fault status of the nodes has been identi ed.
Adaptive-DSDP [1] also considers a xed topology environment and a spanning tree approach which is a improvement over the Dynamic-DSDP model. In case of Adaptive-DSDP the spanning tree is initially con gured with the MANET and the protocol enables the maintenance as well as the recon guration of the spanning tree while the hosts are moving or they are diagnosed by their neighbor. In this thesis we have proposed a new model for xed topology environment. A spanning tree approach has also been considered here. In this model the testing of the nodes, gathering of information about neighbors and building of the spanning tree takes place simultaneously. As a result of which the maintenance and the repairing overhead of the spanning tree is completely avoided thus reducing the number of messages exchanged. We have also developed a simulator which can be applied to a network with large number of nodes. We have carried out the simulation in-order to nd out the total number of messages exchanged and the total diagnosis time. On analysing the results we have seen that our model performs better than its previous counterparts. The correctness and complexity proofs are also being provided. From the message and time complexity thus derived we see that our model performs better from a communication as well as latency viewpoint.

FUTURE WORK

The model we have proposed is for a xed topology MANET. Moreover this model identi es only the permanent faults. In future it is possible to extend this model for dynamic topology MANETs and for identifying intermittent as well as dynamic faults. One of the approach for identifying intermittent faults is to repeat our proposed algorithm for a xed number of times.

MATLAB SOURCE CODE

Instructions to run the code

1- Copy each of below codes in different M files.
2- Place all the files in same folder

3- Use the files from below link and download them into the same folder as the source codes

fault diagnosis timing

4- Also note that these codes are not in a particular order. Copy them all and then run the program.
5- Run the “FINAL.m” file

Code 1 – Script M File – FINAL.m

clc
clear
close all
%sn=50;
sn=input('Please enter the number of mobiles: '); %total number of mobile nodes
while isempty(sn) || ~isnumeric(sn) %loop for getting a valid number of mobile nodes
    sn=input('Please enter the number of mobiles: ');
end

% placing the sensors in the network
figure(1) 
[nodex,nodey,avg]=placing_sensors(sn); 
%avg
title('Ideal placement of the mobiles')
disp('Press enter to continue')
pause

%faulty nodes
figure(2)
[fnodex,fnodey,fni,l]=faulty_nodes(nodex,nodey,sn);
title('Mobile nodes with faulty points (red-permanent faults, black-transient faults, green-intermittent)')
%title('Flooding Process')
disp('Press enter to continue')
pause

%information transfer
[sender,receiver,s_coor,r_coor,opt]=getsensorinfo(sn,fni,nodex,nodey);
if opt==1 || opt==2 %case when either sender or receiver is out of bound
    figure(3)
    for i=1:sn
        plot(nodex(i),nodey(i),'b*') % plotting all the mobiles
        text(nodex(i)+0.05,nodey(i),num2str(i))
        hold on
    end
    %plot(s_coor(1),s_coor(2),'m*','LineWidth',3)
    text(s_coor(1)+0.5,s_coor(2),'\leftarrow Faulty Node','FontSize',12,'FontWeight','bold')
    
% elseif opt==2
%     figure(3)
%     for i=1:sn
%         plot(nodex(i),nodey(i),'b*') % plotting all the mobiles
%         text(nodex(i)+0.05,nodey(i),num2str(i))
%         hold on
%     end
%     %plot(d_coor(1),d_coor(2),'m*','LineWidth',3)
%     text(d_coor(1)+0.5,d_coor(2),'\leftarrow Faulty Node','FontSize',12,'FontWeight','bold')
    
elseif opt==0 %case when both sender and reciever are inbound
    figure(3)
%    subplot(2,1,1)    
    for i=1:length(nodex) %placement of nodes
        plot(nodex(i),nodey(i),'*')
        text(nodex(i)+0.05,nodey(i),num2str(i)) %numbering
        hold on
    end
    plot(s_coor(1),s_coor(2),'c*')
    text(s_coor(1)+0.05,s_coor(2),'\leftarrow Sender') %marking the sender
    plot(r_coor(1),r_coor(2),'c*')
    text(r_coor(1)+0.05,r_coor(2),'\leftarrow Receiver') %marking the reciever
%    [total1,good1,bad1,diag]=witherror(s_coor,r_coor,nodex,nodey,fnodex,fnodey,l); %subplot 1 for the scenario with errors
    [total1]=witherror(s_coor,r_coor,nodex,nodey,fnodex,fnodey,l,avg,fni,sender,receiver);
    title('Scenario with error detection')

%    subplot(2,1,2)    
    figure(4)
    for i=1:length(nodex)
        plot(nodex(i),nodey(i),'*') %placement of nodes
        text(nodex(i)+0.05,nodey(i),num2str(i)) %numbering
        hold on
    end
    plot(s_coor(1),s_coor(2),'c*')
    text(s_coor(1)+0.05,s_coor(2),'\leftarrow Sender') %marking the sender
    plot(r_coor(1),r_coor(2),'c*')
    text(r_coor(1)+0.05,r_coor(2),'\leftarrow Receiver') %marking the receiver
    [total2]=withouterror(s_coor,r_coor,nodex,nodey,fnodex,fnodey,l,avg,fni,sender,receiver); %subplot 2 for the scenario without errors
    title('Scenario without errors')
% 
%     %graphs
%     %graphs(good1,bad1,total1,total2,sn,nodex,nodey,diag)
% total1
% total2
    graphs(total1,total2,sn,nodex,nodey,fnodex,fnodey,fni)
end

Code 2 – Function M File -graphs.m

function graphs(total1,total2,sn,nodex,nodey,fnodex,fnodey,fni)

[r3 c3]=size(total1);
[r4 c4]=size(total2);

% total1
% total2

[num,txt,raw]=xlsread('fault diagnosis timing');
[row col]=size(num);
blank_mat=[];
for i=1:row
    blank_mat(i,:)= zeros(1,col);
end
xlswrite('fault diagnosis timing',blank_mat,1,'A4')

total1_ind=[];
total2_ind=[];

for i=1:r4 %finding the ids of the scenario with no errors
    for j=1:length(nodex)
        if total2(i,1)==nodex(j) && total2(i,2)==nodey(j)
            total2_ind=[total2_ind j];
        end        
    end
end      

for i=1:r3 %finding the ids of the scenario with errors
    for j=1:length(nodex)
        if total1(i,1)==nodex(j) && total1(i,2)==nodey(j)
            total1_ind=[total1_ind j];
        end        
    end
end      

t=[]; %diagnosable nodes
p=[]; %non diagnosable nodes
g=[]; %good nodes
len=length(fnodex);
for i=1:length(total1_ind)
    op=0; %if its 0, then its a good node. else bad node
    for j=1:length(fni)
        if total1_ind(i)==fni(j)
            if j>=ceil(len/4)+1 && j<=ceil(len/2)
                t=[t total1_ind(i)];
                op=1;                 
                break
            else
                p=[p total1_ind(i)];
                op=1;
                break
            end
        end
    end
    
    if op~=1
        g=[g total1_ind(i)];
    end
end
    
from=[];
to=[];
for i=1:length(total1_ind)    
    op=0;
    if i==1
        %first will always come in "from"
        from=[from total1_ind(i)];
        continue
    elseif i==length(total1_ind)        
        %last will always come in "to"
        to=[to total1_ind(i)];
        break
    end    
          
    for j=1:length(g) %cheking for good nodes
        if total1_ind(i)==g(j)
            to=[to total1_ind(i)];
            from=[from total1_ind(i)];
            op=1;
            break
        end
    end
    if op==1
        continue
    end
       
    for j=1:length(t) %checking for diagnosable nodes
        if total1_ind(i)==t(j)
            to=[to total1_ind(i)];
            from=[from total1_ind(i)];
            op=1;
            break
        end
    end
    if op==1
        continue
    end
    
    for j=1:length(p) %checking for permanent nodes
        if total1_ind(i)==p(j)
            to=[to total1_ind(i)];
            from=[from from(length(from))];
            op=1;
            break
        end
    end
    if op==1;
        continue
    end
end
    
ack_sent=[]; %time to send the acknowledgement
for i=1:length(from)
    dist=sqrt((nodex(from(i))-nodex(to(i)))^2+(nodey(from(i))-nodey(to(i)))^2); % distance between the nodes
    ack_sent=[ack_sent dist];
end

ack_rec=ack_sent; %time to receive the acknowledgement
l=length(fni);
permanent=fni([1:ceil(l/4) (ceil(l/2)+1):l]); %indexes without transient fault
diagnosable=fni(ceil(l/4)+1:ceil(l/2)); %indexes of transient fault
ind=1;
for i=to
    for j=permanent
        if i==j
            ack_rec(ind)=0;
        end
    end
    
    for k=diagnosable
        if i==k
            a=ack_rec(ind);
            b=ack_rec(ind)+4;
            time= a + (b-a).*rand(1,1);
            ack_rec(ind)=ack_rec(ind)+time;
        end
    end
    ind=ind+1;
end

xlswrite('fault diagnosis timing',from',1,'A4')
xlswrite('fault diagnosis timing',to',1,'B4')
xlswrite('fault diagnosis timing',ack_sent',1,'C4')
xlswrite('fault diagnosis timing',ack_rec',1,'D4')
xlswrite('fault diagnosis timing',total2_ind(1:end-1)',1,'F4') %"from" nodes in without error senario
xlswrite('fault diagnosis timing',total2_ind(2:end)',1,'G4') %"to" nodes in without error senario
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
oneunit=(1/sn)*100; % 1 unit of energy
Eg=(r3/sn)*100; % energy lost in good nodes
Eb=(r4/sn)*100; % energy lost in bad nodes

x1=[];
for i=1:length(from)
    x=100-oneunit*i;
    x1=[x1 x]; %nodewise energy loss in scenario with error nodes
end

x2=[];
for i=1:length(total2_ind(1:end-1))
    x=100-oneunit*i;
    x2=[x2 x]; %nodewise energy loss in scenario without error nodes
end

figure(5)
subplot(2,1,1)
bar(x1)
set(gca,'XTickLabel',{from})
ylabel('Energy')
xlabel('Nodes')
title('Scenario with error nodes')

subplot(2,1,2)
bar(x2)
set(gca,'XTickLabel',{total2_ind(1:end-1)})
xlabel('Nodes')
ylabel('Energy')
title('Scenario without error nodes')

%%------------- Data loss Graph --------------
lenth = length(x1);
for i = 1:lenth
b(i) = x1(lenth);
lenth = lenth-1;
end
x1_1 = b;

lenth1 = length(x2);
for i = 1:lenth1
b(i) = x2(lenth1);
lenth1 = lenth1-1;
end
x2_1 = b;

figure(6)
subplot(2,1,1)
grid on
plot(x1_1,'g','linewidth',2)
hold on
plot(x1_1,'ok','linewidth',3)
set(gca,'XTickLabel',{from})
title('DATA LOSS in Scenario with error nodes')

subplot(2,1,2)
grid on
plot(x2_1,'c','linewidth',2)
hold on
plot(x2_1,'or','linewidth',3)
set(gca,'XTickLabel',{total2_ind(1:end-1)})
title('DATA LOSS in Scenario without error nodes')
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% delay graph
figure(7)
subplot(2,1,1)
plot(1:r3)
grid on
axis([0 r3+2 0 r3+2])
set(gca,'XTickLabel',{from})
title('Delay in scenario with errors')
xlabel('Serial Number of Nodes in path')
ylabel('Delay (sec)')

subplot(2,1,2)
plot(1:r4)
grid on
axis([0 r3+2 0 r3+2])
set(gca,'XTickLabel',{total2_ind(1:end-1)})
title('Delay in scenario without errors')
xlabel('Serial Number of Nodes in path')
ylabel('Delay (sec)')


Code 3 – Function M File – faulty_nodes.m

function [fnodex,fnodey,fni,l]=faulty_nodes(nodex,nodey,sn)

p=40; %percentage of faulty nodes out of the total number of nodes
fn=floor((p*sn)/100); %number of total faulty nodes (fn-faulty nodes)

fni=randi(sn,1,fn); %indexes of faulty nodes in nodex and nodey (fni-faulty nodes index)

fnodex=[];
fnodey=[];
for i=fni
    fnodex=[fnodex nodex(i)]; % assigning the x coordinates to faulty nodes (assignment is done using the previously assigned x nodes)
    fnodey=[fnodey nodey(i)]; % assigning the y coordiantes to faulty nodes (assignment is done using the previously assigned y nodes)
end

l=length(fni); % total number of faulty nodes 'l'

pfx=[fnodex(1:ceil(l/4))]; %permanent faults
pfy=[fnodey(1:ceil(l/4))];
%ceil(fni/4)

tfx=[fnodex((ceil(l/4)+1):ceil(l/2))]; %transient faults
tfy=[fnodey((ceil(l/4)+1):ceil(l/2))];
% (ceil(fni/4)+1)
% ceil(fni/2)

ifx=[fnodex((ceil(l/2)+1):ceil((3*l)/4))]; %intermittent fault
ify=[fnodey((ceil(l/2)+1):ceil((3*l)/4))];
% (ceil(fni/2)+1)
% ceil((3*fni)/4)

dfx=[fnodex((ceil((3*l)/4)+1):l)]; %dynamic faults
dfy=[fnodey((ceil((3*l)/4)+1):l)];
% (ceil((3*fni)/4)+1)

for i=1:sn
    plot(nodex(i),nodey(i),'b*') % plotting all the mobiles
    text(nodex(i)+0.05,nodey(i),num2str(i))
    hold on
end

%flooding process starts %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[num,txt,raw]=xlsread('static faults');
[row col]=size(num);
blank_mat=[];
for i=1:row
    blank_mat(i,:)= zeros(1,col);
end
xlswrite('static faults',blank_mat,1,'A5')

x=1; %initialize the index of master node
for i=1:sn    
    for j=1:length(fni)
        if i~=fni(j)
            x=i; % master node index which is fault free
            break
        end
    end
    if x==i
        break
    end
end            

pause(0.5)
plot(nodex(x),nodey(x),'b*','LineWidth',5)
text(nodex(x)+0.05,nodey(x),'\leftarrow Master Node','FontSize',15,'FontWeight','bold')
pause(0.5)
for i=1:sn
    plot([nodex(x) nodex(i)],[nodey(x) nodey(i)])
end
pause(0.5)
for i=1:sn
    plot([nodex(x) nodex(i)],[nodey(x) nodey(i)],'w')
end
pause(0.5)
for i=1:sn
    plot([nodex(x) nodex(i)],[nodey(x) nodey(i)])
end
pause(0.5)
for i=1:sn
    plot([nodex(x) nodex(i)],[nodey(x) nodey(i)],'w')
end
for i=1:sn
    plot([nodex(x) nodex(i)],[nodey(x) nodey(i)])
end
pause(0.5)
for i=1:sn
    plot([nodex(x) nodex(i)],[nodey(x) nodey(i)],'w')
end

time=[];
for i=[1:(x-1) (x+1):sn]
    dist=sqrt((nodex(x)-nodex(i))^2+(nodey(x)-nodey(i))^2);
    time=[time; dist]; % calculating the distances of each node from master node
end

xlswrite('static faults',[1:(x-1) (x+1):sn]',1,'A5')
xlswrite('static faults',time,1,'B5')
xlswrite('static faults',time,1,'C5')

ind=1;
for i=[1:(x-1) (x+1):sn] %excluding the master node here
    for j=[fni(1:(l/2))] %these excludes the dynamic faults, they are not yet discovered. also excludes intermittent faults
        if i==j
            time(ind)=0;
            break
        end
    end
    
    for j=[fni((ceil(l/2)+1):ceil((3*l)/4))]
        if i==j
            a=time(ind);
            b=time(ind)+5; 
            X= a + (b-a).*rand(1,1);
            time(ind)=time(ind)+X;
            break
        end
    end    
    ind=ind+1;
end
xlswrite('static faults',time,1,'D5')
%flooding process ends %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

plot(pfx,pfy,'ro','LineWidth',3) %displaying the permanent faults
plot(tfx,tfy,'ko','LineWidth',3) %displaying the transient faults
plot(ifx,ify,'go','LineWidth',3) %displaying the intermittent faults
%plot(dfx,dfy,'mo','LineWidth',3) %displaying the dynamic faults
%legend('r* Permanent','k* Transient','g* Intermittent','m* Dynamic','Location','NorthEastOutside')

%line 98-102 for the erased nodes during flooding
for i=1:sn
    plot(nodex(i),nodey(i),'b*') % plotting all the mobiles
    text(nodex(i)+0.05,nodey(i),num2str(i))
    hold on
end

end

Code 4 – Function M File – check_nn2.m

function [coor2]=check_nn2(nn,fnodex,fnodey,len,nodex,nodey,coor1)

i=1;
while i>=1
    [r1 c1]=size(nn);
    if i>r1
        x=randi(4);
        switch x
            case 1
                [leftnodesx,leftnodesy]=nodes_left(coor1,nodex,nodey);
                searchx=leftnodesx;
                searchy=leftnodesy;
            case 2
                [upnodesx,upnodesy]=nodes_up(nodex,nodey,coor1);
                searchx=upnodesx;
                searchy=upnodesy;
            case 3
                [downnodesx,downnodesy]=nodes_down(nodex,nodey,coor1);
                searchx=downnodesx;
                searchy=downnodesy;
            case 4
                [rightnodesx,rightnodesy]=nodes_right(coor1,nodex,nodey);
                searchx=rightnodesx;
                searchy=rightnodesy;
            otherwise
                disp('wrong')
        end        
        [nn]=nearest_node(searchx,searchy,coor1);
        i=1;
    end 
% i=1;
% while i>=1    
    for j=1:length(fnodex)
        if nn(i,1)==fnodex(j) && nn(i,2)==fnodey(j)        
%             bad=[bad; fnodex(j) fnodey(j)];
%             total=[total; fnodex(j) fnodey(j)];
            i=i+1;
            coor2=[];
            break
        else
            coor2=[nn(i,1) nn(i,2)];
        end
    end
    
    if ~isempty(coor2)        
        break
    end
end

end

Code 5 – Function M File – nearest_node.m

%function [coor2_1,coor2_2,coor2_3,coor2_4,coor2_5]=nearest_node(nodesx,nodesy,coor1,savenodes)
function [nn]=nearest_node(nodesx,nodesy,coor1)
distance=[];
nn=[];
%[rs cs]=size(savenodes);
% for i=1:length(nodesx)
%     for j=1:rs
%         if nodesx(i)~=savenodes(j,1) && nodesy(i)~=savenodes(j,2)
%             remnodesx=[remnodesx nodesx(i)];
%             remnodesy=[remnodesy nodesy(i)];
%         end
%     end
% end

for i=1:length(nodesx)
    dist=sqrt( ((coor1(1)-nodesx(i))^2) + ((coor1(2)-nodesy(i))^2) );
    distance=[distance dist];
end

[b,ix]=sort(distance); 

for i=ix
    nn=[nn; nodesx(i) nodesy(i)];
end


% [r1 c1]=size(nn);
% [r2 c2]=size(used_nodes);
% for i=1:r1
%     for j=1:r2
%         if nn(i,1)==used_nodes(j,1) && nn(i,2)==used_nodes(j,2)
%             nn(i,:)=[];
%         end
%     end
% end

% if nargout==0
%     coor2_1=coor1;
%     coor2_2=coor1;
%     coor2_3=coor1;
%     coor2_4=coor1;
%     coor2_5=coor1;
% end

end

Code 6 – Function M File – blinking_line.m

function blinking_line(coor1,coor2)
%disp('BLINKING LINE')
for iiii=1:3   
        abc=plot([coor1(1) coor2(1)],[coor1(2) coor2(2)]);
        pause(0.3)
        delete(abc)
        pause(0.3)
        abc=plot([coor1(1) coor2(1)],[coor1(2) coor2(2)]);
        pause(0.3)
        delete(abc)
        pause(0.3)
%        abc=plot([coor1(1) coor2(1)],[coor1(2) coor2(2)]);
end
    
end

Code 7 – Function M File – bestmatch.m

function bestmatch(iii,coor1,coor2,fnodex,fnodey,l)
% coor1
% coor2
% pause
if iii>=1 && iii<=ceil(l/4)
    hold on
    blinking_line(coor1,coor2) %%%%%%%%%%% NOT WORKING %%%%%%%%%%%
    plot([coor1(1) fnodex(iii)],[coor1(2) fnodey(iii)],'r','LineWidth',2) %%%%%%%%%%% NOT WORKING %%%%%%%%%%%
    plot(fnodex(iii),fnodey(iii),'r*','LineWidth',2)
    text(fnodex(iii)+0.05,fnodey(iii),'\leftarrow Permanent Error')
elseif iii>=(ceil(l/2)+1) && iii<=ceil((3*l)/4)
    hold on
    blinking_line(coor1,coor2)  %%%%%%%%%%% NOT WORKING %%%%%%%%%%%
    plot([coor1(1) fnodex(iii)],[coor1(2) fnodey(iii)],'g','LineWidth',2) %%%%%%%%%%% NOT WORKING %%%%%%%%%%%
    plot(fnodex(iii),fnodey(iii),'g*','LineWidth',2)
    text(fnodex(iii)+0.05,fnodey(iii),'\leftarrow Intermittent Error')
elseif iii>=(ceil((3*l)/4)+1) && iii<=l
    hold on
    blinking_line(coor1,coor2)  %%%%%%%%%%% NOT WORKING %%%%%%%%%%%
    plot([coor1(1) fnodex(iii)],[coor1(2) fnodey(iii)],'m','LineWidth',2)  %%%%%%%%%%% NOT WORKING %%%%%%%%%%%
    plot(fnodex(iii),fnodey(iii),'m*','LineWidth',2)
    text(fnodex(iii)+0.05,fnodey(iii),'\leftarrow Dynamic Error')
end

end

Code 8 – Function M File – infotransfer1.m

function nn=infotransfer1(coor1,r_coor,nodex,nodey,total)

hori=coor1(1)-r_coor(1); 
vert=coor1(2)-r_coor(2); 

if abs(hori)>abs(vert)
    
    
elseif abs(hori)<abs(vert)
    
end

Code 9 – Function M File – nodes_down.m

function [downnodesx,downnodesy]=nodes_down(nodesx,nodesy,coor1,r_coor)

downnodesx=[];
downnodesy=[];
%downnodes_index=[];

for i=1:length(nodesx)
    if nodesy(i)<coor1(2) && nodesy(i)>=r_coor(2)
        downnodesx=[downnodesx nodesx(i)];            
        downnodesy=[downnodesy nodesy(i)];
        
%        downnodes_index=[downnodes_index i]; % index of nodex and nodey for upnodesx and upnodesy
    end
end



end

Code 10 – Function M File – nodes_left.m

function [leftnodesx,leftnodesy]=nodes_left(coor1,nodex,nodey,r_coor)

leftnodesx=[];
leftnodesy=[];
%leftnodes_index=[];

for i=1:length(nodex)
    if nodex(i)<coor1(1) && nodex(i)>=r_coor(1)
        leftnodesx=[leftnodesx nodex(i)];            
        leftnodesy=[leftnodesy nodey(i)];
        
 %       leftnodes_index=[leftnodes_index i]; % index of nodex and nodey for leftnodex and leftnodey
    end
end

end

Code 11 – Function M File – nodes_right.m

function [rightnodesx,rightnodesy]=nodes_right(coor1,nodex,nodey,r_coor)

rightnodesx=[];
rightnodesy=[];
%rightnodes_index=[];

for i=1:length(nodex)
    if nodex(i)>coor1(1) && nodex(i)<=r_coor(1)
        rightnodesx=[rightnodesx nodex(i)];            
        rightnodesy=[rightnodesy nodey(i)];
        
%        rightnodes_index=[rightnodes_index i]; % index of nodex and nodey for leftnodex and leftnodey
    end
end

end

Code 12 – Function M File – nodes_up.m

function [upnodesx,upnodesy]=nodes_up(nodesx,nodesy,coor1,r_coor)

upnodesx=[];
upnodesy=[];
%upnodes_index=[];

for i=1:length(nodesx)
    if nodesy(i)>coor1(2) && nodesy(i)<=r_coor(2)
        upnodesx=[upnodesx nodesx(i)];            
        upnodesy=[upnodesy nodesy(i)];
        
 %       upnodes_index=[upnodes_index i]; % index of nodex and nodey for upnodesx and upnodesy
    end
end

end

Code 13 – Function M File – getsensorinfo.m

function [sender,receiver,s_coor,r_coor,opt]=getsensorinfo(sn,fni,nodex,nodey)

opt=0;
sender=input('Enter the sender node ID: '); %sender node ID
while isempty(sender) || sender>sn  %getting a valid value
    sender=input('Enter the sender node ID: ');
end

for i=fni
    if sender==i;
%        error('The sender is a faulty node')                
        opt=1; %when sender is out bound
    end
end

receiver=input('Enter the receiver node ID: '); %reciever node ID
while isempty(receiver) || receiver>sn  %getting a valid value
    receiver=input('Enter the receiver node ID: ');
end

for i=fni
    if receiver==i;
        %error('The receiver is a faulty node')                
        opt=2; %when reciever is out bound
    end
end

s_coor=[nodex(sender) nodey(sender)]; %sender coordinates
r_coor=[nodex(receiver) nodey(receiver)]; %receiver coordinates

end

Code 14 – Function M File – infotransfer.m

function nn=infotransfer(coor1,r_coor,nodex,nodey,total)

hori=coor1(1)-r_coor(1); %horizontal distance between "coor1" and "r_coor"
vert=coor1(2)-r_coor(2); %vertical distance between "coor1" and "r_coor"
% searchnodesx1=[];
% searchnodesy1=[];
if abs(hori)>abs(vert) %if hosirzontal distance is greater then vertical distance
    % search horizontally 
    if coor1(1)>r_coor(1) % search in left direction    
        [leftnodesx,leftnodesy]=nodes_left(coor1,nodex,nodey,r_coor); %set of nearest nodes in left direction till "r_coor"       
        
%         for i=1:length(leftnodesx)
%             if r_coor(1)<leftnodesx(i)
%                 searchnodesx1=[searchnodesx1; leftnodesx(i)];
%                 searchnodesy1=[searchnodesy1; leftnodesy(i)];
%             end
%         end
                
        if coor1(2)<r_coor(2) %search in upwards direction
            [upnodesx,upnodesy]=nodes_up(leftnodesx,leftnodesy,coor1,r_coor);
            searchnodesx=upnodesx;
            searchnodesy=upnodesy;
        elseif coor1(2)>r_coor(2) %search in downwards direction
            [downnodesx,downnodesy]=nodes_down(leftnodesx,leftnodesy,coor1,r_coor);
            searchnodesx=downnodesx;
            searchnodesy=downnodesy;
        end                 
    elseif coor1(1)<r_coor(1) % search in right direction
        [rightnodesx,rightnodesy]=nodes_right(coor1,nodex,nodey,r_coor);
        
%         for i=1:length(rightnodesx)
%             if r_coor(1)>rightnodesx(i)
%                 searchnodesx1=[searchnodesx1; rightnodesx(i)];
%                 searchnodesy1=[searchnodesy1; rightnodesy(i)];
%             end
%         end
        
        if coor1(2)<r_coor(2) %search in upwards direction
            [upnodesx,upnodesy]=nodes_up(rightnodesx,rightnodesy,coor1,r_coor);
            searchnodesx=upnodesx;
            searchnodesy=upnodesy;
        elseif coor1(2)>r_coor(2) %search in downwards direction
            [downnodesx,downnodesy]=nodes_down(rightnodesx,rightnodesy,coor1,r_coor);
            searchnodesx=downnodesx;
            searchnodesy=downnodesy;
        end                    
    end    
elseif abs(hori)<abs(vert)
    % search vertically
    if coor1(2)>r_coor(2) % search in down direction
        [downnodesx,downnodesy]=nodes_down(nodex,nodey,coor1,r_coor);        
        
%         for i=1:length(downnodesy)
%            if r_coor(2)<downnodesy(i)
%                 searchnodesx1=[searchnodesx1; downnodesx(i)];
%                 searchnodesy1=[searchnodesy1; downnodesy(i)];
%             end
%         end
        
        if coor1(1)<r_coor(1) %search in right direction
            [rightnodesx,rightnodesy]=nodes_right(coor1,downnodesx,downnodesy,r_coor);            
            searchnodesx=rightnodesx;
            searchnodesy=rightnodesy;
        elseif coor1(1)>r_coor(1) %search in left direction
            [leftnodesx,leftnodesy]=nodes_left(coor1,downnodesx,downnodesy,r_coor);            
            searchnodesx=leftnodesx;
            searchnodesy=leftnodesy;
        end        
    elseif coor1(2)<r_coor(2) % search in up direction
        [upnodesx,upnodesy]=nodes_up(nodex,nodey,coor1,r_coor);        
        
%         for i=1:length(upnodesy)
%            if r_coor(2)>upnodesy(i)
%                 searchnodesx1=[searchnodesx1; upnodesx(i)];
%                 searchnodesy1=[searchnodesy1; upnodesy(i)];
%             end
%         end

        if coor1(1)<r_coor(1) %search in right direction
            [rightnodesx,rightnodesy]=nodes_right(coor1,upnodesx,upnodesy,r_coor);            
            searchnodesx=rightnodesx;
            searchnodesy=rightnodesy;
        elseif coor1(1)>r_coor(1) %search in left direction
            [leftnodesx,leftnodesy]=nodes_left(coor1,upnodesx,upnodesy,r_coor);            
            searchnodesx=leftnodesx;
            searchnodesy=leftnodesy;
        end        
    end    
end

[nn]=nearest_node(searchnodesx,searchnodesy,coor1); %set of nearest nodes from "coor1" in the direction of "r_coor"
%nn2=nn;
 nn2=[];
 %pause
if ~isempty(total) %loop to delete already occured nodes 
    [r1 c1]=size(nn);
    [r2 c2]=size(total);
    for i=1:r1
        for j=1:r2            
            if nn(i,1)==total(j,1) && nn(i,2)==total(j,2)
                nn2=[nn2 i];
%                 i
%                 disp('caught')
%                 nn2(i,:)
%                 nn2(i,:)=[]; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
            end
        end
    end    
end
nn(nn2,:)=[]; %deleting the rows from "nn" which have already occured

% [r c]=size(nn);
% for i=1:r
%     plot(nn(i,1),nn(i,2),'y*')
% end
% pause(1)
% for i=1:r
%     plot(nn(i,1),nn(i,2),'b*')
% end
    
%nn=nn2;
end

Code 15 – Function M File – placing_sensors.m

function [nodex,nodey,avg]=placing_sensors(sn)

low=0; %lower bound to both the axis
high=10; %upper bound to both the axis
 
nodex=[];
nodey=[];
for i=1:sn
    nodex=[nodex (low+(high-low)*rand)];  %x coordinates of all the points
    nodey=[nodey (low+(high-low)*rand)];  %y coordinates of all the points
end
 
for i=1:sn 
    plot(nodex(i),nodey(i),'b*') %ploting the x and y coordinates
    text(nodex(i)+0.05,nodey(i),num2str(i)) %giving all the nodes there respective numbers
    hold on
end

av=0;
ind=0;
for i=1:length(nodex)
    for j=1:length(nodex)
        dist=sqrt((nodex(i)-nodex(j))^2+(nodey(i)-nodey(j))^2);
        av=av+dist;
        ind=ind+1;
    end
end

avg=av/ind;
end

Code 16 – Function M File – witherror.m

%function [total,good,bad,diag]=witherror(s_coor,r_coor,nodex,nodey,fnodex,fnodey,l)
function [total]=witherror(s_coor,r_coor,nodex,nodey,fnodex,fnodey,l,avg,fni,sender,receiver)

coor1=s_coor; %initialising "coor1" with the value of "sender coordinate" i.e. "s_coor"
%blinking will always occur from "coor1" to "coor2"
total=[];
total=[total; coor1];
% good=[];
% bad=[];
% diag=[];
while coor1(1)~=r_coor(1) && coor1(2)~=r_coor(2)   % loop will run untill "coor1" is not equal to "receiver coordinates" i.e. "r_coor"
    nn=infotransfer(coor1,r_coor,nodex,nodey,total); %finding out the valid group of nodes to mark "coor2"
    %pause    
    [coor2,total,x]=check_nn(nn,fnodex,fnodey,l,nodex,nodey,coor1,total,r_coor); %finding suitable node "coor2" from "nn". also stores its coordinates
    if x==0 %the node is not diagnosable
        [coor2]=check_avg(coor2,coor1,avg,nodex,nodey,r_coor,fni,sender,receiver); 
    end    
    blinking_line(coor1,coor2)
    plot([coor1(1) coor2(1)],[coor1(2) coor2(2)],'k','LineWidth',2)
    if coor2(1)==r_coor(1) && coor2(2)==r_coor(2)
        total=[total; coor2];
%        good=[good; coor2];        
        break
    else
        coor1=coor2;                
        total=[total; coor2];
%        good=[good; coor1];
    end
end

end

Code 17 – Function M File – withouterror.m

function [total]=withouterror(s_coor,r_coor,nodex,nodey,fnodex,fnodey,l,avg,fni,sender,receiver)

coor1=s_coor;
total=[];
while coor1(1)~=r_coor(1) && coor1(2)~=r_coor(2)    
    nn=infotransfer(coor1,r_coor,nodex,nodey,total);
%    [coor2]=check_nn2(nn,fnodex,fnodey,l,nodex,nodey,coor1);
    coor2=[nn(1,1) nn(1,2)];
    [coor2]=check_avg2(coor2,coor1,avg,nodex,nodey,r_coor,fni,sender,receiver);
    blinking_line(coor1,coor2)
    plot([coor1(1) coor2(1)],[coor1(2) coor2(2)],'k','LineWidth',2)
    total=[total; coor1];    
    if coor2(1)==r_coor(1) && coor2(2)==r_coor(2)
        total=[total; r_coor];    
        break
    else
        coor1=coor2;                
    end
end

end

Code 18 – Function M File – check_nn.m

%function [coor2,bad,total,diag]=check_nn(nn,fnodex,fnodey,len,nodex,nodey,coor1,bad,total,r_coor,diag)
function [coor2,total,x]=check_nn(nn,fnodex,fnodey,len,nodex,nodey,coor1,total,r_coor,fni,avg,sender,receiver)
%[coor2]=check_avg2(coor2,coor1,avg,nodex,nodey,r_coor,fni,sender,receiver)
i=1;
x=0;
while i>=1
    [r1 c1]=size(nn);
    if i>r1 % loop will run when the given "nn" does not have any value. we have to find new set of "nn"
 %       disp('yes')
        [leftnodesx,leftnodesy]=nodes_left(coor1,nodex,nodey,r_coor); %nodes to the left of "coor1" till "r_coor"
        [upnodesx,upnodesy]=nodes_up(nodex,nodey,coor1,r_coor); %nodes to the up of "coor1" till "r_coor"
        [downnodesx,downnodesy]=nodes_down(nodex,nodey,coor1,r_coor); %nodes to the down of "coor1" till "r_coor"
        [rightnodesx,rightnodesy]=nodes_right(coor1,nodex,nodey,r_coor); %nodes to the right of "coor1" till "r_coor"
       
        [nn1]=nearest_node(upnodesx,upnodesy,coor1); %arranging the obtained sets in ascending order. it contains the nearest nodes 
        [nn2]=nearest_node(downnodesx,downnodesy,coor1);
        [nn3]=nearest_node(leftnodesx,leftnodesy,coor1);
        [nn4]=nearest_node(rightnodesx,rightnodesy,coor1);
        
        if ~isempty(nn1) %if "nn1" is not empty, it will find the nearest node 
            for ii=1:length(fnodex)
                if nn1(1,1)==fnodex(ii) && nn1(1,2)==fnodey(ii) %cheking if the nearest nodes in up direction is faulty
                    a=[];
                    break
                else
                    a=[nn1(1,1) nn1(1,2)]; %nearest nodes in up direction
                end
            end
        else % if "nn1" is empty
             a=[];
        end 
       
         if ~isempty(nn2)%if "nn2" is not empty, it will find the nearest node 
            for ii=1:length(fnodex)
                if nn2(1,1)==fnodex(ii) && nn2(1,2)==fnodey(ii) %cheking if the nearest nodes in down direction is faulty
                    b=[];
                    break
                else
                    b=[nn2(1,1) nn2(1,2)]; %nearest node in down direction
                end
            end
         else% if "nn2" is empty
             b=[];
         end
        
         if ~isempty(nn3)%if "nn3" is not empty, it will find the nearest node 
            for ii=1:length(fnodex)    
                if nn3(1,1)==fnodex(ii) && nn3(1,2)==fnodey(ii) %cheking if the nearest nodes in left direction is faulty
                    c=[];
                    break
                else
                    c=[nn3(1,1) nn3(1,2)]; %nearest node in left direction
                end
            end
         else% if "nn3" is empty
             c=[];
         end
        
         if ~isempty(nn4)%if "nn4" is not empty, it will find the nearest node 
            for ii=1:length(fnodex)    
                if nn4(1,1)==fnodex(ii) && nn4(1,2)==fnodey(ii) %cheking if the nearest nodes in right direction is faulty
                    d=[];
                    break
                else
                    d=[nn4(1,1) nn4(1,2)]; %nearest node in right direction
                end
            end
         else% if "nn4" is empty
             d=[];
         end
            
        if ~isempty(a)
            ad=sqrt((coor1(1)-a(1))^2+(coor1(2)-a(2))^2); %distance to the nearest up node
        else
            ad=50; %high value given so that it will not be chosen at the time of selection
        end
        
        if ~isempty(b)
            bd=sqrt((coor1(1)-b(1))^2+(coor1(2)-b(2))^2); %distance to the nearest down node
        else
            bd=50;%high value given so that it will not be chosen at the time of selection
        end
        
        if ~isempty(c)
            cd=sqrt((coor1(1)-c(1))^2+(coor1(2)-c(2))^2); %distance to the nearest left node
        else
            cd=50;%high value given so that it will not be chosen at the time of selection
        end
        
        if ~isempty(d)
            dd=sqrt((coor1(1)-d(1))^2+(coor1(2)-d(2))^2); %distance to the nearest right node
        else
            dd=50;%high value given so that it will not be chosen at the time of selection
        end
        
        mat=[ad bd cd dd];
        [mini,ind]=min(mat);  %finding the closest of all the 4 direction nodes
        switch ind
            case 1
                nn=nn1; %up is closest
            case 2
                nn=nn2; %down is closest
            case 3
                nn=nn3; %left is closest
            case 4
                nn=nn4; %right is closest
            otherwise
                disp('wrong')
        end
        
        i=1;
    end 
    
    for j=1:length(fnodex) %loop will rum for the number of faulty nodes times
        if nn(i,1)==fnodex(j) && nn(i,2)==fnodey(j) %checking the nearest node for faultyness                        
            coor_1=coor1;
            coor_2=[fnodex(j) fnodey(j)];
            %[coor2]=check_avg2(coor2,coor1,avg,nodex,nodey,r_coor,fni,sender,receiver);
            %pause
            blinking_line(coor_1,coor_2)  %this blinking is till the faulty node
%            disp('---------')            
            x=0;
            if j>=(ceil(len/4)+1) && j<=ceil(len/2)
                plot(fnodex(j),fnodey(j),'ko','LineWidth',2)
                text(fnodex(j)+0.05,fnodey(j),'\leftarrow Transient Error Diagnosed')
                coor2=[fnodex(j) fnodey(j)];
                %diag=[diag; coor2];
%                total=[total; coor2];
                x=1; %this means that the faulty node is diagnosable
                break
            else
                bestmatch(j,coor_1,coor_2,fnodex,fnodey,len) 
                %bad=[bad; fnodex(j) fnodey(j)]; %storing the coordinates
                total=[total; fnodex(j) fnodey(j)];
            end     
            
            if x==1 % if the node can be diagnosed, break. no need to search further
                break
            else % if the node cnt be diagnosed, mark it with its color
                i=i+1; % and move on to the next nearest node in "nn"
                coor2=[]; %no value of "coor2" obtained
                break
            end
        else
            coor2=nn(i,:); % "coor2" is the i^th nearest node from "nn" as it is not a faulty node as checked at line 113
        end
    end
    
    if ~isempty(coor2) %breaks the loop if we have a value of "coor2"
        break
    end
end

end

Code 19 – Function M File – check_avg.m

function [coor2]=check_avg(coor2,coor1,avg,nodex,nodey,r_coor,fni,sender,receiver)

dist=sqrt((coor1(1)-coor2(1))^2+(coor1(2)-coor2(2))^2);
range=ceil(avg/2);

if dist<range;
    coor2=coor2;
    
elseif dist>range;
    range_ind=[]; %nodes in the range of coor1
    for i=[1:sender-1  sender+1:length(nodex)]
        dist=sqrt((coor1(1)-nodex(i))^2+(coor1(2)-nodey(i))^2);
        if dist<=range
            range_ind=[range_ind i]; %finding all the nodes in the range of coor1
        end
    end    
    %range_ind
    
    for i=1:length(fni)
        x=find(range_ind==fni(i));
        range_ind(x)=[]; %deleting the faulty indexes
    end
    %range_ind
    %pause

    %the closest node to r_coor in the range of coor1 will connected
    distances=[];
    for i=range_ind
        dist=sqrt((r_coor(1)-nodex(i))^2+(r_coor(2)-nodey(i))^2);
        distances=[distances; i dist];
    end    
    x=find(distances(:,2)==min(distances(:,2)));
    coor2=[nodex(distances(x,1)) nodey(distances(x,1))];    
end    

Code 20 – Function M File – check_avg2.m

function [coor2]=check_avg2(coor2,coor1,avg,nodex,nodey,r_coor,fni,sender,receiver)

dist=sqrt((coor1(1)-coor2(1))^2+(coor1(2)-coor2(2))^2);
range=ceil(avg/2);

if dist<range;
    coor2=coor2;
    
elseif dist>range;
    range_ind=[]; %nodes in the range of coor1
    for i=[1:sender-1  sender+1:length(nodex)]
        dist=sqrt((coor1(1)-nodex(i))^2+(coor1(2)-nodey(i))^2);
        if dist<=range
            range_ind=[range_ind i]; %finding all the nodes in the range of coor1
        end
    end    
    %range_ind
    
    %the closest node to r_coor in the range of coor1 will connected
    distances=[];
    for i=range_ind
        dist=sqrt((r_coor(1)-nodex(i))^2+(r_coor(2)-nodey(i))^2);
        distances=[distances; i dist];
    end    
    x=find(distances(:,2)==min(distances(:,2)));
    coor2=[nodex(distances(x,1)) nodey(distances(x,1))];    
end    

 

Document Categorization

PROJECT VIDEO

ABSTRACT

Document categorization is process of categorizes the document based on their content. Document categorization in today world is most critical problem because most document categorization in organizations is done manually. We save at work hundreds of files and e-mail messages in folders every day. This reason companies need of automatic classification or categorization of documents. Document categorization reduced a lot of burden of company or organization Document classification, also known as document categorization, is the process of assigning documents to categories. A particular document may fit into two or more different categories. Without the categorization of documents will leads to increase the search time. But use of this process will reduce the search time and burden of working employees in particular organization. Document categorization process is also known as text classification.

LITERATURE REVIEW

Document categorization is not new idea. It has been developed in 1950.The main purpose of document categorization is convert unstructured document into structured form .so various approaches can develop by scientist to solve this problem.
Debnath Bhattacharya[1] that discuss the history of document categorization. In this show Different approaches to solve problem of document categorization. In this paper show that text Data mining approaches. In this paper describe that different algorithm can be proposed to Convert unstructured document structured document. The approaches like lexical chain  Method, linearization method and neural network approaches. It explain how it can work and  Also explain the limitations of methods.
Dina Goren-Bar and Tsvi Kuflik [2] that discuss the document the categorization using Lava(learning vector organization) and SQM (self organizing maps). In this papers there is evaluate and compare the results document classification using learning vector  organization and self organizing maps .The main purpose of this work was to evaluate the possibility of automating the classification of subjectively categorized data sets. The classification by using self organization map and learning vector organization that provides the accuracy in the result.
Hang lie, kanji Yamanishi [3] that discuss the framework of Document classification using finite mixture model. It is a new method to document to classify the document into categories. In this define the each category a finite mixture model based on soft clustering of words . Then conducting the statistical hypothesis. The main problem is classifying the document into no of categories. Each categories also determine already containing the newly documents also determine which category contain newly document to be assigned. To address this issue, the method is also called the hard clustering. But this method is degrading classification of results. But this problem can be solved by soft
clustering, in soft clustering use the finite mixture model .This model classifying the document based on the soft clustering of words.
Zhihang Chen that represents categorization of documents by using of different neural network approaches [4]. In this paper described single neural network technique is not efficient task for categorizing the collection of documents. So in this describe the hierarchical neural networks for document categorization. These hierarchical neural networks that provides the efficiency in document classification
G.S Thakur [5]described the new framework for document categorization/ classification  to achieve the efficiency and accuracy in comparison to other techniques. The Previous technique that is applied on the document classification does not provide satisfied result. In binary model each row indicates the no of documents , each column indicates the term in document. This method convert unstructured document into a structured document representation. Text classification is based on supervised learning model. In this learning we divided our dataset into two parts. One part is called training dataset and anot her part is called test dataset
Quire Zhang and jinghua [6] that represent medical document categorization using naive  byes approach. In this paper described that in document classification indexing is very important because its helps classify. The documents based predetermined classes . In this also discuss the various effects of training sets in document classification. By using this method represent improved performance of document classifications.
Riel’s and boonyasopon [7]that represent the data mining approach to document classification. In this paper knowledge mining approach is used for document classification . To text analysis tool can be used for purpose of mine the knowledge by analyzing the different documents that can be categories according to different domains. In this paper to categories the document on the purpose extract the knowledge by using of knowledge based data mining approach. Manhood soltani, Mohammad taher [8]that represent the classification of textual document by using learning vector organization. In this paper to represent the class by vector called codebook . This helps to identify the different features in text documents to categories the different documents. These methods of document classification use the less training set and this method is faster than other document categorization methods that used f or this purpose.
Dr.S.R.Suresh, T.Karthikeyan, D.B.Shanmugam, J.Dhilipan [9] that presented use reinforcement learning for document classification. The learning measures the utility of action that provide the benefits in future. It provides the model for document classification is Q- learning algorithm. This algorithm performs sub phases for document classification. This algorithm, determine Q -value of documents of each category that belongs to particular domain. During learning process maps text with the Q-value the determine for each categories. We also identify that the use of reinforcement learning technique significantly improve the efficiency of text document classifiers and show that reinforcement learning is a prominent technique. In this classification process text  documents of specific domain are completely gone through for classification and thus improve the accuracy.

SIGNIFICANCE

The main significance of proposed document categorization system is very useful for organization because every organization holds the large amount of data like electronic document, email messages if it can be categorize then search time to particular document is reduce and if not then it leads to lot of effort to manage it. Today most of organization done this process manually. So increase the lot of burden on employees in particular organization. We save at work hundreds of files and e-mail messages in folders every day. This reason companies need of automatic classification or categorization of documents. Document categorization reduced a lot of burden of company or organization. This will also increase the search time of particular document. The process of categorizes the documents help to reduce this time. In today world increase the wide variety of electronic documents its needs to be categorizes. If different documents in organization is not categorize than this leads to decrease search time of documents The study of document categorization/classification is main importance is by increasing the no of electronic documents from different types of resources. These resources include electronic document, news articles and electronic mail. This resource sometimes is structured, semi-structured and unstructured. From this resources extract information is very important research area. So the process of categorizing the documents by assign the category based on their content. The process of categorizing the document it ’s very important need because without categorizing the document to search the particular document is very difficult and extract the knowledge from this document is very difficult process. By use document categorization software its help to decrease this type of difficulties.

OBJECTIVES

The main objective of document classification is assigning predefined category to documents. Traditionally this process done by experts or employees in organization manually. Firstly read the documents properly and then assign number of categories according to the predefined categories. So my objective is reducing this burden. so  design a such system that that fully fill this requirement. The system or software for  document categorization that can be implemented is use full to applicable wide variety of applications like topic spotting, email routing language guessing and spam filtering. The objective of document categorization is reduce to detail and diversity of data and
result will be large amount of data is overloaded to similar types of documents. The document categorization/classification is combination of two process document categorization and document clustering. The main difference is that document categorization is categorizes the document and its supervised approach clustering to collect the similar types of object and its unsupervised approach.

MATLAB SOURCE CODE

Instructions to run the code

1- Copy each of below codes in different M files.
2- Place all the files in same folder

3- Use the files from below link and download them into the same folder as the source codes

dictionary       new

4- Also note that these codes are not in a particular order. Copy them all and then run the program.
5- Run the “final_file.m” file

Read me file

please follow the following instruction to successfully run the program

# THE PROGRAM WILL RUN ONLY ON TEXT FILES. SO PLEASE CONVERT YOUR DOCUMENT IN .txt FORMAT

# REGARDING THE EXCEL FILE "dictionary.xls"

1. the excel file "dictionary.xls" should not be deleted in any case

2. if it is deleted, make a new file with same name and same extension

3. the first row of the excel file contains all the topics into which the document is to be sorted

4. leave the row 2 empty

5. write down the supporiting words which will support the title in first row in the same column

# REGARDING THE TEXT FILE "new.txt"


1. the excel file "new.txt" should not be deleted in any case

2. if it is deleted, make a new file with same name and same extension

3. keep your desired text in this file for detection

Code 1 – Script M File – final_file.m

close all
clear all
clc

disp('%%%%%%%%%%%%% WELCOME TO THE DOCUMENT CHARACTERIZATION SYSTEM %%%%%%%%%%%%%')
disp('%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%')
% function to input the file to be tested
disp('Inputting the file...')
disp('Press enter to continue...')
pause
fid=file_loc_inp;
disp('File inputted')
disp('%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%')

% checking for a match from the dictionary
disp('Checking for a match from the dictionary')
disp('This might take time according to the inputted file')
disp('or the dictionary size')
disp('Press enter to continue...')
pause
[num,txt,raw]=xlsread('dictionary');
%txt;
[row col]=size(txt);
result=[];
dict_words={};
for i=1:col    
    mat=[];
    for j=1:row
        fid=file_loc_inp;                       
        [r1 c1]=size(char(txt(j,i)));
        if c1==0
            iteration_num=0;
            mat=[mat iteration_num];            
        else
            [iteration_num]=check_file(fid,txt(j,i));        
            mat=[mat iteration_num];
        end
        dict_words(j,i)={char(txt(j,i))};        
    end
    result(:,i)=mat';
end
result;
disp('%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%')
disp('Press enter for final results')
pause
disp('%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%')
[results,wordcnt,unq_wrdcnt] = wordcount;

disp('Total number of words: ')
disp(wordcnt)
disp('Number of unique words: ')
disp(unq_wrdcnt)

disp('FINAL OUTPUT COMPRISING OF THE FREQUECNY AND RELATUVE FREQUENCY OF THE ')
disp('WORDS FROM DICTIONARY')
final_output=table_func(result,dict_words,results,wordcnt);
disp(final_output)

disp('OVERALL WORDCOUNT AND FREQUENCY')
disp(results)

disp('%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%')
% to check the category
if find(result)==0;
    disp('The document inputted doesnt belong to any of the category')
else
    [index]=category(result);
    disp('File check complete...')
    disp('The category of the document is: ')
    txt(1,index)
end
disp('%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%')

Code 2 – Function M File -table_func.m

function final_output=table_func(result,dict_words,results,wordcnt)

[r c]=size(result);
mat=[];
mat_a={};
mat_b=[];
mat_c=[];
ind=1; % row number for mat_a, mat_b, mat_c
in=2;
for i=1:r
    for j=1:c
        if result(i,j)~=0
            output(in,1)={char(dict_words(i,j))};
            mat_a(ind,1)={char(dict_words(i,j))};
            
            output(in,2)={result(i,j)};
            mat_b(ind,1)=result(i,j);
            
            output(in,3)={((result(i,j))/wordcnt)*100};
            mat_c(ind,1)=((result(i,j))/wordcnt)*100;
            
            mat=[mat result(i,j)];
            in=in+1;
            ind=ind+1;
        end
    end
end

clear in
[B,IX]=sort(mat,'descend');
final_output={'DICTIONARY WORD' 'FREQUENCY' 'RELATIVE FREQUENCY (%)'};
[r c]=size(output);
s=1;
for i=2:r    
    final_output(i,1)={char(mat_a(IX(s),1))};
    final_output(i,2)={mat_b(IX(s),1)};
    final_output(i,3)={((mat_b(IX(s),1))/wordcnt)*100};
    s=s+1;
end    

end

Code 3 – Function M File -wordcount.m

function results = wordcount( filenam, num)


% First import the words from the text file into a cell array

[FileName,PathName] = uigetfile('*.txt','Select any text file');
y= [PathName,FileName];
fid = fopen(y);
words = textscan(fid,'%s');


for i=1:numel(words{1,1})
    ind = find(isstrprop(words{1,1}{i,1}, 'alphanum') == 0);

    words{1,1}{i,1}(ind)=[];
    
end

% Remove entries in words that have zero characters
for i = 1:numel(words{1,1})
    if size(words{1,1}{i,1}, 2) == 0
        words{1,1}{i,1} = ' ';
    end
end

% Now count the number of times each word appears
unique_words = unique(words{1,1});

freq = zeros(numel(unique_words), 1);

for i = 1:numel(unique_words)
    if max(unique_words{i} ~= ' ')
        for j = 1:numel(words{1,1})
            if strcmp(words{1,1}(j), unique_words{i})
                freq(i) = freq(i) + 1;
            end
        end
    end
end


% Finally, print out the results

u_freq = unique(freq);

sorted_freq = sort(u_freq, 'descend');

results={ 'WORD' 'FREQ' 'REL. FREQ' };

for i = 1:min(numel(find(sorted_freq > 1)), 10)
    ind = find(freq == sorted_freq(i));
    results{i+1, 1} = unique_words{ind};

    results{i+1, 2} = unique(freq(ind));
    results{i+1, 3} = sprintf('%.4f%s', unique(freq(ind)/numel(words{1,1}))*100, '%');
end

sprintf('The words that appeared more than once are displayed below\nTotal number of words in "%s" = %d\nTotal number of unique words = %d', 's2.txt', numel(words{1,1}), numel(find(freq)))

sprintf('display Frequency of Word  "%d"',sorted_freq(1))
sprintf('display word of High Frequency char "%s"',results{2, 1})
sc=results{2, 1};
new_catag={'science','computer','english'};
for k = 1:3
if(strcmp(sc,new_catag(k))==1)
    sprintf('This document comes under catagory:%s',sc)
else
     sprintf('This document  not comes under catagories that is  defined ')
    
end

end
fclose(fid);

Code 4 – Function M File -wordcount.m

%function results = wordcount( filenam, num)
function [results,wordcnt,unq_wrdcnt] = wordcount

% First import the words from the text file into a cell array
%  [FileName,PathName] = uigetfile('*.txt','Select any text file'); %to promt user to select the file
%  y= [PathName,FileName];
% fid = fopen(y);
fid = fopen('new.txt','r');
words = textscan(fid,'%s');

for i=1:numel(words{1,1})
    ind = find(isstrprop(words{1,1}{i,1}, 'alphanum') == 0);

    words{1,1}{i,1}(ind)=[];
    
end

% Remove entries in words that have zero characters
for i = 1:numel(words{1,1})
    if size(words{1,1}{i,1}, 2) == 0
        words{1,1}{i,1} = ' ';
    end
end

% Now count the number of times each word appears
unique_words = unique(words{1,1});

freq = zeros(numel(unique_words), 1);

for i = 1:numel(unique_words)
    if max(unique_words{i} ~= ' ')
        for j = 1:numel(words{1,1})
            if strcmp(words{1,1}(j), unique_words{i})
                freq(i) = freq(i) + 1;
            end
        end
    end
end


% Finally, print out the results

u_freq = unique(freq);

sorted_freq = sort(u_freq, 'descend');

results={ 'WORD' 'FREQ' 'REL. FREQ' };

for i = 1:min(numel(find(sorted_freq > 1)), 10)
    ind = find(freq == sorted_freq(i));
    results{i+1, 1} = unique_words{ind};

    results{i+1, 2} = unique(freq(ind));
    results{i+1, 3} = sprintf('%.4f%s', unique(freq(ind)/numel(words{1,1}))*100, '%');
end

% sprintf('The words that appeared more than once are displayed below\nTotal number of words in "%s" = %d\nTotal number of unique words = %d', 's2.txt', numel(words{1,1}), numel(find(freq)))
wordcnt= numel(words{1,1});  % -->total number of words
unq_wrdcnt=numel(find(freq));  % -->number of unique words
% 
% sprintf('display Frequency of Word  "%d"',sorted_freq(1))
% sprintf('display word of High Frequency char "%s"',results{2, 1})
sc=results{2, 1};
new_catag={'science','computer','english'};
for k = 1:3
% if(strcmp(sc,new_catag(k))==1)
%     sprintf('This document comes under catagory:%s',sc)
% else
%      sprintf('This document  not comes under catagories that is  defined ')
%     
% end

end
fclose(fid);


end

Code 5 – Function M File – category.m

function [index]=category(result)

summation=sum(result); % to get total number of repetitions for every word in dictionary (columnwise)
[maximum,index]=max(summation);

end

Code 6 – Function M File -check_file.m

function [iteration_num]=check_file(fid,term)

tline = fgetl(fid);
line_num=0;
iteration_num=0;
% ischar(tline)
% ischar(tline)~=0
if ischar(tline)~=0
    while ischar(tline)
        line_string = sprintf('%s',tline);        
        u=strfind(line_string,(char(term)));
        line_num=line_num+1;
        iteration_num = iteration_num + length(u);
        tline = fgetl(fid); %go to next line    
    end
end

fclose(fid);

end

Code 7 – Function M File -file_check.m

function detect=file_check(fid)

[num,txt,raw]=xlsread('dictionary');
[row col]=size(txt);
detect=[];

for i=1:col    
    ln=1;
    for j=1:row
        tline = fgetl(fid);
        detect=[];
        while ischar(tline)    
            line_string = sprintf('%s',tline); % line from the text file
            x=txt(i,j)
            u = strfind(line_string, x); % checking the word from dictionary in the file
            detect=[detect u]
            tline = fgetl(fid); %go to next line
            ln=ln+1
            
        end
    end
end

Code 8 – Function M File -file_loc_inp.m

function fid=file_loc_inp

% fid--> variable in which the fileto be tested is stored
% 'r' --> read operation
% new.txt --> name of the file. it should be present in the same
% directories with other program files. if not, then proper location should
% be provided (only text format allowed)

fid = fopen('new.txt','r');
end

Recent Posts

Tags

ad-hoc networks AODV boundary detection process classification clustering clustering algorithm Colour Information computer vision Decryption Encryption EZRP ICM (Iterated Conditional Modes) image denoising image enhancement IMAGE PROCESSING image segmentation Imaging and image processing MANET Markov Random Fields neutrosophic logic optical network proposed method PSNR QLab system region growing Robert’s operator Seed point selection segmentation semi-automatic algorithm Shadow Detection shadow removal wall motion wireless communication Wireless network wireless networks Wireless Sensor Network wireless sensor networks ZRP