Jump to content
ann_macs

String Compare

Recommended Posts

ade x sape2 yg pro dlm string compare.program ni nk kire word dlm dua file n then compare how many the same word

in that files.nk compare tu yg xdpt2 lg.tolong la sape2 yang boleh tolong.as soon as possible

ni codingnye.problem kat strcmp tu

#include<fstream.h>

#include<iostream.h>

#include<conio.h>

#include<string.h>

unsigned long int size=0, wcount=0, wordcount=0, wspace=0;

char c, e, d, f;

char kname[100];

char lname[100];

void main(){

int cnt=0;

ifstream infile1,infile2;

cout<<"Enter File name : ";

cin>>kname;

infile1.open(kname,ios::in);

cout<<"Enter File name : ";

cin>>lname;

infile2.open(lname,ios::in);

while(!infile1 && !infile2)

{

cout<<"File does not exist!! TRY AGAIN";

cout<<"\n\nPress Any key to Continue";

getch();

infile1.close();

infile2.close();

cout<<"Enter File name : ";

cin>>kname;

infile1.open(kname,ios::in);

cout<<"Enter File name : ";

cin>>lname;

infile2.open(lname,ios::in);

if (strcmp(kname,lname==0))//strcmp ni xleh execute.still bg result =0

cnt++;

}

infile1.seekg(0);

e=32;

while(infile1)

{

infile1.get©;

if((c == 32)||(c=='\n')||(c==9))

{

//wspace++;

if((e != 32)&&(e !='\n')&&(e != 9))

wcount++;

}

else if(c != EOF)

size++;

else

wcount++;

e=c;

}

cout<<endl<<"Total No of Words in the File "<<kname<<" are: "<<wcount;

cout<<endl;

infile2.seekg(0);

f=32;

while(infile2)

{

infile2.get(d);

if((d == 32)||(d=='\n')||(d==9))

{

//wspace++;

if((f != 32)&&(f !='\n')&&(f != 9))

wordcount++;

}

else if(d != EOF)

size++;

else

wordcount++;

f=d;

}

cout<<endl<<"Total No of Words in the File "<<lname<<" are: "<<wordcount;

cout<<endl;

cout<<"Total same words are: "<<cnt<<endl;

}

Share this post


Link to post
Share on other sites

sebelum menjawab soalan, terlebih dahulu saya nak tegur lecturer awak tu yg tak nak baca buku 'The C++ Programming Language' oleh Bjarne Stroustrup utk latest c++ standard coding. satu lagi, sekarang ni dah tahun 2007, kenapa masih pakai turbo c++ versi DOS lagi?

utk menjawan soalan awak plak, 1st bug memang kat

if (strcmp(kname,lname==0))
kalau dirujuk kepada http://www.cplusplus.com/reference/clibrar...ing/strcmp.html
int strcmp ( const char * str1, const char * str2 ); Compare two strings Compares the C string str1 to the C string str2. Return Value Returns an integral value indicating the relationship between the strings: A zero value indicates that both strings are equal.
code yang awak tulis tu memang tak logik la. sebab strcmp tu akan compare dua string jer dan jika nilainya sama, maka kita akan dapat kosong sebagai return value. oleh itu code yang betul,
if (strcmp(kname,lname))
anyway, aku tak faham anda nak buat apa sebenarnya dgn
if (strcmp(kname,lname))
cnt++;

sebabnya, kalau nak kira string yg sama dari kedua2 file, bukanlah semudah itu, code awak di atas tu hanyalah compare 2 file name yg akan dimasukkan, bukan compare isi kandungan di dalam file.

pseudokod utk algo ni,

0. declare 2 vector string,

vector<string> file1;

vector<string> file2;

1. salin (push) semua perkataan dari file1 dan file2 dalam vector

2. compare perkataan pertama dari file1, dgn semua perkataan dalam file2

3. jika sama, cnt++

4. ulang step 2 dgn perkataan kedua, ketiga,........

rujuk http://forum.putera.com/tanya/index.php?showtopic=26408 utk vector, etc

Edited by zeph

Share this post


Link to post
Share on other sites

sebelum menjawab soalan, terlebih dahulu saya nak tegur lecturer awak tu yg tak nak baca buku 'The C++ Programming Language' oleh Bjarne Stroustrup utk latest c++ standard coding. satu lagi, sekarang ni dah tahun 2007, kenapa masih pakai turbo c++ versi DOS lagi?

Mula-mula aku nak sentuh yang ni dulu, "Kenapa masih pakai c++ versi DOS?". Aku rase sebab sofwer tu free, tapi takkanlah universiti tu kedokut sangat nak keluar duit beli compiler yang canggih sikit (yang terkini la...). Aku rase sebab kedua ni lagi tepat:

Lacturer2 kat Malaysia ni (bukan sume tapi ramai) adalah dari jenis bodoh (kurang ilmu) sombong, iaitu bodoh tapi tak mahu belajar dan berlagak pandai. Ini pengalaman aku le, dulu aku jadi teknisyen jadi selalu gak pegi repair computer kat rumah lacturer ni. Biasanya diaorang akan tanya kat mane kita belajar repair kom, apa degree yang kita ade. Lepas tu diaorang akan citer itu ini (teori) berkaitan komputer, mungkin nak tunjuk pandai tapi bila diaorang bercakap sesuatu yang bukan di bidang diorang jadi kita tahu diorang ni sebenarnya bodoh tapi berlagak pandai.

Ada satu ari tu aku gi repair komputer kat makmal komputer di sebuah U. Masa aku tengah repair, tiba2 ada kelas nak guna bilik tu (aku buat tak tahu je la sebab aku repair kat belakang, tak ganggu diorang). Kebetulan masa tu kelas komputer, ajar penggunaan Windows, dan masa tu lacturer nak ajar cara nak format disket. Katanya (lebih kurang) "Mula-mula kita keluar ke DOS (masa tu window 3.1), kemudian pada command line (C:\>) tapi kan .........". Aku tergelak, dalam hati la, tapi aku tak boleh nak tahan, tepaksa paling belakang tersenyum kat dinding. Apa tak nye, macam tu punya bangang pun boleh jadi lacturer. Kalau dah dok dalam Windows buat apa keluar ke DOS hanya kerana nak format disket, format dalam windows lagi mudah. Pastu mungkin lacturer tu perasan, aku tengok di masuk bilik sebelah, kedian ada sorang staf pegi kat aku, suruh aku keluar dulu.

Maaf le, bahasa aku ni kadang-kadang kasar sikit, aku ni tak reti nak berkias. Apa pun aku boleh simpulkan begini, lacturer kat Malaysia ni jenis malas nak belajar, tak reti nak update ilmu, tak mahu ikut perkembang dan tak ada rasa tanggungjawab untuk memberikan sesuatu yang lebih berguna (kalau tak yang terbaik) kepada pelajar. Mereka masih ajar silibus lapuk yang hampir tidak berguna bila pelajar tersebut tamat pengajian. Buat tak kisah, janji masuk kelas mengajar, cukup bulan dapat gaji. Tolonglah, berubah la sikit... Kalau tak macam mana nak lahirkan modal insan yang berkualiti.

Kembali kepada topik asal... Untuk menyelesaikan sesuatu yang rumit, kita perlukan pecahkan masaalah-masaalah kepada bahagian-bahagian yang lebih kecil dan membuat penyelesaian bagi setiap bahagian tersebut. Di dalam pengaturcaraan biasanya kita akan membuat fungsi. Di sini saya cadangkan ann_macs supaya membuat satu fungsi untuk mendapatkan (atau mengasingkan) perkataan dari string, misalnya seperti berikut:

LPCSTR GetWord(int Pos, LPCSTR pString)

{

......

// Buat koding disini untuk mendapatkan perkataan pada posisi Pos di dalam pString.

// Jika Pos melebihi jumlah perkataan yang ada didalam pString, kembalikan NULL.

}

Selesaikan dulu perkara ini (cuba selesaikan sendiri) kemudian poskan hasilnya di sini.

Edited by CFoo++

Share this post


Link to post
Share on other sites

actually problem kt nk kire word yg same dlm dua file tu je.program sy ni nk detect word same dlm dua file tu.boleh x tolong buat kn function or anything yg boleh detect word same dlm file tu.lecturer sy suggest gune string compare plg simple tp strcmp compare character.mcm mane nk compare words.xtau la dh bengong ni.due date kamis ni.tolong la

Edited by ann_macs

Share this post


Link to post
Share on other sites

actually problem kt nk kire word yg same dlm dua file tu je.program sy ni nk detect word same dlm dua file tu.boleh x tolong buat kn function or anything yg boleh detect word same dlm file tu.lecturer sy suggest gune string compare plg simple tp strcmp compare character.mcm mane nk compare words.xtau la dh bengong ni.due date kamis ni.tolong la

Cam ni....

#include "stdafx.h"
#include<conio.h>
#include<string.h>


int _tmain(int argc, _TCHAR* argv[])
{
char kname[100];
char lname[100];
int cnt=0;
FILE *infile1, *infile2;
std::cout<<"Enter File name : ";
std::cin>>kname;
infile1 = fopen( kname, "r" ); // open first file for reading
std::cout<<"Enter File name : ";
std::cin>>lname;
infile2 = fopen( lname, "r" ); // open second file for reading
int wordcount = 0;

char s1[100]; // buffer 1
char s2[100]; // buffer 2

if(infile1 && infile2)
{
printf("\nSame words:\n");

while(fscanf( infile1, "%s", s1 ) != EOF)
{
wordcount++;
fseek( infile2, 0L, SEEK_SET ); // set pointer to begining
while(fscanf( infile2, "%s", s2 ) != EOF)
{
if(strcmp(s1, s2) == 0)
{
printf("%s\n", s1); // display same word for debugging purpose
cnt++;
}
};
};

wordcount++;
std::cout<<std::endl<<"Total No of Words in the File "<<kname<<" are: "<<wordcount-1;
std::cout<<std::endl;
std::cout<<"Total same words are: "<<cnt<<std::endl;
}
else
std::cout<<"File does not exist!! TRY AGAIN";

std::cout<<"\n\nPress any key to quit...";
getch();
return 0;
}

[/codebox]

Aku kompil guna Visual Studio, ko kena convert balik supaya serasi ngan kompiler ko tu.... Tapi tak besh la cam ni, aku nak ko cube dulu supaya ko boleh belajar tapi sebab date line dah nak expired, aku terpaksa bagi. Aku harap ko study dan faham betul2.

Edited by CFoo++

Share this post


Link to post
Share on other sites

xpaham la nk gune vector tu and xreti mcm mane nk gune dlm program sy tu.boleh x u buatkn vector tu dlm program sy tu coz dh pening nk compare words tu.due date kamis ni.plz...tggl nk compare words tu je n klu boleh dpt tunjuk brape words yang same.plz....

Share this post


Link to post
Share on other sites

xpaham la nk gune vector tu and xreti mcm mane nk gune dlm program sy tu.boleh x u buatkn vector tu dlm program sy tu coz dh pening nk compare words tu.due date kamis ni.plz...tggl nk compare words tu je n klu boleh dpt tunjuk brape words yang same.plz....

hmm... saya masih belajar. ndak tau apa tu vector sekian.

Share this post


Link to post
Share on other sites

vector dalam c++ ni secara mudahnya ialah array of any element seperti string, integer, etc. size dan kandungan dalam vector tu boleh diubah secara dynamic semasa program berjalan.

ni plak contoh 'pure' c++ aku.

#include <iostream>
#include <vector>
#include <fstream>
#include <string>

using namespace std;

int main()
{

// data/variable init

    fstream infile1;
    fstream infile2;

    vector<string> vec1;
    vector<string> vec2;

    string str1;
    string str2;

    char kname[100]={0};
    char lname[100]={0};

// tanya utk nama file

    cout << "Enter File name : ";
    cin >> kname;
    infile1.open(kname,ios::in);

    cout << "Enter File name : ";
    cin >> lname;
    infile2.open(lname,ios::in);

// load file content to vector /////////////////////////////

    int sizefile1 = 0;
    int sizefile2 = 0;

//proses file pertama
    infile1.seekg(0);            // go to beginning of file
    while (infile1 >> str1)   //read the string by string
    {
        vec1.push_back(str1);
    }
    sizefile1 = vec1.size();     //dapatkan size vector (bilangan string/perkataan dalam file)

//proses file kedua

    infile2.seekg(0);            // go to beginning of file
    while (infile2 >> str2)   //read the string by string
    {
        vec2.push_back(str2);

    }
    sizefile2 = vec2.size();

//loop untuk compare string.

    int cnt = 0;

    for (int i = 0; i < sizefile1; i++)
    {
        for (int j = 0; j < sizefile2; j++)
        {
            if ((vec1[i] == vec2[j]))
            {
                cout << vec1[i] << endl;    //for debugging purpose
                cnt++;
            }
        }
    }

    cout << "jumlah perkataan sama :" << cnt << endl;

    system("pause");

    return 0;
}

Share this post


Link to post
Share on other sites

xpaham la nk gune vector tu and xreti mcm mane nk gune dlm program sy tu.boleh x u buatkn vector tu dlm program sy tu coz dh pening nk compare words tu.due date kamis ni.plz...tggl nk compare words tu je n klu boleh dpt tunjuk brape words yang same.plz....

Kod yang aku bagi kat atas tu (#5) dah menyelesaikan semuanya. Ko tak bace ke?

Edited by CFoo++

Share this post


Link to post
Share on other sites

relaks en. CFoo++.. ehehehe. mungkin dia dah baca, tapi aku rasa cikgu dia ajar "c++" jer, so mungkin dia pening dgn c style file manipulation. lagi pun dia baru belajar, jgn bagi code yg complex sangat lagi. guna divide n conquer easy approach.

Share this post


Link to post
Share on other sites

relaks en. CFoo++.. ehehehe. mungkin dia dah baca, tapi aku rasa cikgu dia ajar "c++" jer, so mungkin dia pening dgn c style file manipulation. lagi pun dia baru belajar, jgn bagi code yg complex sangat lagi. guna divide n conquer easy approach.

Oopss, sorry. Aku ni kan ape, sume benda nak cepat je. Tapi takpe lah, mod teruskan....

Share this post


Link to post
Share on other sites

sy dh study coding yg CFoo++ bg tu tp mmg pening la.yg zeph post boleh pki just sy kene edit utk count words.tq both of u coz tolong sy hope lecturer sy bleh accept.lg 1 nk tny zeph.sori la klu2 soalan ni soalan bodoh sbb sy pun xterer c++ that's why mtk tolong dlm forum ni.sy nk tny yg coding zeph bg tu pki algorithm ape?

boleh terang x algorithm tu

Share this post


Link to post
Share on other sites

aku malas nak explain sket pasal vector. korang boleh try run code di bawah sambil modify sket2.

string text1 = "lalala";
string text2 = "kakaka";
string text3 = "bababa";

vector<string> mystring;

mystring.push_back(text1);
mystring.push_back(text2);
mystring.push_back(text3);
cout << mystring[0] << endl << mystring[1] << endl << mystring[2] << endl;
kembali kepada tajuk. sebagai contoh, kita ada 2 file teks yang hendak di bandingkan persamaan dia. file1.txt
saya ada buku novel dalam bahasa melayu
ada 7 perkataan dalam file1 file2.txt
kami ada buku cerita kartun melayu

ada 6 perkataan dalam file2

1. kita kena compare 'saya' dari file1 dgn semua perkataan yg ada dalam file2.

saya == kami, saya == ada, saya == buku,....., saya == melayu.

terpaksa jalankan comparison sebanyak 6 kali sebab ada 6 perkataan dalam file2.

jadi terpaksa la kita kena buat loop utk compare semua perkataan tu.

2.kita kena compare 'ada' dari file1 dgn semua perkataan yg ada dalam file2.

ada == kami, ada == ada, ada == buku,....., ada == melayu.

3. .....

kalau anda tengok sendiri, ada 2 loop untuk membolehkan semua perkataan dibandingkan. anyway, aku tak reti sangat nak explain programming technique/algo ni. pandai2 la ek. banyakkan membuat latihan.

Share this post


Link to post
Share on other sites

Congrat zeph, ko memang boleh jadik guru la. Apsal aku buat CodeBox tak jadik aaa...?

Edited by CFoo++

Share this post


Link to post
Share on other sites

zeph boleh tolong sy x.sy xphm dgn lecturer sy tu.sy dh tunjuk kt dia coding yg awk dh btulkn tu dh run dpn dia.dia suruh btulkn total word same tu.mmg dlm program tu salah kire word yg same.program tu kire byk kali mcm klu ade perkataan "the" 5kali program tu g kire 5 kali gak tp lecturer sy xnk mcm tu.dia nk program tu kire sekali je contoh:

a.txt b.txt

(saya suka makan) (Dia suka makan)

Total word sama:2

word same:suka

makan

tp dlm program tu akan kire word same sebanyak 4 kali.lecturer sy suruh btulkn tp sy xpandai.boleh tlg sy x.skit je lg.

lg satu sy mmg bengang dgn lecturer sy ni.actually task sy ni berkaitan "plagiarism detection in student's assignment".b4 ni sy kene buat web based system tp xberjaya nk detect plagiarism tu.mmg sy xleh buat.lecturer sy nk bg sy fail tp sy sempat merayu bg 2nd chance utk sy buat skali lg sbb sy mmg ngaku sy serius lemah dlm c++ that's smpi skrg sy xdpt buat.after merayu lecturer sy stuju bg 2nd chance n dia pun phm sy xleh buat web based utk detect tu so dia pun bg la buat dlm DOS je dgn syarat,mcm ni dia ckp " i boleh bg u lulus janji u dpt detect dgn ape cara sekali pun" .so smlm sy pun tjk program yg zeph btulkn tu dia ckp mcm ni ke detect plagiarism dlm file.

bg sy la detect plagiarism tu detect words yg same dlm file.dia xstuju.

yg buat sy bengang janji dia tu.dia ckp dgn ape cr sklai pun boleh detect.sy ingat dgn dpt detect word yg same tu boleh slamatkn result sy tp bukan. sy dh dpt detect word same dia xpuas hati.dia nk sy btulkn kiraan word yg same tu n then dia xpuas hati dgn nape file txt je boleh detect dia suruh detect format file lain pastu dia nk algorithm.kate janji boleh detect tp bile sy dh buat ape yg mampu dia tambah2 plak.xphm btul la dgn dia tu.

xtau nk buat mcm mane dh.mmg nk mtk tolong zeph btulkn kiraan word yg same tu pastu nnt tjk xpuas hati lg sy xtau la sbb dia ade ckp "mcm ni ke detect plagiarism.mcm mane nk tau plagiarism ade ke x".

bennga la sy janji lain pastu tmbh2 lg.kate janji boleh detect tp ni dh mcm nk sy buat the whole system tu nk.tensen sy dgn dia ni....alih2 smlm dia mrh sy blik psl sy sebut janji tu.

tolong sy zeph.klu boleh sy nk ajar lecturer tu.plz....klu CFoo++ boleh tlong sy alu2kn.

CFoo++ sy pun nk mtk tolong dgn awk.bc la ape yg sy xpuas hati tu...plz CFoo++

Share this post


Link to post
Share on other sites

awk xnk tolong sy ke.sy btul2 mtk tlg.klu ikutkn sy dh give up dgn task tu tp sy nk grad sy nk tamatkan ijazah sy dgn segulung ijazah.sy harap awk dpt tlg sy utk capai cita2 sy tu.sy xnk buat mak ayah sy kecewa sbb diarog dh byk berabis utk sy.sy nk dptkn ijazah tu utk mak ayah sy.tolong sy zeph...CFoo++.....

Share this post


Link to post
Share on other sites

aku dah test balik, takde yg salah dalam algo (but maybe not perfect mcm yg dia nak). aku tak tau la mcm mana ko modify, maybe sebab compiler zaman purba awak tu. aku pakai compiler c/c++ dalam package visual c++ 2005 dgn menggunakan Code::Blocks IDE. anyway, kalau takde visual c++ 2005, Visual C++ Express Edition (free) juga menggunakan compiler yg sama dgn yg tak free tu.

#include <iostream>
#include <vector>
#include <fstream>
#include <string>

using namespace std;

int main()
{

// data/variable init

    fstream infile1;
    fstream infile2;

    vector<string> vec1;
    vector<string> vec2;

    string str1;
    string str2;

    char kname[100]={0};
    char lname[100]={0};

// tanya utk nama file

//    cout << "Enter File name : ";
//    cin >> kname;
    infile1.open("lala.txt",ios::in);

//    cout << "Enter File name : ";
    // cin >> lname;
    infile2.open("baba.txt",ios::in);

// load file content to vector /////////////////////////////

    int sizefile1 = 0;
    int sizefile2 = 0;

//proses file pertama
    infile1.seekg(0);            // go to beginning of file
    while (infile1 >> str1)   //read the string by string
    {
        vec1.push_back(str1);
    }
    sizefile1 = vec1.size();     //dapatkan size vector (bilangan string/perkataan dalam file)

//proses file kedua
//    cout<<"\n\nvector "<< vec1[2]<<endl;


    infile2.seekg(0);            // go to beginning of file
    while (infile2 >> str2)   //read the string by string
    {
        vec2.push_back(str2);

    }
    sizefile2 = vec2.size();

    cout << "\n\n";
//loop untuk compare string.

    int cnt = 0;

    for (int i = 0; i < sizefile1; i++)
    {
        for (int j = 0; j < sizefile2; j++)
        {
            if ((vec1[i] == vec2[j]))
            {
                cout << vec1[i] << endl;    //for debugging purpose
                cnt++;
            }
        }
    }

    cout << "jumlah perkataan sama :" << cnt << endl;

    system("pause");

    return 0;
}
lala.txt
saya suka makan
baba.txt
Dia suka makan
output
suka
makan
jumlah perkataan sama :2

Share this post


Link to post
Share on other sites

sy dh tau problem dlm program tu.sy dh study dlm2 rupenye program tu cr perkataan yg same dlm sume ayat.sy dh cube buat 1 file yg ade 2 ayat.mcm ni

1.txt

saya suka makan nasi.saya juga makan nasi bersama ayam

2.txt

saya suka makan nasi bersama ikan.saya suka makan beramai-ramai

word yg sama (yg program tu buat la)

saya

suka

suka

makan

makan

makan

makan

nasi

bersama

total same word : 9

problemnye cth dh jumpe "makan" dlm ayat pertama,dlm ayat kedua pun ada "makan" program tu kira gak

mcm ek zeph.sy dh pening ni.mati la sy klu xdpt buat gak.

cube awk buat 2,3 ayat n compile.sy pki microsoft visual c++ 6.0

actually program tu btul cume slh kire kt word yg same.pastu mcm mane sy nk buat program tu boleh detect plagiarism.lecturer sy ckp ni bkn detect plagiarism tp detect word same.kire same la tu kn.ke sy slh tafsir ape itu plagiarism.plagiarism tu taking another ideas without permission so that's mean amik word dlm file tu kn.btul x?

Share this post


Link to post
Share on other sites

aku dah test balik, takde yg salah dalam algo (but maybe not perfect mcm yg dia nak). aku tak tau la mcm mana ko modify, maybe sebab compiler zaman purba awak tu. aku pakai compiler c/c++ dalam package visual c++ 2005 dgn menggunakan Code::Blocks IDE. anyway, kalau takde visual c++ 2005, Visual C++ Express Edition (free) juga menggunakan compiler yg sama dgn yg tak free tu.

#include <iostream>
#include <vector>
#include <fstream>
#include <string>

using namespace std;

int main()
{

// data/variable init

    fstream infile1;
    fstream infile2;

    vector<string> vec1;
    vector<string> vec2;

    string str1;
    string str2;

    char kname[100]={0};
    char lname[100]={0};

// tanya utk nama file

//    cout << "Enter File name : ";
//    cin >> kname;
    infile1.open("lala.txt",ios::in);

//    cout << "Enter File name : ";
    // cin >> lname;
    infile2.open("baba.txt",ios::in);

// load file content to vector /////////////////////////////

    int sizefile1 = 0;
    int sizefile2 = 0;

//proses file pertama
    infile1.seekg(0);            // go to beginning of file
    while (infile1 >> str1)   //read the string by string
    {
        vec1.push_back(str1);
    }
    sizefile1 = vec1.size();     //dapatkan size vector (bilangan string/perkataan dalam file)

//proses file kedua
//    cout<<"\n\nvector "<< vec1[2]<<endl;
    infile2.seekg(0);            // go to beginning of file
    while (infile2 >> str2)   //read the string by string
    {
        vec2.push_back(str2);

    }
    sizefile2 = vec2.size();

    cout << "\n\n";
//loop untuk compare string.

    int cnt = 0;

    for (int i = 0; i < sizefile1; i++)
    {
        for (int j = 0; j < sizefile2; j++)
        {
            if ((vec1[i] == vec2[j]))
            {
                cout << vec1[i] << endl;    //for debugging purpose
                cnt++;
            }
        }
    }

    cout << "jumlah perkataan sama :" << cnt << endl;

    system("pause");

    return 0;
}
lala.txt
saya suka makan
baba.txt
Dia suka makan
output
suka
makan
jumlah perkataan sama :2

lebih kurang mcm ni la yg lecturer sy nk.

Software to detect plagiarism:

This program examines a collection of document files. It extracts the text portions of those documents and looks through them for matching words in phrases of a specified minimum length. When it finds two files that share enough words in those phrases, generates html report files. These reports contain the document text with the matching phrases underlined.

What can do: It can find documents that share large amounts of text. This result may indicate that one file is a copy or partial copy of the other, or that they are both copies or partial copies of a third document.the software can presently handle text, html, and some word processor files (notably Microsoft Word documents).

xdela smpi create html file just follow dia punye detection.kire lbh krg x dgn program yg awk btulkn tu.

Share this post


Link to post
Share on other sites

Oopss, sorry. Aku ni kan ape, sume benda nak cepat je. Tapi takpe lah, mod teruskan....

CFoo++, TOLONG SAYA...................

Zeph, TOLONG SAYA......................

Share this post


Link to post
Share on other sites

pos terakhir aku utk thread ni. kalau file text tu ada simbol (.) dan (,) dan (!), etc. pls reprogram it to exclude the symbol from the words before pushing it to the array. saya tak makan gaji dgn awak utk buat semua ni. so, finish it by yourself.

#include <iostream>
#include <vector>
#include <fstream>
#include <string>

using namespace std;

int main()
{

// data/variable init

    fstream infile1;
    fstream infile2;

    vector<string> vec1;
    vector<string> vec2;

    string str1;
    string str2;

    char kname[100]={0};
    char lname[100]={0};

// tanya utk nama file

    cout << "Enter File name 1: ";
    cin >> kname;
    infile1.open(kname,ios::in);

    cout << "Enter File name 2: ";
    cin >> lname;
    infile2.open(lname,ios::in);

// load file content to vector /////////////////////////////

    int sizefile1 = 0;
    int sizefile2 = 0;



    int y;
    infile1.seekg(0);            // go to beginning of file
    while (infile1 >> str1)   //read the string by string
    {
        y=0;
        if (vec1.size() == 0)
        {
            vec1.push_back(str1);


        }

        else
        {
            for (int z = 0; z < vec1.size(); z++)
            {

                if (vec1[z] == str1)
                {
                    y++;

                }

            }

            if (y == 0)
            {
                vec1.push_back(str1);
            }

        }


    }

    sizefile1 = vec1.size();
    for (int A = 0; A < vec1.size(); A++)
        cout << vec1[A] << " ";
    cout << endl<< endl;


    infile2.seekg(0);            // go to beginning of file
    while (infile2 >> str2)   //read the string by string
    {
        y=0;
        if (vec2.size() == 0)
        {
            vec2.push_back(str2);
        }

        else
        {
            for (int z = 0; z < vec2.size(); z++)
            {

                if (vec2[z] == str2)
                {
                    y++;

                }

            }

            if (y == 0)
            {
                vec2.push_back(str2);
            }

        }


    }
    sizefile2 = vec2.size();

    for (int A = 0; A < vec2.size(); A++)
        cout << vec2[A] << " ";
    cout << endl<< endl;

    //loop untuk compare string.
cout <<"----------------------------------\n";
    int cnt = 0;

    for (int i = 0; i < sizefile1; i++)
    {
        for (int j = 0; j < sizefile2; j++)
        {
            if ((vec1[i] == vec2[j]))
            {
                cout << vec1[i] << endl;    //for debugging purpose
                cnt++;
            }
        }
    }

    cout << "\njumlah perkataan sama :" << cnt << endl;

       system("pause");

    return 0;
}
input
saya suka makan nasi saya juga makan nasi bersama ayam aku bukan bodoh damn kamu ni apo la saya gi skoloh naik basikal
input
saya suka makan nasi bersama ikan kambing saya suka makan ramai suka fakap saya nasi sial la kamu ni kohkohokkohkohkoh aku gi skoloh naik keto buruk
output
Enter File name 1: lala.txt
Enter File name 2: baba.txt
saya suka makan nasi bersama ikan kambing ramai fakap sial la kamu ni kohkohokkohkohkoh aku gi skoloh naik keto buruk

saya suka makan nasi juga bersama ayam aku bukan bodoh damn kamu ni apo la gi skoloh naik basikal

----------------------------------
saya
suka
makan
nasi
bersama
la
kamu
ni
aku
gi
skoloh
naik

jumlah perkataan sama :12

akhir kata,

dari thread ni, semua programmer2 newbie sepatutnya ambik tahu, programming ialah bukan setakat belajar programming language rules and syntax, its is problem solving using programming language.

p/s:

I have made this letter(codes) longer than usual, because I lack the time to make it short. - Blaise Pascal

Share this post


Link to post
Share on other sites

Congrat zeph, ko memang boleh jadik guru la. Apsal aku buat CodeBox tak jadik aaa...?

http://forum.putera.com/tanya/index.php?s=&a...amp;CODE=bbcode

p/s: bila nak jawab pm saya?

awk xnk tolong sy ke.sy btul2 mtk tlg.klu ikutkn sy dh give up dgn task tu tp sy nk grad sy nk tamatkan ijazah sy dgn segulung ijazah.sy harap awk dpt tlg sy utk capai cita2 sy tu.sy xnk buat mak ayah sy kecewa sbb diarog dh byk berabis utk sy.sy nk dptkn ijazah tu utk mak ayah sy.tolong sy zeph...CFoo++.....

kalau tak boleh tamatkan sem ni, sem depan ada lagi. tak yah grad lagi la, nanti boleh practice programming skill selama satu sem sebelum memasuki alam pekerjaan. setakat dapat degree tapi tak boleh buat keje, takde guna jugak kan.

Share this post


Link to post
Share on other sites

ni function sy jumpe kt internet tp complicated sgt sy just nk follow dia punye comparison n the percentage tp xreti nk implement dlm coding yg awk post tu.

void CXdocument::Compare(int docs, int phraselength, int minstring, int wordthreshold, int firstnewdoc,
                         CString* szfolder, CListCtrl* preport, CStatic* pstatus, CProgressCtrl* pprogress,
                         bool bignore_case,
                         bool bignore_numbers,
                         bool bignore_punctuation,
                         bool bignore_outer_punctuation,
                         bool bskip_long_words,
                         bool bskip_nonwords,
                         unsigned int skip_length,
                         int tolerance,
                         int percentage,
                         wordmap *pmymap,
                         bool bbrief_report)
{
    FILE *fhtmlp;                                        // handle for html file (output)
    
    int doccount;                                        // running count of documents
    int    words,wordcount;                                // total number of words in document and running count
    int lcount;                                            // general use counter, for local use only
    char dstring[255];                                    // character buffer for document name strings
    char hrefL[1000],hrefR[1000];                            // href for the Left and Right html files
    char hrefB[1000];                                    // href from frame file for side-by-side viewing
    int docL,docR;                                        // document number of left document and right document
    int wordcountL,wordcountR;                            // word running count for left document and right document
    int firstL,firstR;                                    // first matching word in left document and right document
    int lastL,lastR;                                    // last matching word in left document and right document
    int firstLp,firstRp;                                // first perfectly matching word in left document and right document
    int lastLp,lastRp;                                    // last perfectlymatching word in left document and right document
    int firstLx,firstRx;                                // first original perfectly matching word in left document and right document
    int lastLx,lastRx;                                    // last original perfectlymatching word in left document and right document
    int matchedwordsL[10],matchedwordsR[10];            // total matched words in left document and right document
    int firstquality,lastquality;                        // extent of matching quality before starting word & after starting word
    int qcnt;                                            // quality counter
    int wordcountRsave;                                    // saved copy of wordcountR, for redundant words
    int docmax=1000;                                    // number of document entries allocated
    int docinc=1000;                                    // step size when allocating more document entries
    int wordmax=1000;                                    // number of word entries allocated
    int wordinc=1000;                                    // step size when allocating more word entries
    int comparecount;                                    // running count of documents 
    int perfectmatches;                                    // count of perfect matches within a phrase
    int start;                                            // pointer to first perfect match
    int anchor;                                            // number of current match anchor
    CString szerror;                                    // error messages
    CString szfilename;                                    // file names
    CString szmessage;                                    // status messages
    char *pstr;
    int SBScnt=0;                                        // count of side-by-side comparisons
    char SBScntstring[255];                                // string version of count

    extern bool g_abort;                                // abort signal when true

    wordinput indoc;                                    // wordinput class to handle inputting the document
    indoc.setminstringlen(minstring);                    // set minimum string length for document
    char word[256];                                        // container for the current word
    int delimitertype;                                    // type of word-ending delimiter

    int perfection=tolerance+1;                            // highest level of matching (perfect match)

    szfilename.Format("%s%s",*szfolder,"\\matches.txt");
    if((fmatch=fopen(szfilename, "w")) == NULL)            // open comparison report file
    {
        szerror.Format("%s%s","Cannot open ",szfilename);    // if failed, report
        AfxMessageBox(szerror);
        return;                                            // and return
    }
    szfilename.Format("%s%s",*szfolder,"\\matches.html");
    if((fmatchhtml=fopen(szfilename, "w")) == NULL)        // open comparison report file - html
    {
        szerror.Format("%s%s","Cannot open ",szfilename);    // if failed, report
        AfxMessageBox(szerror);
        return;                                            // and return
    }
    
    fprintf(fmatchhtml,
        "%s\n",
        "<html><title>File Comparison Report</title><body><H2>File Comparison Report</H2>");
    fprintf(fmatchhtml,
        "%s%i\n",
        "<H3>Produced by WCopyfind 2.5 with These Settings:</H3><br><blockquote>Shortest Phrase to Match: ",phraselength);
    fprintf(fmatchhtml,
        "%s%i\n",
        "<br>Fewest Matches to Report: ",wordthreshold);
    fprintf(fmatchhtml,
        "%s%i\n",
        "<br>Shortest String to Consider: ",minstring);
    if(bignore_punctuation)
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Ignore Punctuation: Yes");
    }
    else
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Ignore Punctuation: No");
    }
    if(bignore_outer_punctuation)
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Ignore Outer Punctuation: Yes");
    }
    else
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Ignore Outer Punctuation: No");
    }
    if(bignore_numbers)
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Ignore Numbers: Yes");
    }
    else
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Ignore Numbers: No");
    }
    if(bignore_case)
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Ignore Letter Case: Yes");
    }
    else
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Ignore Letter Case: No");
    }
    if(bskip_nonwords)
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Skip Non-Words: Yes");
    }
    else
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Skip Non-Words: No");
    }
    if(bskip_long_words)
    {
        fprintf(fmatchhtml,
            "%s%i%s\n",
            "<br>Skip Words Longer Than ",skip_length," Characters: Yes");
    }
    else
    {
        fprintf(fmatchhtml,
            "%s\n",
            "<br>Skip Long Words: No");
    }
    fprintf(fmatchhtml,
        "%s%i\n",
        "<br>Most Imperfections to Allow: ",tolerance);
    fprintf(fmatchhtml,
        "%s%i\n",
        "<br>Minimum % of Matching Words: ",percentage);
    fprintf(fmatchhtml,
        "%s\n",
        "</blockquote><br><br><table border='1'><tr><td>Total Match</td><td>Basic Match</td><td>View Both Files</td><td>File 1</td><td>File 2</td></tr>");

    int startticks=clock();                                // get initial processor clock ticks

    pstatus->SetWindowText("Loading and Hash-Coding Documents");
    
    if( (pqwordhash =                                    // allocate array for hash-coded words
        new unsigned long[wordmax]) == NULL )
    {
        AfxMessageBox("Could not allocate enough memory for the word data array.");
        return;
    }
    
    for(doccount=0;doccount<docs;doccount++)            // loop for all document entries
    {
        if(g_abort){
            pstatus->SetWindowText("Comparison Aborted");
            return;
        }
        pprogress->SetPos(doccount*100/docs);
        pstr=strrchr(this[doccount].docname,0x5C);
        pstr++;
        szmessage.Format("Loading: %s",pstr);
        pstatus->SetWindowText(szmessage);
        
        if(indoc.openwordinput(this[doccount].docname) != 0)    // open the next document for word input
        {
            CString szdocerr;
            szdocerr.Format("%s%s","Cannot open document ",this[doccount].docname);
            AfxMessageBox(szdocerr);
            return;
        }
        
        wordcount=0;                                    // set count of words in document to zero
        delimitertype=cwh;                                // pretend last delimiter was white space

        while( delimitertype != cef )                    // loop until an eof
        {
            indoc.getword(word,&delimitertype);            // get the next word
            
            if(bignore_punctuation) wordxpunct(word);    // if ignore punctuation is active, remove punctuation
            if(bignore_outer_punctuation) wordxouterpunct(word);    // if ignore outer punctuation is active, remove outer punctuation
            if(bignore_numbers) wordxnum(word);            // if ignore numbers is active, remove numbers
            if(bignore_case) wordxcase(word);            // if ignore case is active, remove case
            if(bskip_long_words & (strlen(word)>skip_length) ) continue;    // if skip too-long words is active, skip them
            if(bskip_nonwords & (!wordcheck(word)) ) continue;    // if skip nonwords is active, skip them

            if(pmymap != NULL) pmymap->map(word);        // if word mapping is active, map the word
            
            if (wordcount==wordmax)                        // if hash-coded word entries are full
            {
                if( (pxwordhash = new unsigned long[wordmax+wordinc]) == NULL )
                {                                        // allocate new, larger array of entries
                    AfxMessageBox("Could not allocate enough memory for the word data array.");
                    return;
                }

                for(lcount=0;lcount<wordmax;lcount++)
                {                                // loop for all hash-coded word entries
                    pxwordhash[lcount]=pqwordhash[lcount];
                }                                // copy hash-coded word entries to new array
                
                delete [] pqwordhash;            // delete old array

                pqwordhash=pxwordhash;            // set normal pointer to new, larger array
                pxwordhash=NULL;                // null out temporary pointer
                wordmax=wordmax+wordinc;        // set maximum to new, larger value
            }
                    
            pqwordhash[wordcount]=wordhash(word);// hash-code the word and save that hash
            wordcount++;                        // increment count of words
        }

        words=wordcount;                                // save number of words

        this[doccount].words=words;                        // save number of words in document entry

        if( (this[doccount].pwordhash =                    // allocate array for hash-coded words in doc entry
            new unsigned long[words]) == NULL )
        {
            AfxMessageBox("Could not allocate enough memory for the word data arrays.");
            return;
        }
        if( (this[doccount].pswordhash =                // allocate array for sorted hash-coded words
            new unsigned long[words]) == NULL )
        {
            AfxMessageBox("Could not allocate enough memory for the word data arrays.");
            return;
        }
        if( (this[doccount].pswordnum =                    // allocate array for sorted word numbers
            new int[words]) == NULL )
        {
            AfxMessageBox("Could not allocate enough memory for the word data arrays.");
            return;
        }
        
        for (lcount=0;lcount<words;lcount++)            // loop for all the words in the document
        {
            this[doccount].pwordhash[lcount]=            // copy over hash-coded words
                pqwordhash[lcount];
            this[doccount].pswordnum[lcount]=            // copy over word numbers
                lcount;
            this[doccount].pswordhash[lcount]=            // copy over hash-coded words
                pqwordhash[lcount];
        }

        heapsort(&this[doccount].pswordhash[-1],        // sort hash-coded words (and word numbers)
            &this[doccount].pswordnum[-1],words);

        if(phraselength > 1)                            // if phraselength > 1 word, skip over short words initially
        {
            for (lcount=0;lcount<words;lcount++)        // loop for all the words in the document
            {
                if( (this[doccount].pswordhash[lcount]    // if the word is longer than 3 letters, break
                    & 0xFFC00000) != 0 ) break;
            }
            this[doccount].firsthash = lcount;            // save the number of the first >3 letter word
        }
        else
        {
            this[doccount].firsthash = 0;                // if phraselength = 1 word, compare even the shortest words
        }

        indoc.closewordinput();                            // close this document
    }        

    if( (matchL = new int[wordmax]) == NULL )            // allocate array for left match markers
    {
        AfxMessageBox("Could not allocate enough memory for the word matching arrays.");
        return;
    }

    if( (matchR = new int[wordmax]) == NULL )            // allocate array for right match markers
    {
        AfxMessageBox("Could not allocate enough memory for the word matching arrays.");
        return;
    }

    if( (matchLa = new int[wordmax]) == NULL )            // allocate array for left match markers
    {
        AfxMessageBox("Could not allocate enough memory for the word matching arrays.");
        return;
    }

    if( (matchRa = new int[wordmax]) == NULL )            // allocate array for right match markers
    {
        AfxMessageBox("Could not allocate enough memory for the word matching arrays.");
        return;
    }

    if( (matchLt = new int[wordmax]) == NULL )            // allocate array for left match markers - temporary
    {
        AfxMessageBox("Could not allocate enough memory for the word matching arrays.");
        return;
    }

    if( (matchRt = new int[wordmax]) == NULL )            // allocate array for right match markers - temporary
    {
        AfxMessageBox("Could not allocate enough memory for the word matching arrays.");
        return;
    }

    if(firstnewdoc<2)                                    // if all documents are considered new,
    {
        firstnewdoc=2;                                    // start comparisons with the second document
    }
    
    long comparetotal=((docs*docs-docs)/2)-(((firstnewdoc-1)*(firstnewdoc-1)-(firstnewdoc-1))/2);
    int comparestep;

    if(comparetotal<100)
    {
        comparestep=1;
    }
    else if(comparetotal<1000)
    {
        comparestep=10;
    }
    else if(comparetotal<10000)
    {
        comparestep=100;
    }
    else
    {
        comparestep=1000;
    }
    
    pstatus->SetWindowText("Comparing Documents");
    pprogress->SetPos(0);
    comparecount=0;

    for(docL=firstnewdoc-1;docL<docs;docL++)            // for all possible left documents
    {
        for(docR=0;docR<docL;docR++)                    // for all possible right documents
        {
            if(g_abort){
                pstatus->SetWindowText("Comparison Aborted");
                return;
            }
            for(wordcountL=0;                            // loop for all left words
                wordcountL<this[docL].words;wordcountL++)
            {
                matchL[wordcountL]=0;                    // zero the left match markers
                matchLa[wordcountL]=0;                    // zero the left match anchors
            }
            for(wordcountR=0;                            // loop for all right words
                wordcountR<this[docR].words;wordcountR++)
            {
                matchR[wordcountR]=0;                    // zero the right match markers
                matchRa[wordcountR]=0;                    // zero the right match anchors
            }

            wordcountL=this[docL].firsthash;            // start left at first >3 letter word
            wordcountR=this[docR].firsthash;            // start right at first >3 letter word
            wordcountRsave=wordcountR;                    // prepare right redundant word pointer
            for(qcnt=0;qcnt<=perfection;qcnt++)
            {
                matchedwordsL[qcnt]=0;                    // zero counts of left matched words
                matchedwordsR[qcnt]=0;                    // zero counts of right matched words
            }
            anchor=0;                                    // start with no html anchors assigned
                        
            while ( (wordcountL < this[docL].words)        // loop while there are still words to check
                 && (wordcountR < this[docR].words) )
            {
                // if the next word in the left sorted hash-coded list has been matched

                if( matchL[this[docL].pswordnum[wordcountL]] != 0 )
                {
                    wordcountL++;                        // advance to next left word
                    continue;
                }

                // if the next word in the right sorted hash-coded list has been matched

                if( matchR[this[docR].pswordnum[wordcountR]] != 0 )
                {
                    wordcountR++;                        // skip to next right sorted hash-coded word
                    continue;
                }

                // check for left word less than right word

                if( this[docL].pswordhash[wordcountL] < this[docR].pswordhash[wordcountR] )
                {
                    wordcountL++;                        // advance to next left word
                    if ( wordcountL >= this[docL].words) break;
                    if ( this[docL].pswordhash[wordcountL] == this[docL].pswordhash[wordcountL-1] )
                    {
                        wordcountR=wordcountRsave;
                    }
                    else
                    {
                        wordcountRsave=wordcountR;
                    }
                    continue;                            // and resume looping
                }

                // check for right word less than left word

                if( this[docL].pswordhash[wordcountL] > this[docR].pswordhash[wordcountR] )
                {
                    wordcountR++;                        // advance to next right word
                    wordcountRsave=wordcountR;            // set pointer back to top of redundant words
                    continue;                            // and resume looping
                }

                // we have a match, so look up and down the hash-coded (not sorted) lists for matches

                matchLt[this[docL].pswordnum[wordcountL]]=perfection;    // markup word in temporary list at perfection quality
                matchRt[this[docR].pswordnum[wordcountR]]=perfection;    // markup word in temporary list at perfection quality

                firstL=this[docL].pswordnum[wordcountL]-1;    // start left just before current word
                lastL=this[docL].pswordnum[wordcountL]+1;    // end left just after current word
                firstR=this[docR].pswordnum[wordcountR]-1;    // start right just before current word
                lastR=this[docR].pswordnum[wordcountR]+1;    // end right just after current word

                while( (firstL >= 0) && (firstR >= 0) )        // if we aren't at the start of either document,
                {

                    // Note: when we leave this loop, firstL and firstR will always point one word before the first match
                    
                    // make sure that left and right words haven't been used in a match before and
                    // that the two words actually match. If so, move up another word and repeat the test.

                    if( matchL[firstL] != 0 ) break;
                    if( matchR[firstR] != 0 ) break;
                    if( this[docL].pwordhash[firstL] == this[docR].pwordhash[firstR] )
                    {
                        matchLt[firstL]=perfection;            // markup word in temporary list
                        matchRt[firstR]=perfection;            // markup word in temporary list
                        firstL--;                            // move up on left
                        firstR--;                            // move up on right
                        continue;
                    }
                    break;
                }

                while( (lastL < this[docL].words) && (lastR < this[docR].words) ) // if we aren't at the end of either document
                {

                    // Note: when we leave this loop, lastL and lastR will always point one word after last match
                    
                    // make sure that left and right words haven't been used in a match before and
                    // that the two words actually match. If so, move up another word and repeat the test.

                    if( matchL[lastL] != 0 ) break;
                    if( matchR[lastR] != 0 ) break;
                    if( this[docL].pwordhash[lastL] == this[docR].pwordhash[lastR] )
                    {
                        matchLt[lastL]=perfection;        // markup word in temporary list
                        matchRt[lastR]=perfection;        // markup word in temporary list
                        lastL++;                        // move down on left
                        lastR++;                        // move down on right
                        continue;
                    }
                    break;
                }

                firstLp=firstL+1;                        // point to first perfect match left
                firstRp=firstR+1;                        // point to first perfect match right
                lastLp=lastL-1;                            // point to last perfect match left
                lastRp=lastR-1;                            // point to last perfect match right
                perfectmatches=lastLp-firstLp+1;        // save number of perfect matches

                if(tolerance!=0)                        // are we accepting imperfect matches?
                {
                    firstquality=perfection;            // start marking quality at perfect matching
                    lastquality=perfection;                // start marking qualtiy at perfect matching

                    firstLx=firstLp;                    // save pointer to first perfect match left
                    firstRx=firstRp;                    // save pointer to first perfect match right
                    lastLx=lastLp;                        // save pointer to last perfect match left
                    lastRx=lastRp;                        // save pointer to last perfect match right

                    while( (firstL >= 0) && (firstR >= 0) )        // if we aren't at the start of either document,
                    {

                        // Note: when we leave this loop, firstL and firstR will always point one word before the first match
                        
                        // make sure that left and right words haven't been used in a match before and
                        // that the two words actually match. If so, move up another word and repeat the test.

                        if( matchL[firstL] != 0 ) break;
                        if( matchR[firstR] != 0 ) break;
                        if( this[docL].pwordhash[firstL] == this[docR].pwordhash[firstR] )
                        {
                            perfectmatches++;                    // increment perfect match count;
                            firstquality=perfection;            // we're at perfect matching
                            matchLt[firstL]=firstquality;        // markup word in temporary list
                            matchRt[firstR]=firstquality;        // markup word in temporary list
                            firstLp=firstL;                        // save pointer to first left perfect match
                            firstRp=firstR;                        // save pointer to first right perfect match
                            firstL--;                            // move up on left
                            firstR--;                            // move up on right
                            continue;
                        }

                        firstquality--;

                        if( firstquality == 0 ) break;            // check for maximum imperfections reached
                        
                        if( (firstL-1) >= 0 )                    // check one word earlier on left (if it exists)
                        {
                            if( matchL[firstL-1] != 0 ) break;    // make sure we haven't already matched this word
                            
                            if( this[docL].pwordhash[firstL-1] == this[docR].pwordhash[firstR] )
                            {
                                if( percent(firstL-1,firstR,lastLx,lastRx,perfectmatches+1) < percentage ) break;    // are we getting too imperfect?
                                matchLt[firstL]=firstquality;    // markup non-matching word in left temporary list
                                firstL--;                        // move up on left
                                perfectmatches++;                // increment perfect match count;
                                firstquality=perfection;        // we're once again at perfect matching
                                matchLt[firstL]=firstquality;    // markup word in lefttemporary list
                                matchRt[firstR]=firstquality;    // markup word in right temporary list
                                firstLp=firstL;                        // save pointer to first left perfect match
                                firstRp=firstR;                        // save pointer to first right perfect match
                                firstL--;                        // move up on left
                                firstR--;                        // move up on right
                                continue;
                            }
                        }

                        if( (firstR-1) >= 0 )                    // check one word earlier on right (if it exists)
                        {
                            if( matchR[firstR-1] != 0 ) break;    // make sure we haven't already matched this word

                            if( this[docL].pwordhash[firstL] == this[docR].pwordhash[firstR-1] )
                            {
                                if( percent(firstL,firstR-1,lastLx,lastRx,perfectmatches+1) < percentage ) break;    // are we getting too imperfect?
                                matchRt[firstR]=firstquality;    // markup non-matching word in right temporary list
                                firstR--;                        // move up on right
                                perfectmatches++;                // increment perfect match count;
                                firstquality=perfection;        // we're once again at perfect matching
                                matchLt[firstL]=firstquality;    // markup word in left temporary list
                                matchRt[firstR]=firstquality;    // markup word in right temporary list
                                firstLp=firstL;                        // save pointer to first left perfect match
                                firstRp=firstR;                        // save pointer to first right perfect match
                                firstL--;                        // move up on left
                                firstR--;                        // move up on right
                                continue;
                            }
                        }

                        if( percent(firstL-1,firstR-1,lastLx,lastRx,perfectmatches) < percentage ) break;    // are we getting too imperfect?
                        matchLt[firstL]=firstquality;    // markup word in left temporary list
                        matchRt[firstR]=firstquality;    // markup word in right temporary list
                        firstL--;                        // move up on left
                        firstR--;                        // move up on right
                    }
        
                    while( (lastL < this[docL].words) && (lastR < this[docR].words) ) // if we aren't at the end of either document
                    {

                        // Note: when we leave this loop, lastL and lastR will always point one word after last match
                        
                        // make sure that left and right words haven't been used in a match before and
                        // that the two words actually match. If so, move up another word and repeat the test.

                        if( matchL[lastL] != 0 ) break;
                        if( matchR[lastR] != 0 ) break;
                        if( this[docL].pwordhash[lastL] == this[docR].pwordhash[lastR] )
                        {
                            perfectmatches++;                // increment perfect match count;
                            lastquality=perfection;            // we're at perfect matching
                            matchLt[lastL]=lastquality;        // markup word in temporary list
                            matchRt[lastR]=lastquality;        // markup word in temporary list
                            lastLp=lastL;                    // save pointer to last left perfect match
                            lastRp=lastR;                    // save pointer to last right perfect match
                            lastL++;                        // move down on left
                            lastR++;                        // move down on right
                            continue;
                        }

                        lastquality--;
                        
                        if( lastquality == 0 ) break;        // check for maximum imperfections reached
                            
                        if( (lastL+1) < this[docL].words )        // check one word later on left (if it exists)
                        {
                            if( matchL[lastL+1] != 0 ) break;    // make sure we haven't already matched this word
                            
                            if( this[docL].pwordhash[lastL+1] == this[docR].pwordhash[lastR] )
                            {
                                if( percent(firstLx,firstRx,lastL+1,lastR,perfectmatches+1) < percentage ) break;    // are we getting too imperfect?
                                matchLt[lastL]=lastquality;    // markup non-matching word in left temporary list
                                lastL++;                    // move down on left
                                perfectmatches++;            // increment perfect match count;
                                lastquality=perfection;        // we're once again at perfect matching
                                matchLt[lastL]=lastquality;    // markup word in lefttemporary list
                                matchRt[lastR]=lastquality;    // markup word in right temporary list
                                lastLp=lastL;                // save pointer to last left perfect match
                                lastRp=lastR;                // save pointer to last right perfect match
                                lastL++;                    // move down on left
                                lastR++;                    // move down on right
                                continue;
                            }
                        }

                        if( (lastR+1) < this[docR].words )    // check one word later on right (if it exists)
                        {
                            if( matchR[lastR+1] != 0 ) break;    // make sure we haven't already matched this word

                            if( this[docL].pwordhash[lastL] == this[docR].pwordhash[lastR+1] )
                            {
                                if( percent(firstLx,firstRx,lastL,lastR+1,perfectmatches+1) < percentage ) break;    // are we getting too imperfect?
                                matchRt[lastR]=lastquality;    // markup non-matching word in right temporary list
                                lastR++;                    // move down on right
                                perfectmatches++;            // increment perfect match count;
                                lastquality=perfection;        // we're once again at perfect matching
                                matchLt[lastL]=lastquality;    // markup word in left temporary list
                                matchRt[lastR]=lastquality;    // markup word in right temporary list
                                lastLp=lastL;                // save pointer to last left perfect match
                                lastRp=lastR;                // save pointer to last right perfect match
                                lastL++;                    // move down on left
                                lastR++;                    // move down on right
                                continue;
                            }
                        }

                        if( percent(firstLx,firstRx,lastL+1,lastR+1,perfectmatches) < percentage ) break;    // are we getting too imperfect?
                        matchLt[lastL]=lastquality;            // markup word in left temporary list
                        matchRt[lastR]=lastquality;            // markup word in right temporary list
                        lastL++;                            // move down on left
                        lastR++;                            // move down on right
                    }
                }
                if( perfectmatches >= phraselength )    // check that phrase has enough perfect matches in it to mark
                {
                    anchor++;                                    // increment anchor count
                    for(lcount=firstLp;lcount<=lastLp;lcount++)    // loop for all left matched words
                    {
                        matchL[lcount]=matchLt[lcount];            // copy over left matching markup
                        matchedwordsL[matchLt[lcount]]++;        // increment count of matching words
                        matchLa[lcount]=anchor;                    // identify the anchor for this phrase
                    }
                    for(lcount=firstRp;lcount<=lastRp;lcount++)    // loop for all right matched words
                    {
                        matchR[lcount]=matchRt[lcount];            // copy over right matching markup
                        matchedwordsR[matchRt[lcount]]++;        // increment count of matching words
                        matchRa[lcount]=anchor;                    // identify the anchor for this phrase
                    }
                }
                wordcountR++;                                    // skip to next right sorted hash-coded word
            }

            comparecount++;                                        // increment count of comparisons

            if( (comparecount%comparestep)    == 0 )                // if count is divisible by 1000,
            {
                CString szcomp;
                pstatus->SetWindowText("Comparing Documents");

                szcomp.Format("%s%d%s","Comparing Documents, ",comparecount," Completed");
                pstatus->SetWindowText(szcomp);
                pprogress->SetPos(int((100.0*double(comparecount))/double(comparetotal)));
            }
            
            if(matchedwordsL[perfection]>=wordthreshold)        // if there are enough matches to report,
            {
                if(tolerance==0)
                {
                    perfectmatches=matchedwordsL[perfection];
                }
                else
                {
                    start=0;
                    perfectmatches=0;
                    for(lcount=0;lcount<this[docL].words;lcount++)
                    {
                        if(matchL[lcount] != perfection)
                        {
                            if(lcount-1-start+1>=phraselength)
                            {
                                perfectmatches=perfectmatches+lcount-1-start+1;
                                start=lcount+1;
                            }
                            start=lcount+1;
                        }
                    }
                    if(lcount-1-start+1>=phraselength)
                    {
                        perfectmatches=perfectmatches+lcount-1-start+1;
                    }
                }
                
                // report number of matching words in the two documents (assume left and right totals are equal)

                fprintf(fmatch,"%d\t%d\t%s\t%s\n",matchedwordsL[perfection],perfectmatches,this[docL].docname,this[docR].docname);
                
                CString szdocL,szdocR;

                char* clbackslash;
                clbackslash = strrchr(this[docL].docname,'\\');
                if(clbackslash == NULL)
                {
                    szdocL=this[docL].docname;
                }
                else
                {
                    clbackslash++;
                    szdocL=clbackslash;
                }

                clbackslash = strrchr(this[docR].docname,'\\');
                if(clbackslash == NULL)
                {
                    szdocR=this[docR].docname;
                }
                else
                {
                    clbackslash++;
                    szdocR=clbackslash;
                }

                int nItem;
                nItem=preport->GetItemCount();
                CString szreport,sztm,szbm;
                sztm.Format("%d  [%d%%,%d%%]",matchedwordsL[perfection],100*matchedwordsL[perfection]/this[docL].words,100*matchedwordsL[perfection]/this[docR].words);
                preport->InsertItem(nItem,sztm);
                szbm.Format("%d  [%d%%,%d%%]",perfectmatches,100*perfectmatches/this[docL].words,100*perfectmatches/this[docR].words);
                preport->SetItemText(nItem,1,szbm);
                preport->SetItemText(nItem,2,szdocL);
                preport->SetItemText(nItem,3,szdocR);
                preport->EnsureVisible(nItem,FALSE);
                preport->Update(nItem);

                strcpy(hrefL,szdocL);                            // assemble file name for left html file
                strncat(hrefL,".",255);
                strncat(hrefL,szdocR,255);
                strncat(hrefL,".html",255);

                strcpy(hrefR,szdocR);                            // assemble file name for right html file
                strncat(hrefR,".",255);
                strncat(hrefR,szdocL,255);
                strncat(hrefR,".html",255);

                SBScnt++;
                _itoa(SBScnt,SBScntstring,10);
                strcpy(hrefB,"SBS.");                            // assemble file name for side-by-side frame
                strncat(hrefB,SBScntstring,255);
                strncat(hrefB,".html",255);

                fprintf(fmatchhtml,
                    "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
                    "<tr><td>",sztm,"</td><td>",szbm,"</td><td><a href='",hrefB,
                    "' target='_blank'>Side-by-Side</a></td><td><a href='",hrefL,
                    "'>",szdocL,"</a></td><td><a href='",hrefR,
                    "'>",szdocR,"</a></td></tr>\n");

                strcpy(dstring,*szfolder);                        // assemble full html file name for left file
                strncat(dstring,"\\",255);
                strncat(dstring,hrefL,255);

                if((fhtmlp=fopen(dstring,"w")) == NULL)                // open html file
                {
                    szerror.Format("%s%s","Cannot open file ",dstring);
                    AfxMessageBox(szerror);
                    return;
                }

                // create header material for html file

                fprintf(fhtmlp,"%s%s%s%s%s%d%s","<html><title>Comparison of ",szdocL,
                    " with ",szdocR,
                    " (Matched Words = ",matchedwordsL[perfection],")</title><body><base target='right'>\n");

                if(indoc.openwordinput(this[docL].docname) != 0)    // open left document for word input
                {
                    szerror.Format("%s%d%s%s","Cannot open document #",docL," = ",this[docL].docname);
                    AfxMessageBox(szerror);
                    indoc.closewordinput();
                    return;
                }
                
                // generate text body of html file, with matching words underlined

                docprint(indoc,fhtmlp,matchL,matchLa,this[docL].words,minstring,
                    bignore_case,
                    bignore_numbers,
                    bignore_punctuation,
                    bignore_outer_punctuation,
                    bskip_long_words,
                    bskip_nonwords,
                    skip_length,
                    tolerance,
                    bbrief_report,
                    hrefR
                    );


                indoc.closewordinput();                                // close document

                fprintf(fhtmlp,"%s","\n</html></body>\n");            // complete html file
                
                fclose(fhtmlp);                                        // close html file

                strcpy(dstring,*szfolder);                            // assemble html file name for right file
                strncat(dstring,"\\",255);
                strncat(dstring,hrefR,255);

                if((fhtmlp=fopen(dstring,"w")) == NULL)                // open html file
                {
                    printf("Cannot open file ",dstring,"\n");        // if failed, report
                    return;
                }

                // create header material for html file

                fprintf(fhtmlp,"%s%s%s%s%s%d%s","<html><title>Comparison of ",szdocR,
                    " with ",szdocL,
                    " (Matched Words = ",matchedwordsR[perfection],")</title><body><base target='left'>\n");

                if(indoc.openwordinput(this[docR].docname) !=0)        // open right document
                {
                    szerror.Format("%s%d%s%s","Cannot open document #",docR," = ",this[docR].docname);
                    AfxMessageBox(szerror);
                    indoc.closewordinput();                            // close document
                    return;
                }
                
                // generate text body of html file, with matching words underlined

                docprint(indoc,fhtmlp,matchR,matchRa,this[docR].words,minstring,
                    bignore_case,
                    bignore_numbers,
                    bignore_punctuation,
                    bignore_outer_punctuation,
                    bskip_long_words,
                    bskip_nonwords,
                    skip_length,
                    tolerance,
                    bbrief_report,
                    hrefL
                    );

                indoc.closewordinput();

                fprintf(fhtmlp,"%s","\n</html></body>\n");        // complete html file
                
                fclose(fhtmlp);                                    // close html file

                strcpy(dstring,*szfolder);                            // assemble html file name for side-by-side file
                strncat(dstring,"\\",255);
                strncat(dstring,hrefB,255);

                if((fhtmlp=fopen(dstring,"w")) == NULL)                // open html file
                {
                    printf("Cannot open file ",dstring,"\n");        // if failed, report
                    return;
                }

                fprintf(fhtmlp,"%s%s%s%s%s%d%s%s%s%s%s%s","<html><title>Comparison of ",szdocR,
                    " with ",szdocL,
                    " (Matched Words = ",matchedwordsR[perfection],")</title>\n",
                    "<frameset cols='*,*' frameborder='YES' border='1' framespacing='0'><frame src='",
                    hrefL,
                    "' name='left'>\n<frame src='",
                    hrefR,
                    "' name='right'>\n</frameset><body></body></html>");
                    fclose(fhtmlp);                                    // close html file

            }
        }
    }
    
    fprintf(fmatchhtml,
        "%s",
        "</table></body></html>\n");

    int totalticks;
    totalticks=clock()-startticks;
    double time;
    time=totalticks*(1.0/CLOCKS_PER_SEC);
    CString sztime;
    sztime.Format("Done. Total CPU Time: %.3f seconds",time);
    pstatus->SetWindowText(sztime);

    return;
}

plz help me zeph....

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...