=========================preview======================
(COMP336)final96F.pdf
Back to COMP336 Login to download
======================================================
COMP336/533InformationRetrieval
FINALEXAMINATION
December19,1996
Timeallowed:2hours
1.[5]Inthevector-spacemodel,isitpossibleforuserstosearchfordocumentswhichDON'T containacertainkeyword.Justifyyouranswer.
Anegativeweightcanbeassignedtothewordthattheuserdoesn'twant.
2.[10]IntheexperimentthatIperformedforIDI,Ihaveshowedthatthenormalizationfactor
p
numberoftermsindocumentisbetterthanthefullnormalizationfactorasde.nedinthe vector-spacemodel.
(i)Explainwhyitisexpensivetocomputethefullnormalizationfactorfromaninverted.le.
Ifonlytheinverted.leisavailable(i.e.,documentvector.leisnotkept),itisnextto impossibletoknowwhatkeywordsdoesadocumentcontaininordertocalculatethevector length.
(ii)ExplainwhyitisNOTalwaysagoodideatototallydiscardthedocumentlengthwhen documentsareranked.
Withfullnormalization,twodocumentscontainingxxyyzzwillhavethesamescore asxyzwithrespecttoaquery,say,xy.Inreality,peoplewillfavordocuments whichmentionthequerykeywordsmoreoftenthanotherdocumentsdo.
Notethat,theanswer\peoplemayfavorlongdocuments"isnotaperfectanswer.Itisnot convincingthatlongdocumentscontainingmanyirrelevantkeywordsarestillmorerelevant thantheshortones.IntheexampleIgive,bothdocumentscontainthesamesetofkeywords, itisjustthatthetermfrequenciesaredi.erent.Thus,itisarguablethattheonecontaining moreoccurrencesofthequerykeywordsismorerelevant.
3.[10]Whatisthede.nitionoffalloutrate.Explainwhyweneedfalloutrate,inadditionto precisionandrecall,toevaluatetheretrievale.ectivenessofaretrievalsystem. numberofnonrelevantitemsretrieved
Fallout.
totalnumberofnonrelevantitemsinthecollection Precisionandrecalldon'ttakeintoaccountofthenumberofnonrelevantdocumentsinthe collection.
4.[5]IntheKMPpatternmatchingalgorithm,thespeedofpatternmatchingdependson thecharactersandtheirarrangementsinthepattern.Givethemostimportantfactorthat in.uencethespeedofthealgorithm.
Repetitionofcharactersinthepattern.
5.[10]Usingthe(improved)KMPmethod,.llinthefollowingshiftarray,#istheend-of-string character.
1
patternno.of
charshifts
a
1
a
2
a
3
a
4
a
5
b
1
#
6 Whatarethevaluesforthenext[]array. 0000051 6.[10]Isitpossibleforaprecision/recallgraphtolooklike(a)or(b).Explainwhy.
precision
1
1
0 1 recall 0 1 recall
(a) (b)
(a)possible or impossible(circle your answer). Justi.cation:
It is impossible for the curve to touch the (1,1) p oi n t. At (1,1), it means both the recall
andprecisionisone,meaningthatallofthedocumentsretrievedarerelevant.However,this contradictswiththefactthatprecisionislessthanzerowhenrecallislow,whichmeansthat someirrelevantdocumentsarealreadyretrieved.
(b)possibleorimpossible(circleyouranswer).Justi.cation:
Itispossible.Thecurvemeansthattherearerelativelyfewrelevantdocumentsatthe beginningandendoftheretrieveddocumentsbutsomerelevantdocumentsareconcentrated inthemiddleoftheretrieveddocuments.
7.[10]ImaginethatyouaretheChiefSystemAnalystforaninformationprovider.Yourboss asksyoutoconsider