RIM Laboratoire 3 Lab 1 Instructions

Lab%201%20instructions

User Manual:

Open the PDF directly: View PDF .
Page Count: 6

A. Objectives
B. Organization
C. Import the project
D. Understanding the Lucene API
E. Using Luke
F. Indexing and Searching the CACM collection

Méthodes d’accès aux données 2018 - 2019





Indexing and Search with Apache

Lucene

Lab Nº 1

A. Objectives

  Lucene 

 Lucene  

 !"

 # $   

 

              !   !  



B. Organization

% &

Report'("

()(!



Deadline'*Moodle

C. Import the project

+",-

+.(""

D. Understanding the Lucene API

/.01+02 ,34

Méthodes d’accès aux données 2018 - 2019

             ‐Lucene (  "

'  5,6                3  

-5&6()  ‐

"(

"   Lucene +    '

'33334747,33"0 

 lucene-6.6.1/docsSearchFiles

) ch.heigvd.iict.dmg.demo "

-  +    "        )      ")  "  

"('

, +stopword8. "

"

& +8. "

"

9 8/"

"8

: +"stopword

82(

E.Using Luke

Luke;"!""

Lucene ! !(

 ;Luke  

 indexDemo

"-

'<"'333+=3)3

544>6?#00@

"

F. Indexing and Searching the CACM

collection

A"  Lucene 

%          

 . 

"!' publication id!

authors 56! title  summary 56

?$@"

"

<) !"(!

"

/.01+02 &34

Méthodes d’accès aux données 2018 - 2019

Indexing

BLucene  

, ;StandardAnalyzer

2. -

 "(author!

titlesummary 

9 <                      

 Lucene  

 

: =publication id "

(

5. Lucene))Lucene

'

'33334747,33333

3  " id! title!

summaryauthor

4 A

  8/ ) Lucene  

5) FieldType6; Luke )

 

 ')TODO student

)""(

Using different Analyzers

Lucene%"

"% 

,  "%'



WhitespaceAnalyzer



EnglishAnalyzer

ShingleAnalyzerWrapper5%&6

ShingleAnalyzerWrapper5%96

StopAnalyzer""

       common_words.txt     

;"

& #) Luke 

"'

   

  

 ,>( 

 % )

 ( 

/.01+02 934

Méthodes d’accès aux données 2018 - 2019

Reading Index

Luke" 

< Lucene

!HighFreqTerms"

""('

, A "8 /"

38

& #,>"(



Searching

B       EnglishAnalyzer  2      

            (    "      

       "     ; QueryParser 

%("Lucene

   !    ( compiler program        

"'

*'

9,CD'2EBB5,&::>:&D6

,:FD'B(B0#5,,FF4F4F6

&4F&'B22#25,,&>&9>46

,,C9';+2+5,>D4D:4F6

,:4F'1"+5>DDF&9:&F6

,DCC'5>DDF&9:&F6

,4:G'AEB0;AEBB125>DD>G4D,6

,&9G'2+25>D&:F&F&6

&D::'*2E25>D&:F&F&6

&D&9'/0#+"5>D&>>,,D6

.' publication id HI'IH title HI5IH Lucene

scoreHI6I

A              "  (    

summary'

, IBI

& IIIBI

9     IBI ! 

III+I

: "II

F  II IBI

5 F6

( ( QueryParser!

,>



/.01+02 :34

Méthodes d’accès aux données 2018 - 2019

Lucene( '

'33334747,3(3333

(33)0J)7

Tuning the Lucene Score

   %



Lucene,       4!           

Okapi BM251. <&Lucene 

0+(*LuceneK

'

'33334747,333333

3+*

Lucene"'

"L(L(!LL!LL!'

'("5'L

√

freq

:(""

5'L

log

(

numDocs

docFreq+1

)

'!(("?L?

5!6

'(

+'L

overlap

maxOverlap

(:(%$

(K"!

K"

'L '

oL0LM56L



oL0" 

")!



* 

1 M&F''33")3")3E)7M&F

2 See https://lucene.apache.org/core/6_6_1/core/org/apache/lucene/index/IndexWriterConfig.html#setSimilarity-

org.apache.lucene.search.similarities.Similarity- and

https://lucene.apache.org/core/6_6_1/core/org/apache/lucene/search/IndexSearcher.html#setSimilarity-

org.apache.lucene.search.similarities.Similarity-

/.01+02 F34

Méthodes d’accès aux données 2018 - 2019

"0+"'

, 2'



org.apache.lucene.search.ClassicSimilarity

& E'

LL5L(6

LL5L(!LL+6

LL5L!LL E6

Note that search time is too late to modify this norm part of scoring.

You need to re-index the documents using your specialized similarity

class that implements computeNorm().

9 ;    "            

'

'

1+log ⁡freq

'5

(

numDocs

docFreq+1

)

',

'

√

overlap

maxOverlap

: * *

*5*6 IndexWriterIndexSearcher ;

EnglishAnalyzer

F 2("ClassicSimilarity

"  "

ClassicSimilarity   "  )3      ,>  

+"



/.01+02 434

RIM Laboratoire 3 Lab 1 Instructions

Lab%201%20instructions

Lab%201%20instructions

Lab%201%20instructions

Navigation menu

Versions of this User Manual:

Views

Navigation