Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
640 views
in Technique[技术] by (71.8m points)

elasticsearch data increase & duplicate at each restart

I'm using elasticsearch with angularjs and oracle on windows 7. it's working more & more finer ( thanks to stackoverflower help ). I have a problem with elasticsearch: the number of elements in my document is increasing and i don't know why/how. My oracle table indexed by elasticsearch contain 12010 elements, now i got 84070 elements in elastic document (frequently checked by curl _count): so it duplicate the data 7 times now. I re-indexed the table few days ago but i remove elasticsearch "data" folder before.

data seems to increase each time i restart windows.

Thanks for help.

This is how i install and index my data :


I do this only the first time :


creating index

curl -XPOST 'localhost:9200/donnees'

mapping :

curl -XPUT 'localhost:9200/donnees/specimens/_mapping' -d '{
"specimens" : {
    "_all" : {"enabled" : true},
    "_index" : {"enabled" : true},
    "_id" : {"index": "not_analyzed", "store" : false},
    "properties" : {
        "O_OCCURRENCEID"                                : {"type" : "string",   "store" : "no","index": "not_analyzed"  } ,
            .... 
        "I_INSTITUTIONCODE"                             : {"type" : "string",   "store" : "yes","index": "analyzed" } 
    }
}}'

query oracle and index data :

curl -XPUT 'localhost:9200/_river/donnees_s/_meta' -d '{
 "type" : "jdbc",
 "jdbc" : {
    "index" : "donnees",
    "type" : "specimens",
    "url" : "jdbc:oracle:thin:@localhost:1523:recolnat",
     "user" : "user",
     "password" : "password",
     "sql" : "select * from all_specimens_data"
   }
}'

( is this correct ?? it doesn't work if i replace "curl -XPUT 'localhost:9200/_river/donnees_s/_meta'" by "curl -XPUT 'localhost:9200/donnees/specimens/_meta' which i use to query )

test :

curl -XGET 'http://localhost:9200/donnees/specimens/_count?q=*'
    => 12010
curl -XGET 'http://localhost:9200/donnees/specimens/_search?q=P00009359'
    => return data ok
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Resolved thanks to Konstantin V. Salikhov.

Each time elasticsearch service start it query the database with the sql provided to the _river and get the data ( see me previous "query oracle and index data : "). If the data don't have an "_id" column _river can't determine which records it have already loaded and the data is duplicated each time. To avoid duplicate i edit my "all_specimens_data" table in database ( who is in fact a view to avoid modification o database) and rename "O_OCCURRENCEID" to "_id", "O_OCCURRENCEID" is my primary key UUID.

hope this help other


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...