Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
452 views
in Technique[技术] by (71.8m points)

config - Solr delta-import not working

Full import and deletedPkQuery works , I've traced the database server both the deltaQuery and deletedPkQuery are executed.

ive executed these queries manually many times and they do indeed return row(s), but

It does not fetch any rows. Last thing i did was to output the FILE_ID as id on all the queries. Still doesnt work.

<dataConfig>

<dataSource name="db" type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://localhost:1433;databaseName=norway_operations;responseBuffering=adaptive;selectMethod=cursor" user="noropuser" password="noropuser" autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED"holdability="CLOSE_CURSORS_AT_COMMIT"/>
<dataSource name="bin" type="BinFileDataSource"  basePath="D:OPG_FILESTORE"/>

<document>

    <entity name="file" dataSource="db" pk="id" query="select FILE_ID as id, CATEGORY_ID, CATEGORY_NAME, FILENAME, FILE_MIME_TYPE, PATH, LAST_MODIFIED as last_modified from DOCUMENTS"
            deltaQuery="select FILE_ID as id from DOCUMENTS where LAST_MODIFIED > '${dataimporter.last_index_time}'" 
            deltaImportQuery="select FILE_ID as id, CATEGORY_ID, CATEGORY_NAME, FILENAME, FILE_MIME_TYPE, PATH, LAST_MODIFIED as last_modified from DOCUMENTS where FILE_ID = '${dih.delta.id}'" 
            deletedPkQuery="delete from PK_DELETE_HISTORY output DELETED.PK AS id where PK_NAME = 'FILE_ID'" >

        <field column="id" name="id" />
        <field column="CATEGORY_ID" name="categoryId" />
        <field column="CATEGORY_NAME" name="category" />
        <field column="FILENAME" name="filename" />
        <field column="FILE_MIME_TYPE" name="content_type" />
        <field column="last_modified" name="last_modified" />

        <entity name="tika" processor="TikaEntityProcessor" url="${file.PATH}" parser="org.apache.tika.parser.AutoDetectParser" format="text" dataSource="bin" onError="continue">                
            <field column="text" name="content" />
            <field column="title" name="title"/>
            <field column="subject" name="subject"/>
            <field column="description" name="description"/>
            <field column="comments" name="comments"/>
            <field column="author" name="author"/>
            <field column="keywords" name="keywords"/>
            <field column="url"  name="url"/>
            <field column="content_type" name="content_type" />                
            <field column="links"  name="links" />                
        </entity>            
    </entity>        
</document>

Trace

declare @p1 int
set @p1=180150003
declare @p5 int
set @p5=-1
exec sp_cursoropen @p1 output,N'select FILE_ID as id from DOCUMENTS where LAST_MODIFIED > ''2014-02-06 15:02:40''',16,8193,@p5 output
select @p1, @p5

When i run this manually it returns 1 row

Response:

    <?xml version="1.0" encoding="UTF-8" ?> 
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">31</int> 
  </lst>
- <lst name="initArgs">
- <lst name="defaults">
  <str name="config">db-data-config.xml</str> 
  <int name="rows">0</int> 
  <int name="start">0</int> 
  </lst>
  </lst>
  <str name="command">delta-import</str> 
  <str name="mode">debug</str> 
  <arr name="documents" /> 
  <lst name="verbose-output" /> 
  <str name="status">idle</str> 
  <str name="importResponse" /> 
- <lst name="statusMessages">
  <str name="Total Requests made to DataSource">2</str> 
  <str name="Total Rows Fetched">0</str> 
  <str name="Total Documents Skipped">0</str> 
  <str name="Delta Dump started">2014-02-06 15:32:20</str> 
  <str name="Identifying Delta">2014-02-06 15:32:20</str> 
  <str name="Deltas Obtained">2014-02-06 15:32:20</str> 
  <str name="Building documents">2014-02-06 15:32:20</str> 
  <str name="Total Changed Documents">0</str> 
  <str name="Total Documents Processed">0</str> 
  <str name="Time taken">0:0:0.16</str> 
  </lst>
  <str name="WARNING">This response format is experimental. It is likely to change in the future.</str> 
  </response>
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Things might worth looking for:

1.Timestamp saved in dataimport.properties config file

This happens to me before

Running delta-import (successfully) will update the {dataimporter.last_index_time} in conf/dataimport.properties file. And next time, your query may run based on the new timestamp, which may return zero rows unless you updated the database.

2. dataimporter.delta.id and dataimporter.last_index_time

dataimporter.delta.id should be dih.delta.id

last_index_time remains in the dataimporter namespace. **dataimporter.last_index_time** works at least in solr 4.2.0. dih.last_index_time might works too as it was mentioned in the solr wiki, but I haven't test it

3. Timestamp need to be converted to proper DataTime datatype depends on the DB .

In case of SQL server:

LAST_MODIFIED_DATETIME > convert(datetime,'${dataimporter.last_index_time}')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...