Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
582 views
in Technique[技术] by (71.8m points)

hadoop - HDFS file watcher

Can I have file watcher on HDFS?

Scenario: The files are landing on HDFS continuously.I want to start a Spark Job once the number of files reached a threshold(it can be number of files or size of the files).

Is it possible to implement file watcher on HDFS to achieve this . If yes, then can anyone suggest the way to do it?What are the different options available? Can the Zookeeper or the Oozie do it?

Any help will be appreciated.Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Hadoop 2.6 introduced DFSInotifyEventInputStream that you can use for this. You can get an instance of it from HdfsAdmin and then just call .take() or .poll() to get all the events. Event types include delete, append and create which should cover what you're looking for.

Here's a basic example. Make sure you run it as the hdfs user as the admin interface requires HDFS root.

public static void main( String[] args ) throws IOException, InterruptedException, MissingEventsException
{
    HdfsAdmin admin = new HdfsAdmin( URI.create( args[0] ), new Configuration() );
    DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
    while( true ) {
        EventBatch events = eventStream.take();
        for( Event event : events.getEvents() ) {
            System.out.println( "event type = " + event.getEventType() );
            switch( event.getEventType() ) {
                case CREATE:
                    CreateEvent createEvent = (CreateEvent) event;
                    System.out.println( "  path = " + createEvent.getPath() );
                    break;
                default:
                    break;
            }
        }
    }
}

Here's a blog post that covers it in more detail:

http://johnjianfang.blogspot.com/2015/03/hdfs-6634-inotify-in-hdfs.html?m=1


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...