Trumpet comes out as a new building block in the Hadoop ecosystem and unleashes brand new capabilities to HDFS. As it relies mostly on Kafka for the distribution of the events to the clients, it is decoupled from the NameNode and does not impact its operations. It also scales as well as your Kafka cluster.

Below are the steps to let you simply and quickly write your own client applications:

Dependency

Add the dependency to the Trumpet client in your project

        <dependency>
            <groupId>com.verisign.vscc.trumpet</groupId>
            <artifactId>trumpet-client</artifactId>
            <version>${trumpet.version}</version>
        </dependency>

Hint: the versions follow carefully the tags naming convention. Looks at the available tags in the project to get the version.

Use it

Now it's as easy as using a Java Iterable.

String kafkaTopic = ...
String zkConnect = ...
for (Map<String, Object> event : new TrumpetEventStreamer(curatorFramework, kafkaTopic)) {
    ... do something with your event!
}

Delivery guarantee

As Trumpet relies on Kafka, the delivery guarantee of Kafka is inherited for the client side.

TrumpetEventStreamer uses internally a simple consumer and does not store any offset anywhere, the checkpointing process is offloaded to the application writer. But a Kafka consumer group can be used instead, and configured as required to fit you application needs.

Test it

A test environment is available in the trumpet-client:tests jar. To import it inside your maven project, use the following dependency:

        <dependency>
            <groupId>com.verisign.vscc.trumpet</groupId>
            <artifactId>trumpet-client</artifactId>
            <version>${trumpet.version}</version>
            <classifier>tests</classifier>
            <scope>test</scope>
        </dependency>

You can then write test classes extending com.verisign.vscc.hdfs.trumpet.server.IntegrationTest and you'll have a complete running environment, including Zookeeper, Kakfa and mini-hdfs.

Events Format

The events published into Kakfa are JSON dictionary. The rationale behind that is discussed here.

Events taken into account are basically all events modifying the NameNode INodeDirectory in-memory structure. The exhaustive list is given below, and they are regrouped in 6 event types: CREATE, CLOSE, APPEND, RENAME, METADATA, UNLINK:

  • Add -- either CREATE or APPEND
  • Close -- CLOSE
  • Set Replication -- METADATA
  • ConcatDelete -- create a sequence of APPEND, UNLINK and CLOSE, but all with the same txId (see notes)
  • RenameOld -- RENAME
  • Rename -- RENAME
  • Delete -- UNLINK
  • Mkdir -- CREATE
  • SetPermissions -- METADATA
  • SetOwner -- METADATA
  • Times -- METADATA
  • Symlink -- CREATE
  • RemoveXAttr -- METADATA
  • SetXAttr -- METADATA
  • SetAcl -- METADATA

Find more details about the operation decoding and translation in the class org.apache.hadoop.hdfs.server.namenode.InotifyFSEditLogOpTranslator and org.apache.hadoop.hdfs.inotify.Event.

The available attributes in the JSON dict per event types are described below.

  • APPEND
Attribute name Mandatory Description
txId Yes transaction Id
eventType APPEND type of event.
path
  • CLOSE
Attribute name Mandatory Description
txId Yes transaction Id
eventType CLOSE type of event.
path
fileSize
timestamp
  • CREATE
Attribute name Mandatory Description
txId Yes transaction Id
eventType CREATE type of event.
iNodeType Value can be FILE, DIRECTORY or SYMLINK. See org.apache.hadoop.hdfs.inotify.Event.INodeType for more details.
path
ctime
replication
ownerName
groupName
perms
symlinkTarget
overwrite
  • METADATA
Attribute name Mandatory Description
txId Yes transaction Id
eventType METADATA type of event.
path
metadataType
mtime
atime
replication
ownerName
groupName
perms
acls
xAttrs
xAttrsRemoved
  • RENAME
Attribute name Mandatory Description
txId Yes transaction Id
eventType RENAME type of event.
srcPath
dstPath
timestamp
  • UNLINK
Attribute name Mandatory Description
txId Yes transaction Id
eventType UNLINK type of event.
path
timestamp

Example Application

com.verisign.vscc.hdfs.trumpet.client.example.TestApp is a sample application, listening to Trumpet and filtering out _SUCCESS files, indicating the success of a MapReduce application.

It would be really easy to start a distcp job from here to, for instance, replicate the newly created data into a different remote cluster.