Trumpet comes out as a new building block in the Hadoop ecosystem and unleashes brand new capabilities to HDFS. As it relies mostly on Kafka for the distribution of the events to the clients, it is decoupled from the NameNode and does not impact its operations. It also scales as well as your Kafka cluster.
Below are the steps to let you simply and quickly write your own client applications:
Dependency
Add the dependency to the Trumpet client in your project
<dependency>
<groupId>com.verisign.vscc.trumpet</groupId>
<artifactId>trumpet-client</artifactId>
<version>${trumpet.version}</version>
</dependency>
Hint: the versions follow carefully the tags naming convention. Looks at the available tags in the project to get the version.
Use it
Now it's as easy as using a Java Iterable.
String kafkaTopic = ...
String zkConnect = ...
for (Map<String, Object> event : new TrumpetEventStreamer(curatorFramework, kafkaTopic)) {
... do something with your event!
}
Delivery guarantee
As Trumpet relies on Kafka, the delivery guarantee of Kafka is inherited for the client side.
TrumpetEventStreamer
uses internally a simple consumer
and does not store any offset anywhere, the checkpointing process is offloaded to the application writer.
But a Kafka consumer group can be used instead,
and configured as required to fit you application needs.
Test it
A test environment is available in the trumpet-client:tests
jar. To import it inside your maven project,
use the following dependency:
<dependency>
<groupId>com.verisign.vscc.trumpet</groupId>
<artifactId>trumpet-client</artifactId>
<version>${trumpet.version}</version>
<classifier>tests</classifier>
<scope>test</scope>
</dependency>
You can then write test classes extending com.verisign.vscc.hdfs.trumpet.server.IntegrationTest
and you'll have a
complete running environment, including Zookeeper, Kakfa and
mini-hdfs.
Events Format
The events published into Kakfa are JSON dictionary. The rationale behind that is discussed here.
Events taken into account are basically all events modifying the NameNode
INodeDirectory
in-memory structure. The exhaustive list is given below, and they are regrouped in 6 event types:
CREATE
, CLOSE
, APPEND
, RENAME
, METADATA
, UNLINK
:
- Add -- either
CREATE
orAPPEND
- Close --
CLOSE
- Set Replication --
METADATA
- ConcatDelete -- create a sequence of
APPEND
,UNLINK
andCLOSE
, but all with the sametxId
(see notes) - RenameOld --
RENAME
- Rename --
RENAME
- Delete --
UNLINK
- Mkdir --
CREATE
- SetPermissions --
METADATA
- SetOwner --
METADATA
- Times --
METADATA
- Symlink --
CREATE
- RemoveXAttr --
METADATA
- SetXAttr --
METADATA
- SetAcl --
METADATA
Find more details about the operation decoding and translation in the class org.apache.hadoop.hdfs.server.namenode.InotifyFSEditLogOpTranslator and org.apache.hadoop.hdfs.inotify.Event.
The available attributes in the JSON dict per event types are described below.
APPEND
Attribute name | Mandatory | Description |
---|---|---|
txId | Yes | transaction Id |
eventType | APPEND | type of event. |
path |
CLOSE
Attribute name | Mandatory | Description |
---|---|---|
txId | Yes | transaction Id |
eventType | CLOSE | type of event. |
path | ||
fileSize | ||
timestamp |
CREATE
Attribute name | Mandatory | Description |
---|---|---|
txId | Yes | transaction Id |
eventType | CREATE | type of event. |
iNodeType | Value can be FILE, DIRECTORY or SYMLINK. See org.apache.hadoop.hdfs.inotify.Event.INodeType for more details. | |
path | ||
ctime | ||
replication | ||
ownerName | ||
groupName | ||
perms | ||
symlinkTarget | ||
overwrite |
METADATA
Attribute name | Mandatory | Description |
---|---|---|
txId | Yes | transaction Id |
eventType | METADATA | type of event. |
path | ||
metadataType | ||
mtime | ||
atime | ||
replication | ||
ownerName | ||
groupName | ||
perms | ||
acls | ||
xAttrs | ||
xAttrsRemoved |
RENAME
Attribute name | Mandatory | Description |
---|---|---|
txId | Yes | transaction Id |
eventType | RENAME | type of event. |
srcPath | ||
dstPath | ||
timestamp |
UNLINK
Attribute name | Mandatory | Description |
---|---|---|
txId | Yes | transaction Id |
eventType | UNLINK | type of event. |
path | ||
timestamp |
Example Application
com.verisign.vscc.hdfs.trumpet.client.example.TestApp
is a sample application, listening to Trumpet and filtering out _SUCCESS
files, indicating the success of a MapReduce application.
It would be really easy to start a distcp job from here to, for instance, replicate the newly created data into a different remote cluster.