Azure Event Hubs Primer

Event Hubs are part of the Azure Service Bus service and are designed for extremely high speed ingestion of data such as in an IoT environment. At full scale, they can process 1 million events per second and are the basis for Azure’s Application Insights service. Event Hubs implement a stream model that can be highly scaled with extremely low latency. Event Hubs are the “front door” for an event processing system, and provide little logic beyond what is required for managing scalability. Event Hubs are only receiving endpoints for ingestion of data and provide no mechanism for sending data back to publishers.

Event Hubs are defined within an Azure Service Bus namespace (i.e. scusabus.servicebus.windows.net). Event Hubs are defined with a namespace level unique name. Event Hub events are placed into separate lists for scalability called Partitions. You define the number of partitions for your Event Hub at creation time based on your maximum expected scalability needs with a value between 2 and 32 (4 is the default). You also specify the message retention policy in days.

Note: Event Hubs are intended as in ingestion point for messages, not for storage. As a result the maximum number of days messages will be retained is 7.

Partitions

To manage the scale of messages being sent into the Event Hub, the system creates separate ordered message streams called partitions. Each partition contains the message along with metadata in an ordered list. Messages are sent with a Partition key to identify the partition that the message is targeting. If no partition key is sent, a round-robin distribution is applied.

While a publisher can send a message to a specific partition, that’s generally not a good idea because it sidesteps the Event Hub scaling logic.

Shared Access Policies

Access to the Event Hub is managed by creating policies. Policies are connections with shared access keys that the client uses to call the Event Hub. Each policy allows you to set Send, Listen, and Manage permissions. You can regenerate the shared access key for any policy.

It’s a Best Practice to separate Publishers and Consumers via policy as shown above.

Publishers

A publisher is any application that sends messages into an Event Hub. This can be anything from IoT devices to other services. Publishers can publish one or a group of messages with a total size limit of 256K total.

Protocols

Event Hubs uses two protocols for submission.

AMQP – Advanced Message Queuing Protocol is an open standard used by lots of different messaging systems including RabbitMQ and many financial institutions. It’s optimized for high speed low latency communications. This allows Event Hubs to also be consumed by other downstream systems like Apache Storm. AQMP is designed for long lived connections.
HTTPS – Good old fashioned secured HTTP. Used for short lived connections (Fire-and-forget).

The format of the data is entirely up to the publisher and the Event Hub does not interact with the data in any way. However, the EventData stored in an Event Hub partition has the following format.

EventData

Offset
Sequence N umber
Body
User Properties
System Properties

Consumers

Any process can consume messages from the Event Hub. Consumption of events is done only via the AMQP protocol which allows the client to receive messages without polling. All event consumers access event data via a Consumer Group which manages the stream offset position for events. Multiple consumer groups can be created on the same event hub making consumer groups equate roughly to a subscription. However multiple applications or multiple instances can use the same consumer group for further scalability.

Consumer Groups

Consumer groups maintain a view of the state for an entire event hub across all partitions. All event data is accessed through Consumer Groups. There is a default consumer group for all event hubs called $Default and you can specify up to 20 Consumer Groups. In a Direct Consumer model, the process can elect to receive messages on a specific partition and must be responsible for maintaining the current offset itself. Only 5 consumers can be connected to a single partition and connections are not disposed immediately so care must be taken when using this model.

Checkpointing

Checkpoints are similar to client side cursors that allow clients to store a partition/offset for failover processing at a later time.

EventProcessorHost

In order to make event consumption easier, Microsoft has supplied a type-safe multi-threaded class called EventProcessorHost. This class is used to manage the execution of multiple readers spread across all partitions in an Event Hub. Using this class, a developer implements a concrete instance of IEventProcessor and then the EventProcessorHost class will create an instance of this class for every partition and then start sending event data. IEventProcessor is a simple interface with three methods.


public interface IEventProcessor

{

Task CloseAsync(PartitionContext context, CloseReason reason);

Task OpenAsync(PartitionContext context);

Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages);

}

EventProcessorHost also works across multiple processes allowing for event processors to be scaled out. As the new event processors come online, the partition leases are spread across the available processors to spread out the load as shown below.

Conclusion

Event Hubs are a key component for large scale Azure implementations. While not a service solution in itself, if you are creating a cloud based system for data collection, telemetry, or IoT this is an incredibly useful service.