Idempotency for Windows Azure Message Queues

Idempotency is the mathematical term used to describe a system that produces the same result when a formula or procedure is applied numerous times against the same target. In software systems, this translates to an ability to perform an operation more than one time with knowledge that the resulting state of the system will be consistent. Idempotency does not dictate the mechanism by which this consistency is to be achieved, only the fact that it must.

Queues are useful in Windows Azure for delivering work requests to worker roles. It is the primary architectural means by which web roles signal worker roles to begin asynchronous performance of work. When a worker role accepts a message from a queue, the queue hides that message from other workers for 30 seconds to reduce the probability that a message will be operated on by multiple simultaneous workers. This approach does much to greatly reduce the probability that redundant work will be performed by the system, but it does not prevent it!

If a message takes longer to process than is allowed by Windows Azure, then the message is made visible again for other workers to pick up and process. It is therefore possible for more than one worker to be working on the same work at the same time… the original recipient of the message, plus the new worker who picks it up when it becomes visible in the queue again. In addition, the typical pattern for failed or corrupted message receipt in a fault tolerant system is to retry message delivery. This can also lead to redundant work being performed.

The fact that multiple workers may work on the same message makes it essential for us to design our software for use in the cloud with idempotence in mind. An argument that idempotence will only matter once in hundreds of thousands of transactions is still very problematic if your system may be processing millions of transactions, or where the integrity of your data may be mission critical.

There seems to be a lot of blog posts and forum entries on the importance of writing idempotent services, but very little in the way of constructive feedback that I was able to find on how developers should go about achieving the objective of idempotency, and thus the purpose of this blog post.

One suggested technique for achieving idempotency that I read on several blogs and saw being discussed in forums while grokking material on this topic was to avoid the problem altogether. Many people suggested creating a table of message IDs and then forcing the workers verify the state of a message by consulting the table before processing an incoming message. Even one book author of SOA architectures put this idea forward. To my way of thinking, avoidance of idempotency does not make your software idempotent; such schemes are merely a pattern to avoid the problem rather than to design for it. This isn’t necessarily a bad way to go for some software systems, but be aware that pattern itself may contain its own set of flaws because an error could keep the table from being updated, and there is a time windows where the database table itself might hold inaccurate state information thereby allowing the two workers to still execute simultaneously. The old two-phase commit solution starts to raise its ugly head. Since such schemes could have problems, a better question to ask yourself is this… what is the sate of your data will be after the execution of a message received multiple times. Is your data consistent or inconsistent?

For a system to be truly idempotent we must be capable of processing the same message twice and after processing that message we must still be in a consistent state.

Let’s say that we want to update a customer’s address. Our service receives a message from some application with the new street address of our customer. We process the message and the address is changed in our database. If we receive this message again the work will be performed twice. No matter how inefficient or unsavory this may be, the resulting state of the customer’s address will be identical. In other words, our overly-simplified address change operation would be considered idempotent. If two messages for the same customer arrive carrying two separate addresses, the first one would succeed and so would the second one. Again, we would still be idempotent in the sense that our data was consistent; however, we have set ourselves up for a “last-in-wins” model. This is not necessarily a bad thing but we should be aware of it in our design.

Many businesses extend credit to their customers. No reasonable business would extend such credit without placing limits on it. Instead of the customer address example, let imagine that our messages are for new orders from our customers. If such a message were to be processed twice without any concern for idempotency, our customer might receive twice as much product as they ordered, and they may find themselves prematurely exceeding their credit limit on subsequent orders. This would clearly not be idempotent. So how do we get to where we want to go?

If the message contains the invoice number, then we might construct our business and database operations to perform the add operation in such a manner as to ensure that the data is never inserted into the table twice. We could perform the insertion into the invoice table as part of a transaction where the invoice number was not already present in the table. This would result in the insertion of one row into the table for the first receipt, but zero rows into the table on subsequent attempts. In other words our add operation would leave the data in a consistent state no matter how many times we replayed the message.

If the message was to perform an update of an existing invoice, then things get a little more sophisticated, but still very manageable. By using and comparing a timestamp column for equality with the value contained in the message we can perform the update where the row’s timestamp column is equal to the value contained in the message. If the incoming message caries an equivalent timestamp of the data at the time that it was issued to the sender, then this timestamp can be checked against the one in the database as parameter to the WHERE clause

UPDATE Invoice Set Amount=@Amount WHERE InvoiceNumber=12345 and tstamp = @tstamp

If the data has not been updated by another worker since it was issued, then the update operation will modify the matching invoice number row, but if the message is duplicate, then the tstamp column will have a new value which will result in zero rows being updated (as no rows will satisfy the timestamp equality constraint). We can now process an infinite number of updates while remaining idempotent. Of course this technique would be a better approach for the simpler address change example that I provided above.

Clearly there is much more that can be said in this space, but that is all I have time for in this blog post. Look for additional advice and commentary in future posts.