SQS vs. Kinesis in Batching IoT Data
Lately, we’ve been working on several IoT projects. Our bread and butter is on the cloud side, so ingesting and managing IoT data in AWS is something we do have opinions about.
AWS offers several services to ingest IoT data. One of the most popular options is AWS IoT Core. This service provides a secure and scalable platform for connecting IoT devices and ingesting data from them. AWS IoT Core supports a variety of protocols, including MQTT, HTTP, and WebSockets, which makes it easy to connect to a wide range of devices.
When ingesting IoT data into AWS using IoT Core, there are several architectural options available. Serverless options provide an easy-to-manage and easy-to-scale setup but as the amount of data increases, it also becomes important to design for cost.
Using rules and Lambdas is the most straightforward solution to process the data. IoT Core receives messages from the IoT devices and uses the rule engine to send them directly to an AWS Lambda for processing, e.g. data validation. Simple and effective, but as the invocation is not batched , it quickly becomes expensive when the number of messages increases.
If there is no need for extremely fast processing, batching can help. Amazon offers multiple services to create batches and enable each Lambda invocation to process multiple messages at once. Two management-free options are Amazon SQS (and SQS FIFO) and Kinesis Data Streams.
We know it always depends. Therefore, first we lay down some design principles that lead our thinking.
a) remember that high scalability equals a highly parallelized system with fewer guarantees with regards to order and deduplication
b) in a highly parallelized system, choosing an appropriate partitioning strategy is key
c) keep in mind that throttling can (will) occur and ensure that all parts of the pipeline can scale with your traffic
Simple is better. If there is no real reason to go with the more complex options, you should choose the simplest options. It is important to understand the true requirements for your use case - otherwise you’ll probably end up choosing more bells and whistles - which is probably going to end up costing more.
Both services, SQS and Kinesis, have their places. SQS is simple and scales nearly “infinitely”. Messages are not ordered except for when using the FIFO solution. Kinesis enables longer data retention and maintains order but this will cost you in development time and usage fees.
As to the more detailed feature comparison of the services, Cloudotonaut.io guys have made great comparison tables:
https://cloudonaut.io/versus/messaging/sqs-standard-vs-kinesis-data-streams/
https://cloudonaut.io/versus/messaging/sqs-standard-vs-sqs-fifo/
Our experience suggests that with Kinesis you’ll encounter more pain on the development side. The On-Demand version of Kinesis will be a lot more expensive than SQS so you’d probably want to use the provisioned version. It is true that provisioned Kinesis with a short data retention period might cost you less in usage fees than SQS, but you will have to be prepared to invest more time playing with the shards and getting frustrated with the insufficient documentation.
Our conclusion is that SQS is simpler and, thus, our baseline option. You should only look at Kinesis if you have specific requirements that justify the amount of hassle and (likely) greater total cost.
As we’ve already made SQS our baseline option, it makes sense to focus on the things that would drive you towards using Kinesis. Based on our experience, those are:
You need a longer data retention period (SQS up to 14 days, Kinesis up to 365 days)
You need to ensure that the order of messages is correct and SQS FIFO is not capable enough (3000 msg/sec)
You need a mechanism to halt processing for a given partition in error scenarios, as this is something that SQS cannot provide
You want to optimize for cost, have the resources to manage and scale Kinesis and understand the (real) scaling demands well.
Happy batching!