JavaScript Compatible Snowflake ID
In my research into Snowflake ID , I encountered a surprising limitation with JavaScript's number type. All was well when generating Snowflake IDs in my Go code, but there was a problem decoding IDs in the JS console. After some investigation, I discovered not only that the JS number type could only store 53 bit ints, but I also discovered that the bit shift operator only works on 32 bit ints!
Background
In 2010, Twitter was planning a migration from MySQL to CassandraDB to accomodate their rapid growth. Given the lack of auto incrementing ids in CassandraDB and the volume of tweets, Twitter engineers needed a new id generation scheme. The requirements for the scheme were that the ids be roughly sortable, can be generated at 10,000s of ids per second, and that the ids fit in 64 bit signed ints [twitter.com].
The result is the Snowflake ID scheme; the scheme generates 63 bit ids (fit in signed 64 bit ints) composed of a timestamp, machine number, and a sequence number. Snowflake ID remains in use at twitter and variations are used by other large companies like Instagram, Discord, and Sony.
The system works as expected in most modern backend runtimes. Engineers later learned that JavaScript only supports 53bit integers and could not be expected to parse the ids correctly:
"Before launch it came to our attention that some programming languages such as JavaScript cannot support numbers with >53bits. This can be easily examined by running a command similar to: (90071992547409921).toString() in your browsers console or by running the following JSON snippet through your JSON parser." [Snowflake ID discussion]
In Twitter APIs a string
"id_str" property was added alongside the int
"id" field for compatibility with JavaScript (Twitter now recommends using the string representation).
It's clear that with a full 64bit id, handling of ids in JavaScript requires extra care.
What if we weren't so limited by Twitter's hyperscale requirements?
Reevaluating ID Requirements
Not every use case has the same requirements described above, and other companies have used adaptations of Snowflake. In my case, my requirements were as follows:
Backend
The application I'm building can afford fewer id generation servers and greatly reduced id throughput, so throughput requirements are a fraction of Twitter's.
In preparation for a multi stage commit process, auto-increment ids were to be phased out. The new multi stage commit process would allow us to commit items to PostgreSQL in bulk. One alternative considered was UUID V4. Their generation was simple enough, but they do not sort naturally, and index less efficiently in PostgreSQL. The best alternative to auto-increment ids was Snowflake ID.
The ids are roughly sortable, enough for our use case, and index nicely in the database. We require rows to be sorted, and with a UUID primary key, any rows created within the same microsecond timestamp would not sort consistently (would happen with bulk insert). The sequence counter in snowflake id takes care of any timestamp ambiguity.
This also allows multiple related entities to be persisted with the predetermined id before it's fully persisted.
JavaScript Clients
A full 63bit Snowflake ID cannot be represented as a JavaScript number. Twitter's solution is not acceptable in our use case as we wanted to use an id format that didn't introduce complexity to id handling on the front end. Using 64bit ints requires the use of BigInt or string hacks to sort ids predictably. JavaScript clients would require custom JSON serialization and deserialization to handle and sort string ints. Some Twitter API consumers have been caught off guard by the munged ids and warnings of the problem are common:
- id and id_str are different:
- ..."JS has a bad habit of wrecking twitter IDs."
- ..." are you definitely using the string version of the media_id"...
- another case of munged ids
- Cannot reply to a tweet
- Error in the documentation about since_id and max_id
- Mismatch id and id_str
- 18 digit Twitter user IDs behaving differently
Many more instances can be found by searching "javascript id" in Twitter's Developer forum. This problem was also (encountered by the grpc web team)[https://github.com/grpc/grpc-web/issues/1229].
I didn't want this to be a possible source of bugs in the future, so I decided to adapt Snowflake ID to fit 53 bits.
Let's see what Snowflake ID gives us:
Modifying Snowflake ID
Snowflake ID Breakdown
Machine ids within 10bits affords us 1023
or (2^10 - 1
) id generating servers.
Sequence counter within 12bits allows up to 4095
or (2^12 - 1
) or ids per ms
per server.
To give a rough idea of throughput with N SnowflakeID servers:
N servers | ID per ms |
---|---|
1 | 4,095 |
3 | 12,285 |
100 | 409,500 |
127 | 520,065 |
1023 | 4,189,185 |
I don't need near this many IDs per millisecond, and can reduce the number of bits used by the sequence. What can we fit in the remaining 12 bits?
53 bit Snowflake ID
The timestamp can't change, lets keep it at 41 bits, for 69 years of timestamps at ms resolution.
Machine ids will be limited to 7bits for a total of 127
or (2^7 - 1
) id servers.
Sequence will be limited to 5bits for up to 31
or 2^5 - 1
or ids per ms
per server.
Here's an idea of id throughput up to the maximum of 127 id servers.
N servers | ID per ms |
---|---|
1 | 31 |
3 | 93 |
100 | 3,100 |
127 | 3,937 |
With this new scheme, we can generate 31 ids per millisecond per server (31,000 id/sec). In a best-case scenario with 127 id servers running, we can produce 3,937 ids per ms.
Considerations
All implementations of Snowflake ID I've reviewed have used busy-waiting to push sequence overflows into the next ms. The unlikely scenario of sequence overflow in Snowflake ID is now a real possibility with the 53bit adaptation. Alternatives to busy-waiting, such as sleeping for the remaining microseconds, may be explored in a sequel to this post.
Under prolonged load, busy-waiting may cause instability from excessive CPU usage. To prevent overloading any one id server, aim for uniform distribution of requests across your id servers. Scale out id servers based on projected load and potential surges in id requests.
Conclusion
For applications in need of an id scheme similar to Snowflake, it's worth re-evaluating requirements to see if a JS number compatible scheme is an option. 53 bit Snowflake IDs come with so little added complexity I would consider it for any roughly sortable data that is created at non-trivial scale. JavaScript's quirks are a common source of frustration for me, but this time it forced me to consider what's most important, and reminded me that sometimes it's easier to just give in to the Web.