Sami Fayoumi

JavaScript Compatible Snowflake ID

2022-10-25

In my research into Snowflake ID , I encountered a surprising limitation with JavaScript's number type. All was well when generating Snowflake IDs in my Go code, but there was a problem decoding IDs in the JS console. After some investigation, I discovered not only that the JS number type could only store 53 bit ints, but I also discovered that the bit shift operator only works on 32 bit ints!

Background

In 2010, Twitter was planning a migration from MySQL to CassandraDB to accomodate their rapid growth. Given the lack of auto incrementing ids in CassandraDB and the volume of tweets, Twitter engineers needed a new id generation scheme. The requirements for the scheme were that the ids be roughly sortable, can be generated at 10,000s of ids per second, and that the ids fit in 64 bit signed ints [twitter.com].

The result is the Snowflake ID scheme; the scheme generates 63 bit ids (fit in signed 64 bit ints) composed of a timestamp, machine number, and a sequence number. Snowflake ID remains in use at twitter and variations are used by other large companies like Instagram, Discord, and Sony.

Timestamp (41 bits)63bits totalMachine ID (10bits)Sequence (12bits)

The system works as expected in most modern backend runtimes. Engineers later learned that JavaScript only supports 53bit integers and could not be expected to parse the ids correctly:

"Before launch it came to our attention that some programming languages such as JavaScript cannot support numbers with >53bits. This can be easily examined by running a command similar to: (90071992547409921).toString() in your browsers console or by running the following JSON snippet through your JSON parser." [Snowflake ID discussion]

In Twitter APIs a string "id_str" property was added alongside the int "id" field for compatibility with JavaScript (Twitter now recommends using the string representation). It's clear that with a full 64bit id, handling of ids in JavaScript requires extra care. What if we weren't so limited by Twitter's hyperscale requirements?

Reevaluating ID Requirements

Not every use case has the same requirements described above, and other companies have used adaptations of Snowflake. In my case, my requirements were as follows:

Backend

The application I'm building can afford fewer id generation servers and greatly reduced id throughput, so throughput requirements are a fraction of Twitter's.

In preparation for a multi stage commit process, auto-increment ids were to be phased out. The new multi stage commit process would allow us to commit items to PostgreSQL in bulk. One alternative considered was UUID V4. Their generation was simple enough, but they do not sort naturally, and index less efficiently in PostgreSQL. The best alternative to auto-increment ids was Snowflake ID.

The ids are roughly sortable, enough for our use case, and index nicely in the database. We require rows to be sorted, and with a UUID primary key, any rows created within the same microsecond timestamp would not sort consistently (would happen with bulk insert). The sequence counter in snowflake id takes care of any timestamp ambiguity.

This also allows multiple related entities to be persisted with the predetermined id before it's fully persisted.

JavaScript Clients

A full 63bit Snowflake ID cannot be represented as a JavaScript number. Twitter's solution is not acceptable in our use case as we wanted to use an id format that didn't introduce complexity to id handling on the front end. Using 64bit ints requires the use of BigInt or string hacks to sort ids predictably. JavaScript clients would require custom JSON serialization and deserialization to handle and sort string ints. Some Twitter API consumers have been caught off guard by the munged ids and warnings of the problem are common:

Many more instances can be found by searching "javascript id" in Twitter's Developer forum. This problem was also (encountered by the grpc web team)[https://github.com/grpc/grpc-web/issues/1229].

I didn't want this to be a possible source of bugs in the future, so I decided to adapt Snowflake ID to fit 53 bits.

Let's see what Snowflake ID gives us:

Modifying Snowflake ID

Snowflake ID Breakdown

Machine ids within 10bits affords us 1023 or (2^10 - 1) id generating servers. Sequence counter within 12bits allows up to 4095 or (2^12 - 1) or ids per ms per server. To give a rough idea of throughput with N SnowflakeID servers:

N servers ID per ms
1 4,095
3 12,285
100 409,500
127 520,065
1023 4,189,185

I don't need near this many IDs per millisecond, and can reduce the number of bits used by the sequence. What can we fit in the remaining 12 bits?

53 bit Snowflake ID

The timestamp can't change, lets keep it at 41 bits, for 69 years of timestamps at ms resolution. Machine ids will be limited to 7bits for a total of 127 or (2^7 - 1) id servers. Sequence will be limited to 5bits for up to 31 or 2^5 - 1 or ids per ms per server.

Timestamp (41 bits)53bits totalMachine ID (7bits)Sequence (5bits)

Here's an idea of id throughput up to the maximum of 127 id servers.

N servers ID per ms
1 31
3 93
100 3,100
127 3,937

With this new scheme, we can generate 31 ids per millisecond per server (31,000 id/sec). In a best-case scenario with 127 id servers running, we can produce 3,937 ids per ms.

Considerations

All implementations of Snowflake ID I've reviewed have used busy-waiting to push sequence overflows into the next ms. The unlikely scenario of sequence overflow in Snowflake ID is now a real possibility with the 53bit adaptation. Alternatives to busy-waiting, such as sleeping for the remaining microseconds, may be explored in a sequel to this post.

Under prolonged load, busy-waiting may cause instability from excessive CPU usage. To prevent overloading any one id server, aim for uniform distribution of requests across your id servers. Scale out id servers based on projected load and potential surges in id requests.

Conclusion

For applications in need of an id scheme similar to Snowflake, it's worth re-evaluating requirements to see if a JS number compatible scheme is an option. 53 bit Snowflake IDs come with so little added complexity I would consider it for any roughly sortable data that is created at non-trivial scale. JavaScript's quirks are a common source of frustration for me, but this time it forced me to consider what's most important, and reminded me that sometimes it's easier to just give in to the Web.

Links and References