Distinct ID Limits
In order to maintain fast queries and catch implementation mistakes, we set a limit on the number of events sent to a particular distinct_id
in a given time window. In this document, we explain how this limit works and what to do when you hit it.
What is a hot shard?
There are cases when an incorrect implementation results in a disproportionately high number of events sent to Mixpanel for the same distinct_id
. This leads to an imbalance when storing events across distinct_ids, where one distinct_id's events grows larger than the rest, impacting storage and query systems which in-turn results in high query latencies for the end user.
Since we distribute events across shards, this imbalance is called a hot shard.
How does hot shard detection work?
The hot shard detection step runs in the ingestion pipeline. A counter of events is maintained for each distinct_id
and event_date
combination. The counter is best-effort as a result of the underlying systems used to maintain such a large keyspace.
Once a pre-defined threshold is crossed(currently set to 100K events), the distinct_id
is marked as contributing to a hot shard and all subsequent events for this distinct_id
and event_date
are updated to even the load across shards. Historical events prior to the hotshard detection for the same distinct_id
are not updated.
What happens when we detect a hot shard?
Once a given entry crosses the threshold, all subsequent matching events (same distinct_id
and event_date
) will have the following transformations applied to them:
event
will be changed to$hotshard_events
. The original event name will be preserved under a property calledmp_original_event_name
(display name isHotshard Original Event Name
). Changing the name removes the bad events from being selected for analysis yet remain accessible for debugging.distinct_id
is changed to""
1. The original value will be preserved under a property calledmp_original_distinct_id
(display name isHotshard Original Distinct ID
). Removing the distinct_id allows Mixpanel backend to distribute these events evenly across shards ensuring that performance is not adversely affected while keeping the data accessible for debugging.
Original Event -
{
"event": "Signed up",
"properties": {
"time": 1618716477000,
"distinct_id": "hotshard@mixpanel.com",
"$insert_id": "36a92782-cd7d-41a0-93af-8c23ec6c4191",
}
}
Updated Event -
{
"event": "$hotshard_events",
"properties": {
"time": 1618716477000,
"distinct_id": "",
"mp_original_event_name": "Signed up",
"mp_original_distinct_id": "hotshard@mixpanel.com",
"$insert_id": "36a92782-cd7d-41a0-93af-8c23ec6c4191",
}
}
These events can be queried from the dashboard just like any other events. A weekly report is sent to project owners if a new hot shard was detected and remediated in the past 7 days.
Recovering from a hot shard
Recovery is a multi-step process -
- Break down
$hotshard_events
bymp_original_distinct_id
andmp_original_event_name
to spot what’s causing the issue. - Use the above to locate the code that is the root cause and update it to stop the ongoing issue.
- This should fix it going forward. To fix historical data, the recommendation is to export, transform, and re-import the data.
Footnotes
-
Due to a side-effect on how events are serialized, some remediated entries were initially saved with a numeric distinct_id (instead of ""). This value can safely be ignored. ↩