This past week we discovered an issue on HSReplay.net that resulted in some cards showing misleading statistics due to our systems incorrectly assuming replaced cards to be part of a player's initial deck. We fixed the underlying issue and filtered out the affected statistics.
On May 31st we became aware of an issue in our Mulligan Guides containing the card Arch-Villain Rafaam on HSReplay.net. We immediately added an annotation in our Mulligan Guides directly next to the affected card to inform our users we were looking into a potential discrepancy. We started an investigation and could confirm an issue pattern that directly affected Rafaam and three other cards. We deployed a fix to our replay processing environment on June 2nd to correct the underlying issue and made a change to HSReplay.net on June 3rd to ensure we were only showing data from after the processing fix was deployed.
Read on for a more comprehensive analysis.
What was affected?
This issue affected our global and personal Mulligan Guides on HSReplay.net for the rows belonging to this card:
The following cards were also affected, but none was played in a deck eligible for a Mulligan Guide at the time of writing:
- Prince Liam
- Lillian Voss - Wild
- Explore Un'Goro - Wild
- Renounce Darkness - Wild
- Elise Starseeker (via Golden Monkey) - Wild
The issue manifested itself primarily through incorrect values in any of the columns belonging to the affected cards. It was caused by us incorrectly inferring transformed cards as part of a player's initial deck, which is a player's deck before the start of the game.
For instance, if a user played Arch-Villain Rafaam and drew a legendary card we incorrectly inferred that legendary card to have been part of the user's deck in their collection. At the end of the game we would then treat the deck as a different deck, thus effectively excluding the game's data from the Mulligan Guide. This could lead to skewed data by exclusion, especially if a card was played in a certain type of situation like Arch-Villain Rafaam in a losing game.
To a much lesser extent it was also possible for these cards to skew the rest of the Mulligan Guide they appeared in, as games in a certain type of situation would occasionally not be included. We compared data from Arch-Villain Rafaam's top decks from the day before fix to the day after the fix and found that he was significantly overvalued in our Mulligan Guides.
Finally, the overall statistics for cards that were randomly created by the above cards could have been slightly skewed on the Cards page. As the random pool of transformed cards is very large and it is therefore very unlikely for any card to be randomly created a significant amount of times, we were unable to detect any skew in even the least played legendary cards and spells.
Where did this issue occur?
As part of assembling the statistics on our site we perform the processing of game replays. A key step of processing replays is extracting both players' initial deck lists, which are their decks just before the game starts, so after Whizbang the Wonderful and Zayle, Shadow Cloak but before Start of Game keywords. We use this together with data from the game itself to power our Mulligan Guides on HSReplay.net.
When extracting a list of played cards from the game it is important to differentiate between cards that are present at the beginning of the game from cards that were created later. Due to the internals of Hearthstone we luckily know exactly when a certain card was added to the deck and when it is drawn or shuffled back into the deck. This allows us to do things like reconstructing which cards an opponent held at a certain point in the game or which cards were replaced during Mulligan, which are invaluable features for players reviewing their games through a replay.
Cards that were created during a game have always been excluded from initial decks by looking at when they were created.
How could this issue occur?
In the past there were very few effects that truly replaced cards in decks or in hand. Cards like Azalina Soulthief work by actually moving your current hand out of the game and granting you new cards. This is useful as we can clearly detect these cards and never assume Crowd Roaster was in a Bomb Warrior's initial deck just because they played Azalina against a Dragon Warrior. On the other hand, if you've used Baleful Banker to shuffle a copy of your Omega Devastator into your deck and you draw one we know exactly whether you drew the copy or a second one from your initial deck.
There is one internal effect that breaks this: When your Unidentified Contract becomes identified, your Swift Messenger swaps its attack and Health in your hand or Chameleos transforms, they actually become different cards. These transformations are completely hidden from the opponent. In most of these cases it is trivial to infer the original: The "identified" items will have always been an unidentified item in your deck list, the Worgens will always their base version, and for cards like Shifter Zerus or Chameleos we actually see the card morph when it is played. We've had logic in place for a long time to handle these well-known transformations.
Unfortunately, this does not apply to hidden transformations that are completely random. We have no way of telling which card a random legendary from Arch-Villain Rafaam was before it got transformed. In a sense, these transformations are destructive as we lose the ability to infer the original cards as appeared in the initial deck.
The Root Cause
Our existing processing infrastructure tries really hard to infer the initial card for the cases above and not forget about them, even if the card is transformed again. This is important for cases like a Lucrative Contract transforming into Power Word: Shield through Lillian Voss. This caused the following behavior:
If we saw a player draw Millhouse Manastorm after playing Arch-Villain Rafaam and we knew the card was in the player's deck since the start of the game, we assumed they entered the game with Millhouse in their deck, even if it was transformed from another card by Arch-Villain Rafaam. That is obviously not correct and our systems failed to attach the data from these game to the correct deck.
How was this issue fixed?
We updated our replay processing infrastructure to ensure that cards from the initial deck do not have the "Created by [a card]" note set. For example, when you play a legendary transformed by Arch-Villain Rafaam the note "Created by Arch-Thief Rafaam" is shown even though the card has actually been in your deck from the start of the game. If a card has this note set and we have never seen it before during the game we exclude it from the initial deck.
As this fix would take 30 days to fully correct the affected rows in our Mulligan Guides we have updated them to filter out data from before the fix was applied. This will result in some sparse data for the affected cards in the next few days, but at our current game volume the data should become representative again very soon.
Thanks to everyone who sent us reports about this. If you believe you've found an issue on HSReplay.net please email us at email@example.com.
We appreciate your continued support for HSReplay.net and we'll keep working to provide the largest and most accurate set of Hearthstone data to our users.
Why don't you just always use the deck from the deck tracker?
Nearly all replays are uploaded through one of our deck trackers, which generally have a full deck list available at least for the friendly player. It might seem like a good idea to use the deck lists as reported by the deck tracker.
However, this is not sufficient. There are many of cases where the decks shown by the deck tracker are not correct. Whether it is users who haven't set any deck, are overriding or forgot to unset their deck, are reconnecting or spectating a game with a wrong deck, or are using an old version of their tracker: Any of these may cause data quality issues that result in incomplete or outright wrong deck lists. Therefore we ensure the correctness of a submitted deck by cross-referencing it with the cards that were played in the game itself. If a submitted deck looks good we enhance the initial deck by adding all the cards we missed.
We also require as much detail as we can about the initial deck of the opposing player, which deck trackers will never have. This is critical for the correct detection of an opposing player's archetype.
Why don't you just always use the "Created by [a card]" notice to detect cards that weren't present in the initial deck?
As mentioned above, created cards will usually show a "Created by [a card]" message below themselves when it is played. In the past this message has been notoriously unreliable missing in cases where a card was obviously created. This alone also doesn't handle transforming cards like Shifter Zerus and Chameleos. It appears that, at least in the past, the correct behavior of this message was not enforced by Hearthstone itself but rather set or overridden manually by the developers. It is therefore unreliable to rely on this message alone and we are now using it only as an additional signal.
Why is Archivist Elysiana not affected?
While Archivist Elysiana also replaces your deck it does so differently: Elysiana first removes all cards from your deck, then for each of the choices creates the three choice cards outside of the game, and after you pick a card it moves the chosen card into your deck together with a second copy of it. As these new cards making up your deck are created and are not present at the start of the game they are never assumed to have been part of the initial deck.
This is similar to Azalina Soulthief which removes your hand and creates an entirely new hand based on the opponent's cards.
What about the golden adventure cards?
Due to the nature of this bug, it affected all cards where a random effect transforms cards out of sight, usually in the deck. This means the following adventure-only cards also suffered from the same issue:
- Golden Kobold (a Dungeon Run Treasure)
- Golden Candle (a Dalaran Heist Treasure)
We never show statistics for these cards on HSReplay.net as they are non-collectible.