A short tale about troubleshooting improbable bug – part deux

Last month I hoped that we have valid workaround for the bug described in A short tale about troubleshooting improbable bug. Unfortunately, it appeared that it is not working for all cases. As the fix for the bug was still missing, I tried to find something more reliable.

Knowing the contents of the stack trace reported during the crash and looking at the query plan I decided to go through all MySQL variables which may affect the behaviour when sorting and using temporary tables.

After short-listing a few parameters I have started to play and finally found one helping to avoid the crash in all known cases:

  • Original queries recorded during previous crashes
  • “Workaround-ed” queries from the newest crash
  • Artificial test case queries

The workaround was to set below before running the query:

SET SESSION sql_buffer_result = ON;

Interestingly, it was doing the opposite I thought I should be doing – I was at the beginning focusing on avoiding the usage of temporary tables, as the 1st meaningful line from the stack trace was:

/path/to/mysql/mysqld(temptable::Handler::position(unsigned char const*)+0x30) [0x25dd620]

But what this setting does is in fact forcing results from the query to be put into temporary tables: https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_sql_buffer_result.

If you remember from the previous post, it was the 2nd query which was leading to the crash. I have observed that the crash happens only when the 1st query is not using temporary tables and the 2nd is. Setting this parameter to ON is forcing the usage of temporary tables for both of them…which helps to avoid the crash! It makes below operations looking the same in query plan:

-> Table scan on <temporary>  (cost=2.50..2.50 rows=0) (actual time=0.000..0.001 rows=2 loops=1)
    -> Temporary table  (cost=2.50..2.50 rows=0) (actual time=0.058..0.058 rows=2 loops=1)

Now, there was a question on how to implement the workaround with minimal impact. The 1st, simplest idea was to do something like:

$dbh->query("SET SESSION sql_buffer_result = ON;");
$dbh->prepare...
$dbh->query("SET SESSION sql_buffer_result = OFF;");

But it would require running 3 commands, instead of 1. Moreover, it would reduce multiplexing ratio in ProxySQL, since session variable was touched. But there was even bigger problem with this approach, trying to do so with ProxySQL resulted in:

[WARNING] Unable to parse unknown SET query from client 10.x.x.x:42148. Setting lock_hostgroup. Please report a bug for future enhancements:SET SESSION sql_buffer_result = ON;

Fortunately, I ended up with working and much better solution, avoiding all pitfalls mentioned above:

SELECT /*+ SET_VAR(sql_buffer_result = ON) */ m.MessageID...

We have already confirmed that it solves (or rather works around) our issue fully, so we can now patiently wait for new MySQL version with the fix.

Leave a comment