Wednesday, September 2, 2009

How GDB helped me fix a Drizzle Bug


The other day I found a nice surprise on my inbox. Jay Pipes asked me if I'd like to try fixing a small bug on Drizzle. It looked pretty simple, and the bug report included a big part of the fix. I accepted without a doubt.

I decided to first change trans_prealloc_size from uint32_t to uint64_t. That was done on drizzled/session.h.
Then, I went to drizzle/set_var.cc and changed sys_trans_prealloc_size from sys_var_session_uint32_t to sys_var_session_uint64_t (and removed the two extra parameters).

At first, that looked like it was everything I needed to do. I compiled drizzle, executed the queries that were included on the bug report, and that almost worked!

set session transaction_prealloc_size=1024*1024*1024*4;
set session transaction_prealloc_size=1024*1024*1024*5;
and
set session transaction_prealloc_size=1024*1024*1024*6; were all being truncated to "4294966272"

The pending problem was that I no longer received a warning telling me that those values had been truncated.

I looked again at the code, used gdb to step through the code, but I just couldn't tell what the problem was.
I was then forced to do what I do when I'm stuck on a computer problem, I had to take a break :).

... And that was a great idea, because while away from the computer, I thought of looking at sys_var_session_uint64_t::update, and compare it to sys_var_session_uint32_t::update.

It turned out that sys_var_session_uint64_t::update was missing a call to throw_bounds_warning(), no wonder I wasn't getting the truncation warnings.

It was time to enable the test suite that targeted this bug, run this particular test, commit, push .... and I suddenly thought that I should also run the regular tests, you know, make test. And I was glad I did. There were about 7 test cases that failed.

The firs thought was that these changes I just made uncovered some mysterious bugs on Drizzle. But after looking closely at the test cases, I noticed that there was something else going on.

If I executed this query: set group_concat_max_len = 4;, the session variable group_concat_max_len ended up with a value of 4294967295.

It was time for some serious GDB'ing. There are a few articles here, here and here that explain how to use gdb and Drizzle, but I start it a little different.

Starting GDB and Drizzle

On the terminal, I run:
[wizard]$ gdb --args /Applications/drizzle/sbin/drizzled --port=3306 --basedir=/Applications/drizzle --datadir=/Applications/drizzle/var

After a few seconds, I am at the (gdb) prompt. This is the time to set breakpoints, which would help me find this little issue.
I tried doing this:
(gdb)break drizzled/set_var.cc:744
but gdb told me: "No source file named drizzled/set_var.cc", I had this same problem trying to use gdb on the MySQL Proxy, and after trying different things, I found the correct way to specify the file name:
(gdb)break set_var.cc:744

*Note: when debugging plugins, you specify the filenames like this:
(gdb)break plugin/logging_gearman/logging_gearman.cc:150

(include the plugin folder)

That one breakpoint was all I needed, I then started drizzle by typing:
(gdb)run
After pressing enter, you see many lines, but the last few are (similar to)

. done
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins.
090827 23:45:16 InnoDB: highest supported file format is Barracuda.
090827 23:45:16 InnoDB Plugin 1.0.3 started; log sequence number 46439
Listening on :::3306
Listening on 0.0.0.0:3306
/Applications/mysql/enterprise/drizzle/sbin/drizzled: ready for connections.
Version: '2009.08.1124' Source distribution (bug-fixes)

Drizzle was ready for my tests. I went to a new terminal and executed

set group_concat_max_len = 4;
The drizzle prompt hanged there (waiting on the breakpoint I set earlier)

This is the code near the breakpoint

bool sys_var_session_uint64_t::update(Session *session, set_var *var)
{
uint64_t tmp= var->save_result.uint64_t_value;

if (tmp > max_system_variables.*offset) // <-- this was line 744
  throw_bounds_warning(session, true, true, getName(), (int64_t) tmp);
  tmp= max_system_variables.*offset;
if (option_limits)
  tmp= fix_unsigned(session, tmp, option_limits);
if (var->type == OPT_GLOBAL)
{
/* Lock is needed to make things safe on 32 bit systems */
pthread_mutex_lock(&LOCK_global_system_variables);
global_system_variables.*offset= (uint64_t) tmp;
pthread_mutex_unlock(&LOCK_global_system_variables);
}
else
session->variables.*offset= (uint64_t) tmp;
return 0;
}


Back on the GDB terminal, I saw:

Breakpoint 1, sys_var_session_uint64_t::update (this=0x4dec60, session=0x1044e00, var=0x10470c8) at set_var.cc:744
744 if (tmp > max_system_variables.*offset)

It was time to look at the value of tmp, so I typed
(gdb) print tmp
$4 = 4

So far, so good, I wanted to see 4 as the result, let's step through the code, and see when tmp gets a much higher value.
(gdb) step
746 tmp= max_system_variables.*offset;

Check the value of tmp


(gdb) print tmp
$5 = 4
(gdb) step
748 if (option_limits)
(gdb) print tmp
$6 = 4294967295

There it was, whatever happened between line 746 and 748 is causing the problem. By now you may have already spotted the issue, but I just didn't see it. I stared at that piece of code for about a minute, and then I realized what the problem was.
The if statement on line 744 was a single line statement, before I added the call to throw_bounds_warning(). Because I am so used to having brackets, even for single line if statement, I forgot to add a pair of {}.

*Note: For completeness, once you are done stepping through the code, you can type
(gdb)cont
and the program will continue to run as normal. On my case, I could then run on the drizzle terminal:


drizzle> show variables like "group_concat_max_len";
+----------------------+------------+
| Variable_name | Value |
+----------------------+------------+
| group_concat_max_len | 4294967295 |
+----------------------+------------+
1 row in set (0.01 sec)


Conclusion.
If you don't have another pair of eyes to look over your code, you could ask gdb for some help :)

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete

Vote on Planet MySQL