Friday, August 28, 2009

Drizzle and the Gearman logging plug-in

Disclaimer:
This blog post is about things I did on my own free time, not endorsed by my employer.

I have been meaning to look at Gearman for a long time, but I just couldn't find any project where I could use it.

Well, that was true until last week a couple of weeks ago, when I started to put together Drizzle, the Gearman logging plug-in, Perl and the Enterprise Monitor.

As I was finishing writing the agent.pl script, I thought that it would be a good idea to split the script in at least two components: one that would just collect the queries, and another component that would do the processing of the log entries (replacing the literals for "?", grouping queries by query text, etc).

It was right there when I realized that this was my chance to look at Gearman! The first thought was to still use the regular query logging plug-in.
But there is already a Gearman logging plugin, and I was curious about how that worked.

A quick Google search returned very little information, but I did find the doxygen docs, and reading the code was fairly straight forward.

By reading the code, I found out that the plug-in registers the function drizzlelog with the Gearman Job server, and that it passes the same string that the query logging plug-in sends to the log file.

Next step was to find a hello world Perl + Gearman example. And I found a sample for the client and the worker. That almost worked out of the box, but I got this error:

Can't call method "syswrite" on an undefined value at /Library/Perl/5.8.8/Gearman/Taskset.pm line 201.

A little google search and I found an example where the port was appended to the host. I then added the port 4730 to worker.pl and client.pl and it all worked as expected.

Once I got the simple example working, I added most of the agent.pl code to the worker.pl script, made a few small changes, and added comments. I was done!

The Gearman logging plugin sends query logs to the job server, and the job server asks the worker to do the actual job.
In the end, the service manager ends up with all the information related to the queries that go to the Drizzle server.

Layout.
For this initial version, one worker cannot handle jobs for more than one drizzle server, this is not a Gearman limitation. When I wrote this script, there was no way to tell the worker, which Drizzle server was sending the log entry.

And that was an excellent excuse to add a few more fields to the Gearman logging plugin. (That patch was already approved and will soon be part of the main Drizzle source.)



worker-1 handles requests for drizzled-1 and worker-2 handles jobs for drizzled-2. I am already looking into ways to change this.

Where is the code?
As usual, I posted the worker.pl script on the MySQL Forge.

How do I start the worker?
Like this:

$ DEBUG=1 perl worker.pl --serveruuid="22222222-5555-5555-5555-222222222211"\
--serverhostuuid="ssh:{11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:21}" \
--serverdisplayname="Main-Drizzle-web2"



How do I start the client?
In this case, the Gearman client is the drizzle plug-in, so, all you need to do is add these lines to your drizzle.cnf

$ cat /etc/drizzle/drizzled.cnf
[drizzled]
logging_gearman_host = 127.0.0.1
logging_gearman_enable = true


Restart the Drizzle server and you are ready to go (well, you also need the MySQL Enterprise Monitor)

Final Note.
I was amazed at how easy it was to have it all working, I will keep looking for other projects where I could use Gearman.

Thursday, August 27, 2009

Drizzle query monitoring

Disclaimer:
This blog post is about things I did on my own free time, not endorsed by my employer.

A little over a month ago, Ronald posted a blog about the different query logging plug-ins that are available for Drizzle. This was pretty exciting news, especially when I saw the details that were included in the logs.

Meanwhile, a few weeks ago, I started looking at the REST API that comes with the MySQL Enterprise Monitor.

The result is that we can now see most of the information returned by the plug-in, on the Dashboard.




How?
A colleague at work, wrote a little Perl script that interacts with the REST API, and I took his work as the foundation for my agent.pl script.

The next problem was to find a way to call this script as soon as there was a new entry on the log. After a little Google search, I went ahead and decided to ask my friend Adriano Di Paulo (who among other things, introduced me to MySQL).
A few minutes later, he showed me a working example of the Tail Perl module.
That was exactly what I needed, as soon as there is a new entry, I call the function assemble_queries() and I pass the new log entry as the parameter.


sub tail_log {
my $file=File::Tail->new(name=>$log_file, maxinterval=>1, reset_tail=>0);
while( defined (my $line=$file->read ) ) {
print "\n" . $line . "\n" if $DEBUG > 3;
assemble_queries( $line );
}
}



The assemble_queries() function is mostly based on what MC blogged about some time ago. On his blog post, he shows how to collect query related data using Dtrace and Perl.

Then, every n number of queries, I use send_json_data() to send the query information to the Dashboard, delete the sent data and it is ready to process more queries.

Now that I'm writing this, I realized that if send_json_data() fails, the information related to the queries are lost :|. (Note to self, fix it).

There are other functions in there, but they are mostly for housekeeping.

How do I use it?
Very simple, get the agent.pl script from the MySQL Forge website, edit the credentials, hosts, and ports to fit your needs (Future versions would include some kind of config file).

And then you call the script like this:

$ DEBUG=1 perl agent.pl --serveruuid="22222222-5555-5555-5555-222222222211" \
--serverhostuuid="ssh:{11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:21}"\
--serverdisplayname="Main-Drizzle-web2" \
--log-file=/path/to/log/file

As soon as the scripts starts, it will add the drizzle server to the service manager, and once you start sending queries to drizzle, those queries will end up on the UI.

Next?
Next is already done :). I modified the agent.pl script to use the gearman logging plugin. I'll write a blog about it very soon.

Thanks for reading and enjoy!


Monday, August 24, 2009

More Drizzle plug-ins

Last weekend, I finally got some time to look around Drizzle. I had already compiled it on my laptop, but hadn't really looked at the code.
Then, I thought that looking over some of the blueprints on Launchpad, would be a good way to get familiar with the code base.
After a quick search, I found move function/time/ functions into plugin(s)

This blueprint is basically to create UDF plug-ins for the different time related functions.
There was no priority assigned and it was on the low hanging fruit milestone. Which was perfect for someone who doesn't really know how much time he could spend, and wants to get to know the code.

The first step was to read a bit about the process to contribute to the Drizzle project, I went to the wiki here and read about the coding standards.

I then, went ahead and saw how difficult easy the code looked like. And proceeded to email the list, asking for feedback and also to tell others what I was up to. This is important, to avoid duplicating the work of others.

Code?
This is where the fun began. I had a fresh branch, and it was time to pick the first function to make into an UDF plugin.
By luck (and you will know why luck), I picked to move unix_timestamp() first.

The Process
There are already some great plugins on the Drizzle branch. I went ahead and duplicated the md5 plugin (in plugin/md5). Renamed the folder unix_timestamp, also renamed the md5.cc to unix_timestamp.cc and edited the plugin.ini file that was on the same folder.

The md5 plugin folder also has a plugin.ac file, but it turned out I didn't need this file, so I just removed it.

It was then time to do the actual code moving. To start, I opened drizzled/function/time/unix_timestamp.cc and drizzled/function/time/unix_timestamp.h
It was pretty much copy and paste from those two files into plugin/unix_timestamp/unix_timestamp.cc

And the rest was to replace md5 for unix_timestamp :)

Notes:
When I first started, I had both, the built-in unix_timestamp() and the plugin version. To make sure the plugin was returning the correct values, I just temporary named the plugin function unix_timestamp2(). And you can do that by just changing code in two lines:

Error messages
Whenever there is an error with your function, the error message will call the plugin function func_name(), the string you return there, will be shown on error messages. One way to force this error is by including either too many, or too few parameters.

const char *func_name() const
{
return "unix_timestamp2";
}

To tell Drizzle the name of your plugin function, you use this line:

Create_function unix_timestampudf(string("unix_timestamp2"));

Most (all?) plugins files will start with lib + <name of the plugin> + _plugin.la. You specify this name using this line:

drizzle_declare_plugin(unix_timestamp)
The rest should be pretty easy to figure out.

Tip
Which I wish I knew before. Something that took me way too long to find out, when you add a new plugin folder, you need to run ./config/autorun.sh and ./configure ... && make && make install. This would make sure your new plugin gets compiled., if you skip autorun.sh, your new plugin will not be compiled.

Final steps
Once I compiled the new plugin, and verified that it all worked well. It was time to delete the built-in function.
1) Went to drizzled/Makefile.am and removed function/time/unix_timestamp.h from there.
2) Removed the files drizzled/function/time/unix_timestamp.cc and drizzled/function/time/unix_timestamp.h
3) Edited drizzled/item/create.cc and removed #include and some other references to the unix_timestamp function.
4) drizzled/sql_parse.cc also had to be edited, to remove #include .
5) Added the new plugin/unix_timestamp/ folder and files to the bzr branch.
6) Run tests (and here I found a new problem)

I'm still working on a fix for it. I'm going with using one error message, for built-in functions, as well as plugins. I hope to be pushing those changes soon.

Oh, why was I lucky to pick the unix_timestamp() function as the first one to tackle, well, I have been working on timestamp_diff for many hours, and it just does not want to work. It somehow does not see the first parameter. I'm pretty sure I'll be asking the Drizzle-discuss for help :)

The end.


select * from information_schema.plugins where plugin_name like '%time%';
+-------------------------+----------------+---------------+---------------+--------------------------------+----------------+
| PLUGIN_NAME | PLUGIN_VERSION | PLUGIN_STATUS | PLUGIN_AUTHOR | PLUGIN_DESCRIPTION | PLUGIN_LICENSE |
+-------------------------+----------------+---------------+---------------+--------------------------------+----------------+
| unix_timestamp_function | 1.0 | ACTIVE | Diego Medina | UDF for getting unix_timestamp | GPL |
+-------------------------+----------------+---------------+---------------+--------------------------------+----------------+
1 row in set (0 sec)

drizzle>
Well, not really the end, I still have plenty of functions to move into plugins.

Thanks!

Monday, August 10, 2009

Vote for me! ... widget for your blog.

Most likely you have seen Giuseppe's post showing the latest feature of Planet MySQL. Voting from RSS readers, was one feature I was really hoping for, since the day voting was announced. As I read most blogs using Google Reader.

Now, I don't remember if it was Dups who asked me, or if I asked him, but all I remember is that I ended up writing a little JavaScript widget, that you can add to your blog. This widget allows readers to vote for your blog on Planet MySQL, all from within your blog.

Why would you want to add this JavaScript to your blog?
Because you want to make it very easy for your readers to vote if they like or dislike what they just read.

Requirements/Limitations.
Yes, there are a few (small?) things that have to be in place for this widget to work.
* Your readers will have to have an account on the MySQL.com website.
* But most important, your blog post has to be already on the Planet MySQL database.
* If you are using Feedburner, and the url on your feeds is not the same as your post's url, this does not work (which is my case :( ). But I'll look for a workaround.

Code.

All you need to do is add these lines of code to your template:


<script language="JavaScript"><!--
var planet = "http://planet.mysql.com/entry/vote/?apivote=1&";
var lk=encodeURIComponent(document.location);
var thumb_up="<img src=\"http://planet.mysql.com/images/thumbs_up_blue.jpg\" border=\"0\" />";
var thumb_down="<img src=\"http://planet.mysql.com/images/thumbs_down_blue.jpg\" border=\"0\" />";
document.write('Vote on the <br /><a href=\"http://planet.mysql.com/\" >Planet MySQL</a><br />');
document.write('<a title=\"Vote me Up on the Planet MySQL\" href=\"' + planet + 'vote=1&url=' + lk + '\">' + thumb_up + '</a>');
document.write('&nbsp;');
document.write('<a title=\"Vote me Down on the Planet MySQL\" href=\"' + planet + 'vote=-1&url=' + lk + '\">' + thumb_down + '</a>');
// --></script>


How do I add it to Blogger?

1- On the left side of this blog, you will see a "Add Voting to your blog" button, click on it.


2- On the "Add page element" section, select the blog you would like to add this widget to.
3- Click "Add widget"


4- You will now see a widget under the name "Vote on Planet MySQL", you can go ahead and leave it there, or move it around.
This widget will appear on every single post you have.
5- Click on save, and you are done!



How do I add it to XYZ?

I'll talk to the Community team, and I'll ask them to have either a wiki page on the forge, or some place else the steps to add a widget like this to other blog platforms.

Enjoy!

Vote on Planet MySQL