Difference between revisions of "Driver-side DB Query Profiling"

From alfrescowiki

Jump to: navigation, search
m (updating patch links for the patches that now apply against 2.2.6 (latest release as of Dec 20th, 2013).)
m (updating patch links for the patches that now apply against 2.2.8 (latest release as of June 11th, 2014).)
Line 18: Line 18:
  
  
Pending the fix to upload .patch files on this wiki directly, the patch is hosted on github gist and available here : [https://gist.github.com/romainguinot/8054709/raw/139c270738afefff94820aa0d8a0fe9aef07b05a/add-p6spy-parser-2.2.6.patch add-p6spy-parser-2.2.6.patch] .  
+
The patch is hosted on github gist and available here : [https://gist.github.com/romainguinot/edf6a92e97837712800d/raw/b8196b77e46f9e8d316dd51b69d0d13ac54678c1/add-p6spy-parser-2.2.8.patch add-p6spy-parser-2.2.8.patch] .  
  
  
Therefore to be able to parse and digest a p6spy query log using pt-query-digest after having [http://www.percona.com/downloads/percona-toolkit/2.2.6/ downloaded] and uncompressed the toolkit in e.g. ''/opt/percona/toolkit/2.2.6' , you'll need to do e.g. :  
+
Therefore to be able to parse and digest a p6spy query log using pt-query-digest after having [http://www.percona.com/downloads/percona-toolkit/2.2.8/ downloaded] and uncompressed the toolkit in e.g. ''/opt/percona/toolkit/2.2.8' , you'll need to do e.g. :  
  
 
<pre>
 
<pre>
cd /opt/percona/toolkit/2.2.6
+
cd /opt/percona/toolkit/2.2.8
wget https://gist.github.com/romainguinot/8054709/raw/139c270738afefff94820aa0d8a0fe9aef07b05a/add-p6spy-parser-2.2.6.patch
+
wget https://gist.github.com/romainguinot/8054709/raw/139c270738afefff94820aa0d8a0fe9aef07b05a/add-p6spy-parser-2.2.8.patch
patch -p0 < add-p6spy-parser-2.2.6.patch
+
patch -p0 < add-p6spy-parser-2.2.8.patch
 
</pre>
 
</pre>
  
The SHA1 digest for this patch is : ed7bb7f4a076386bb657818946ac9eb2b0d93d1d.
+
The SHA1 digest for this patch is : 4636bfc9c85c8ed4d764c196e37073f2ff5fcf2a.
  
 
Use the SHA1 digest (e.g. with ''sha1sum'' or ''openssl sha1'') to verify the integrity the patch before applying it.
 
Use the SHA1 digest (e.g. with ''sha1sum'' or ''openssl sha1'') to verify the integrity the patch before applying it.
  
As of this writing/update (Dec 20th, 2013), the patch applies against the latest published version of Percona Toolkit : 2.2.6.
+
As of this writing/update, the patch applies against the latest published version of Percona Toolkit : 2.2.8.
  
  
Note : if you want to work against the toolkit full source (with the perl unit tests), the patch to apply against a source revision [https://launchpad.net/percona-toolkit on launchpad] ([http://bazaar.launchpad.net/~percona-toolkit-dev/percona-toolkit/2.2/revision/593 2.2.6 release revision]) is also available here : [https://gist.github.com/romainguinot/8054765/raw/6f21606a1076ac3d8a79c5ca5d80702c3f0ff5e0/add-p6spy-parser-source-2.2.6.patch add-p6spy-parser-source-2.2.6.patch].
+
Note : if you want to work against the toolkit full source (with the perl unit tests), the patch to apply against a source revision [https://launchpad.net/percona-toolkit on launchpad] ([http://bazaar.launchpad.net/~percona-toolkit-dev/percona-toolkit/2.2/revision/605 2.2.8 release revision]) is also available here : [https://gist.github.com/romainguinot/431e1bb820ba821d1f64/raw/d74bd7c33615dc46f91be4d6fa0bfaab26bc2eef/add-p6spy-parser-source-2.2.8.patch add-p6spy-parser-source-2.2.8.patch].
  
(SHA1SUM : 4963eb26f52a71ab3f73121479202f68e346a2d4). The latter is however not required to simply use the patched tool.
+
(SHA1SUM : 548eecfa0a84b9a52cdc719075684829aca48f3b). The latter is however not required to simply use the patched tool.
  
  

Revision as of 16:53, 11 June 2014

Overview

Maatkit (http://www.maatkit.org/) is a great command-line toolsuite, initially for MySQL.

Note that as of Jan 10,2012, Maatkit has now been renamed Percona Toolkit (http://www.percona.com/software/percona-toolkit/)


One of the tools of the toolsuite, pt-query-digest, allows to process and profile queries that run through the server, either by processing logs (general log, slow query log, etc...), or by processing wire protocol gathered by tcpdump.

A nice writeup of this tool is available here : http://www.mysqlperformanceblog.com/2009/07/01/gathering-queries-from-a-server-with-maatkit-and-tcpdump/ and is a recommmended read before proceeding.

P6Spy (sourceforge.net/projects/p6spy) is an open source driver that works as a "proxy" driver, which allows to log every query that passes through it, while delegating query execution to "real" drivers, the ones that would actually be used to communicate with the DB. Even though it is quite old, it does the job.

Some time ago, i wrote a pt-query-digest parser for p6spy (http://code.google.com/p/maatkit/issues/detail?id=1269). It hasn't yet been merged in HEAD, but this is currently item #3 on the project roadmap (as of July '11 : http://code.google.com/p/maatkit/wiki/ProjectRoadmap ). Feel free to vote for it if you think this is useful.


The patch is hosted on github gist and available here : add-p6spy-parser-2.2.8.patch .


Therefore to be able to parse and digest a p6spy query log using pt-query-digest after having downloaded and uncompressed the toolkit in e.g. /opt/percona/toolkit/2.2.8' , you'll need to do e.g. :

cd /opt/percona/toolkit/2.2.8
wget https://gist.github.com/romainguinot/8054709/raw/139c270738afefff94820aa0d8a0fe9aef07b05a/add-p6spy-parser-2.2.8.patch
patch -p0 < add-p6spy-parser-2.2.8.patch

The SHA1 digest for this patch is : 4636bfc9c85c8ed4d764c196e37073f2ff5fcf2a.

Use the SHA1 digest (e.g. with sha1sum or openssl sha1) to verify the integrity the patch before applying it.

As of this writing/update, the patch applies against the latest published version of Percona Toolkit : 2.2.8.


Note : if you want to work against the toolkit full source (with the perl unit tests), the patch to apply against a source revision on launchpad (2.2.8 release revision) is also available here : add-p6spy-parser-source-2.2.8.patch.

(SHA1SUM : 548eecfa0a84b9a52cdc719075684829aca48f3b). The latter is however not required to simply use the patched tool.


Benefits

  • allows to determine query performance, from the app server "point of view", for each type of query, with great statistical detail (avg response time per query type , std dev, apdex, variance to mean, ... see examples below) without having direct access to the actual DB or involving a DBA (helpful when on short engagements).
  • allows (mostly) to use pt-query-digest to parse response times of any database accessed through jdbc, without knowledge of that database's particular protocol. It has maximum compatibility with MySQL, but works with other databases too since the queries passing through the driver should have mostly consistent syntax, and not much DB-specific syntax. Sufficiently enough at least to get a precise idea of the query type performance for a particular test run.
  • If using a DB whose wire protocol is already supported by Maatkit (such as MySQL and PostgreSQL), it is then easier to determine for example the network overhead, as the DB side profiling can be easily compared with the driver-side profiling using p6spy.

Setup

To gather a query log, 3 steps are required :

  • Download a copy of p6spy.jar and place it in the classpath. I usually use tomcat/shared/lib. I have only tested the jar p6spy 1.3 (latest) , downloadable here : http://sourceforge.net/projects/p6spy/files/p6spy/1.3/p6spy.jar/download (sha1 : 984baf54184a788a569765361a5df48ce0a63c2a p6spy.jar)
  • Create a spy.properties and place it on the classpath. I usually use tomcat/shared/classes/spy.properties. This file will contain the driver configuration. A simple working example is included below :
module.log=com.p6spy.engine.logging.P6LogFactory
realdriver=com.mysql.jdbc.Driver
autoflush=true
excludecategories=
appender=com.p6spy.engine.logging.appender.FileLogger
logfile=/tmp/spy.log
deregisterdrivers=true

You only need to change the "realdriver" property if you are not using MySQL (replace by your actual driver).

  • In alfresco-global.properties, Change the DB driver to use the p6spy driver, which in the case above, will delegate to the MySQL driver, e.g :
#db.driver=org.gjt.mm.mysql.Driver 
db.driver=com.p6spy.engine.spy.P6SpyDriver

Upon startup, you should see a /tmp/spy.log being created, filling up with queries and associated raw response time among other things. This is the log that will be parsed (as in the examples below) with pt-query-digest

Examples

As is the case for the other parsers, there are 2 output "modes" that can be used :

  • profile (summary)
  • detail

profile

This is a summary output for a limited number of query types (sorted by impact). Example :

$ pt-query-digest --type=p6spy --report-format=profile /tmp/spy.log

Output :

# Profile 
# Rank Query ID           Response time Calls R/Call V/M   Item 
# ==== ================== ============= ===== ====== ===== ========== 
#    1 0x631DB93E5642CADF  0.7780 21.8%  2575 0.0003  0.00 INSERT alf_node_aspects 
#    2 0x74EBEB13627F93B7  0.6970 19.6%  2570 0.0003  0.00 INSERT alf_node_properties 
#    3 0x7F11B32B1B5A760F  0.5150 14.4%  1294 0.0004  0.00 INSERT alf_node 
#    4 0x893211C17A511841  0.5050 14.2%  1279 0.0004  0.00 INSERT alf_content_url 
#    5 0x1D8D7FD7E0184ED6  0.3480  9.8%  1293 0.0003  0.00 INSERT alf_child_assoc 
#    6 0xF75307A715416C4B  0.3390  9.5%  1279 0.0003  0.00 INSERT alf_content_data 
#    7 0xA5B6696AC7DF25C4  0.0930  2.6%   500 0.0002  0.00 SELECT alf_applied_patch 
#    8 0x3BCD98DE024A0893  0.0610  1.7%    66 0.0009  0.00 SELECT alf_applied_patch 
#    9 0x3AFCFEBEE97A8217  0.0400  1.1%    71 0.0006  0.00 SELECT alf_node alf_store alf_transaction 
#   10 0x2720FC146858DD93  0.0360  1.0%    40 0.0009  0.00 SELECT alf_child_assoc alf_node alf_store alf_node alf_store
# MISC 0xMISC              0.1530  4.3%   454 0.0003   0.0 <51 ITEMS> 

By default the top 10 queries in terms of impact (# of times executed * avg response time) are shown, but this may be changed with the "--limit" parameter. For each aggregated query type (queries that differ only by the values), the column fields are as follows :

  • Query ID : an assigned query ID. this can be used to store these results in a database, and do some trending over time, or more generally to compare a specific query fingerprint performance over time
  • Response time : the overall response time for this query, in terms of elasped seconds and percentage of the total query time for all queries
  • Calls : the number of times this particular query was parsed out of the log
  • R/Call: response time per call, on average
  • V/M : the variance-to-mean ratio, also called index of dispersion. (See Wikipedia's index of dispersion definition).
  • Item : a summarized description of the query, also called query fingerprint, which is the abstracted form of a query, which makes it possible to group similar queries together. Abstracting a query removes literal values, normalizes whitespace, and so on. See the output documentation for more details.
  • the Apdex index (Apdx column) is no longer available in the 2.2 release. Available in 2.1.x and prior, this is a satistical evaluation of user "satisfaction" for this avg query response time. (See Wikipedia's apdex definition).


detail

This a detailed output for each query type. This is the default output if none is specified.

$ pt-query-digest --type=p6spy /tmp/spy.log

Below is shown the detailed output for the query #1 :

# Query 1: 0 QPS, 0x concurrency, ID 0x631DB93E5642CADF at byte 13096045 _ 
# Scores: Apdex = 1.00 [1.0], V/M = 0.00 
# Query_time sparkline: |   ^_   | 
# Time range: unknown 
# Attribute    pct   total     min     max     avg     95%  stddev  median 
# ============ === ======= ======= ======= ======= ======= ======= ======= 
# Count         22    2575 
# Exec time     21   778ms       0    28ms   302us   972us     1ms       0 
# Query_time distribution 
#   1us 
#  10us 
# 100us 
#   1ms  ################################################################ 
#  10ms  # 
# 100ms 
#    1s 
#  10s+ 
# Tables 
#    SHOW TABLE STATUS LIKE 'alf_node_aspects'\G 
#    SHOW CREATE TABLE `alf_node_aspects`\G 
#    insert into alf_node_aspects             (node_id, qname_id)         values             (162299, 31)\G 

In this mode, only the top 5 queries are detailed by default. The other 4 are not printed here. More statistical details are available for this query in this mode (min,max, 95%, stddev, median), including a sample query of that type for reference. Can be used to be copy/pasted to get the query plan for example.


Even though more information is available when using the wire protocol (for example concurrency, rows examined, query plan, query plan warnings, ...), this data allows to have a good overview of this particular query's performance. See percona documentation for more details about the histogram. Note : Sparklines are no longer available as of the 2.2 release.


Note that it is not necessary to use pt-query-digest on the actual server where the log has been collected. The log can be transferred to another machine for processing, for example if the machine doesn't have Perl installed. pt-query-digest is written in Perl, and has fairly minimal dependencies, only some core packages that ought to be installed in any reasonably new version of Perl.


Percona documentation for the tool itself is available here.

Personal tools
© 2014 Alfresco Software, Inc. All Rights Reserved. Legal | Privacy | Accessibility