Archive

Posts Tagged ‘mysql’

Gene Ontology MySQL Quickstart

November 17th, 2006

Quickest way to set up a local GO DB with MySQL:

  1. Grab latest go-YYYYMM-seqdblite-tables.gz from here.
  2. tar -zxvf go-YYYYMM-seqdblite-tables.gz
  3. cd go-YYYYMM-seqdblite-tables
  4. echo “create database mygo” | mysql -uroot -p
  5. cat *.sql | mysql -uroot -p mygo
  6. mysqlimport -L -uroot -p mygo *.txt
  7. This is listed right on that same download page, but somehow I always forget that.

Aidan Findlater Impersonal ,

MySQL installation on Mac OS X makes me happy

July 26th, 2006

Mad props to MySQL for providing their software in PKG form for Mac OS X (now all they need is an appcast). However, they note in the readme:

If you are upgrading an existing installation, note that installing a new MySQL PKG does not remove the directory of an older installation. Unfortunately, the Mac OS X Installer does not yet offer the functionality required to properly upgrade previously installed packages.

I guess that depends on what’s required to upgrade MySQL installations, but I’m not sure why they’re so special. Mac PKGs provide support for pre- and post-upgrade scripts. I’m not sure what it could need to do that isn’t supported. At the very least, a post-upgrade script could do a little backing up and moving of important files so that the old directory can be deleted.

Regardless, I’m glad to see an open source company like them releasing software for the Mac in a format that makes sense. With open source projects it’s mostly the the low karma command line (sometimes Fink) or a negative karma dirty, dirty installer like Zero G.

Aidan Findlater Impersonal ,

Database indexes are very important

June 7th, 2006

The SQL query that I’m running 650 times is as follows:

SELECT h.protein_acc, h.taxon_id FROM homologene h, all_proteins a, synonyms s WHERE
s.synonym = '$_' AND a.alt = s.protein_acc AND h.protein_acc = a.protein_acc
UNION SELECT h2.protein_acc, h2.taxon_id FROM homologene h1, homologene h2, all_proteins a
WHERE a.alt = '$_' AND h1.protein_acc = a.protein_acc AND h1.hid = h2.hid;

I estimated that the script would take 10 hours to run (this SQL query isn’t the only thing it’s doing, of course). That’s far too long. Using the handy EXPLAIN function showed me where I was going wrong:

+-------+--------+-----------------+---------+---------+----------------------+--------+-------------+
| table | type   | possible_keys   | key     | key_len | ref                  | rows   | Extra       |
+-------+--------+-----------------+---------+---------+----------------------+--------+-------------+
| s     | ref    | PRIMARY,synonym | synonym |      27 | const                |      1 | Using where |
| a     | ref    | PRIMARY,alt     | alt     |      28 | s.protein_acc        |      3 | Using where |
| h     | ALL    | NULL            | NULL    |    NULL | NULL                 | 174762 | Using where |
| a     | ref    | PRIMARY,alt     | alt     |      28 | const                |      1 | Using where |
| h2    | ALL    | PRIMARY         | NULL    |    NULL | NULL                 | 174762 |             |
| h1    | eq_ref | PRIMARY         | PRIMARY |      31 | h2.hid,a.protein_acc |      1 | Using index |
+-------+--------+-----------------+---------+---------+----------------------+--------+-------------+

See those two rows that say “ALL”? That’s right, it’s doing a linear search through 174,762 records to find the record of interest. Twice.

What to do? Create a decent index. After running “ALTER TABLE homologene ADD INDEX (protein_acc);“, we now have:

+-------+------+---------------------+-------------+---------+---------------+------+-------------+
| table | type | possible_keys       | key         | key_len | ref           | rows | Extra       |
+-------+------+---------------------+-------------+---------+---------------+------+-------------+
| s     | ref  | PRIMARY,synonym     | synonym     |      27 | const         |    1 | Using where |
| a     | ref  | PRIMARY,alt         | alt         |      28 | s.protein_acc |    3 | Using where |
| h     | ref  | protein_acc         | protein_acc |      27 | a.protein_acc |    1 |             |
| a     | ref  | PRIMARY,alt         | alt         |      28 | const         |    1 | Using where |
| h1    | ref  | PRIMARY,protein_acc | protein_acc |      27 | a.protein_acc |    1 |             |
| h2    | ref  | PRIMARY             | PRIMARY     |       4 | h1.hid        | 1747 |             |
+-------+------+---------------------+-------------+---------+---------------+------+-------------+

The script went from taking 10 hours to about 15 minutes. That makes me very happy.

Edit: 15 minutes may have been too optimistic. If only there was some way of speeding up GenBank, then I’d be in business!

Aidan Findlater Impersonal

Postfix + MySQL Virtual User Script

March 17th, 2005

Here’s a script I wrote for myself and my friend to help add virtual mail users in a Postfix w/ MySQL setup:
Read more…

Aidan Findlater Impersonal , ,