These tips are ones that I wish I'd read when I started getting into databases, both Oracle and MySQL. I use MySQL a lot, and I see MySQL in a lot of production environments- and the O in osix.net stands for open, so I assume most of my readers are going to be using it. The tips aren't really specific to MySQL, but are definitely applicable.These tips are also ones I wish developers of live databases I've had to use had read. Reading this won't make you a database superman. It may not even make you a green arrow. It will, however, alert you to a few useful tools that will save you a whole lot of trouble.
1. Securing Permissions
Security is vital for any public-facing database. There's no way to completely secure a box short of turning it off, but you can make it harder to crack. There are other guides on osix.net that explain how and why to scrub data, but there are other things you can do. One of the simplest and most effective is user permissions. Let's say you have a database that you manage on the back end that only shows information to the users, and it doesn't let them edit it. Following the principle of least privilege commonly used in *nix, we ought to grant only select access to the user. They've no reason to modify data; therefore, there is no reason to have access to INSERT, UPDATE, DELETE, DROP, or CREATE TABLE or CREATE DATABASE. Depending on what GUI tools you have available, you may be able to select permissions for each user per database or table. If you don't have a GUI, you may have MySQL's admin package, which as of May 2008 doesn't include by-table permissions. If this is the case, you'll need to use the CLI for granulation this fine.
The GRANT command is what you need. It's fully explained in the MySQL documentation, but I'll explain it for our purposes here. Basically SELECT privileges are the same as "read" permissions in a file system. INSERT/UPDATE/DELETE are like "write" permissions.
GRANT USAGE ON *.* TO user
GRANT USAGE is the same as denying all other privileges. Now, you know they'll only have access to the "reports" table of the "business" database, and they'll only be able to SELECT from it;
GRANT SELECT ON business.reports TO user
Here, the user can view reports, but even if an attacker manages to get some SQL injected, say by entering Quote: '; DROP DATABASE -- , their user account, or the one used by the script on your web server, doesn't have permission to DROP anything.
Let's take a more complicated example. Say you upgrade your system, and now users can enter and update their own reports, which are linked to some reference tables with a foreign key. The users can already SELECT from reports, but you want the web form to show a list of regions from a reference table so they can easily select it from a list instead of typing and getting errors entering data; the glory of reference tables and foreign keys will appear a bit later. How do we do this? Well, first, let's give them SELECT permissions, and then write access to reports.
GRANT SELECT ON reference.regions TO user
GRANT INSERT, UPDATE ON business.reports TO user
Notice anything? I didn't give them permission to DELETE reports. The specification "users can enter and update their own reports," didn't require me to give them DELETE permissions. Now if I was contracting and the boss meant "enter, update, and delete" this would cause some shouting, so don't assume business users mean to exclude permissions they don't mention. In fact, it's best practice to clarify every list they give you, but that's not related to the technical question at hand. As it stands, our users can now create new records, and edit existing records. If you're vulnerable to SQL injection, this can cause problems but only problems with setting all records to null and adding extra ones to fill your database! Even now, the structure of your data and the existence of your tables and databases are secure.
Like any security measure, user permissions should be only one of multiple layers. It doesn't solve every problem, and it only mitigates vulnerabilities elsewhere. However, it is easy to implement and understand and can be a very effective measure against more serious attacks. Far better than user permissions is something called stored procedures; this involves writing functions that get the data for the users, and you GRANT users permissions to execute these functions instead of querying the database directly. However, if you don't have user permissions locked down yet, this should be your first step.
2. Data Integrity with Foreign Keys, Default Values, and Reference Tables
Foreign keys, like user permissions, aren't unique to MySQL, but they are very useful and underused. A primary key is used to ensure that each row in a table is unique. Good examples are user ID numbers: each user has one and only one and each id has one and only one user. Primary keys can be combinations of multiple columns; for instance, a time and location for a series of stops along a delivery route. Foreign keys link to primary keys. This is very useful for data integrity.
Say you have 20 users entering data, every time they enter data they need to include someone's first name, last name, and state. It's pointless to check the first name and last name, but the name of every state is known. Your users will probably misspell names, and if given the opportunity, you can expect that they'll misspell "Mississippi" at some point. The best solution, and one that even the users prefer, is to give them a list to select from. In this case, you can kill two birds with one stone. First, you can build a reference table that the user data refers to. Next, you can build the list for the users from that same table. If Canada joins as a state or Texas leaves, you can update the one reference table and both the data and the user list will be updated! First, create your reference table.
CREATE TABLE reference.regions(
id varchar(10) not null,
name varchar(30) not null,
primary key (id)
) engine=InnoDB;
Specifying the engine as InnoDB is important because in MySQL, the MyISAM engine doesn't get along with foreign keys at all. Now that the regions table is created, we can make a user table that references it.
CREATE TABLE business.customers(
id integer auto-increment not null,
firstname varchar(30) not null,
lastname varchar(30) not null,
region varchar(10) not null,
primary key (id),
key customers_fk_region (region),
constraint customers_fk_region foreign key (region) references reference.regions (id) on delete restrict on update cascade
) engine=InnoDB;
There are several things to note that will save you a lot of time looking up error codes. First, the business.customers region column has the same definition as the reference.regions id column; a varchar(11) or integer wouldn't work. Second, the region column in the customers table is keyed by the "key customers_fk_region (region)" line. Every foreign key column needs to be keyed. It doesn't need to be unique; in fact, index is an ideal type for this key. MySQL will handle much of this for you when you add a foreign key to an existing table.
Finally, the actions on delete and update are important. "Cascade" means that any change you make to the reference.regions table will fall down onto the customers table. When applied to the update event, cascade means that a change of the reference table's primary key (say from 'MISS' to 'US-MS') will update any row in the customers table that references that row. If you set "ON DELETE CASCADE" and delete a state from the reference table, it will also delete any customer table entry that references it. The other popular option is RESTRICT; it won't allow changes to a row in the regions table if that row is referenced. Say you have fifty customers in Texas; before deleting Texas from your reference.regions table, you will need to delete those customers that reference the Texas row.
The important thing to remember about foreign keys is that they restrict what data is allowed into a field, and they do so in a way that is fairly easy to maintain. If you company expands into a new state, you just add a new row in the reference table. It's usually easier to have a numeric key for primary keys, such as when you're tracking sales orders. You can do the same with regions, but regions usually have international standard codes, which might be useful to you in the future.
3. Query Logic and Efficiency.
I'm going to start with some pseudo code I found in the wild:
while !done
.iterate the counter
..query entire table
.for each row in the query
..check it against the comparison array at counter(iteration)
..if it's the right one
...print it.
.if the iteration is the same as the size of the comparison array,
..set done = true
Once I finished clawing out my eyeballs, I rewrote it:
query, order by ordering column.
for each row in the results,
.print it.
Now, it was both PHP and MySQL that I discovered this in, but the load time of the page was 90 seconds when I found it and 0.3 seconds when I left because I was not querying an n-line table n times. The key to not bogging down your MySQL server is to query efficiently. If you're looping through something once, do so by looping through the result set and not by looping and retrieving a full result set every time. If you're going through the results multiple times, check your logic to make sure it's necessary, and then try putting the results into an array. You can loop through the array rather than hammering the database with queries. The mileage of this varies depending on the size of your result set, the number of times you need to reference it, etc. Don't put 100 BLOBs (binary large objects) into an array, but if you're querying 100 BLOBs, you might be doing something wrong. You could try building a reference array of ids and names for the BLOBs and pull individual BLOBs inside your loop.
The best way to reduce resource use, however, is to do as much elimination as possible in the query. For example, it's possible to do something like this:
SQL: SELECT * FROM users
PHP/ASP/etc.: for each row retrieved,
.if the last name is 'johnson'
..get the email
It's easier if do this, especially if the lastname column is indexed for quicker searching:
SQL: SELECT email FROM users WHERE lastname='johnson'
PHP/ASP/etc.: get the email(s) returned by the query.
This way, we sharply reduce the rows retrieved from all to just a few. We iterate through fewer things, putting less load on both our database and our web server. The less data you ask for from your database server and the more specifically you ask for it, the easier time you'll have and the less work your web server or application will have to do with the data it retrieves. Remember, a database's purpose is not just to store information but to retrieve data conveniently and quickly.
Lousy queries are not the sole domain of the front end, however; SQL channels are loaded with newbies asking why their queries run slowly. The zeroth thing you should do is read MySQL's documentation of the explain command, which debugs and times queries entered. The first thing to check is keyed columns. Joins and ordering should be done on columns that the database has an indexed record of. The easiest way to ensure snappy queries is to order by primary keys, to index columns frequently used for ordering or querying, and to join on foreign key columns.
Finally, the best way to improve your MySQL skills is to practice and to learn: Read the manual. Ask questions on MySQL channels with the purpose of learning and improving not just to fix your current problems. Read documentation when time permits. Track down discussions of technique in forums or IRC channels. Read the MySQL forums and maybe even buy some books on the subject. |