One of my pet hates is people going to a computer forum and asking something like, "I have a query that takes 50 seconds, how can I reduce it to 10" without a copy of their code and with all the telltale signs that they are nowhere near qualified or experienced enough and who expect people to simply do their work for them. Anyway, I want to partially mitigate this by giving you some beginners advice on improving the performance of queries. After reading these, you should write good queries in the first place, it is definitely easy than rewriting something and hoping you haven't changed anything.
Structural Improvements e.g. always do this
- Avoid carrying out functions like LEFT, LEN etc in queries. These probably betray a poor design (like joining on varchar columns) but might be sometimes unavoidable
- Use WHERE and ON correctly. People sometimes get these confused. Your ON clause should ensure that the rows that get joined are logically matched and then a WHERE clause is to optionally reduce the result set. If you have an ON which is not complete enough, you will multiply the join significantly and then have to reduce it after joining using your WHERE clause.
- Do NOT use CURSORS, in most/all? cases you can use JOIN to avoid them.
- Do NOT use selects in a where clause. It means every row that is matched needs to call another select (300,000 rows = 300,000 selects)
- Test your query on large datasets before deciding it performs acceptably.
- Only select columns you need, not *
- Design your DB tables to link using ID numbers and not text/varchar columns
Design Improvements e.g. you might need to do this
- Use the query analyser to suggest index creation. Understand that indexes usually result in faster selects but slower insert/update/delete so depending on how often a table is read compared to how often it is updated, this may or may not be acceptable. The alternative is an indexed cached table (updated every X minutes)
- Select required data from tables to join into temp tables or table variables, especially if the dataset is large, the table is in another database or you need to join on it multiple times. You might also want to create indexes on these tables. Consider joining a table with 500,000 rows and 80 columns that takes up 180Mb. Creating a temp table with 8 of these columns and perhaps using a where clause to reduce it to 100,000 rows will obviously perform much faster.
- Use cached data if acceptable rather than joining raw tables. If the data you need does not need to be up-to-the-second correct, you might write a proc that updates a cached table every 15 minutes locally and then use this in your join