Ad Hoc Query Performance Best Practices

By: Enterprise Singularity Team

Posted on: October 12, 2017

Optimizing performance in Ad Hoc Query can be challenging. The optimizer does well with simple queries, but struggles with larger datasets pulled from multiple tables. This is because the optimizer does not have time to examine every option and must use a simplified approach. Essentially, it looks for a few good first steps, then a few good second steps, and keeps working until it has a good plan or it runs out of time. With complex queries, it might run out of time before it can find the best approach.

Given that we cannot always avoid complex queries, how do we ensure good performance?

Queries perform best when the following things are true:

  1. Selective filters are used
  2. Selective filter columns are indexed
  3. Indexes can be used
  4. The result set is small

Selective Filters

What is a “selective filter”? A selective filter is a filter that eliminates most of the data in a table. A filter on record ID is very selective. It matches to either 0 or 1 rows. A filter for records from the last 3 years is not selective. It may only eliminate half the data in a table. Highly selective filters make great first steps. It is much faster to join to one row than to one million.

As an example, checking to see what financial transactions have changed this month will allow SQL Server to eliminate most rows in the table immediately. On a database with 1 million revenue records, the following query returns in 13 milliseconds and performs 3 reads.

Ad hoc query with the filter "date changed is this month"

Caution: “Or” statements can make selective filters useless. For instance, a query looking for any activity on a revenue record (payment, write-off, etc.) within a month would not be selective. SQL Server must check each table referenced to see if the record matches any of them before eliminating the row. For instance, the query below takes 1200 milliseconds to run and performs 49,000 reads, despite there being 0 rows in the ADJUSTMENT table!

Ad hoc query with filters "date changed is this month or adjustment date changed is this month"

What are your options if you need to get data from multiple tables like above? Here are a few:

  1. Capture information from multiple sources in one place. This could be a selection, an attribute, a customization, a smart field, etc. Examples include:
    1. If you have several queries that look for constituents meeting the same criteria (which uses an “or”), see if creating a constituent static selection will improve the performance of those queries.
    2. Capture information for varied criteria in a solicit code or attribute. Use a business process to update the solicit code/attribute. Read from those items in a query rather than the original criteria.
  2. Use Merge Selections. Merge selections is a tool that lets you combine two selections. It combines data with a “union” rather than an “or”, and is more efficient when each result selection is small and fast. Merge selections has the following limitations.
    1. Merge selections are limited to 2 selections at once. If you want to create a merged selection with, for example, three selections, you must merge the first two. Then, merge the merged pair with the third.
    2. Just like normal selections, merged selections only filter records at the root node level.
  3. If the result sets from each filter group do not overlap, run them separately.

Indexing Columns

Indexes have two major performance advantages:

  1. Indexes allow seeks. When selecting small numbers of records, seeks are much faster than scans. A seek is like looking a name up in the phone book by last name. A scan is starting from the beginning of the phone book and reading each entry until you find the right name.
  2. Nonclustered indexes retrieve data more efficiently. Nonclustered indexes only contain chosen columns from the table. This means that reading from a nonclustered index is usually more efficient than reading directly from the table (i.e. the clustered index).

It is important to remember that indexes have costs. They take up space, slow down add/update/delete operations, increase maintenance time, and force the optimizer to consider more options. Before you add a new index, ask yourself the following questions.

    1. Can you filter on a column that already has an index? For instance, if you are looking for a specific record, try using record ID instead of name.
    2. Is there an index you can modify? If you can add a column to an existing index to meet your needs, that is usually better than creating a new one.
    3. Is the performance benefit worth the overhead? Run tests to ensure your index boosts query performance. For large, heavily-used tables, your queries must benefit significantly to make it worthwhile.
    4. Would a filtered index be better? Read more here.

For information on how to add indexes to out-of-the-box tables, see this article.

Is the Index Usable?

Just because a column is indexes does not mean the optimizer can use the index for a query. Filter clauses that allow index usage are called “Sargable” or “Search Argumentable”. For instance, exact matches and “begins with” statements allow indexes to be used. The following will prevent indexes from being used:

  1. Clauses that perform a function on the indexed column
  2. Clauses that use “contains” instead of “begins with”
  3. Clauses that modify an indexed column before comparing it

Additionally, not all index usage is created equal. Exact matches give better estimates than “begins with” and other less-precise filters. Better estimates lead to better plans. Use exact matches whenever possible.

Finally, Ad Hoc Query uses views. Use sp_helptext with the “Results to Text” option enabled to dig into query views. A single column returned from a view may be a combination of columns from the source tables, which will prevent an index from being used.

Small Result Sets

In general, data pulled from the OLTP system should be small. If you are pulling a year or more’s worth of data, consider using the Data Warehouse instead. The Data Warehouse calculates complex information in advance and optimizes data for reading. This preprocessing can result in significant performance improvements. Pulling from the Data Warehouse also means less blocking in the OLTP system.

Other Tips and Tricks

  1. Test queries in a development environment first. Testing in a development environment will allow for optimization and prevent surprises in the production environment.
  2. Learn SQL Server Profiler or Extended Events. Checking the runtime of your query is an insufficient way to measure performance. Many small details can alter your runtime (for instance, running on a warm cache versus a cold one). The best way to measure the costs of a query is to use a tracing tool. Running repeated tests with a trace running will give you important details like CPU time, total reads, and total writes.
  3. Get familiar with query execution plans. The best way to understand how a query is retrieving data is to examine the execution plan. Once you measure the costs of the query, use the execution plan to identify ways to reduce those costs. Check out this free book to learn more.
  4. When optimizing, beware of data and plan caching. It is always slower to pull data from disk than memory. If you run your “baseline” query first and your optimized query second, without clearing the cache in between, you will get misleading results. Additionally, if you change something but don’t clear your plan cache, the optimizer may use the old plan. Remember to only clear your caches in test environments. Clearing them in production environments will lead to system-wide slowness.
  5. Get a peer review. Have another engineer look over your code. They can help you spot things you’ve overlooked and prevent headaches down the road. Furthermore, it’s a great way to sharpen your skills.
  6. Check out the Developer’s Conference presentation on Designing and Troubleshooting for Performance.
  7. Collaborate with organizations that do similar work. If you’re facing challenges with performance, it’s likely that other organizations have faced them as well. They may be able to provide you with useful ideas.

Further Reading

Nick DeWitt gave an excellent presentation on the SQL Server optimizer during Pass Summit 2010. If you are interested in learning more details on how it works, you can view the slides here. The optimizer has undergone changes since the presentation, but it is still a great overview.

Leave a Reply

Privacy Policy | Sitemap | © 2011 Blackbaud, Inc. All Rights Reserved

Digital Ocean