How to avoid writing SQL for Ado.Net DataAdapter

Introduction

One of most repetitive, boring and thus error prone tasks when accessing databases from C# (or other language) is writing SQL commands. And when there is explicit SQL, no matter whether it's in stored procedures or inside your code, it has to be maintained. For instance if column is added to database, that column must be added to all 4 commands (INSERT, SELECT, UPDATE, DELETE - aka CRUD), and there are often more than one SELECTs with different filters. There are certainly times when you must write some SQL, but it would be better if machine could create as much as possible.
This article is about writing some code that can do that job. It was made for SQL Server and Ado.Net SqlClient provider.
Update 15/1/2005: I've changed it to support OleDb too. Wherever in text SqlCommandBuilder is referenced it also applies to OleDbCommandBuilder. You shouldn't have too much problems to customize it to other databases and data providers.

What's already included in Ado.Net

Ado.Net comes with one solution for this problem, SqlCommandBuilder class. You still have to provide SELECT command, but INSERT, UPDATE and DELETE are generated for you. Behind scenes CommandBuilder goes to database to fetch required metadata. CommandBuilder has lot of issues, discussed in William Vaughn's article "Weaning Developers from the CommandBuilder". In my opinion greatest problem is if for some reason code it generates doesn't fit your needs there's no way to customize or fix it - you have to abandon it completely. For example if you want to find out value of autoincrement column in SQL Server after insert, you're out of luck. Another serious problem is that it requires roundtrip to database to fetch metadata, so even Microsoft doesn't recommend using it in production code.
Another possible solution is to use Data Adapter Configuration Wizard (DACW), as suggested in Vaughn's article. It works similar to SqlCommandBuilder, but in design time. So you can see generated code and tweak it to your needs. Problem is you have to make same tweaks whenever you generate SQL (e.g. to add retrieval of autogenerated column to INSERT commands). Once you tweaked it you are in manual maintenance mode - computer can't add new columns to these commands for you.

A word or two about Strongly Typed DataSets

When working with plain, untyped DataSets, there are lot of casts and finding columns by name. Both of these activities can't be checked at compile time. If you like when compiler finds as many bugs in your code as possible then you must be using Strongly Typed DataSet (typed DS). It's one of my favorite features of .Net. If something changes in db, you just update typed DS, compile and let compiler find where your code doesn't work with new version of db. Also when you use typed DS you have Intellisense (which is hard to live without). Although it affects performance, benefits usually compensate that difference. If your application can afford to use typed datasets you'll see here how they can reduce amount of SQL that should be written manually.
Typed DS is created from db in similar way CommandBuilder works - database is queried for metadata. Main difference is it's happening on design time instead of run time. This essentially means on run time there already exists enough metadata inside typed DS to construct commands - you can found out names of tables, list of columns, their types, which columns are primary keys, etc. If SqlCommandBuilder knew about them it wouldn't have to make roundtrip to database. Also SqlCommandBuilder have no way of knowing which columns you really need: if some typed DS represents only subset of some database table, only columns represented in that DS should be included in commands.

CustomCommandBuilder

Code presented here will work similar job as SqlCommandBuilder, but under our control, therefore we can make custom changes. For SELECT command we can add filtering or ordering, and still have auto generation of column list. For INSERT we can add support for auto increment fields. For UPDATE command SqlCommandBuilder is using "optimistic locking" support, i.e. only rows that are not changed since we fetched them are written to db. We can choose other kind of concurrency control, e.g. using timestamps or no concurrency control at all.
All relevant classes are in Cogin.Data and Cogin.Data.DataAccess namespace. Class CustomCommandBuilder is one that generates SQL. It's used by SqlDataAccess class, which contains low level Ado.Net code. These 2 classes are not application specific and can be reused in different projects. There are similar 2 classes for OleDb: OleDbCustomCommandBuilder and OleDbDataAccess. DataAccess classes are not used directly, but through IDataAccess interface, so that application specific classes don't have to be changed when we switch from SqlClient to OleDb data provider. They just have to use DataAccessFactory to get appropriate version, which checks config file to see which data provider will be used.
Example of data access classes are CustomersDataAccess and OrdersViewDataAccess. These classes are handling all CRUD operations for sample application and yet there is no SQL in any of them! That was our goal. Methods you have to write are as simple as:
public void FillOrdersForCustomer( string customerId )
{
dataAccess.SelectWithFilter( dataSet.Orders, "CustomerID=@customerId",
new GenericSqlParameter( "@customerId", customerId ) );
}

public void updateCustomers()
{
dataAccess.FullUpdate( dataSet.Customers );
}
Sample Win Forms application is included. It uses Northwind database which you should already have on Sql Server. It displays customers and their orders (filtered and loaded only when needed) and allows updates, inserts and deletes. Unfortunately Northwind doesn't have cascade updates/deletes so you can successfully play only with rows you created. Since that wasn't point of this article I didn't bother to write code to cascade changes to all affected tables, e.g. to Order Details.
CustomCommandBuilder can be used even if you want to use stored procedures instead of plain SQL. It can create calls to stored procedures, and generic procedures as well. Creating these autogenerated stored procs in db can be done manually or automatically on application start.

Performance considerations

When working with db usually the bottleneck is database itself. However I was still interested to find out performance hit CustomCommandBuilder introduces. In order to minimize db overhead test database was on same machine as test program (no network latency). Fastest way is to use stored procedures, so alternatives are compared to that approach. All test cases use typed DS. I didn't try to measure typed DS versus DataReader or some other approach. I tried to answer following questions with this test:
  • Is it slow if SQL commands are generated every time? It's what CustomCommandBuilder is doing, and if command generation is too slow some caching mechanism should be used.
  • Is it slow to recreate data access and typed DS classes for each request?
  • Does it matter if filters are made using SqlParameters compared to plain text?
Last question was important because you can make filters by concatenating strings, e.g.
SelectWithFilter( dataSet.Orders, "CustomerId='" + customerId + "'" ); 
This is slow not because of string concatenation, but because Sql Server receives always different commands and must compile them (parse and determine execution order) every time. Better way to make parameterized filters is:
 SelectWithFilter( dataSet.Orders, "CustomerId=@customerId",
new GenericSqlParameter ( "@customerId", customerId ) );
Sql Server can cache this kind of command, because parameters are transferred separately from command string. As you'll see from benchmark results this is almost as fast as using stored procedures, but much more flexible. Using SqlParameter is not only faster, but safer too because you don't have to worry if parameter contains some illegal character, like ' sign.
I created a small benchmark that also uses Northwind database, to fetch 1000 times random row from Orders table. Stored procedure CustOrdersOrders (which already exists in Northwind) is compared to selecting using generated SQL command, generated again for each request. Variants with and without reusing data access and typed DS classes are measured. Test configuration: Athlon XP 1800+, 512MB RAM, 7200rpm HDD, Win XP SP1, SQL Server 2000. Database is running on same machine. All tests are performed 3 times. Results:

This example is very simplified and limited, but it gives some hints. You should perform tests on your setup with typical real life load, to get relevant data for your case.
As you can see differences are rather small, and worst case is only 25% slower than best case. I consider 25% to be small difference because in real world database is on separate server, queries are much more disk-intensive (during this benchmark disk activity was practically nonexistent because very soon entire Orders table was cached in system memory), so differences should be much smaller. You should notice that "plain text filters" have largest deviation. That's because they are very dependent on "randomness" of required CustomerID, i.e. whenever same filter condition is repeated previously compiled sql command is used. This also means that this kind of filtering is trashing Sql Server's compiled commands cache.
If this performance difference is significant in your case, or you can not afford additional memory overhead of typed DS, you can go DataReader/Stored proc way, but that requires much more work and is much harder to maintain. Best advice is to measure before optimizing anything.
If performance is an issue I would rather focus on way typed data sets are usually used - as set of several data tables which directly map to database tables, i.e. without using joins in queries. Each DataTable object is filled/updated with separate SQL command, and each one means roundtrip to server. This can be improved by packaging several commands together - multiple resultsets will be received with single roundtrip.

Conclusion

CustomCommandBuilder frees you of writing generic SQL code, and yet allows you to choose how it will be generated. Performance hit is usually minimal in real life. Only requirement is that you're using typed data sets, which I would recommend anyway if you can trade some performance hit for increased productivity. If you're using Sql Server everything is already provided, but contrary to Ado.Net's SqlCommandBuilder is you can change it for other servers.

0 comments:

Post a Comment