Why SQL Sucks
About two weeks ago at work we discussed different strategies on testing databases. Although there are ways to do it, they are all more or less painful. And finally we all parted in agreeing: (SQL) databases suck, SQL sucks and testing databases and SQL sucks as well. We are only testing this crap because the only thing worse then database tests are untested databases. But complaining doesn’t help. I have heard you are supposed to ask 5 times “why?” when faced with a problem. Maybe it was 7 times? Don’t know lets start with asking it once:
Why does SQL suck?
Hard to parse for computers: I don’t know much about parsing, but it seems to be hard for a computer to identify the bits and pieces of a SQL statement, when it is not really SQL but just almost SQL. This results in really crappy IDE support, because the editor doesn’t know what you are trying to write, before you finished it.
Hard to parse for humans: One of the ideas behind the design of SQL was to make it look almost like a sentence. While this works for trivial statements, it completely falls a part when you have some serious statement at hand. For an extremely bad example check out the turing engine implemented in Oracle SQL. Side Note: aren’t some fluent APIs trying to do just that.
Not consistent:Why does the having clause exist? Why nothing similar for group by? Who came up with the syntax of analytic functions. Why can I reference an expression defined the column list in the where clause, but not in the group by clause?
(Almost) no modularization: You need this real complex where condition in another select? Copy it. You need to join the same bunch of table, just with a tiny change in one condition? Copy it. You want to use the same group by clause in a different select? Copy it. There are few little things you can do to break your SQL code, but it is nothing compared to the power of a modern language like … e.g. … Fortran 66.
No decent development environment appart from the target environment When you write a java application (and you do it the way it is supposed to be done), most of it can get compiled, and executed in a simple JVM. No web server needed, nor application server. Compare that to SQL. The only thing that can properly compile and execute a SQL statement is a database. And not just any database, but a database with all the accessed tables created and filled with data. This does not help when writing SQL code.
Ok, we have some reasons for SQL sucking so badly. What’s next. How about asking “Why?” again?
Why are all these things true about SQL?
That answer is easy: I have no idea. So let’s step right ahead and ask again “Why”
Why does a curious developer with over 10 years of experience in database development know some reasons why SQL sucks, but can’t answer the simple question, why these reasons apply?
I think it is because nobody really discusses SQL (at least not loud enough for me to hear). I know there is the No-SQL-Movement, but I happen to think that a relational database is a pretty decent solution to many things. There are also things like the Hibernate Criteria API and HQL. But those do not operate on database schemas, but on classes and objects. While this is fine when you wan’t to work with classes and objects, it sucks when you just want to process large tables. When processiong 100K database records pulling each out of the database, instanciate an object, manipulating it and writing it back is not an option.
Yet a way to write queries in a more modular, easy to parse and easy to test way would be highly appreciated.
I see a lot of complaints about SQL and how it’s so hard to write, etc., but I honestly don’t see the problem. If you understand ANSI SQL and understand the variations your database(s) support, there should be little trouble. If you want more validation of your sql statements, make them stored procedures.
I personally find that I generate many on-the-fly sql calls, based on passed-in parameters. Create methods that add to your sql statement, which would allow for the re-use you mention.
My answers to your points…
Hard to parse.
Hardly. Intellisense does a very good job of it. It’s a simple language and very easy to parse with simple code.
Hard to parse for humans.
I find it quite easy. If you write bad statements then you get bad readability but isn’t that true of all languages?
Not consistent.
Your argument about the HAVING clause shows lack of knowledge. HAVING and WHERE are two separate and completely different statements. One is for grouping, one for row filtering. They can be used together for different purposes.
There is a variation between systems. Some systems allow you to use expressions wherever you like, some don’t. There is an ANSI SQL but people tend to go outside of this. It’s like HTML in that respect. Not ideal but it’s up to yuo as the developer to use the right stuff if you want it to be fully portable.
(Almost) no modularization
What about views and functions? They do what you need. Depending upon your SQL implementation you can index them too.
No decent development environment appart from the target environment
Depends upon your system. SQL Server has a great environment provided with the software and you can use Visual Studio too. Other systems may not be so good but there are open source tools out there.
Why are all these things true about SQL?
They aren’t.
It appears to me you have got the title wrong. It should read : Why I suck at SQL.
Or “Why people with decades old ideas flame me”.
Readability:
sql is very human readable
sql is somewhat intelisense, most problematic is that one can’t use columns in select until from clause is defined. (see ms-linq dialect that ensures that from clause is first)
–
Consistency:
”On” clause is used to define which records to match during join
”Where” clause is used to define which records to process in aggregates
”Having” clause is used to define which records to return in the resultset
Very consistent! Three ways to filter data, but each during specific purpose :
on : filter on join
where: filter after join before aggregate
having: filter after aggregate.
Even their order says clearly which ones are processed when: On in joins, then where, then you group (aka: define aggregate) then you filter on aggregate results (having clause).
–
Modularization
What about views? inherited tables? materialized views? Scalar functions? Table functions? Custom aggregate functions? Triggers? Relationships (PK/FK)? Computed columns?
–
Target environment issue
You are complaining that sql can only be tested in same database with same schema as your system expects?
It is like complaining that your JVM class can only be tested in java (not c++), and only if it contains other classes that it inherits from.
Cheers
Adam Zochowski
Wow, so many comments, what an honor for my little blog. Thanx everybody for contributing
@ C.W.Zachary
It’s not so much that I consider SQL difficult to write. Statements up to a couple hundred lines are fairly straight forward to write I think. But they are a nightmare to maintain, read, reuse and test. Part of the problem is the persistent nature of databases, but part of it is the due to the nature of SQL
@Jim Danby
I might be wrong on the parsing part, maybe its just the tool vendors that took awfully long to come up with something usefull. I agree though that the capabilities of IntelliJ IDEA are pretty impressive when it comes to intellisense on SQL. Interesting though that this comes form a ‘mainly Java IDE’
Having and where do exactly the same thing, just at different time in the processing. You can ‘proof’ that by replacing a having by a where clause on a inner select, or replacing a where clause by a having clause after grouping on all columns (assuming the rows are unique, which can be achieved by a row counting column)
No, views do only very little of what I need. For an example consider the sql statement in this article: /2008/08/20/sql-tricks-3-intervall-join (don’t worry about the german noise around it and sorry for the bad formatting). As you can see it contains exactly the same subsqlect just on different tables. There are no easy ways to reduce this code duplication. The same applies for joins, group by and combinations of columns.
@Adam Zochowski Your comparison is only a little off. I am fine with needing the execution environment (rdbms vs jvm). But yes I hate it that there is no concept comparable to interfaces, which allows to exchange the tables a sql statement operates on or (the currently not existing) modular pieces of a sql statement. It would allow for true unit testing of sql statments (among others)
Of course if you rely heavily on inheritance in an class based OO language you might easily end up in the same kind of mess one is stuck in with SQL.
Hard to parse for computers : Don’t care i sure it’s possible.
Hard to parse for humans : Yes because SQL is very expressive and concise. So a few line can do many complex things.
Not consistent: Do not agree, it’s logical and consistent if you think at what it really does.
(Almost) no modularization: Don’t use plain SQL. You should generate it. It’s fairly esay, for example in PL-SQL. That what I do all the time. But you are true, we need more modularity.
No decent development environment appart from the target environment : True I know what you can have when you develop java, and at least, all SQL environment i know sucks. (didn’t try Visual studio…)
Why does a curious developer with over 10 years of experience in database development know some reasons why SQL sucks, but can’t answer the simple question, why these reasons apply?
Pleople don’t know, don’t like SQL. SQL is old fashioned. But they forget, when you do do some serious data manipulation, not only on 100K line, but on 10 000K line (we have 30 000K to 300 000K line in our table and have to compute sum, aggregate etc on it).
In SQL it is just a few minutes hours. And a few line of code. Do it with hibernate and java : hundred, thousand line of code, chance you have Out of Memory Exceptions and it will be far far slower.
@Nicolas
Well PL-SQL isn’t that much better then SQL when it comes to the design of the language. You do get modularization but you loose the set base declaritive way of thinking and working which is a strength of SQL
And I agree with your last two paragraphs. While hibernate is great for many things, it sucks extremely at large volume processing. And SQL shows it powers. Yet this doesn’t make it a nice language.
I am shocked… SHOCKED… at how many people have commented that SQL is easy to understand. SQL has been the bane of my multi-year developer career. I mean how can anyone understand this little beauty:
SELECT
COUNT(*) AS Pipeline_Count
, SUM(CASE WHEN OP.Rating_BidderRate > 3 THEN 1 ELSE 0 END) AS PipelineProbableWinCount
, SUM(OP.EstimateAward_Value) AS Pipeline_TotalValue
, SUM(CASE WHEN OP.Rating_BidderRate > 3 THEN OP.EstimateAward_Value ELSE 0 END) AS Pipeline_ProbableWinTotalValue
, SUM(CASE WHEN O.Recompete = 1 AND OP.Rating_BidderRate >= 7 THEN OP.EstimateAward_Value ELSE 0 END) AS Pipeline_RecompeteIncumbentTotalValue
, SUM(CASE WHEN O.Recompete = 0 AND OP.Rating_BidderRate > 3 THEN OP.EstimateAward_Value ELSE 0 END) AS Pipeline_NewTotalValue
, SUM(CASE WHEN O.Recompete = 1 AND OP.Rating_BidderRate BETWEEN 4 AND 6 THEN OP.EstimateAward_Value ELSE 0 END) AS Pipeline_RecompeteNonIncumbentTotalValue
, C.Id AS CompanyId
FROM
dbo.OpportunityPipeline OP WITH (NOLOCK)
[...] Why SQL Sucks [...]
Ich finde die Einwände etwas naiv.
SQL ist eine relationale Sprache und ist im Umgang mit Mengen imperativen Sprachen immer überlegen. Sie ist mit logischen Sprachen verwandt wie RegEx oder Prolog, aber mit imperativen wie Fortran. SQL ist regulären Mengen und regulären Ausdrücken daher verwandter als Imperativen und Prozeduren, gleich ob SQL prozedurale Programmierung ähnlich RegEx auf eine Art unterstützt.
Ich empfehle dringend ein Studium Datenstrukturen und Mathematik (Mengenlehre, Relationen, Abbildungen). Dann leuchten einem die Vorteile ein. Es hat Gründe, warum man AIs, Graphen oder Netzwerke mit Hilfe solcher Sprachen programmiert.
Hi notnot,
ich habe keinerlei Interesse SQL durch etwas imperatives zu ersetzen. In der Tat ist die deklarative Natur von SQL eine Stärke, ebenso wie die mathematischen Fundamente.
Das ist aber in keiner Weise ein Gegenargument gegen meine Argumente. Mein Beispiel von Fortran als moderne Sprache war in dieser Hinsicht vielleicht etwas ungeschickt. Es ging mir dabei nur um ein Beispiel für eine Sprache, die zumindest grundlegende Mechanismen der Modularisierung unterstützt.
So, what’s the alternative?