Saturday, July 7, 2012

SubQuery Example in SQL – Correlated vs Noncorrelated

SubQuery in SQL is a query inside another query. Some time to get a particular information from database you may need to fire two separate sql queries, subQuery is a way to combine or join them in single query. SQL query which is on inner part of main query is called inner query while outer part of main query is called outer query. for example in below sql query

SELECT name FROM City WHERE pincode IN (SELECT pincode FROM pin WHERE zone='west')

section not highlighted is OUTER query while section highlighted with grey is INNER query. In this SQL tutorial we will see both Correlated and non correlated sub-query and there examples, some differences between correlated and noncorrelated subqueries and finally subquery vs join which is classic debatable topic in SQL. By the way this SQL tutorial is next in series of SQL and database articles in Javarevisited like truncate vs delete and 10 examples of  SELECT queries. If you are new here then you may find those examples interesting.

SubQuery Rules in SQL
Like any other concept in SQL, subquery also has some rules and you can only embed one query inside another by following rules :
1. subquery can be used in insert statement.
2. subquery can be used in select statement as column.
3. subquery should always return either a scaler value if used with where clause or value from a column if used with IN or NOT IN clause.

Before going to understand non-correlated  and correlated subquery, let’s see the table and data which we are going to use in this example. Until you have an understanding of how table look like and what kind of data it stores its little difficult to understand queries. In this subquery example we will use two table Stock and Market. Stock holds different stocks and Market holds all stock exchanges in the world.

mysql> select * from stock;
+---------+-------------------------+--------------------+
| RIC     | COMPANY                 | LISTED_ON_EXCHANGE |
+---------+-------------------------+--------------------+
| 6758.T  | Sony                    | T                  |
| GOOG.O  | Google Inc              | O                  |
| GS.N    | Goldman Sachs Group Inc | N                  |
| INDIGO  | INDIGO Airlines         | NULL               |
| INFY.BO | InfoSys                 | BO                 |
| VOD.L   | Vodafone Group PLC      | L                  |
+---------+-------------------------+--------------------+
6 rows in set (0.00 sec)

mysql> select  from Market;
+------+-------------------------+---------------+
| RIC  | NAME                    | COUNTRY       |
+------+-------------------------+---------------+
| T    | Tokyo Stock Exchange    | Japan         |
| O    | NASDAQ                  | United States |
| N    | New York Stock Exchange | United States |
| BO   | Bombay Stock Exchange   | India         |
+------+-------------------------+---------------+
4 rows in set (0.00 sec)


Noncorrelated subquery in SQL

There are two kind of subquery in SQL one is called non-correlated and other is called correlated subquery. In non correlated subquery, inner query doesn't depend on outer query and can run as stand alone query.Subquery used along-with IN or NOT IN sql clause is good examples of Noncorrelated subquery in SQL. Let's a noncorrelated subquery example to understand it better.

NonCorrelated Subquery Example:
Difference between correlated and noncorrelated suqueryLet’s see the query  “Find all stocks from Japan”, If we analyze this query we know that stock names are stored in Stock table while Country name is stored in Market table, so we need to fire two query first to get RIC for Japanese market and than all stocks which is listed on that Market. we can combine these two queries into one sql query by using subquery as shown in below example:

mysql> SELECT COMPANY FROM Stock WHERE LISTED_ON_EXCHANGE = (SELECT RIC FROM Market WHERE COUNTRY='Japan');
+---------+
| COMPANY |
+---------+
| Sony    |
+---------+
1 row IN SET (0.02 sec)

Here part which is inside bracket is called inner query or subquery. As you see in this example of subquery, inner query can run alone and its not depended on outer query and that's why its called NonCorrelated query.

NonCorrelated Subquery Example with IN Clause SQL
NonCorrelated subquery are used along-with IN and NOT IN clause. here is an example of subquery with IN clause in SQL.
SQL query: Find all stocks from United States and India

mysql> SELECT COMPANY FROM Stock WHERE LISTED_ON_EXCHANGE IN (SELECT RIC FROM Market WHERE COUNTRY='United States' OR COUNTRY= 'INDIA');
+-------------------------+
| COMPANY                 |
+-------------------------+
| Google Inc              |
| Goldman Sachs GROUP Inc |
| InfoSys                 |
+-------------------------+

When Subquery is used along-with IN or NOT IN Clause it returns result from one column instead of Scaler value.

Correlated SubQuery in SQL

Correlated subqueries are the one in which inner query or subquery reference outer query. Outer query needs to be executed before inner query. One of the most common example of correlated subquery is using keywords exits and not exits. An important point to note is that correlated subqueries are slower queries and one should avoid it as much as possible.

Example of Correlated Subquery in SQL
Here is an example of Correlated subquery “Return all markets which has at least one stock listed on it.”

mysql> SELECT m.NAME FROM Market m WHERE m.RIC = (SELECT s.LISTED_ON_EXCHANGE FROM Stock s WHERE s.LISTED_ON_EXCHANGE=m.RIC);

+-------------------------+
| NAME                    |
+-------------------------+
| Tokyo Stock Exchange    |
| NASDAQ                  |
| New York Stock Exchange |
| Bombay Stock Exchange   |
+-------------------------+
4 rows IN SET (0.00 sec)

Here inner query will execute for every Market as RIC will be changed for every market.

Difference between Correlated and NonCorrelated Subquery

Now we have seen correlated and noncorrelated subqueries and there example its much easier to understand difference between correlated vs noncorrelated queries. By the way this is also one of the popular sql interview question and its good to know few differences:

1.In case of correlated subquery inner query depends on outer query while in case of noncorrelated query inner query or subquery doesn't depends on outer query and run by its own.
2.In case of correlated subquery, outer query executed before inner query or subquery while in case of NonCorrelated subquery inner query executes before outer query.
3.Correlated Sub-queries are slower than non correlated subquery and should be avoided in favor of sql joins.
4.Common example of correlated subquery is using exits and not exists keyword while non correlated query mostly use IN or NOT IN keywords.

SubQuery vs Join in SQL

Any information which you retrieve from database using subquery can be retrieved by using different types os joins also. Since SQL is flexible and it provides different way of doing same thing. Some people find SQL Joins confusing and subquery specially noncorrelated more intuitive but in terms of performance SQL Joins are more efficient than subqueries.

Important points about SubQuery in DBMS
1.Almost whatever you want to do with subquery can also be done using join, it just matter of choice
subquery seems more intuitive to many user.
2.Subquery normally return an scaler value as result or result from one column if used along with
IN Clause.
3.You can use subqueries in four places: subquery as a column in select clause,
4.In case of correlated subquery outer query gets processed before inner query.


Other SQL and Database related articles from Javarevisited :

7 comments :

Naresh Jagatap said...

hi paul, here in the "Example of Correlated Subquery in SQL" , you have not used exists or not exists keyword. is it a correlated query?

Anonymous said...

Sorry but it's not correct. EXISTS clauses on statements are faster than the noncorrelates that you are refering to.

Anonymous said...

It is possible to use subqueries that return multiple values per row. Example of a valid query:
SELECT name FROM City WHERE (pincode,timezone) IN (SELECT pincode, timezone FROM pin WHERE zone='west');

Kiran Kumar said...

Use "=" , when subquery return single value on (single column)
Use "IN or NOT IN" clause, when subquery return multiple value on (single column)
Use "EXISTS or NOT" EXISTS clause, when subquery return multiple value on (multiple column)

Kavitha said...

"One of the most common example of correlated subquery is using keywords exits and not exits" In these lines it should be EXISTS and NOT EXISTS, you miss one of the "S" there.

Servanthi said...

One of the best example of correlated sub query is to to write sql query find the second maximum number in a table e.g. finding second largest salary in employee table. Look at below subquery, its a correlated because inner query is evaluated every time a new row is processed by outer query.

SELECT salary FROM EMP e1 WHERE 2 = (SELECT COUNT(DISTINCT salary) FROM EMP e2 WHERE e2.salary >= e1.salary )

You can use this query to find second, third, fourth or nth highest salary, just change the number after WHERE clause, e.g. for third highest salary put WHERE 3 = (result of correlated subquery).

How does this query will work?

When first row will be processed by outer query, inner query will find how many rows exists where salary is greater than current salary, if it was maxium then only correlated sub query will return 1, if the current salry is second largest then inner query will return 2 and if the current rows's salary is third largest then inner query will return 3.

Why this is an example of correlated query?
because inner query references outer query for comparing salary, here is the inner query portion :

SELECT COUNT(DISTINCT salary) FROM EMP e2 WHERE e2.salary >= e1.salary

you can see that it is using "e1.salary" which is from outer query.

Akshay said...

This is a good interview question to judge knowledge of advanced SQL developer. Regarding, what is the difference between correlated and uncorrelated subquery, here are my 2 cents :

1) You can execute an uncorrelated subquery independently but you cannot execute correlated subquery independently. SQL engine will complain about missing table reference for example in following non-correlated subquery, you can execute inner quey independently :

select * from Employee where salary = (select MAX(salary) from Employee)

here select * from Employee where salary is outer query and (select MAX(salary) from Employee) is inner query, which can be executed independtly.

On the other hand, if you try to exeucte inner query in case of following correlated subquery, you will get missing table reference error

SELECT MAX(Number) FROM Numbers n1 WHERE 2 = (SELECT CONT(DISTINCT Number) from Numbers WHERE n2.number>=n1.number).

here inner query is (SELECT CONT(DISTINCT Number) from Numbers WHERE n2.number>=n1.number), which is correlated because it is reffering to table alias n1, which is part of outer query. if you try to run this query independently, you will get compile time error.

2) Another significant differnece between these two is that uncorrelated subquery execute only once irrespective of how many rows in outer table, but correlated query execute for each reocrd processed by outer query.

3) Due to above reason, correlated query is not very efficient compared to normal subqueries.

4) You can use correlated queries with EXISTS and NOT EXISTS clause to search based upon value provided by outer query.

Since, correlated sub query is not very efficient, you should use it only when you need single result. Using it on large table with so many rows will make it run slower.

Post a Comment