2008 Summer School: Advanced SQL


This talk is intended to introduce advanced concepts in Structured Query Language (SQL). Examples of each concept are presented.

We will use the SQL Search front end to the SDSS database available here. Throughout, remember that the Schema Browser is your friend. Refer to it often for column names and function definitions.


SELECT: describes what you want.
FROM: describes where to look.
WHERE: describes how to constrain.

The simplest query is just a select and an expression.

SELECT sqrt(100.0)
SELECT getDate()
The next simplest is a select with a from on a single table.
FROM chunk
Next in complexity is a query with constraints.
SELECT run, rerun, camcol, field
FROM field
WHERE camcol between 4 and 6

SQL Dialects

Each SQL database vendor is responsible for implementing the SQL grammar for their particular product. There are standards for SQL and standard extensions (like Transact-SQL), but frequently these standards are not followed to the letter.
A good comilation of the differences between different dialects is located at: http://troels.arvin.dk/db/rdbms/


Often it is nice to return just a few rows to make sure the query is working, or maybe you just want the "most something" values (i.e. brightest, most distant, nearest, bluest, etc.).

SELECT TOP 10 ra, dec 
FROM PhotoPrimary

Aggregate Functions

Aggregate functions are useful for doing simple math on return columns. A common usage is to find the number of records that match a constraint via the COUNT function.

SELECT AVG(mjd_r), COUNT(DISTINCT field), MAX(run), MIN(run), SUM(nObjects)
FROM field


In cases where it appears that an intermediate table is necessary to get what you want, a sub-select may do what you want in a single query. Sub-selects act just like a table in the FROM clause.

SELECT a.ra, a.dec, b.bestobjid
FROM galaxy a, (SELECT TOP 50 ra, dec, bestobjid FROM SpecObjAll where bestobjid > 0) b 
WHERE a.objid = b.bestobjid

A real world usecase is getting multiple lines out of the SDSS lines tables.

SELECT a.ew, b.ew, c.ew, a.specobjid, b.specobjid, c.specobjid, a.name, b.name, c.name 
FROM (select ew, name, specobjid from speclineindex where name = "Lick_CN1") a, (select ew, name, specobjid from speclineindex where name = "Lick_CN2") b, (select ew, name, specobjid from speclineindex where name = "Lick_Hb") c 
WHERE a.specobjid = b.specobjid and b.specobjid = c.specobjid and (a.ew > -9999 and b.ew > -9999 and c.ew > -9999)

User Defined Functions

We won't go into how to create functions, but many dataproviders will define useful functions for you to use. In general, functions operate just like tables. The following example uses the SDSS function for doing a cone search on ra, dec, and r in arcmin. The return is a list of object IDs.

SELECT a.modelMag_g, a.modelMag_r, a.modelMag_i, a.modelMag_g - a.modelMag_r as gr, a.modelMag_r - a.modelMag_i as ri
FROM PhotoObjAll a, dbo.fGetNearbyObjEq(210.0, 35.0, 10) b
WHERE a.objID = b.objID AND a.type=3

Group By Clause

On occasion, one may only care about records in an grouped sense. For eample, the following query returns the average redshift of all objects in the database grouped by their spectral classification name with the constraint that the redshift is less than unity.

SELECT c.name, AVG(s.z) as AVG_Z
FROM specObj s, specClass c 
WHERE s.z < 1 AND s.specClass = c.value
GROUP BY c.name

Having Clause

Further constraints can be placed on the groups created in the GROUP BY clause with the HAVING clause. In the above example, average redshifts were returned for all spectral classes. In some the following example, only spectral classes with star in the name will be returned.

SELECT c.name, AVG(s.z) as AVG_Z
FROM specObj s, specClass c 
WHERE s.z < 1 AND s.specClass = c.value
GROUP BY c.name

Dealing With Flags

Flags may contain lots of information about objects in a relatively compact form. Bitwise operators are the way to deal with these flags.

SELECT TOP 10 ra, dec
FROM photoObjAll 
WHERE flags & dbo.fPhotoFlags('SATURATED') > 0 

The SATURATED flag in binary is 1000000000000000000 which is 262144 in decimal. The & operator computes the bitwise AND of the two arguments, thus if the SATURATED flag is not set, the result will be zero. The | operator is the bitwise OR.

If the database you are using does not have the nice builtin functions for flags that the SDSS DB has, you can just do the bitwise operations on the decimal representation of the binary flag.

SELECT TOP 10 ra, dec
FROM photoObjAll 
WHERE flags & 262144 > 0 


The NVO Summer School is made possible through the support of the National Science Foundation.