SQL

What is SQL? Learn Trends in Data Analytics & Warehousing

collation and SQL

What is SQL?

What is SQL? SQL stands for Structured Query Language. Application Developers, reporting, and BI Professionals use SQL to insert, retrieve, update, and delete data from databases. IBM initially developed SQL in their Santa Teresa labs in San Jose, CA.

What is SQL and How Does it Work?

SQL (pronounced “see-qu-el”) is a procedural language that often has non-standard procedural extensions. SQL standards bodies include ANSI (American National Standards Institute), and it is the standard language for relational database management systems. SQL is primarily used to manage databases that conform to the relational model and are managed by an RDBMS (Relational Database Management System). The SQL language uses an interpreter to parse the statement. The results from the parse step, which identifies syntactical errors, are passed to the database optimizer. The optimizer creates a query plan, which is a set of instructions that specify how the database engine will optimally execute the statement. The query plan states what keys will be used to get intermediate result sets and how they will be merged and sorted to arrive at the result set, which is returned to the calling application.

What is SQL Used For?

SQL allows applications to interact with a database. An application user typically accesses a collection of database objects that include tables, views, and indexes. A table object in a relational database consists of logical rows and columns. Some RDBMS systems store tables as rows, known as a row store, while others may store columns as a column store. In either case, the application operates on a row basis irrespective of the storage format.

The SELECT CUSTOMER-NAME FROM CUSTOMERS WHERE CUSTOMER-ID =10; statement will fetch the customers name whose CUSTOMER-ID is 10 from the CUSTOMERS table. If the table contains a STATE column, the following SQL statement will retrieve just the customers in Nevada:

SELECT * FROM CUSTOMERS WHERE STATE = 'NV';

Multiple tables can be related to each other by sharing common columns. These columns can be indexed to allow for faster access or simply fully scanned by the RDBMS if they are not too large.

Tables can be logically joined at execution time, and the nature of the join can be expressed in SQL.

The following are the types of joins where left and right are the order of the tables named in the SELECT statement:

INNER JOIN: This join returns those records which have matching values in both the tables.

FULL JOIN: This join returns all those records which either have a match in the left or the right table.

LEFT JOIN: This type of join returns records from the left table, along with those records which satisfy the condition from the right table.

RIGHT JOIN: This kind of join returns records from the right table, along with those records which satisfy the condition from the left table.

Tables can be related through primary and foreign key column relationships. For example, a customer table might have a primary key of Customer-ID. An Orders table containing a Customer-ID column will be known as a foreign key in that table.

Primary keys are usually indexed and contain unique values.

Applications use SQL to retrieve and insert data from databases, including distributed databases.

Below are some examples of SQL statements:

  • SELECT – gets data from a table or view
  • CREATE TABLE – is used to create a table of data
  • INSERT – is used to insert new rows into a table
  • UPDATE – is used to make changes to data contained in a database table
  • DELETE – is used to remove rows of data
  • GRANT – is used to give users the permission to view or modify database objects such as tables
  • REVOKE – is used to remove permissions from users or groups of users
  • ALTER – allows the user to add or remove columns from an object to update permissions
  • DROP TABLE – deletes a table
  • CREATE INDEX – will create an INDEX to allow for efficient access to data using a key-value
  • TRUNCATE TABLE – allows the user to empty the contents of a table

The SELECT statement can contain aggregation functions, such as:

  • COUNT() – returns the total number of rows that satisfy the criteria in the statement
  • MIN() – returns the smallest value in the result set
  • MAX() – returns the largest value in the result set
  • AVG() – returns the average of the values in the result set
  • GROUP BY – will group the results by a given column value
  • ORDER BY – will sort the results by the specified column

 

Comments in SQL help with understanding and maintaining applications. Below are single and multi-line examples of comments:

-- This is my comment

/* This is 

multiple line 

comment */

It helps to visualize a database schema using a Venn Diagram, especially when using set operators such as UNION, INTERSECT and EXCEPT in SELECT statements.

UNION – This operator is used to combine the result-set of two or more SELECT statements.

Example:

SELECT Columns FROM Table1

UNION

SELECT Columns FROM Table2;

INTERSECT – This clause is used to combine two SELECT statements and return the intersection of the data sets of both the SELECT statements.

Example:

SELECT Column1 , Column2 ....

FROM TableName

WHERE Condition

INTERSECT

SELECT Column1, Column2 ....

FROM TableName

WHERE Condition

EXCEPT – This operator returns those rows that are returned by the first SELECT operation and are not returned by the second SELECT operation.

Example:

SELECT ColumnName

FROM TableName

EXCEPT

SELECT ColumnName

FROM TableName;

What is a Database in SQL?

At a high level, a relational database consists of storage (typically files) and a set of processes that connect to application sessions that send SQL requests to the RDBMS server, and the server sends back records or messages. For example, the SQL statement SELECT * FROM EMP; would return all the rows in the EMP table. Building a database schema starts typically with a design process, in which all the entities involved in the app are drawn up and connected in a way that illustrates their relationships.

You may be building an HR app that tracks employees and departments, so you draw a box representing the department’s object and a second box representing the employees. Next, you would draw a line that connects them with a single line at the department end and 3 lines (as in a crows-foot) at the employees’ box end of the connection. You could name the connection “works in”, indicating that many employees work in a department. A process called normalization optimizes the design.

Now you know what tables you want, you can begin to decide what attributes you want to track about the entities and what indexes need to be created. The department table may contain a department number which can be its primary key, and the employee table might have an employee number as its primary index key. A table can have multiple indexes, so it also makes sense to index the employees by name, as it will be a common way for employee information to be accessed.

The next stage is to create a database schema. Each table object can be created using a CREATE TABLE SQL statement that specifies the table name, column names, data types and lengths. Databases are usually populated by loading data from flat files or with an ETL (Extract, Transform, Load) tool that generates the appropriate INSERT statements.

Applications let users change data using UPDATE statements and remove rows using DELETE statements. The RDBMS instance handles updating the index and ensuring relationships between tables such as referential constraints, are enforced to maintain database integrity.

A Database Administrator (DBA) would use statements such as DROP TABLE, TRUNCATE TABLE, CREATE INDEX and CREATE TABLE in a subset of SQL known as Data Manipulation Language (DML) to alter the structure of the database schema. Fortunately, most modern RDBMS systems, whether on-premise or the Actian cloud SaaS offering, come with an excellent user interface (UI) to simplify the writing of  SQL/DML queries.

Once the database is populated with data, business applications can make use of it.

What is SQL and Examples?

We have discussed how databases get created and used. Now let’s see some example SQL statements.

Not all databases use a CREATE DATABASE command. Ingres, for example, has a command-line method called as below:

createdb -e mydb

Oracle example:

CREATE DATABASE mynewdb

USER SYS IDENTIFIED BY sys_password

USER SYSTEM IDENTIFIED BY system_password

EXTENT MANAGEMENT LOCAL

DEFAULT TEMPORARY TABLESPACE temp

UNDO TABLESPACE undotbs1

DEFAULT TABLESPACE users;

What is SQL Server?

SQL Server has its origins in Sybase. Microsoft licensed Sybase when it went to the PC server market with NT to compete with products such as Novell Netware and SCO Unix that ran on Intel x86-based hardware.

SQL Server has a graphical UI to make it easy to set up and maintain. The on-premise version of SQL server only supports clusters for failover, not for scalability.

Below is an example of how a SQL Server database is created:

USE master;

GO

-- Get the SQL Server data path.

DECLARE @data_path nvarchar(256);

SET @data_path = (SELECT SUBSTRING(physical_name, 1, CHARINDEX(N'master.mdf', LOWER(physical_name)) - 1)

      FROM master.sys.master_files

      WHERE database_id = 1 AND file_id = 1);

 -- Execute the CREATE DATABASE statement.

EXECUTE ('CREATE DATABASE FileStreamDB

ON PRIMARY

    (NAME = FileStreamDB_data

    ,FILENAME = ''' + @data_path + 'FileStreamDB_data.mdf''

    ,SIZE = 10MB,MAXSIZE = 50MB,FILEGROWTH = 15%),

FILEGROUP FileStreamPhotos CONTAINS FILESTREAM DEFAULT

    (NAME = FSPhotos

    ,FILENAME = ''C:MyFSfolderPhotos''

, MAXSIZE = 5000 MB),

    (NAME = FSPhotos2

      , FILENAME = ''D:MyFSfolderPhotos''

      , MAXSIZE = 10000 MB

     ),

FILEGROUP FileStreamResumes CONTAINS FILESTREAM

    (NAME = FileStreamResumes

    ,FILENAME = ''C:MyFSfolderResumes'')

LOG ON

    (NAME = FileStream_log

    ,FILENAME = ''' + @data_path + 'FileStreamDB_log.ldf''

    ,SIZE = 5MB    ,MAXSIZE = 25MB    ,FILEGROWTH = 5MB)');

GO

What is SQL in DBMS?

As we saw above, SQL is used to access records in an RDBMS by applications and manage the database. The ALTER command can be used to modify objects after they have been created. You can use the ALTER command to add columns to a table or change its data type (if it is unpopulated).

An EXPLAIN PLAN provides a glimpse into what an RDBMS does with a SQL request. It is used by developers and DBAs to tune SQL statements. A skilled performance tuning professional will be looking for the correct indexes being used. The information an execution plan provides includes whether a Clustered Index Scan is performed, when Sort Operators are used, the direction of Data Flow from Sort Operators and when Select Operators are used.

Early databases that used a heuristic optimizer could be guided by the way a SQL statement was written to optimize the execution path. Modern databases use a cost-based approach to tune statements. Databases maintain metadata about the data stored to decide how selective it is and its volume to see how to optimize the flow of data. For example, if merging intermediate result sets, it makes sense to move the smaller result to the bigger data, which is important in a distributed or clustered database architecture.