Functionality
The functionality of the current SQL language is listed here,
and some ideas how NewSQL should look like. However, nothing is fixed at
this time.
Case sensitivity of the language
SQL is not case sensitive
- Idea: Most programming languages are case sensitive, and NewSQL
should be the same. As it is easier to type lowercase then all uppercase,
so the plan is to have the keywords in lower case.
Identifiers (Names)
SQL allows quoted names.
- Idea: Do not support quoted names. There seems to be no good
reason why a programming language should allow spaces in identifiers.
It is possible to support a user friendly (graphical-) interface to
a database, by assigning display names and icons to the different objects.
The priority is the developer, and most developers do not like spaces
in identifiers.
Standard data types
SQL supports a variety of data types, and most vendors have a
different set of types.
- Support the common data types (integer, string, date, decimal,
binary)
- Size restrictions in data types are superfluous:
there is no limit how long a Java String can be, and most people do not
complain about that.
Advanced data types
This are data types that are not supported by all databases, but are usefull
in many cases. In SQL, this feature is largely vendor specific.
- Case insensitive String data type support is required for many real-world
use cases. That doesn't mean the data is stored in uppercase or lowercase
- but the comparison is done insensitive (ignoring the case). Just case insensitive
comparison operators are not enough: it must be possible to define a case
insensitive index. This data type is maybe already a user defined data type,
it is not strictly a standard data type.
- Arrays of standard data types; or arrays of advances / user defined
data types
- To store objects alone, there is no object data type required, because
is it possible to use binary. There have been many efforts to build object
databases, but still most new projects are using relational databases.
However to build a index on this objects, or to execute some operations
on them. The question is what programming language should be supported.
One idea is Java.
Large Objects
Large objects are objects that may not fit into memory. SQL 99 supports
CLOB (Character Large OBject) and BLOB (Binary Large OBject).
- There is no need for a special data type. If memory is a problem,
the application (or database API) is responsible for loading / not loading
the value into memory. But that means it must be possible to retrieve the
size of a value before reading the value itself.
Autoincrement
There is no standard for autoincrement, but many vendors support it
in some way.
- This is a popular feature, support it on the language level if that
makes sense.
Data Definition
CREATE TABLE, DROP TABLE. There is no standard for CREATE INDEX and
DROP INDEX, however most database vendors support it. ALTER TABLE ADD
COLUMN, and so on.
- Support renaming tables.
- If data definition can be done by inserting into the system tables,
then moving table defintion around would be much simpler.
Check constraints
- This seems to be similar to triggers. Just support triggers instead.
Referential integrity
Most database administrators are converened about referential integrity.
Sometimes this feature makes live much easier (when bugs are found early
using this mechanism), sometimes harder (when data has to be moved around).
- Support it as a special, simple to define sort of triggers
Default values
- Seems to be similar to triggers. Just support triggers.
- Don't distinguish between NULL and DEFAULT values when inserting
a new row.
Nullable / non-nullable columns
In SQL, columns can be defined NOT NULL.
- Use rules similar to Java, where some data types are non-nullable
(for example int, long, double) and some are nullable (Integer, String).
This could also help to simplify the rules for NULL handling.
Data Manipulation
INSERT, UPDATE, DELETE
- Support UPSERT (UPDATE if the row exists, INSERT if not). This
functionality is similar to a Hashtable.put in Java. This is used when
merging data.
Data Query Language
SELECT
Subqueries
This is a very popular feature.
- Support it, and if possible not only inside SELECT statments,
but also in UPDATES and DELETES. Wherever possible.
Expressions and Formulas
The ANSI standard does define a few operations, but there are big differences
in what vendors support. In Oracle, the comparison operators are not allowed
everywhere, for example.
- Support what Java supports; maybe except some things like ? : because
the ? character would collide with the ? in JDBC, for PreparedStatements.
- LIKE in SQL is very popular. If possible continue to support this feature.
However, the escape charater can be fixed to \, as the % and _ operators
are also fixed.
String constants
String literals are enclosed with single quoted. There is no standard
to escape special characters (newline, tab). Two single quotes are used
to write a single quote inside a string constant.
- Use Java rules for Strings
- Allow single quotes as well as double quotes. This simplifies to use
NewSQL statments in Java (using JDBC).
User defined functions
- Just support calling a procedural language
Built-in Functions
There are some built-in functions defined, however every database vendor
supports its own set.
- Define a common set of functions, to avoid having to add external
user defined functions for simple cases.
Access rights
- Support a roles based access right system. However, login
should be possible with user name and password only; if a user is part of
multiple roles, then he should not have to specify with what role he logs
wants to log in.
- Support direct manipulation of the system tables, because like
this there is no need for a special syntax.
Syntax rules
SQL has many hundred syntax rules.
- We want to have as little syntax rules as possible. Still our
language should be feature-rich, otherwise it has to be extended to be
usefull.
- Avoid different levels of compatibility as SQL 92 Entry Level,
Intermediate Level, Full Level, because
this leads to incompabilities. Extensions should be made using ways
that do not change the grammar, for example using libraries. Database
vendors should compete on stability, performance, ease of use, functionality,
but not grammar.
Namespaces
SQL does not support namespaces, however JDBC defines 'catalogs' and
'schemas'.
- The namespace of files in a filesystem is hierarchical, the
namespace of Java classes is also. It looks like we should do the same.
However, we want to be able to convert our language to SQL, so we have
to be carefull.
NULL Handling
NULL handling in SQL is 'problematic': NULL is not equal to NULL. NULL
values in indices is a separete story.
- Support NULL like in Java. This change will not be easy (for
the converters), define some compiler warnings if the user wrote something
that may not lead to the expected result.
Joins
SQL supports inner joins, outer joins, and full outer joins.
- Don't support inner joins with a list tables (SELECT... FROM
TEST T1, TEST T2). Use the JOIN syntax instead (...FROM TEST T1 JOIN TEST
T2...)
- Don't support FULL OUTER JOIN
Union
UNION ALL and UNION DISTINCT are SQL standard, where UNION DISTINCT
is the default
- Support UNION ALL only. DISTINCT can be supported separately
Advanced transformations
Some databases support UNION EXCEPT, CROSS JOINS and other seldom used
functionality. All this is not standardized.
- Support user-defined transformations. Such transformations will
be programmed using a prodecural language if the database does not natively
support it.
Distinct
DISTINCT is supported in combination with SELECT and UNION
- Make DISTINCT its own transformation.
Mananging databases
SQL does not define a standard way to manage databases.
- Provide standard ways how to manipulate (create, backup, restore,
copy, delete) entire databases or parts of it. Many databases do not
allow such operations using SQL statements, but that should not be
a reason not to support it.
Logging
Some databases define a way how the transaction log (or other logs) can be
read. This is however complety vendor specific.
- While it is not possible to dictate the format how the log has to be
written to disk, it is possible to define a way to read the content of the
transaction log.
Procedural language
SQL does not define procedural elements, however the big vendors
support their own extensions.
- Define a standard to call a procedural language (such as Java).
- Just support Java.
- No not add elements of a procedural language to NewSQL, as
at some point calling an external function from the database will be required
anyway.
Stored procedures
This is not a standard, but many vendors support it.
- This is not required if user defined functions are supported.
Triggers
Triggers are called when a row is inserted, updated or deleted. Triggers
are largely vendor specific.
- Define a standard for triggers.
- Just support calling a procedural language
Transactional functionality
COMMIT and ROLLBACK are standard. Some vendors support more functionality,
such as nested transactions. There are also many transaction isolation levels
possible. Some databases use locking, some versioning, for transaction isolation.
- Support switching the autocommit mode on/off.
- Support some kind of two-phase commit, but keep it as simple
as possible.
Connection management
There is no SQL standard to connecting to a different database / connecting
as a different user
Metadata
There is no SQL standard, however the JDBC API already supports DatabaseMetaData
- Support a common set of system tables (views) to read the
meta data in a database independant way from within the database.
- If the meta data can be modified from the user, manipulating
/ moving data would get simpler.
Error messages / codes
Error codes are largely vendor specific. Some applications have to check
for error codes (for example, to re-try after a deadlock).
- Define standard codes for all common errors wherever possible..