A View Inside My Head

Jason's Random Thoughts of Interest

NAVIGATION - SEARCH

SQL Server 2008: Spatial Data, Part 2

In the first part of this series, I provided a somewhat verbose description of the two different types of Spatial data supported by SQL Server 2008: Geometry (flat Earth model) and Geography (ellipsoidal model).  Now, let's dive in a bit deeper, and take a look at some of the specifics.

Standards

From their website, the Open Geospatial Consortium, Inc. (OGC) is described as "a non-profit, international, voluntary consensus standards organization that is leading the development of standards for geospatial and location based services."  Unlike me, these folks are the real experts in the field who understand both the problems and the solutions of working with spatially-aware systems.  By defining standards, they help to ensure that geospatial data can be shared between different vendors and platforms.

SQL Server 2008's Spatial data types are based on the OGC's "Simple Feature Access" standards, which are described in more detail than you would likely ever care to know by two documents in particular:

Note that like any standard, there is no guarantee that everything declared will be implemented - it just tries to guarantee that if something is implemented, that it is done in a way that is compatible with other implementations.  Likewise, there's nothing to stop a vendor from adding their own extensions to an implementation that provide functionality above and beyond what is defined in the standard.

Specific Types of Spatial Data

The Geometry and Geography data types in SQL Server 2008 are actually abstract by themselves.  When you instantiate an object of this type, it becomes one of a number of possible concrete types (with each having its own set of characteristics).  The following diagram (from the SQL Server BOL) displays the inheritance tree of the various concrete types (in blue) and their abstract ancestors (in yellow):

 Spatial Types - Click to Enlarge 

As far as SQL Server is concerned, though, we must define a variable or a column in a table as a Geometry (or Geography) type.  There is no means to be more specific at the time of declaration, and it is the job of the Geometry (or Geography) type itself to be aware of the deriving types like Point and LineString. 

This also means that a column in a table can have a mixture of different geospatial information: one row could contain a single point indicating where you are right now, another row could contain the surface of the state of Ohio, and a third row could contain a line that represents the route between your home and workplace.  We recognize these three things as unique types of geospatial information, but SQL Server thinks of them as the same things. 

Note: If you are not familiar with the concepts of inheritance and polymorphism from Object Oriented Programming, then consider this to be the same as how humans are able to generalize a dog and a cat both as being an animal.

POINT: A Point is an exact location, and is defined in terms of an X and Y pair of coordinates, as well as optionally by a Z (elevation) and M (measure) coordinate.  It does not have a length or any area associated with it.  Points are used as the fundamental building blocks of more complex spatial types.  Note: Z and M are recognized and maintained by SQL Server 2008 if supplied, but are not used in any calculations).

LINESTRING: A line segment is the shortest path between two points.  A LineString, then, is defined as the path between a sequence of points (i.e., a series of connected line segments).  It is considered simple if it does not cross over itself, and is considered a ring if the starting point is the same as the ending point.  A LineString is always considered to be a one dimensional object; it has length, but does not have area (even if it is a ring).

POLYGON: A Polygon is a closed two-dimensional shape defined by a ring.  It has both length and area.  A Polygon may also have holes in its interior (a hole is defined by another Polygon).  Area within a hole is considered to be exterior to the Polygon itself.

COLLECTIONS: In addition to the single instance types (Point, LineString, and Polygon), there is also a type that can hold a collection of instances.  This is similar to a list or an array in most programming languages.  The most generic type of collection is the GeomCollection, whose members can be of any type.  Deriving from GeomCollection are MultiPolygon, MultiLineString, and MultiPoint.  As their names suggest, the members of these collection types must be of the single instance type (i.e., MultiPoint can only contain Points, etc).

Representing Spatial Data

The OGC defines different ways to represent geospatial information as bytes of data that can then be interpreted by the Geometry or Geography types as being Points, Linestrings, etc.  SQL Server 2008 supports three such formats: Well-Known Text (WKT), Well-Known Binary (WKB), and Geography Markup Language (GML). 

For this series of articles, I will primarily be using WKT in my examples since is both concise and [relatively] human-readable.

The syntax of WKT is not difficult to understand, and can usually be inferred simply by looking at some examples:

POINT(10 10)
POINT(10 10 10 1) // X Y Z M shown here LINESTRING(0 0, 10 10) POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))


People with backgrounds in mathematics may be compelled to put commas between each coordinate, but in WKT, a space is used to separate the coordinates of a point, and a comma is used to separate points within a more complex type.  Parentheses are used to group points into a single shape. 

As a more complex example, here is how you would represent the above Polygon with a triangle-shaped hole in it:

POLYGON((0 0, 0 10, 10 10, 10 0, 0 0), (1 1, 4 5, 4 1, 1 1))


The first group of points defines the Polygon's ring, while the second set defines the hole within the Polygon's interior.

Though, as the old saying goes, a picture is worth a thousand words.  Therefore, it is often easier to plot out geospatial information for human consumption, so our Polygon would look something like this:

POLYGON((0 0, 0 10, 10 10, 10 0, 0 0), (1 1, 4 5, 4 1, 1 1))

(Note: In this particular visualization, the point (0 0) is considered to be the upper-left corner, and the units increase down and to the right)

SQL Server 2008: Spatial Data, Part 1

(next part) SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

SQL Server 2008: Spatial Data, Part 6

SQL Server 2008: Spatial Data, Part 7

SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com

SQL Server 2008: Spatial Data, Part 1

Beyond Relational

The introduction of SQLCLR in SQL Server 2005 allowed for very rich User Defined Types to be utilized.  This meant that a developer could create a single object that contained multiple data points (properties) and could also perform calculations internally (methods), yet store that instance in a single field of a single row in a database table.  Suddenly, any type of data could be stored and queried in the database, instead of just strings and numbers.

Fast forward to the present time, as Microsoft is celebrating the launch of SQL Server 2008 (notice that in the brilliance that is known as Marketing Speak, the term "Launch" does not mean "Release", so the product is still under development).  We're seeing the realization of the SQLCLR UDT concept being branded as being "Beyond Relational".  In this upcoming release, developers will have access to two new spatial data types that are included with the product: Geometry and Geography.

Geometry

fixpic1Think back to High School.  No, I'm not talking about those painful memories of being on the receiving end of wedgies, or the awkwardness and loss of words that came over you when you were around members of the opposite gender.  I'm referring to Mr. Bollenbacher's Geometry class, where you learned about all sorts of useless things that you would never again need to use...

High School Geometry class, for the most part, involved working in a single plane.  A good part of the curriculum taught us mathematical proofs, and constructions using a compass and straight-edge.  But, we also pulled out the graph paper at times, and plotted points using a Cartesian Coordinate System, then connected two points points to form a line, and then connected  lines to create closed shapes, or polygons. 

The Geometry spatial data type in SQL Server 2008 is used to represent information in a uniform 2-dimensional plane, much like the graph paper analogy from High School.  The units are completely user-defined, and could be inches, miles, pixels, or even picas.  The point is that SQL Server doesn't care; it can perform calculations just the same. 

You would use the Geometry data type in conjunction with some existing planar projection of the real world that becomes the underlying reference.  That is, you would usually start with an existing diagram or a map, and define your spatial data in terms of that base map.

For example, perhaps your user-defined coordinate space is being used to represent a warehouse facility.  Within that coordinate space, you can use the Geometry data type to define polygons that represent storage bays within the warehouse.  Data in your tables can track where inventory is placed using these polygons.  Then, it becomes possible to do things with the data, like determine where the closest item of a certain type is to a forklift driver's present location.

The most important thing to keep in mind about the Geometry data type is that it works with uniform units of data.  The number of units across the top of the plane is the same as the number of units across the bottom, and the same applies to the left and right sides.

Geography

Perhaps you have not yet heard, but the Earth is actually not flat.  It's more like a ball.  Take a minute to absorb that information... I'll wait.

Since the Earth is not flat, it is challenging to represent real-world locations using 2-dimensional mathematics.  Sure, we're used to looking at flat maps of the world, whether it's that road map folded up in your car, or the rolled-up atlas that hung over the chalk board in school, or even a Google map displayed on your computer screen.  But, those are merely projections of the real world onto a plane. 

There are multiple ways to create such projections using mathematical transformations, with each type having benefits and disadvantages over other types of projections.  For instance, one type of projection might attempt to minimize the distortion for a particular area of the globe, while other areas towards the outer edges of the projection become very distorted.  Or, a projection might make it easier to represent points on the globe in a uniform fashion, but areas near the North and South poles (like Greenland and Antarctica) appear much larger in area than they actually are.

B_S_A_M ApianusII EquidistantConic

For this reason, representing geospatial information using the Geometry data type is not practical, especially if you wish to share information with other people who might be using a different projection.  Fortunately, we are able to use a Geographic Coordinate System to accurately represent locations on Earth, or any ellipsoidal body for that matter.

The SQL Server Geography data type uses Latitude and Longitude angles to identify points on the Earth.  Latitude measures how far North (or South) of the Equator a point is, while Longitude measures how far East (or West) of a Prime Meridian a point is.  Note that this coordinate system can be used to identify points on any sized "ball", be it a golf ball, the Earth, or even the Sun.

Spatial Reference System

Latitude and Longitude are thought of as being absolute values that are universally accepted, but this is not the case.  For instance, the Prime Meridians, or zero degrees Longitude, is quite arbitrary in nature.  It is commonly referred to as being a North-South line that passes through Greenwich, London, England, but in truth, there are four different Prime Meridian that have been used historically that actually pass through Greenwich.  Even the current Prime Meridian that is used by the GPS system (and most Internet mapping sites) is actually 102.5 meters to the East of the "official" line that marks the otherwise universally accepted Prime Meridian. 

Once you know the exact starting points of the coordinate system that a set of data uses, we're left with another dilemma when we try to calculate the distance between two points.  Because of the action of the Earth spinning on its axis, it is not actually spherical in shape, but rather ellipsoidal (it's fatter in the middle than at the poles).  That complicates the ability to accurately calculate distance and area in a uniform manner, but doesn't make it impossible.

To measure distance between two points on the globe, we need to know the radius of the Earth (i.e., how far from the center of the Earth that each point is).  The problem is that because of the land masses, the Earth doesn't have just one uniform radius that can be applied globally.  If you measure the distance between two points that are one degree apart at sea level, you will get a different result than two points that are one degree apart in the mountains, which might be 5000 feet above sea level.

So, when a location is surveyed and points are identified in terms of global coordinates, these are based on a model of the earth that is quite accurate for that localized area, but maybe not as accurate elsewhere.  The parameters that define the particular model of the Earth is known as the Spatial Reference System.  It is important to know which Spatial Reference System is used for a set of data, because you cannot always mix data from one set with another and get accurate results.

With the pervasiveness of GPS data, I would argue that the most commonly used Spatial Reference System today is WGS84

From Here

This first article was to establish a baseline of knowledge about some of the spatial concepts.  Going forward from here, I will describe the standards that have been implemented in the creation of the Geometry and Geography data types, as well as how to actually use the data types programmatically.

(next part) SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

SQL Server 2008: Spatial Data, Part 6

SQL Server 2008: Spatial Data, Part 7

SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com