A View Inside My Head

Jason's Random Thoughts of Interest

NAVIGATION - SEARCH

SQL Server 2008: Spatial Data, Part 8

 

In this, the eighth part in a series on the new Spatial Data types in SQL Server 2008, I'll step away from the database and do a little spatial coding using .NET.

Redistributable .NET Library

Up to this point in the series, I have demonstrated a lot of interesting (?) things that you can do with the new Spatial data types (Geometry and Geography) in SQL Server 2008.  You might be thinking, "That's swell, and all, but I wish I could do some of that stuff without needing to be tethered to a database."  Well, you know what?  You can!

I mentioned in a previous post that the Spatial data types were implemented as SQLCLR User-Defined Types.  I've since been corrected by Isaac Kunen, who stated that they are more accurately described as System-Defined Types, with the difference being that these are automatically installed and available for use as part of SQL Server 2008, regardless of whether the ENABLE CLR bit has been activated.  Semantics aside, these types are merely classes within a .NET assembly, and Microsoft is making this freely available as part of a Feature Pack for SQL Server (which will be redistributable as part of your stand-alone application, according to Isaac):

SQL Server 2008 RC0 Feature Pack Download Page

(Look for "Microsoft SQL Server System CLR Types," which includes the two Spatial types plus the HierarchyID type.  This link is for RC0, and may not be applicable to future versions as the product is finalized and released.)

Builder API

A new feature that was included with the first Release Candidate (RC0) is the Builder API.  This is a collection of interfaces and classes that helps you to construct spatial types by specifying one point at a time until all points have been added.

The Builder API is not only useful for creating new instances of spatial data, but also for consuming existing instances one point at a time (maybe to convert an instance into another format).  Documentation is light at the moment, so I'm still trying to grok exactly how to best utilize it.

For my first experiment with the API, I obtained some Zip Code Boundary data in ASCII format from the U.S. Census Bureau:

http://www.census.gov/geo/www/cob/z52000.html#ascii

My goal was to parse the data, and then create a new SqlGeography instance for each zip code.  (Note: SqlGeography is the .NET class name that T-SQL refers to simply as Geography).  The SqlGeographyBuilder class proved to be perfect for accomplishing this task.

At its core, the SqlGeographyBuilder implements the IGeographySink interface.  If you wanted to consume an existing SqlGeography instance, you could implement IGeographySink in your own class, and then invoke the SqlGeography's Populate() instance method, passing in your object as the parameter.  The Populate() method takes care of calling the appropriate IGeographySink methods within your class.

In this case, I'm not starting with an existing SqlGeography instance, so my code will need to call the methods of the SqlGeographyBuilder in the correct order:

IGeographySink

After EndGeography() has been invoked, the new instance is available via the ConstructedGeography property of the SqlGeographyBuilder class. 

Simple enough, right?  Yeah, I'm still a little lost myself...  But, here's some code to help demonstrate what's going on!

First, let's look at the ASCII data.  A single zip code's boundary might be defined as:

      1469      -0.824662148292608E+02       0.413848583827499E+02
      -0.824602851767940E+02       0.413864290595145E+02
      -0.824610630000000E+02       0.413860590000000E+02
      -0.824685900000000E+02       0.413841470000000E+02
      -0.824686034536111E+02       0.413843846804627E+02
      -0.824605990000000E+02       0.413863160000000E+02
      -0.824602851767940E+02       0.413864290595145E+02
END

 

The very first line happens to contain an identifier (maps to a second file that lists the actual USPS zip code).  The coordinate listed in the first line is not actually part of the boundary, but rather appears to be the population center of that area.  The actual boundary begins with the second line, and continues until you encounter the "END".  Also, in case you couldn't tell, coordinates in this data are in Longitude-Latitude order.

Since a Zip Code is a polygon, and since we are working with SqlGeography, we must be aware of ring ordering.  That is, the exterior ring of a polygon must be defined in a counter-clockwise order so that as you "walk the ring", the interior is always to your left.  If you reverse the order, then SqlGeography assumes that you're trying to define a polygon containing the entire world except for the small area inside of the polygon.

Well, in this case, the order of the points of the Zip Code boundary is defined in clockwise order... so, we must be aware of this and call into the SqlGeographyBuilder in the opposite order (so the last point defined in the ASCII data is the first point used while building our new instance). 

To accomplish this, I simply parse the Lat/Long coordinates as "double" types, and then push them onto a stack.  Then, I pop the stack and call into the Builder API with each point.  At the end, I obtain the new SqlGeography instance from the ConstructedGeography property. 

(Note: This is demonstrative code - some things should probably be cleaned up/refactored/error handled... You have been warned)

  public SqlGeography ParseAsGeography(string zipcode_points)
{
    StringReader sr = new StringReader(zipcode_points);
    string line = sr.ReadLine();

    Stack<double[]> Points = new Stack<double[]>();

    while (line != null  && line != "END")
    {
        if (line != String.Empty)
        {
            Points.Push(ParseLatLngValues(line));
        }

        line = sr.ReadLine();
    }

    return CreateGeography(Points);
}

  private
  double[] ParseLatLngValues(string line)
{
    //      -0.838170700000000E+02       0.409367390000000E+02double[] ret = newdouble[2];

    string lng = System.Text.RegularExpressions.Regex
.Matches(line, "\\S+")[0].Value; string lat = System.Text.RegularExpressions.Regex
.Matches(line, "\\S+")[1].Value; double.TryParse(lat, out ret[0]); double.TryParse(lng, out ret[1]); return ret; }

  private SqlGeography CreateGeography(Stack<double[]> points)
{
    SqlGeographyBuilder builder = new SqlGeographyBuilder();
    builder.SetSrid(4326);
    builder.BeginGeography(OpenGisGeographyType.Polygon);

    double[] point = points.Pop();

    builder.BeginFigure(point[0], point[1]);

    while (points.Count > 0)
    {
        point = points.Pop();
        builder.AddLine(point[0], point[1]);
    }

    builder.EndFigure();
    builder.EndGeography();

    return builder.ConstructedGeography;
}

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

SQL Server 2008: Spatial Data, Part 6

SQL Server 2008: Spatial Data, Part 7

kick it on DotNetKicks.com

Pingbacks and trackbacks (1)+

A View Inside My Head | All posts tagged 'spatial'

A View Inside My Head

Jason's Random Thoughts of Interest

NAVIGATION - SEARCH

Entity Framework Spatial: A Real World Example

Background

From the Wikipedia article, Leadership in Energy and Environmental Design (LEED) is described as “an internationally recognized green building certification system, providing third-party verification that a building or community was designed and built using strategies intended to improve performance in metrics such as energy savings, water efficiency, CO2 emissions reduction, improved indoor environmental quality, and stewardship of resources and sensitivity to their impacts.”

In my own words, LEED is a certification system that awards points for following certain environmentally-friendly practices when constructing a building. In the end, a building can be qualify for one of four different levels of certifications, based on the number of points: Certified, Silver, Gold, Platinum. There are often tax benefits associated with having a LEED certification, and many new government buildings (especially Federal) are required to be LEED certified.

Two points in particular (out of of 100, or so) from the LEED checklist are related to geospatial data. One point is awarded if at least 20% of the building materials (by cost) used in construction were manufactured within 500 miles of the job site. A second point is awarded if 10% of the raw materials of those building materials were extracted, harvested, or recovered within 500 miles of the job site.

As a window glass manufacturer, Tempuri Glass is often asked to provide data about its products that are being considered for use in construction. Tempuri Glass may have a certain advantage over its competitors if it can quickly show that its products would count towards these two points for any arbitrary job site.

Data

Tempuri is a simple organization, making only a single type of product (Soda Lime glass) that is then cut into different sizes per order. Therefore, regardless of how many different sized glass panes are produced by a given facility, the ingredients for that glass is the same. The formulas used will be different between facilities, though, since the raw ingredients will be sourced from different locations, and adjustments may need to be made to the ratios due to environmental factors (things like: elevation, temperature, humidity, etc).

So, for our data model, we just need to know where each facility is, and then the formula used to make the glass at that facility (including the ingredients of that formula and the location where they were harvested from).

a.EF_Diagram

Within the data store, the [Geocode] columns of the Facility and FormulaComponent tables use the SQL Server geography type. This is useful for the large-scale/real-world distance calculations that Tempuri Glass needs to perform, since the way that you calculate distance on an sphere or ellipsoid (like the Earth) is vastly different than on a flat map.

In the Entity Framework model (using the June 2011 CTP), the SQL Server geography types are mapped as the new System.Data.Spatial.DbGeography type. This makes the geospatial data a first class citizen of our data model, and not just a castable opaque BLOB, as was the case in the past.

Geospatial data can take on many forms, including Points, Line Strings, Polygons, and collections of these shapes. Even though it’s not apparent from the data model, our [Geocode] data will contain only Points (i.e., a single Latitude/Longitude pair). Likewise, a job site will be specified as a single Point, though there is no hard requirement for this because distance can still be calculated between a Polygon and a Point with no coding change required.

Facility Sample Data

FacilityID

FacilityName

City

State

Geocode

1

Greenfield, IA

Greenfield

IA

POINT (-94.4547843933106 41.3151755156904)

2

Spring Green, WI

Spring Green

WI

POINT (-90.053981 43.17431)

3

Tomah, WI

Tomah

WI

POINT (-90.477058 43.989319)

4

Fremont, IN

Fremont

IN

POINT (-84.9314403533936 41.7186070559443)

5

Fargo, ND

Fargo

ND

POINT (-96.8667125701904 46.8985894795683)

6

Waxahachie, TX

Waxahachie

TX

POINT (-96.8427014350891 32.4424403136322)

7

Hood River, OR

Hood River

OR

POINT (-121.51526927948 45.630620334868)

8

Vinton, VA

Vinton

VA

POINT (-79.863876 37.263329)

9

Casa Grande, AZ

Casa Grande

AZ

POINT (-111.78155422210693 32.882073958767954)

10

Mountain Top, PA

Mountain Top

PA

POINT (-75.896477 41.141327)

11

Winlock, WA

Winlock

WA

POINT (-122.926218509674 46.5449155194259)

12

Durant, OK

Durant

OK

POINT (-96.4133548736572 34.0001619910696)

13

Mooresville, NC

Mooresville

NC

POINT (-80.7865476608277 35.6316281732984)

 

FormulaComponent Sample Data

FormulaComponentID

Name

Percentage

SourceLocation

Geocode

14

Limestone

13

Genola, UT

POINT (-111.808204650879 40.0098667779887)

1

Silica Sand

75

Houck, AZ

POINT (-109.241695404053 35.2062151838369)

27

Soda Ash

12

Trona, CA

POINT (-117.311668395996 35.6955040738332)

15

Limestone

13

Genola, UT

POINT (-111.808204650879 40.0098667779887)

2

Silica Sand

75

Houck, AZ

POINT (-109.241695404053 35.2062151838369)

28

Soda Ash

12

Trona, CA

POINT (-117.311668395996 35.6955040738332)

16

Limestone

13

Chicago, IL

POINT (-87.6176834106445 41.5738476278005)

3

Silica Sand

75

Overton, NV

POINT (-114.4313621521 36.5146030619859)

29

Soda Ash

12

Green River, WY

POINT (-109.448783397675 41.5090754257687)

 

Spatial Querying Algorithm

Input: Job Site Latitude/Longitude

Steps:

A. Query for closest facility to Job Site within 500 miles:

  1. Calculate the distance between the job site and each facility.
  2. Filter the list of facilities to just those where distance < 500 miles.
  3. Order the list of facilities by distance in ascending order.
  4. The first element (if any) will be the closest facility, and also signifies that the product qualifies as being manufactured within 500 miles

B. If there is a facility within 500 miles, then sum the percentage of formula components that were sourced from within 500 miles of the Job Site:

  1. Calculate the distance between the job site and each of the facility’s formula components
  2. Filter the list of formula components to just those where distance < 500 miles
  3. Sum the Percentages

Output: Boolean of whether the product qualifies; Percentage of the product’s ingredients that qualifies.

Implementation

Before we can calculate distance using an instance method of the DbGeography type, we need to actually create an instance to represent the Job Site. DbGeography is immutable and does not have a constructor, so instead, a static method must be called to create a new object. There are a number of these factory methods available to create specific kinds of shapes (Point, Line String, Polygon, etc) given different kinds of input (text, byte arrays).

For simplicity, let’s use the .Parse() method, which accepts Well-Known Text (WKT) as input, and assumes a Spatial Reference ID of 4326 (the same coordinate system that GPS and internet mapping sites use).

Note: WKT uses a (Longitude, Latitude) ordering for points, which adheres to the same concept as (X, Y) ordering for Cartesian coordinates.

private static DbGeography CreatePoint(double latitude, double longitude)
{
return DbGeography.Parse(String.Format("POINT({1} {0})", latitude, longitude));
}



The first spatial query, written as a LINQ expression, finds the closest qualifying facility. Since SRID 4326 uses meters as the unit of measure, we need to convert 500 miles into meters within the predicate:

private Facility GetNearestFacilityToJobsite(DbGeography jobsite)
{
    var q1 = from f in context.Facilities
    let distance = f.Geocode.Distance(jobsite)
    where distance < 500 * 1609.344
    orderby distance
    select f;

return q1.FirstOrDefault();
}



Assuming that a facility was returned, a second LINQ expression can be used to find the sum of Percentage from qualifying Formula Components:

private decimal SumQualifyingPercentages(Facility nearestFacility, DbGeography jobsite)
{
var q2 = from fc in nearestFacility.Formula.FormulaComponents
where fc.Geocode.Distance(jobsite) < 500 * 1609.344
select fc;

return q2.Sum(c => c.Percentage.GetValueOrDefault(0));
}



Finally, putting all of the parts together (using a Tuple<> for the output):

private Tuple<bool, decimal> GetResults(double latitude, double longitude)
{
DbGeography jobsite = CreatePoint(latitude, longitude);
Facility nearestFacility = GetNearestFacilityToJobsite(jobsite);

if (nearestFacility != null)
{
return new Tuple<bool,decimal>(true, SumQualifyingPercentages(nearestFacility, jobsite));
}

return new Tuple<bool, decimal>(false, 0);
}

private void PerformQuery()
{
double latitude = 47.63752;
double longitude = -122.13343;

var results = GetResults(latitude, longitude);
}

Entity Framework Spatial: DbGeography and DbGeometry Members

DbGeography Static Property Return Type DbGeometry Static Property Return Type
DbGeography.DefaultSrid int DbGeometry.DefaultSrid int
       
DbGeography Static Method Return Type DbGeometry Static Method Return Type
DbGeography.FromBinary(byte[] geographyBinary, int srid) DbGeography DbGeometry.FromBinary(byte[] geometryBinary, int srid) DbGeometry
DbGeography.FromGml(string geographyMarkup, int srid) DbGeography DbGeometry.FromGml(string geometryMarkup, int srid) DbGeometry
DbGeography.FromText(string geographyText, int srid) DbGeography DbGeometry.FromText(string geometryText, int srid) DbGeometry
DbGeography.GeographyCollectionFromBinary(byte[] geographyBinary, int srid) DbGeography DbGeometry.GeometryCollectionFromBinary(byte[] geometryBinary, int srid) DbGeometry
DbGeography.GeographyCollectionFromText(string geographyText, int srid) DbGeography DbGeometry.GeometryCollectionFromText(string geometryText, int srid) DbGeometry
DbGeography.LineFromBinary(byte[] geographyBinary, int srid) DbGeography DbGeometry.LineFromBinary(byte[] geometryBinary, int srid) DbGeometry
DbGeography.LineFromText(string geographyText, int srid) DbGeography DbGeometry.LineFromText(string geometryText, int srid) DbGeometry
DbGeography.MultilineFromBinary(byte[] geographyBinary, int srid) DbGeography DbGeometry.MultilineFromBinary(byte[] geometryBinary, int srid) DbGeometry
DbGeography.MultilineFromText(string geographyText, int srid) DbGeography DbGeometry.MultilineFromText(string geometryText, int srid) DbGeometry
DbGeography.MultipointFromBinary(byte[] geographyBinary, int srid) DbGeography DbGeometry.MultipointFromBinary(byte[] geometryBinary, int srid) DbGeometry
DbGeography.MultipointFromText(string geographyText, int srid) DbGeography DbGeometry.MultipointFromText(string geometryText, int srid) DbGeometry
DbGeography.MultipolygonFromBinary(byte[] geographyBinary, int srid) DbGeography DbGeometry.MultipolygonFromBinary(byte[] geometryBinary, int srid) DbGeometry
DbGeography.MultipolygonFromText(string geographyText, int srid) DbGeography DbGeometry.MultipolygonFromText(string geometryText, int srid) DbGeometry
DbGeography.Parse(string geographyText) DbGeography DbGeometry.Parse(string geometryText) DbGeometry
DbGeography.PointFromBinary(byte[] geographyBinary, int srid) DbGeography DbGeometry.PointFromBinary(byte[] geometryBinary, int srid) DbGeometry
DbGeography.PointFromText(string geographyText, int srid) DbGeography DbGeometry.PointFromText(string geometryText, int srid) DbGeometry
DbGeography.PolygonFromBinary(byte[] geographyBinary, int srid) DbGeography DbGeometry.PolygonFromBinary(byte[] geometryBinary, int srid) DbGeometry
DbGeography.PolygonFromText(string geographyText, int srid) DbGeography DbGeometry.PolygonFromText(string geometryText, int srid) DbGeometry
       
DbGeography Instance Property Return Type DbGeometry Instance Property Return Type
g.Area double? g.Area double?
    g.Boundary DbGeometry
    g.Centroid DbGeometry
    g.ConvexHull DbGeometry
g.Dimension int g.Dimension int
g.EndPoint DbGeography g.EndPoint DbGeometry
    g.Envelope DbGeometry
    g.ExteriorRing DbGeometry
g.GeometryType string g.GeometryType  
g.IsClosed bool? g.IsClosed bool?
g.IsEmpty bool g.IsEmpty bool
    g.IsRing bool?
    g.IsSimple bool
    g.IsValid bool
g.Latitude double?    
g.Length double? g.Length double?
g.Longitude double?    
g.M double? g.M double?
g.NumGeometries double? g.NumGeometries int?
    g.NumInteriorRing int?
g.NumPoints int? g.NumPoints int?
    g.PointOnSurface DbGeometry
g.ProviderValue object g.ProviderValue object
g.Srid int g.Srid int
g.StartPoint DbGeography g.StartPoint DbGeometry
g.WellKnownValue WellKnownValue DbGeographyWellKnownValue g.WellKnownValue DbGeometryWellKnownValue
g.WellKnownValue.Srid int g.WellKnownValue.Srid int
g.WellKnownValue.WellKnownBinary byte[] g.WellKnownValue.WellKnownBinary byte[]
g.WellKnownValue.WellKnownText string g.WellKnownValue.WellKnownText string
    g.X double?
    g.Y double?
g.Z double? g.Z double?
       
DbGeography Instance Method Return Type DbGeometry Instance Method Return Type
g.AsBinary() byte[] g.AsBinary() byte[]
g.AsGml() string g.AsGml() string
g.AsText() string g.AsText() string
g.Buffer(double distance) DbGeography g.Buffer(double distance) DbGeometry
    g.Contains(DbGeometry other) bool
    g.Crosses(DbGeometry other) bool
g.Difference(DbGeography other) DbGeography g.Difference(DbGeometry other) DbGeometry
g.Disjoint(DbGeography other) bool g.Disjoint(DbGeometry other) bool
g.Distance(DbGeography other) double g.DistanceDbGeometry other) double
g.GeometryN(int index) DbGeography DbGeography g.GeometryN(int index) DbGeometry
    g.InteriorRingN(int index) DbGeometry
g.Intersection(DbGeography other) DbGeography g.Intersection(DbGeometry other) DbGeometry
g.Intersects(DbGeography other) bool g.Intersects(DbGeometry other) bool
    g.Overlaps(DbGeometry other) bool
g.PointN(int index) DbGeography g.PointN(int index) DbGeometry
    g.Relate(DbGeometry other, string matrix) bool
g.SpatialEquals(DbGeography other) bool g.SpatialEquals(DbGeometry other) bool
g.SymmetricDifference(DbGeography other) DbGeography g.SymmetricDifference(DbGeometry other) DbGeometry
    g.Touches(DbGeometry other) bool
g.Union(DbGeography other) DbGeography g.Union(DbGeometry other) DbGeometry
    g.Within(DbGeometry other) bool

Entity Framework Spatial: First Look

Today, I began to look into the Entity Framework June 2011 CTP which includes first class support for Spatial data types.  The ADO.NET team created an abstraction layer, based on the OGC simple features specification, with a goal being to support spatial implementations from multiple vendors.  As you would expect, SQL Server 2008 Spatial is supported out of the box.

For some reason, I was expecting a lot of work to be done on the client-side within their abstraction data type.  I was pleasantly surprised to see EF pass the heavy lifting to the underlying data store as part of the query.

For instance, I have a table in my database called Facility with a Geography column named [Geocode].  This field contains a point (single latitude/longitude pair) identifying the location of the Facility.  Even though this would normally be represented in client-side code as a SqlGeography type, EF wraps it in the new DbGeography type (i.e., the abstraction data type for ellipsoidal data).

My first query was a LINQ expression to return a list of all facilities that are within 500 miles of a given location:

var q = from f in context.Facilities
let p = DbGeography.Parse("POINT(-83 45)")
where f.Geocode.Distance(p) < 500 * 1609.344
select new { f.FacilityID, wkt=f.Geocode.AsText() };


A couple things about this query:

  1. The default SRID of 4326 is used.  This spatial reference system uses meters for distance, so my predicate needs to convert the 500 miles into meters.  Like the SqlGeography.Parse() method, DbGeography.Parse() will default to 4326.
  2. The return type is an anonymous type.  I wanted to see how the DbGeography type's .AsText() method was executed (i.e., would it be expressed in the resulting query, or would it be handled client side, etc).

When executed, the LINQ expression above generates the following TSQL:

SELECT [Extent1].[FacilityID] AS [FacilityID],
 [Extent1].[Geocode].STAsText() AS [C1]
FROM [dbo].[Facility] AS [Extent1]
WHERE ([Extent1].[Geocode].STDistance(geography::Parse(N'POINT(-83 45)'))) < cast(804672 as float(53))


As you can see, the DbGeography.AsText() was translated into the STAsText() method in the query.  And, as you might expect, the predicate's DbGeography.Distance() was properly translated into STDistance() in the TSQL WHERE clause.

The other thing that I was worried about was not having access to the actual SqlGeography type returned from the database.  I was surprised to see that EF's DbGeography has a property called ProviderValue that returns the native type that the provider supports!

Including SQL Server Spatial Types in a Castle ActiveRecord Entity

This is just a quick post to capture the outcome of about an hour’s worth of trial and error for me.

Here’s a SQL Server table that contained data that my application needed:

SQLTable

Of particular interest is the [Geocode] field, which holds a Geography instance (POINT) representing the location of a Facility.  The [Geocode_Lat] and [Geocode_Lon] fields were from legacy requirements, and would be redundant when all is said and done.

Trying to include a Geography field in my ActiveRecord entity proved to be a challenge.  NHibernate has a Spatial extension, but setting it up and trying to use it proved to be somewhat of a nightmare (read: I couldn’t get it to work).  In reality, I wasn’t looking to use ActiveRecord to do spatial querying, but rather, was just looking to have the data passed between the database and my application.

SQL Server 2008 saves Geography and Geometry columns as binary data.  This isn’t WKB, but rather, an internal serialization of the data.

SpatialColumn

To have the data pass through ActiveRecord, we need to configure the entity class to treat this data as a byte[] array.  The application would then need to deserialize the bytes into a SqlGeography instance.  Likewise, when updating the entity, the application would need to serialize the SqlGeography instance to an array of bytes.

My Entity class defines the [Geocode] property as:

[Property(SqlType = "geography", ColumnType = "BinaryBlob")]
public byte[] Geocode { get; set; }

Then, I created a couple Extension Methods to handle the serialization/deserialization:

public static class ExtensionMethods
{
    public static SqlGeography AsGeography(this byte[] bytes)
    {
        var geo = new SqlGeography();
        using (var stream = new System.IO.MemoryStream(bytes))
        {
            using (var rdr = new System.IO.BinaryReader(stream))
            {
                geo.Read(rdr);
            }
        }

        return geo;
    }

    public static byte[] AsByteArray(this SqlGeography geography)
    {
        if (geography == null)
            return null;

        using (var stream = new System.IO.MemoryStream())
        {
            using (var writer = new System.IO.BinaryWriter(stream))
            {
                geography.Write(writer);
                return stream.ToArray();
            }
        }
    }
}

And, finally, code to use property.  Here is a function that updates the geocode for a particular entity (notice the AsByteArray() extension method):

public static void SetGeocode(int FacilityID, double? Geocode_Lat, double? Geocode_Lon)
{
    Facility f = Facility.TryFind(FacilityID);

    if (f != null)
    {
        f.Geocode_Lat = Geocode_Lat;
        f.Geocode_Lon = Geocode_Lon;
        f.Geocode = SqlGeography.Point(Geocode_Lat.GetValueOrDefault(),
                                       Geocode_Lon.GetValueOrDefault(), 4326).AsByteArray();

        f.SaveAndFlush();
    }
}

Disclaimer: The method described here satisfied my immediate needs, so I didn’t spend additional time exploring other options.  The code is provided AS IS, with no expressed warranty or guarantee.  There probably are far better ways to include spatial data in ActiveRecord entities, and as such, I’d be interested in hearing about your own experiences.

Spatial Data and the Entity Framework

July 28, 2011 Note: This is an outdated post.  Recently, the ADO.NET team has released a CTP with Spatial support as a first class citizen of the Entity Framework!!!  See the following posts that I wrote as I explored the new API:

http://www.jasonfollas.com/blog/archive/2011/07/20/entity-framework-spatial-first-look.aspx

http://www.jasonfollas.com/blog/archive/2011/07/21/entity-framework-spatial-dbgeography-members.aspx

http://www.jasonfollas.com/blog/archive/2011/07/27/entity-framework-spatial-a-real-world-example.aspx


The Entity Framework does not support using User Defined Types (at least in the SQLCLR sense of the term) as properties of an entity. Yesterday, Julie Lerman contacted me to see if we could find a workaround to this current limitation, particularly for the SQL Server Spatial Types (Geometry and Geography).

Whenever I hear of someone wanting to use Spatial data in their application, my first thought is always “what do they want to do with the data once they have it?”  This is because most of the time (in my limited observation), an application does not need the spatial data itself, but rather, it just needs to use that data in the predicate of a query (i.e., the query results contain no spatial information).  For example, an application might want all zipcodes that are within 50 km of a point, but the application doesn’t need the actual shapes that define each zip code.

But, assuming that the developer knows what they are doing and has a legitimate reason to include a spatial type in the results, then how can they use the Entity Framework to get the spatial data into their application?  That was our quest.

Entity Framework Primitive Types

Admittedly, I know very little about EF.  So, my approach to this problem spent a lot of time using .NET Reflector to try to understand what the EF designer was doing behind the scenes (this also proved to be a a good way to understand EF better!).  The first thing that I wanted to figure out is how EF determines which primitive type to use for each SQL Server type. 

I downloaded and imported the States data from the US Census Data for SQL Server 2008 project on Codeplex.  Then, I used the Entity Data Model Designer in VS2010 to generate a model based on my database which resulted in an entity without the geometry property.  Looking at the XML for the .edmx file, I saw the following:

<!--Errors Found During Generation:warning 6005: The data type 'geometry' is not supported; the column 'geom' in table 'Spatial.dbo.State' was excluded.--> <EntityTypeName="State">  <Key>  <PropertyRef Name="StateID"/>  </Key>  <Property Name="StateID" Type="int" Nullable="false"/>  <Property Name="StateName" Type="nvarchar" Nullable="false" MaxLength="50"/> </EntityType>

 

I don’t believe that EF is hating on “geometry” specifically (the 6005 warning).  Rather, I think that if the SQL Server type cannot be mapped to a .NET type from the BCL, then it simply does not know how to handle it.  Certainly, they don’t want to try to map to a type that is not included in the .NET Framework itself (as would be the case for the Spatial data types).

But, what is EF using to determine the mappings?

I looked long and hard, but couldn’t quite figure out the mechanism that gets invoked when the model is generated.  But, I think the key might lie in the Microsoft.VisualStudio.Data.Providers.SqlServer.SqlMappedObjectConverter.GetFrameworkTypeFromNativeType() method:

// Disassembly by Reflector
protected override Type GetFrameworkTypeFromNativeType(string nativeType)
{
switch (this.GetProviderTypeFromNativeType(nativeType))
{
case 0:
return typeof(long);

case 1:
case 7:
case 0x13:
case 0x15:
return typeof(byte[]);

case 2:
return typeof(bool);

case 3:
case 10:
case 11:
case 12:
case 0x12:
case 0x16:
return typeof(string);

case 4:
case 15:
case 0x1f:
case 0x21:
return typeof(DateTime);

case 5:
case 9:
case 0x11:
return typeof(decimal);

case 6:
return typeof(double);

case 8:
return typeof(int);

case 13:
return typeof(float);

case 14:
return typeof(Guid);

case 0x10:
return typeof(short);

case 20:
return typeof(byte);

case 0x20:
return typeof(TimeSpan);

case 0x22:
return typeof(DateTimeOffset);
}
return typeof(object);
}

 

For SQL Server, the Native Types come from the System.Data.SqlDbType enumeration:

// Disassembly by Reflector
public enum SqlDbType
{
BigInt = 0,
Binary = 1,
Bit = 2,
Char = 3,
Date = 0x1f,
DateTime = 4,
DateTime2 = 0x21,
DateTimeOffset = 0x22,
Decimal = 5,
Float = 6,
Image = 7,
Int = 8,
Money = 9,
NChar = 10,
NText = 11,
NVarChar = 12,
Real = 13,
SmallDateTime = 15,
SmallInt = 0x10,
SmallMoney = 0x11,
Structured = 30,
Text = 0x12,
Time = 0x20,
Timestamp = 0x13,
TinyInt = 20,
Udt = 0x1d,
UniqueIdentifier = 14,
VarBinary = 0x15,
VarChar = 0x16,
Variant = 0x17,
Xml = 0x19
}

 

My conclusion here was that if the SQL Server type could only be mapped to System.Object in the BCL (using the GetFrameworkTypeFromNativeType() method), then EF will not support using that field as a property of the entity.  This coincides with the fact that to ADO.NET, the Geometry (and Geography) type is a User Defined Type (0x1d).

UPDATE: After all of this, I discovered that in System.Data.Entity.dll, there is a method that is probably a better candidate for what is actually used: System.Data.SqlClient.SqlProviderManifest.GetEdmType().  This method contains a similar switch{} as the code listed above, only it is EDM-specific instead of returning BCL types.  Feel free to examine it using Reflector if you're curious about its contents.

The Workaround

Having figured out that piece of the puzzle, I was left with trying to figure out a workaround.  If ADO.NET was unable to map a Geometry to a type in the BCL, then could we cast the Geometry as something that would be mappable?

SQL Server serializes spatial objects to binary when it saves the data in a table (documented here: http://msdn.microsoft.com/en-us/library/ee320529.aspx):

EF-Spatial-1

This binary data can be used to deserialize (“rehydrate”) the object in .NET code, which is exactly what SQL Server does when it needs to use the spatial objects.  So, we just need to find a way for EF to pull these down as a byte array.

Looking back at the GetFrameworkTypeFromNativeType function from above, it appears that EF will likely recognize Binary, Image, Timestamp, and Varbinary all as SQL Server types that need to map to byte arrays.  Perfect!

So, by creating a view in SQL Server that casts the Geometry column as a Varbinary(MAX), EF would recognize it as a type that could be mapped as an entity’s property.

CREATE VIEW vStates
AS SELECT StateID
, StateName
, CAST(geom AS VARBINARY(MAX)) AS geom
FROM dbo.State


 

Note: Julie had come up with this same solution at the same time, as our emails crossed paths reporting to one another.

Regenerating the EF model (using this view instead of the table) proved my assumption: the “geom” column now appeared as a Binary property of the vStates entity.

However, we’re not quite done yet.  The point of this exercise was to get an instance of the spatial type to use in our .NET application.  To do that, the Read(BinaryReader) instance method on SqlGeometry (or SqlGeography) must be invoked (using a MemoryStream as the intermediate between the byte[] and the BinaryReader).

The entire logic to retrieve the contents of the table and instantiate one of the Spatial types is as follows:

var entities = new SpatialEntities();
var vStates = entities.vStates;

// pull one of the entities from the collection
var geo2 = vStates.ToArray()[16];
var sqlGeom = new Microsoft.SqlServer.Types.SqlGeometry();

// Deserialize the bytes to rehydrate this Geometry instance
using (var stream = new System.IO.MemoryStream(geo2.geom))
{
using (var rdr = new System.IO.BinaryReader(stream))
{
sqlGeom.Read(rdr);
}
}

// Now let's prove that we have it. Dump WKT to Debug.
System.Diagnostics.Debug.Write(sqlGeom.ToString());

 

Output:

GEOMETRYCOLLECTION (LINESTRING (-99.530670166015625 39.132522583007812, -99.530670166015625 39.13250732421875), LINESTRING (-99.791290283203125 39.131988525390625, -99.791290283203125 39.131973266601562), …

So it worked!

Finally, an extension method would make this code a bit more general purpose:

public static class Extension
{
    public static Microsoft.SqlServer.Types.SqlGeometry AsSqlGeometry(thisbyte[] binary)
    {
        var ret = new Microsoft.SqlServer.Types.SqlGeometry();

        using (var stream = new System.IO.MemoryStream(binary))
        {
            using (var rdr = new System.IO.BinaryReader(stream))
            {
                ret.Read(rdr);
            }
        }

        return ret;
    }
}

 

The test code above then becomes a bit more readable after the refactoring:

var entities = new SpatialEntities();
var vStates = entities.vStates;

// pull one of the entities from the collection
var geo2 = vStates.ToArray()[16];
var sqlGeom = geo2.geom.AsSqlGeometry();

// Now let's prove that we have it. Dump WKT to Debug.
System.Diagnostics.Debug.Write(sqlGeom.ToString());

 

Helpful information: