Geospatial Databases: PostGIS

A geospatial database is basically a very handy tool for storing data that has a spatial component. As other (relational) databases, it stores data in tabular format, where each table is structured in:

  • Rows represent each occurence of the data we're storing.
  • Columns represent the attributes associated to each row.

What makes a geospatial database geospatial is that in addition to numbers, text, arrays and booleans, it can store and operate with a special data type, that represent the shape and location of objects in the surface of Earth.

As you may know already, one can ask questions to a database, using a language called SQL (Structured Query Language). Using SQL, we could filter, aggregate and order our data, but also transform it or obtain calculated columns. The kind of questions we can ask a geospatial database are something like:

  • Which houses are closer than 100m from a Metro station?
  • How many accidents happened in this segment of the street in the last year?
  • Where are the 10 closest grocery stores from this location?
  • What's the tree species distribution like in this park?

Using a database (instead of files) to store data has many advantages, as it allows storing and querying bigger volumes of data in a more organized, structured and performant way.

PostGIS

PostGIS is an Open Source extension that adds geospatial capabilities (data types, functions and operators) to a PostgreSQL database.

PostGIS is the foundation for many other projects, such as CARTO.

Take a look at this amazing talk from Paul Ramsey, one of the main contributors to the PostGIS project. It can get a bit too technical, but it's a really great introduction to PostGIS and geospatial databases in general.

Resources

Other useful links:

Exercise