Shared CMIP6 Tables

The tables in our share are relatively straight forward. There are two foundational pieces of knowledge you’ll need before they make sense, though, so if you haven’t already then please read our explainer on CMIP6 itself and a tutorial on using GEOGRAPHY data types in Snowflake. Once you’ve done that, then these tables will make a lot more sense.

The Projections Table

The CMIP6_PROJECTIONS_BY_TIME_AND_PLACE table is the key table for data. It’s fairly denormalized, and so it contains both the data you’re interested in and some extra information about the place and time it refers to. The columns are as follows:

Name	Description
SOURCE_ID	Corresponds directly to the CMIP6 source_id or, in other words, which lab and model this data came from. See the CMIP6 explainer for more information.
EXPERIMENT_ID	Corresponds directly to the CMIP6 experiment_id which, for all practical purposes, means "which future scenario we're assuming happens". For example, experiment_id="ssp585" means Shared Socioeconomic Pathway 5 (SSP5) crossed with Representative Concentration Pathway 8.5 (RCP 8.5), commonly known as "worst case scenario". See the CMIP6 explainer for more information.
YEAR	The year this measurement is referring to.
MONTH	The month in that year this measurement is referring to.
TEMP_C	The average temperature across that entire month (day and night) in degrees Centigrade. For more information, see our variables explainer.
TEMP_F	The average temperature across that entire month (day and night) in degrees Fahrenheit. For more information, see our variables explainer.
SPECIFIC_HUMIDITY	The average specific humidity across that entire month (day and night) expressed as a unitless ratio. For more information, see our variables explainer.
RELATIVE_HUMIDITY	The average relative humidity across that entire month (day and night) expressed as a percentage. For more information, see our variables explainer.
SNOW_AREA_PCT	The average percent of the land in that given cell that is covered in snow. For more information, see our variables explainer.
PRECIPITATION_MM_PER_DAY	The average amount of precipitation that falls per day in that area, expressed in millimeters per day. For more information, see our variables explainer.
PRECIPITATION_IN_PER_DAY	The average amount of precipitation that falls per day in that area, expressed in inches per day. For more information, see our variables explainer.
CELL_AREA_M_SQ	The actual surface area of that grid cell, expressed in square meters. Mostly useful if you have to create weighted averages across more than one cell, so that you can weight slightly smaller cells less than slightly bigger cells (remember that not every 1 degree x 1 degree "square" on Earth is actually square, nor is it the same size, because the Earth is actually round). See our geographies explainer for more details.
CENTER_LAT	The latitude of the point on Earth that describes the center of the polygon this data describes, mostly for convenience and performance optimization. If you know ahead of time you're only dealing with a subset of the planet (say, "Europe" or "California") then your queries will run much faster if you limit them to roughly the latitude and longitude "neighborhood" you care about first using this column, rather than relying solely on more expensive calculations directly on the geo_bounds column. See our geographies explainer for more details.
CENTER_LON	The longitude of the point on Earth that describes the center of the polygon this data describes, mostly for convenience and performance optimization. If you know ahead of time you're only dealing with a subset of the planet (say, "Europe" or "California") then your queries will run much faster if you limit them to roughly the latitude and longitude "neighborhood" you care about first using this column, rather than relying solely on more expensive calculations directly on the geo_bounds column. See our geographies explainer for more details.
HEMISPHERE	For convenience, "north" for Northern Hemisphere and "south" for Southern Hemisphere. Useful for quickly separating summer temperatures from winter temperatures, for example, without having to do much more expensive calculations directly on the geo_bounds column.

The Sources Table

The CMIP6_SOURCE_AND_EXPERIMENT table describes where we got all this data from and what the licensing terms for it are. The licensing terms are especially important; most of this data is licensed under the Creative Commons Attribution Sharealike license, which means if you recombine or reuse this data elsewhere, then you have to attribute back to the right place.

You’ll notice that many of these columns are array types, but they only contain one element. That’s because we source this data from many, many different files and, theoretically, those files could have, for example, different licensing terms. That said, in practice, they basically never do.

The columns are as follows:

Name	Description
SOURCE_ID	Corresponds directly to the CMIP6 source_id or, in other words, which lab and model this data came from. See the CMIP6 explainer for more information.
EXPERIMENT_ID	Corresponds directly to the CMIP6 experiment_id which, for all practical purposes, means "which future scenario we're assuming happens". For example, experiment_id="ssp585" means Shared Socioeconomic Pathway 5 (SSP5) crossed with Representative Concentration Pathway 8.5 (RCP 8.5), commonly known as "worst case scenario". See the CMIP6 explainer for more information.
NOMINAL_RESOLUTION	Describes roughly what size the geographic polygons this dataset is output in. Will be values like "100 km" or "500 km". Don't take this number too literally; it's just an approximation and every polygon will be slightly different sizes because the Earth is curved. See our geographies explainer for more details.
LICENSE_DESCRIPTIONS	The licensing language, verbatim, as it came out of the source files.
EXTERNAL_DATASET_IDS	The CMIP identifiers of the files that went into building up this dataset.
CONTACT_EMAILS	The contact information, verbatim, as it came out of the source files.
SOURCE_URLS	The list of actual, raw files that went into building up this data.

The Places Table

The CMIP6_PLACES table is mainly just here to hold the geographic bounds of the places on Earth the data is referring to. Join to it using the PLACE_ID column.

The columns are as follows:

Name	Description
PLACE_ID	Used for joining to the PLACE_ID column in the CMIP6_PROJECTIONS_BY_TIME_AND_PLACE table.
CENTER_GEO	A GIS-style geography describing the center point of this place, for convenience.
GEO_BOUNDS	The actual GIS-style polygon that describes the "square" on Earth this data is for. See our geographies explainer for more details.

The US States Table

The US_STATES table is a convenience table for when you want to do analysis by US states. It contains the geographic shapes of each state, which you can join to the geo_bounds column of CMIP6_PLACES if you like.

The columns are as follows:

Name	Description
STATE_CODE	The commonly-used abbreviation for this state, like "CA" or "MA".
NAME	The full name of this state, for convenience.
GEO_BOUNDS	The actual GIS polygon that describes the "square" on Earth this data is for. See our geographies explainer for more details.

Shared CMIP6 Tables

The Projections Table

The Sources Table

The Places Table

The US States Table

Recent Blog Entries

Contact Us

Stay Connected