Shared CMIP6 Tables

The tables in our share are relatively straight forward. There are two foundational pieces of knowledge you’ll need before they make sense, though, so if you haven’t already then please read our explainer on CMIP6 itself and a tutorial on using GEOGRAPHY data types in Snowflake. Once you’ve done that, then these tables will make a lot more sense.

The Projections Table

The CMIP6_PROJECTIONS_BY_TIME_AND_PLACE table is the key table for data. It’s fairly denormalized, and so it contains both the data you’re interested in and some extra information about the place and time it refers to. The columns are as follows:

NameDescription
SOURCE_IDCorresponds directly to the CMIP6 source_id or, in other words, which lab and model this data came from. See the CMIP6 explainer for more information.
EXPERIMENT_IDCorresponds directly to the CMIP6 experiment_id which, for all practical purposes, means "which future scenario we're assuming happens". For example, experiment_id="ssp585" means Shared Socioeconomic Pathway 5 (SSP5) crossed with Representative Concentration Pathway 8.5 (RCP 8.5), commonly known as "worst case scenario". See the CMIP6 explainer for more information.
YEARThe year this measurement is referring to.
MONTHThe month in that year this measurement is referring to.
TEMP_CThe average temperature across that entire month (day and night) in degrees Centigrade. For more information, see our variables explainer.
TEMP_FThe average temperature across that entire month (day and night) in degrees Fahrenheit. For more information, see our variables explainer.
SPECIFIC_HUMIDITYThe average specific humidity across that entire month (day and night) expressed as a unitless ratio. For more information, see our variables explainer.
RELATIVE_HUMIDITYThe average relative humidity across that entire month (day and night) expressed as a percentage. For more information, see our variables explainer.
SNOW_AREA_PCTThe average percent of the land in that given cell that is covered in snow. For more information, see our variables explainer.
PRECIPITATION_MM_PER_DAYThe average amount of precipitation that falls per day in that area, expressed in millimeters per day. For more information, see our variables explainer.
PRECIPITATION_IN_PER_DAYThe average amount of precipitation that falls per day in that area, expressed in inches per day. For more information, see our variables explainer.
CELL_AREA_M_SQThe actual surface area of that grid cell, expressed in square meters. Mostly useful if you have to create weighted averages across more than one cell, so that you can weight slightly smaller cells less than slightly bigger cells (remember that not every 1 degree x 1 degree "square" on Earth is actually square, nor is it the same size, because the Earth is actually round). See our geographies explainer for more details.
CENTER_LATThe latitude of the point on Earth that describes the center of the polygon this data describes, mostly for convenience and performance optimization. If you know ahead of time you're only dealing with a subset of the planet (say, "Europe" or "California") then your queries will run much faster if you limit them to roughly the latitude and longitude "neighborhood" you care about first using this column, rather than relying solely on more expensive calculations directly on the geo_bounds column. See our geographies explainer for more details.
CENTER_LONThe longitude of the point on Earth that describes the center of the polygon this data describes, mostly for convenience and performance optimization. If you know ahead of time you're only dealing with a subset of the planet (say, "Europe" or "California") then your queries will run much faster if you limit them to roughly the latitude and longitude "neighborhood" you care about first using this column, rather than relying solely on more expensive calculations directly on the geo_bounds column. See our geographies explainer for more details.
HEMISPHEREFor convenience, "north" for Northern Hemisphere and "south" for Southern Hemisphere. Useful for quickly separating summer temperatures from winter temperatures, for example, without having to do much more expensive calculations directly on the geo_bounds column.

The Sources Table

The CMIP6_SOURCE_AND_EXPERIMENT table describes where we got all this data from and what the licensing terms for it are. The licensing terms are especially important; most of this data is licensed under the Creative Commons Attribution Sharealike license, which means if you recombine or reuse this data elsewhere, then you have to attribute back to the right place.

You’ll notice that many of these columns are array types, but they only contain one element. That’s because we source this data from many, many different files and, theoretically, those files could have, for example, different licensing terms. That said, in practice, they basically never do.

The columns are as follows:

NameDescription
SOURCE_IDCorresponds directly to the CMIP6 source_id or, in other words, which lab and model this data came from. See the CMIP6 explainer for more information.
EXPERIMENT_IDCorresponds directly to the CMIP6 experiment_id which, for all practical purposes, means "which future scenario we're assuming happens". For example, experiment_id="ssp585" means Shared Socioeconomic Pathway 5 (SSP5) crossed with Representative Concentration Pathway 8.5 (RCP 8.5), commonly known as "worst case scenario". See the CMIP6 explainer for more information.
NOMINAL_RESOLUTIONDescribes roughly what size the geographic polygons this dataset is output in. Will be values like "100 km" or "500 km". Don't take this number too literally; it's just an approximation and every polygon will be slightly different sizes because the Earth is curved. See our geographies explainer for more details.
LICENSE_DESCRIPTIONSThe licensing language, verbatim, as it came out of the source files.
EXTERNAL_DATASET_IDSThe CMIP identifiers of the files that went into building up this dataset.
CONTACT_EMAILSThe contact information, verbatim, as it came out of the source files.
SOURCE_URLSThe list of actual, raw files that went into building up this data.

The Places Table

The CMIP6_PLACES table is mainly just here to hold the geographic bounds of the places on Earth the data is referring to. Join to it using the PLACE_ID column.

The columns are as follows:

NameDescription
PLACE_IDUsed for joining to the PLACE_ID column in the CMIP6_PROJECTIONS_BY_TIME_AND_PLACE table.
CENTER_GEOA GIS-style geography describing the center point of this place, for convenience.
GEO_BOUNDSThe actual GIS-style polygon that describes the "square" on Earth this data is for. See our geographies explainer for more details.

The US States Table

The US_STATES table is a convenience table for when you want to do analysis by US states. It contains the geographic shapes of each state, which you can join to the geo_bounds column of CMIP6_PLACES if you like.

The columns are as follows:

NameDescription
STATE_CODEThe commonly-used abbreviation for this state, like "CA" or "MA".
NAMEThe full name of this state, for convenience.
GEO_BOUNDSThe actual GIS polygon that describes the "square" on Earth this data is for. See our geographies explainer for more details.