Download Data using OPeNDAP

Written by Minh Phan

Open-source Project for a Network Data Access Protocol (OPeNDAP) is the developer of client/server software of the same name, enabling scientists to share data easily over the internet (EarthData). Using an OPeNDAP URL of any database server, we can access data easily and stream it directly and seamlessly to your local machine. This tutorial is based on this notebook from the Copernicus Marine Data, with some modifications to reflect our workflow and demands outlined in the previous notebooks.

Import necessary libraries

import xarray as xr
import getpass
from pydap.client import open_url
from pydap.cas.get_cookies import setup_session
USERNAME = 'mphan'
PASSWORD = getpass.getpass('Enter your password: ')
Enter your password:  ········

For this notebook we want to stream data from the Copernicus Marine Environment Monitoring Service’s Global Ocean Physics Analysis

# change your Dataset ID accordingly
DATASET_ID = 'cmems_mod_glo_phy_my_0.083_P1D-m'

In the function below, we utilize a PydapDataStore, an Xarray store object used for accessing OpenDAP datasets. For this dataset, you need to log in your credentials using the name and password provided above. Note that not all OpenDAP datasets will require the same steps, so you should look up appropriate methods to access data.

def copernicusmarine_datastore(dataset, username, password):
    cas_url = 'https://cmems-cas.cls.fr/cas/login'
    session = setup_session(cas_url, username, password)
    session.cookies.set("CASTGC", session.cookies.get_dict()['CASTGC'])
    database = ['my', 'nrt']
    url = f'https://{database[0]}.cmems-du.eu/thredds/dodsC/{dataset}'
    try:
        data_store = xr.backends.PydapDataStore(open_url(url, session=session, user_charset='utf-8')) # needs PyDAP >= v3.3.0 see https://github.com/pydap/pydap/pull/223/commits 
    except:
        url = f'https://{database[1]}.cmems-du.eu/thredds/dodsC/{dataset}'
        data_store = xr.backends.PydapDataStore(open_url(url, session=session, user_charset='utf-8')) # needs PyDAP >= v3.3.0 see https://github.com/pydap/pydap/pull/223/commits
    return data_store
data_store = copernicusmarine_datastore(DATASET_ID, USERNAME, PASSWORD)

When you open a dataset from PyDAP’s Data Store object, only the “shell” of the dataset is streamed into our local machine. The rest of the data shall be remote, so if we want to process further outside of slicing/inspecting, we need to stream the additional data, which occupies lots of memory. The streaming/downloading speed is also very slow, so proceed with caution.

DS = xr.open_dataset(data_store)
DS
<xarray.Dataset>
Dimensions:    (longitude: 4320, latitude: 2041, depth: 50, time: 10227)
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.9 -179.8 ... 179.8 179.8 179.9
  * latitude   (latitude) float32 -80.0 -79.92 -79.83 ... 89.83 89.92 90.0
  * depth      (depth) float32 0.494 1.541 2.646 ... 5.275e+03 5.728e+03
  * time       (time) datetime64[ns] 1993-01-01T12:00:00 ... 2020-12-31T12:00:00
Data variables:
    mlotst     (time, latitude, longitude) float32 ...
    zos        (time, latitude, longitude) float32 ...
    bottomT    (time, latitude, longitude) float32 ...
    sithick    (time, latitude, longitude) float32 ...
    siconc     (time, latitude, longitude) float32 ...
    usi        (time, latitude, longitude) float32 ...
    vsi        (time, latitude, longitude) float32 ...
    thetao     (time, depth, latitude, longitude) float32 ...
    so         (time, depth, latitude, longitude) float32 ...
    uo         (time, depth, latitude, longitude) float32 ...
    vo         (time, depth, latitude, longitude) float32 ...
Attributes: (12/24)
    title:              daily mean fields from Global Ocean Physics Analysis ...
    easting:            longitude
    northing:           latitude
    history:            2022/05/25 21:54:07 MERCATOR OCEAN Netcdf creation
    source:             MERCATOR GLORYS12V1
    institution:        MERCATOR OCEAN
    ...                 ...
    longitude_min:      -180.0
    longitude_max:      179.91667
    latitude_min:       -80.0
    latitude_max:       90.0
    z_min:              0.494025
    z_max:              5727.917

We can slice and get the portion of data we need. In consistency with the other notebooks, let’s get data from Jan-March 2003 for salinity at the most shallow level, within our region of interest (60-80 deg E, 5-25 deg N)

DISCLAIMER: DO NOT LOAD DATASET IN ITS ENTIRETY BEFORE SLICING IF YOU DO NOT WANT TO OVERFLOW YOUR MEMORY. Actual data is only loaded when you slice the dataset.

DS_sliced = DS['so'].isel(depth=0).sel(longitude=slice(60, 80), latitude=slice(5, 25), time=slice('2003-01', '2003-02'))
DS_sliced
<xarray.DataArray 'so' (time: 59, latitude: 241, longitude: 241)>
array([[[35.399944, 35.389263, ..., 33.65581 , 33.680225],
        [35.425884, 35.407574, ..., 33.587147, 33.597828],
        ...,
        [36.52913 , 36.526077, ...,       nan,       nan],
        [36.552017, 36.53981 , ...,       nan,       nan]],

       [[35.384686, 35.380108, ..., 33.7611  , 33.78399 ],
        [35.387737, 35.38316 , ..., 33.9259  , 34.067814],
        ...,
        [36.52913 , 36.527603, ...,       nan,       nan],
        [36.545914, 36.54744 , ...,       nan,       nan]],

       ...,

       [[35.245827, 35.22599 , ..., 33.957947, 33.991516],
        [35.254982, 35.24125 , ..., 33.898434, 33.935055],
        ...,
        [36.530655, 36.52913 , ...,       nan,       nan],
        [36.552017, 36.57033 , ...,       nan,       nan]],

       [[35.242775, 35.221413, ..., 34.06018 , 34.083073],
        [35.26261 , 35.245827, ..., 34.0083  , 34.040344],
        ...,
        [36.562702, 36.57491 , ...,       nan,       nan],
        [36.590168, 36.602375, ...,       nan,       nan]]], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 60.0 60.08 60.17 60.25 ... 79.83 79.92 80.0
  * latitude   (latitude) float32 5.0 5.083 5.167 5.25 ... 24.83 24.92 25.0
    depth      float32 0.494
  * time       (time) datetime64[ns] 2003-01-01T12:00:00 ... 2003-02-28T12:00:00
Attributes:
    long_name:      Salinity
    standard_name:  sea_water_salinity
    units:          1e-3
    unit_long:      Practical Salinity Unit
    valid_min:      1
    valid_max:      28336
    cell_methods:   area: mean
    _ChunkSizes:    [1, 7, 341, 720]

You can get rid of the depth since it is an empty coordinate. Make sure to note the depth of the data sampled in your attribute when you work on the data, though!

DS_sliced = DS_sliced.drop('depth')
DS_sliced
<xarray.DataArray 'so' (time: 59, latitude: 241, longitude: 241)>
array([[[35.399944, 35.389263, ..., 33.65581 , 33.680225],
        [35.425884, 35.407574, ..., 33.587147, 33.597828],
        ...,
        [36.52913 , 36.526077, ...,       nan,       nan],
        [36.552017, 36.53981 , ...,       nan,       nan]],

       [[35.384686, 35.380108, ..., 33.7611  , 33.78399 ],
        [35.387737, 35.38316 , ..., 33.9259  , 34.067814],
        ...,
        [36.52913 , 36.527603, ...,       nan,       nan],
        [36.545914, 36.54744 , ...,       nan,       nan]],

       ...,

       [[35.245827, 35.22599 , ..., 33.957947, 33.991516],
        [35.254982, 35.24125 , ..., 33.898434, 33.935055],
        ...,
        [36.530655, 36.52913 , ...,       nan,       nan],
        [36.552017, 36.57033 , ...,       nan,       nan]],

       [[35.242775, 35.221413, ..., 34.06018 , 34.083073],
        [35.26261 , 35.245827, ..., 34.0083  , 34.040344],
        ...,
        [36.562702, 36.57491 , ...,       nan,       nan],
        [36.590168, 36.602375, ...,       nan,       nan]]], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 60.0 60.08 60.17 60.25 ... 79.83 79.92 80.0
  * latitude   (latitude) float32 5.0 5.083 5.167 5.25 ... 24.83 24.92 25.0
  * time       (time) datetime64[ns] 2003-01-01T12:00:00 ... 2003-02-28T12:00:00
Attributes:
    long_name:      Salinity
    standard_name:  sea_water_salinity
    units:          1e-3
    unit_long:      Practical Salinity Unit
    valid_min:      1
    valid_max:      28336
    cell_methods:   area: mean
    _ChunkSizes:    [1, 7, 341, 720]
DS.isel(time=0)
<xarray.Dataset>
Dimensions:    (longitude: 4320, latitude: 2041, depth: 50)
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.9 -179.8 ... 179.8 179.8 179.9
  * latitude   (latitude) float32 -80.0 -79.92 -79.83 ... 89.83 89.92 90.0
  * depth      (depth) float32 0.494 1.541 2.646 ... 5.275e+03 5.728e+03
    time       datetime64[ns] 1993-01-01T12:00:00
Data variables:
    mlotst     (latitude, longitude) float32 ...
    zos        (latitude, longitude) float32 ...
    bottomT    (latitude, longitude) float32 ...
    sithick    (latitude, longitude) float32 ...
    siconc     (latitude, longitude) float32 ...
    usi        (latitude, longitude) float32 ...
    vsi        (latitude, longitude) float32 ...
    thetao     (depth, latitude, longitude) float32 ...
    so         (depth, latitude, longitude) float32 ...
    uo         (depth, latitude, longitude) float32 ...
    vo         (depth, latitude, longitude) float32 ...
Attributes: (12/24)
    title:              daily mean fields from Global Ocean Physics Analysis ...
    easting:            longitude
    northing:           latitude
    history:            2022/05/25 21:54:07 MERCATOR OCEAN Netcdf creation
    source:             MERCATOR GLORYS12V1
    institution:        MERCATOR OCEAN
    ...                 ...
    longitude_min:      -180.0
    longitude_max:      179.91667
    latitude_min:       -80.0
    latitude_max:       90.0
    z_min:              0.494025
    z_max:              5727.917

Finally, we can export this dataset to combine with the rest of the data later

DS_sliced.to_netcdf('demonstrated data/salinity_at_0_49m.nc')