Download Chlorophyll data from Copernicus

Author

Eli Holmes (NOAA) and Minh Phan (UW Varanasi intern)

Copernicus is the Earth Observation component of the European Union’s space programme (copernicus.eu). Copernicus data offers downloading using MOTU Service, “a Web Server allowing to handle and extract oceanographic huge volumes of data, creating the connection between heterogeneous data providers and end-users” (help.marine.copernicus.eu). More information on MOTU can be accessed here. You will need to register for a Copernicus account in order to access the data and run this notebook.

We will download the Level-3 data non-gap-filled data and the Level-4 gap-filled daily data.

Daily Level 3 chl data id: cmems_obs-oc_glo_bgc-plankton_my_l3-multi-4km_P1D
Daily Level 4 chl data id: cmems_obs-oc_glo_bgc-plankton_my_l4-gapfree-multi-4km_P1D

Authenticate

Run this once. Afterwards the authentication file will be saved to your home directory.

import copernicusmarine copernicusmarine.login()

Import necessary libraries

import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os, glob, subprocess
import copernicusmarine

Download data

This is an example command to download one day of data. The main command is copernicus.subset(). We will be downloading these datasets: cmems_obs-oc_glo_bgc-plankton_my_l4-gapfree-multi-4km_P1D and cmems_obs-oc_glo_bgc-plankton_my_l3-multi-4km_P1D.

# Get help like so
help(copernicusmarine.subset)

copernicusmarine.subset(
   dataset_id = "cmems_obs-oc_glo_bgc-plankton_my_l4-gapfree-multi-4km_P1D",
   variables = "",
   start_datetime = "2023-08-21T00:00:00",
   end_datetime = "2023-08-21T00:00:00",
   minimum_longitude = 60,
   maximum_longitude = 80, 
   minimum_latitude = 5,
   maximum_latitude = 25, 
   output_directory = 'data/motu/cmems_obs-oc_glo_bgc-plankton_my_l4-gapfree-multi-4km_P1D',
   output_filename = '20230821',
   force_download = True,
)

ds = xr.open_dataset('data/motu/cmems_obs-oc_glo_bgc-plankton_my_l4-gapfree-multi-4km_P1D/20230821.nc')
ds['CHL'].sel(time="2023-08-21").plot()

Load the download function

See bottom of the notebook for a copy of this function. Default is to save the files in ~/shared/data/copernicus in a folder named by the dataset id.

%run -i "~/indian-ocean-zarr/notebooks/functions.py"

Download the products

Download and then load and examine a test nc file.

download_copernicus(
    "cmems_obs-oc_glo_bgc-plankton_my_l4-gapfree-multi-4km_P1D", 
    "1997-10-01", "2024-06-30"
)

ds = xr.open_dataset('~/shared/data/copernicus/cmems_obs-oc_glo_bgc-plankton_my_l4-gapfree-multi-4km_P1D/199710.nc')
ds

Now download the raw level 3 data with many gaps.

download_copernicus(
    "cmems_obs-oc_glo_bgc-plankton_my_l3-multi-4km_P1D", 
    "1997-10-01", "2024-06-30", 
    vars=['CHL', 'CHL_uncertainty', 'flags']
)

ds = xr.open_dataset('~/shared/data/copernicus/cmems_obs-oc_glo_bgc-plankton_my_l3-multi-4km_P1D/199710.nc')
ds

The cmems_obs-oc_glo_bgc-plankton_my_l4-multi-4km_P1M product has many variables (plankton spp) and each month would be 3.8 GB. I will only download the same variables that are in the gapfree product.

The data will be processed in the 02-data-processing.ipynb notebook.

Combine data

Download function

def download_copernicus(dataset, date_start, date_end,  vars="", lat1=-12, lat2=32, lon1=42, lon2=102, path='/home/jovyan/shared/data/copernicus'):
    """
    dataset: dataset_id, example cmems_obs-oc_glo_bgc-plankton_my_l4-gapfree-multi-4km_P1D
    vars: copernicus variables to write, example ['CHL']
    date_start: formatted as YYYY-MM-DD or numpy.datetime64(
    date_end: formatted as YYYY-MM-DD (right-exclusive)
    """

    path_folder = f'{path}/{dataset}'
    if not os.path.exists(path_folder):
        os.makedirs(path_folder)
    sliced_data_filename = '{year}{month}.nc'

    months = pd.date_range(date_start, date_end, freq="ME")
    for month in months:
        yr=month.year
        mon="{:02d}".format(month.month)
        start_date=f'{yr}-{mon}-01'
     
        export_file = sliced_data_filename.format(year = month.year, month = "{:02d}".format(month.month))

        filpath=copernicusmarine.subset(
           dataset_id = dataset,
           variables = vars,
           start_datetime = start_date,
           end_datetime = month,
           minimum_longitude = lon1,
           maximum_longitude = lon2, 
           minimum_latitude = lat1,
           maximum_latitude = lat2, 
           output_directory = path_folder,
           output_filename = export_file,
           force_download = True,
           overwrite_output_data = True
        )