How Interactive Oceans Displays Large Datasets¶

This notebook will walk you through our process in displaying some of the large datasets from OOI

import cmocean
from dask.utils import memory_repr
import matplotlib.pyplot as plt
import hvplot.xarray

from ooi_harvester.models import OOIDataset

Get Data¶

We will be requesting Axial Base Shallow Profiler CTD Data

desired_parameters = ['time', 'seawater_pressure', 'seawater_temperature']

ctd = OOIDataset("RS03AXPS-SF03A-2A-CTDPFA302-streamed-ctdpf_sbe43_sample")[desired_parameters]

ctd

<RS03AXPS-SF03A-2A-CTDPFA302-streamed-ctdpf_sbe43_sample: 59.9 GB>
Dimensions: (time)
Data variables: 
    seawater_pressure
    seawater_temperature
    time

This dataset has a total size of 52.2GB

start_dt, end_dt = "2020-01-01", "2021-01-01"

%%time
ctd_ds = ctd.sel(time=slice(start_dt, end_dt)).dataset

CPU times: user 13.7 s, sys: 6.25 s, total: 19.9 s
Wall time: 23.1 s

ctd_ds

<xarray.Dataset>
Dimensions:               (time: 29403699)
Coordinates:
  * time                  (time) datetime64[ns] 2020-01-01T00:00:00.235197952...
Data variables:
    seawater_pressure     (time) float64 dask.array<chunksize=(11102469,), meta=np.ndarray>
    seawater_temperature  (time) float64 dask.array<chunksize=(11102469,), meta=np.ndarray>

There are about 29 million data points within that time range. This is huge for visualization!

We can check the size of 1 year of this dataset

print(f"This dataset size is {memory_repr(ctd_ds.nbytes)}")

This dataset size is 673.0 MB

Interactive Oceans Resources

How Interactive Oceans Displays Large Datasets

Contents

How Interactive Oceans Displays Large Datasets¶

Get Data¶

Plotting¶

hvPlot Process Diagram¶

Using matplotlib¶

Using hvPlot¶

Datashading Pipeline¶

Extracting underlying dataset¶