# How Interactive Oceans Displays Large Datasets

This notebook will walk you through our process in displaying some of the large datasets from OOI

In [None]:
import cmocean
from dask.utils import memory_repr
import matplotlib.pyplot as plt
import hvplot.xarray

from ooi_harvester.models import OOIDataset

## Get Data

We will be requesting **Axial Base Shallow Profiler CTD Data**

In [None]:
desired_parameters = ['time', 'seawater_pressure', 'seawater_temperature']

In [None]:
ctd = OOIDataset("RS03AXPS-SF03A-2A-CTDPFA302-streamed-ctdpf_sbe43_sample")[desired_parameters]

In [None]:
ctd

This dataset has a total size of 52.2GB

In [None]:
start_dt, end_dt = "2020-01-01", "2021-01-01"

In [None]:
%%time
ctd_ds = ctd.sel(time=slice(start_dt, end_dt)).dataset

In [None]:
ctd_ds

There are about 29 million data points within that time range. This is huge for visualization!

We can check the size of 1 year of this dataset

In [None]:
print(f"This dataset size is {memory_repr(ctd_ds.nbytes)}")

## Plotting

Now let's try to create a depth plot (time, pressure, and temperature). We use **hvPlot** to perform the plotting. Using a plotting tool like matplotlib would take a really long time to plot.

### hvPlot Process Diagram

<img src="https://hvplot.holoviz.org/assets/diagram.png" height="250" />

To learn more [click here](https://hvplot.holoviz.org/).

### Using matplotlib

```python
fig, ax = plt.subplots()
ctd_ds.plot.scatter(x='time', y='seawater_pressure', hue='seawater_temperature', cmap=cmocean.cm.thermal)
ax.invert_yaxis()
ax.set_title('Axial Base Shallow Profiler CTD')

plt.tight_layout()
plt.savefig('ctd-profile.png', dpi=300, bbox_inches='tight', transparent=True)
```


<img src="../_static/ctd-profile.png" height="400" />

For purpose of comparison, the plot above was created with matplotlib pyplot using the builtin xarray plotting function.

### Using hvPlot

In [None]:
plot_size = (888, 450)

In [None]:
%%time
plot = ctd_ds.hvplot.scatter(
    x='time',
    y='seawater_pressure',
    color='seawater_temperature',
    rasterize=True,
    cmap=cmocean.cm.thermal,
    width=plot_size[0],
    height=plot_size[1],
).options(
    invert_yaxis=True,
    title='Axial Base Shallow Profiler CTD'
)
plot

The hvPlot python library is part of the HoloViz Python Visualization Tools Ecosystem. Underneath, hvPlot utilizes HoloViews and Datashader in order to create the plot. We take the resulting data from the hvPlot plot and serialize that to JSON format for our frontend visualization engine plotly to render.


You can see that the resulting datashaded plot has exactly the same pattern seen in the matplotlib plot. For example, around 9/2020 there is a warmer water at the surface. This shows the accuracy of datashading.

#### Datashading Pipeline

<img src="https://datashader.org/assets/images/pipeline2.png" height="200" />

To learn more [click here](https://datashader.org/getting_started/Introduction.html).

### Extracting underlying dataset

In [None]:
plot_data = plot[()].data

In [None]:
plot_data

The xarray dataset shown above is the resulting aggregated data from the datashading process that we push to the frontend application.

That's all. That process happens with all of the datasets that we have, when the data request is large enough.