Computing and Plotting

Computing and Plotting#

This tutorial includes an overview of the different ways available to compute, select data and plot using the xarray objects that are provided by xcompact3d-toolbox.

The very first step is to import the toolbox and other packages:

import matplotlib.pyplot as plt
import numpy as np

import xcompact3d_toolbox as x3d

Why xarray?#

The data structures are provided by xarray, that introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. It integrates tightly with dask for parallel computing.

The goal here is to speed up the development of customized post-processing applications with the concise interface provided by xarray. Ultimately, we can compute solutions with fewer lines of code and better readability, so we expend less time testing and debugging and more time exploring our datasets and getting insights.

Additionally, xcompact3d-toolbox includes extra functionalities for DataArray and Dataset.

Before going forward, please, take a look at Overview: Why xarray? and Quick overview to understand the motivation to use xarray’s data structures instead of just numpy-like arrays.

Example - Flow around a cylinder#

We can download the example from the online database, the flow around a cylinder in this case. We set cache=True and a local destination where it can be saved in our computer cache_dir="./example/", so there is no need to download it every time the kernel is restarted.

dataset, prm = x3d.tutorial.open_dataset("cylinder", cache=True, cache_dir="./example/")

Notice there is an entire tutorial dedicated to the parameters file. Now, let’s take a look at the dataset:

We got a xarray.Dataset with the variables u (velocity vector), pp (pressure) and epsi (that describes the geometry), their coordinates (x, y, t and i) and some attributes like the xcompact3d_version used to run this simulation, the url where you can find the dataset and others.

We can access each of the variables or coordinates with the dot notation (i.g., snapshot.pp, snapshot.u, snapshot.x) or the dict-like notation (i.g., snapshot["pp"], snapshot["u"], snapshot["x"]).

Once the arrays are wrapped with their coordinates, we can use xarray’s plotting functionality to explore our data with just a few lines of code.

Starting with epsi, that represents the geometry (it is 1 inside the cylinder and 0 outside), we select it from the dataset and then use the method plot:

dataset.epsi.plot()

<matplotlib.collections.QuadMesh at 0x7fb342d763c0>

../_images/4c52df108daefae41e008a9497e50c7e6deec6f91be8317301228bdea567d1c1.png

The array in the example was two-dimensional, in this case .plot() automatically calls xarray.plot.pcolormesh().

There are many options to customize the plots, besides that, xarray plotting functionality is a thin wrapper around the popular matplotlib library.

To improve the figure, let’s set the x-axis of the plot as the coordinate x of our array, same for y. Then let’s use matplotlib to set the axis aspect to equal. Take a look:

ax = dataset.epsi.plot(x="x", y="y")
ax.axes.set_aspect("equal")

../_images/ca753dec6c10599aaa9ebb52cb4a00cfd5cad2164c3fb8c2c5a9a54179014bd7.png

It might be important in the cylinder case to ignore the values that are inside the solid cylinder when plotting or computing any quantity. We can do it by preserving the values of u and pp where epsi is equal to zero, and setting the values to np.NaN otherwise.

xarray.Dataset.where is a handy method for that, take a look:

for var in ["u", "pp"]:
    dataset[var] = dataset[var].where(dataset.epsi == 0, np.nan)

Have you noticed that we are doing this comparison between variables with different dimensions, and it just worked? I mean, epsi is 2D (x, y), pp is 3D (x, y, t) and u is 4D (i, x, y, t). That is because xarray automatically broadcasted the values of epsi to each point at the coordinates t and i.

Another cool feature of xarray is that we can select data based on the actual value of its coordinates, not only on the integer indexes used for selection on numpy-like arrays.

To exemplify, let’s select one position at the same height of the cylinder, but a bit downstream. Note we can get the time evolution for all variables at this specified point:

We can chain the methods, selecting a variable, selecting a point in the domain and doing a plot, all with just one line of code:

dataset.u.sel(x=10.0, y=6.0, method="nearest").plot(x="t", hue="i")

[<matplotlib.lines.Line2D at 0x7fb340a9cb90>,
 <matplotlib.lines.Line2D at 0x7fb340a9ccd0>]

../_images/055f91deea21aa66509b1d0f7d19f245bab37433e7b0b224bc8a10f489947a65.png

Note this time the data was 1D, so the plot was handled internally by xarray.plot.line.

To give you another example, let’s plot the time-averaged (\(60 \le t \le 150\)) vertical velocity profile where \(x=10\):

dataset.u.sel(x=10.0, t=slice(60.0, 150.0)).mean("t").plot(y="y", hue="i")

[<matplotlib.lines.Line2D at 0x7fb340aefc50>,
 <matplotlib.lines.Line2D at 0x7fb340aefd90>]

../_images/6072c0bc7801203c3a39073887077a748dfd3237936b36276d495dc4538af73e.png

As you saw, we can refer to the coordinates by their name when working with xarray, instead of keeping track of their axis number.

To extend this concept, let’s now compute the time evolution of the kinetic energy. It is given by the equation:

\[ k = \int_V \dfrac{u_iu_i}{2} dV. \]

Now the code:

dataset["kinetic_energy"] = ((dataset.u**2.0).sum("i").x3d.simpson("x", "y")) * 0.5
dataset["kinetic_energy"].attrs = {"name": "k", "long_name": "kinetic Energy", "units": "-"}
dataset["kinetic_energy"].plot()

[<matplotlib.lines.Line2D at 0x7fb3409979d0>]

../_images/7e503ea7f7feeddaecf38de831355e4302d317a1642c74cd3b067325e0fa83d0.png

In the code above we:

Solved the equation for the entire time series with a very readable code. A good point is that it worked for the xy planes in the dataset in this example, and all we need to do to run it in a real 3D case is include z at the integration;
Included attributes to describe what we just computed, making our application easier to share and collaborate. As a bonus, they were automatically included in the plot;
Plotted the results.

We can use a quick list comprehension to get the dimensions for the volumetric integration:

V_coords = [dim for dim in dataset.u.coords if dim in "xyz"]
V_coords

['x', 'y']

and rewrite the previous example to make it more robust, now it works for n-dimensional cases:

dataset["kinetic_energy"] = ((dataset.u**2.0).sum("i").x3d.simpson(*V_coords)) * 0.5
dataset["kinetic_energy"].attrs = {"name": "k", "long_name": "kinetic Energy", "units": "-"}

Going back to 2D plots, let’s take the velocity vector u, select it for \(60 \le t \le 150\) (sel), compute a time average (mean) and plot:

g = dataset.u.sel(t=slice(60.0, 150.0)).mean("t").plot(x="x", y="y", row="i", cmap="turbo", rasterized=True)
for ax in g.axes.flat:
    ax.axes.set_aspect("equal")

../_images/7a5ecbd6f2ce55d7cfd734ccf7fc51a9a72dd84d4641a11627c0c466b535f832.png

Do you want to see the time evolution? No problem. Let’s take the velocity vector u, use isel with a slice selecting every 40 points in time (otherwise we would get too many figures), and plot:

g = dataset.u.isel(t=slice(None, None, 40)).plot(x="x", y="y", col="t", row="i", cmap="turbo", rasterized=True)

for ax in g.axes.flat:
    ax.axes.set_aspect("equal")

../_images/1c7fe5bd45f54bd88e2e10a64afc50c5aeab430dfb1a61130303e6b1669ba727.png

To exemplify differentiation and parallel computing capabilities, let’s compute the vorticity for our dataset. We just have one component for this 2D example, it is given by the equation:

\[ \omega_z = \dfrac{\partial u_y}{\partial x} - \dfrac{\partial u_x}{\partial y}. \]

We can use xarray.DataArray.differentiate just out of the box with its second order accurate central differences. However, we can use the 4th order accurate centered scheme available at X3dDataArray.first_derivative.

We start setting the attribute boundary conditions (BC) for the velocity field:

dataset["u"].attrs["BC"] = prm.get_boundary_condition("u")

and then we compute the vorticity:

%%time
dataset["vort"] = dataset.u.sel(i="y").x3d.first_derivative("x") - dataset.u.sel(i="x").x3d.first_derivative("y")

CPU times: user 977 ms, sys: 15.9 ms, total: 993 ms
Wall time: 991 ms

Notice the equation above computed the vorticity for the entire time series in our dataset.

We can use X3dDataArray.pencil_decomp to coarse the velocity array to a dask array, ready for parallel computing (see Using Dask with xarray). Notice that X3dDataArray.pencil_decomp applies chunk=-1 for all coordinates listed in args, which means no decomposition, and 'auto' to the others, delagating to dask the job of finding the optimal distribition. One important point here is that dask considers the dataset in this example so small that the overhead for parallel computing is not worth it. As a result, it returns with just one chunk:

u_chunked = dataset.u.x3d.pencil_decomp("x", "y")
u_chunked

<xarray.DataArray 'u' (i: 2, x: 257, y: 128, t: 201)> Size: 53MB
dask.array<xarray-<this-array>, shape=(2, 257, 128, 201), dtype=float32, chunksize=(2, 257, 128, 201), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 2kB 0.0 0.07812 0.1562 0.2344 ... 19.84 19.92 20.0
  * y        (y) float64 1kB 0.0 0.09375 0.1875 0.2812 ... 11.72 11.81 11.91
  * t        (t) float64 2kB 0.0 0.75 1.5 2.25 3.0 ... 147.8 148.5 149.2 150.0
  * i        (i) <U1 8B 'x' 'y'
Attributes:
    BC:       {'x': {'ncl1': 2, 'ncln': 2, 'npaire': 1}, 'y': {'ncl1': 0, 'nc...

Parallel computing is presented in this tutorial anyway, because X3dDataArray.pencil_decomp returns the arrays with several chunks for datasets in real scale. Each of these chunks will be computed in parallel in multi-core systems.

Just to exemplify, let’s create blocks with 51 points in time, so we can use 4 cores to compute it in parallel:

u_chunked = dataset.u.chunk(chunks={"t": 51})
u_chunked

<xarray.DataArray 'u' (i: 2, x: 257, y: 128, t: 201)> Size: 53MB
dask.array<xarray-<this-array>, shape=(2, 257, 128, 201), dtype=float32, chunksize=(2, 257, 128, 51), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 2kB 0.0 0.07812 0.1562 0.2344 ... 19.84 19.92 20.0
  * y        (y) float64 1kB 0.0 0.09375 0.1875 0.2812 ... 11.72 11.81 11.91
  * t        (t) float64 2kB 0.0 0.75 1.5 2.25 3.0 ... 147.8 148.5 149.2 150.0
  * i        (i) <U1 8B 'x' 'y'
Attributes:
    BC:       {'x': {'ncl1': 2, 'ncln': 2, 'npaire': 1}, 'y': {'ncl1': 0, 'nc...

Now computing the vorticity in parallel:

%%time
dataset["vort"] = (
    u_chunked.sel(i="y").x3d.first_derivative("x") - u_chunked.sel(i="x").x3d.first_derivative("y")
).compute()

CPU times: user 3.53 s, sys: 1.73 s, total: 5.25 s
Wall time: 4.09 s

Again, remember that the dataset in this tutorial is to small that the overhead for parallel computing is not worth it. The wall time was 3 times bigger, but the code is here if you plan to try it on large scale simulations.

As usual, we can set attributes to the array we just computed:

dataset["vort"].attrs = {"name": "wz", "long_name": "Vorticity", "units": "-"}

And plot it:

g = dataset.vort.isel(t=slice(None, None, 10)).plot(
    x="x", y="y", col="t", col_wrap=7, cmap="turbo", rasterized=True, robust=True
)
for ax in g.axes.flat:
    ax.axes.set_aspect("equal")

../_images/ba177c63bc597b9d8985d274bb32a2ef7aa4a2d57ecffdf70f89944bb6a34950.png

The precision can be improved near the geometry if we interpolate the velocity field inside, in this way, we create a continuous function before computing the derivative. For a nice visual effect, let’s select a sample data, making it easier to visualize in 1D. From the velocity vector, we select just the component x, besides, we can specify one value for y and t. See the code:

ux_sample = dataset.u.sel(i="x", t=150.0, y=6.0)

We can plot this sample with np.NaN inside the cylinder, the same data we used in all the previous examples, and we can also fill it with a cubic interpolation. See the results:

ux_sample.plot(label="NaN at the Geometry")
ux_sample.interpolate_na("x", "cubic").plot(label="Interpolated", zorder=-1)
plt.legend()

<matplotlib.legend.Legend at 0x7fb32ea351d0>

../_images/8bb80b65083ae4720445964c85392e292e693c7315f53caa21a1b1e8b9f6bac5.png

Now, a demonstration of the first derivative with and without interpolation:

ux_sample.x3d.first_derivative("x").plot(label="NaN at the Geometry")
ux_sample.interpolate_na("x", "cubic").x3d.first_derivative("x").plot(label="Interpolated", zorder=-1)
plt.ylabel("du/dx")
plt.legend()

<matplotlib.legend.Legend at 0x7fb32c902c10>

../_images/8b80fc0f3c44fea44b2d345f7b74e6a4b6adebff57f27349d30ecc896ab5f624.png

Notice that xarray is built on top of Numpy, so its arrays and datasets are compatibles with many tools of the Numpy/SciPy universe. You can even access a numpy.ndarray object with the property values:

dataset.epsi.values

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

You can use it for backwards compatibility with your previous post-processing tools, in this way, the transition to xcompact3d-toolbox should be easier. It is just not so effective, because we lost track of metadata like the coordinates and attributes, they are key points for data analysis with xarray.

Interactive Visualization#

import hvplot.xarray  # noqa: F401

For an interactive experience launch this tutorial on Binder, the widgets are not responsive when disconnected from a Python application.

All the previous examples where based on matplotlib, but xarray is compatible with more options. One of them is hvPlot (see Gridded Data).

hvPlot is recommended when you are exploring your data and need a bit more interactivity.

To exemplify, let’s reproduce one of the figure we did before, choosing one specific location in our mesh and looking at the time evolution of the velocity there:

dataset.u.sel(x=10.0, y=6.0, method="nearest").hvplot(x="t", by="i")

One key aspect about hvPlot is that when it gets more coordinates than it can handle in a plot, it presents the extra coordinates in widgets. So if we do not select any specific point in the domain and reproduce the same figure above, we will get widgets to select the point where we are looking:

dataset.u.hvplot(x="t", by="i", widget_location="bottom")

Here we reproduce the time evolution of the kinetic energy, this time with hvPlot:

dataset["kinetic_energy"].hvplot()

And one last example, we can see a really nice animation of the vorticity field, here in a Jupyter Notebook, with a very few lines of code:

dataset.vort.sel(t=slice(40, None)).hvplot(
    x="x",
    y="y",
    aspect="equal",
    clim=(-5, 5),
    rasterize=True,
    cmap="turbo",
    widget_type="scrubber",
    widget_location="bottom",
    title="Flow around a Cylinder",
    clabel=r"Vorticity [-]",
)

Note: The selection (sel(t = slice(40, None))) in the block above is not necessary, of course, we can see the animation since the beginning. It was just used to look better at readthedocs.

Computing and Plotting

Contents

Computing and Plotting#

Why xarray?#

Example - Flow around a cylinder#

Interactive Visualization#