Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby keyword #396

Open
aaronspring opened this issue Aug 27, 2022 · 1 comment
Open

groupby keyword #396

aaronspring opened this issue Aug 27, 2022 · 1 comment
Labels
enhancement New feature or request feature request

Comments

@aaronspring
Copy link
Collaborator

aaronspring commented Aug 27, 2022

Code Sample

What do you think about a groupby keyword?

To create plots like showing the correlation skill for each initial month:
image

https://nbviewer.org/github/mktippett/NMME/blob/master/n34.ipynb

# correlation as function of start month
# need to compute the means over values where both are not missing
def ac_by_start(x, y):
    ok = ~np.isnan(x) & ~np.isnan(y)
    xa = x.where(ok).groupby('S.month') - x.where(ok).groupby('S.month').mean('S') 
    ya = y.where(ok).groupby('S.month') - y.where(ok).groupby('S.month').mean('S') 
    c = (xa*ya).groupby('S.month').mean('S')/xa.groupby('S.month').std('S')/ya.groupby('S.month').std('S')
    c.attrs['long_name'] = 'correlation'
    c.month.attrs['long_name'] = 'start month'
    # c = xr.corr(x.groupby('S.month'), y.groupby('S.month'), dim='S')
    return c

to be xs.pearson_r(a,b, dim="S", group="month")

Problem description

for many metrics like rmse this doesnt matter much, because you can calc a metric over no dimension dim=[] and hence you can groupby manually afterwards. However, especially for correlation metrics this cannot be done, as they require to be computed over a metric. Therefore I propose to add a metric keyword for correlation metrics as used by @mktippett in https://nbviewer.org/github/mktippett/NMME/blob/master/n34.ipynb

Applies to metrics:

  • pearson_r
  • spearman_r
  • others require dim!=[]

Alternatives

loop and subselect data before

corr = xr.concat([xs.corr(a.sel({dim:a[dim].dt.month == m}), b.sel({dim:a[dim].dt.month == m}), dim=dim) for m in months], "month")
@aaronspring aaronspring added enhancement New feature or request feature request labels Aug 27, 2022
@raybellwaves
Copy link
Member

I see no harm in adding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request
Projects
None yet
Development

No branches or pull requests

2 participants