-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question - Is there a way to find variables for smooth components in mgcv::gam? #553
Comments
Hi, yes, this is possible! You can use the following function: library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
set.seed(2) ## simulate some data...
dat <- gamSim(1, n = 400, dist = "normal", scale = 2)
#> Gu & Wahba 4 term additive model
b <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat)
library(insight)
find_variables(b)
#> $response
#> [1] "y"
#>
#> $conditional
#> [1] "x0" "x1" "x2" "x3" Created on 2022-04-15 by the reprex package (v2.0.1) |
Have a look at the docs to see additional customizations you can do with it: |
And also |
library(insight)
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2)
#> Gu & Wahba 4 term additive model
b <- gam(y~x0+s(x1)+s(x2)+x3,data=dat)
find_variables(b)
#> $response
#> [1] "y"
#>
#> $conditional
#> [1] "x0" "x1" "x2" "x3"
find_smooth(b)
#> $smooth_terms
#> [1] "s(x1)" "s(x2)"
find_terms(b)
#> $response
#> [1] "y"
#>
#> $conditional
#> [1] "x0" "s(x1)" "s(x2)" "x3" Created on 2022-04-15 by the reprex package (v2.0.1) |
Hi! I am not sure that answers my question. What I am trying to achieve is returning the variables inside the smooths after finding the smooths. Pseudo-code example: b <- gam(y~x0+s(x1)+s(x2)+x3+s(x1, x3),data=dat)
smooths <- find_smooth(b)
smooths
#> $smooth_terms
#> [1] "s(x1)" "s(x2)" "s(x1, x3)"
find_vars_from_smooth(smooths)
#> $`s(x1)`
#>[1] "x1"
#>
#>$`s(x2)`
#>[1] "x2"
#>
#>$`s(x1, x3)`
#>[1] "x1" "x3" |
ok, then just use library(insight)
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2)
#> Gu & Wahba 4 term additive model
b <- gam(y~x0+s(x1)+s(x2)+x3,data=dat)
find_smooth(b, flatten = TRUE) |> clean_names()
#> [1] "x1" "x2" Created on 2022-04-16 by the reprex package (v2.0.1) |
Unfortunately, it doesn't work correctly: library(insight)
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2)
#> Gu & Wahba 4 term additive model
b <- gam(y~x0+s(x1)+s(x2)+x3+s(x1,x2),data=dat)
find_smooth(b, flatten = TRUE) |> clean_names()
#> [1] "x1" "x2" "x1" The third smooth should return |
@stefanocoretta It should work now: library(insight)
library(mgcv)
#> Le chargement a nécessité le package : nlme
#> This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
set.seed(2)
dat <- gamSim(1,n=400,dist="normal",scale=2)
#> Gu & Wahba 4 term additive model
b <- gam(y~x0+s(x1)+s(x2)+x3+s(x1,x2), data=dat)
find_smooth(b, flatten = TRUE) |> clean_names()
#> [1] "x1" "x2" "x1, x2"
d <- gam(y~x0+s(x1)+s(x2)+x3+s(x1,x2, k = -1), data=dat)
find_smooth(d, flatten = TRUE) |> clean_names()
#> [1] "x1" "x2" "x1, x2" Created on 2022-06-07 by the reprex package (v2.0.1) |
I'm not super-familiar with smooth-terms (I think, @DominiqueMakowski startet using them some time ago), but when is it important to include a variable? E.g. here, should the last line return library(insight)
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
set.seed(2)
dat <- gamSim(1,n=400,dist="normal",scale=2)
#> Gu & Wahba 4 term additive model
d <- gam(y~x0+s(x1)+s(x2)+x3+s(x1,by = x2, k = -1), data=dat)
find_smooth(d, flatten = TRUE)
#> [1] "s(x1)" "s(x2)" "s(x1, by = x2, k = -1)"
find_smooth(d, flatten = TRUE) |> clean_names()
#> [1] "x1" "x2" "x1" Created on 2022-06-07 by the reprex package (v2.0.1) |
Mmh I am not sure what's the expected output in this case, last line should probably return "x1, x2" or "x1:x2" or something like that |
Looks like none of us are sure about this. Is there anyone in the team who is expert in GAMs? |
Hi! It should return all variables in all cases. And the variables should be different elements. These are some of the possible scenarios s(time)
s(longitude, latitude)
s(longitude, latitude, altitude)
s(time, by = factor)
s(time, duration, by = factor)
s(time, factor, bs = "fs")
s(factor, bs = "re")
s(factor, time, bs = "re) Each of those should return: "time"
c("longitude", "latitude")
c("longitude", "latitude", "altitude")
c("time", "factor")
c("time", "duration", "factor")
c("time", "factor")
"factor"
c("factor", "time") That is the necessary format for the variables to be used in |
@stefanocoretta since |
@stefanocoretta There's an example of output in #580 |
It might do although it's a bit inelegant because technically the In order to be able to use the output further I would have to split the output by But if that means rewriting the code to accept lists, then your current solution will just do! 😄 |
But then there could be duplicates if there are several call to d <- gam(y~s(x1)+s(x2)+s(x1,by = x2, k = -1), data=dat)
find_smooth(d, flatten = TRUE) |> clean_names() |
Maybe we could return a character vector in sapply(insight::find_smooth(d, flatten = TRUE), insight::clean_names, simplify = FALSE) which will give the information @stefanocoretta requested: a named list (with smooth term names), which elements are the variables used. |
Correct, they should be reduplicated, because to predict stuff you need to know which smooths have with variables (especially when excluding terms while predicting). The mgcv implementation of GAMs is a bit different in structure from most other models. So ideally I would expect: gam(y ~ fac + s(x) + s(x, by = fac)) |
Hello! Thanks for this package, it's so great!
I have a question about GAMs with mgcv.
I wonder if there is a function to programmatically find variables based on smooth term strings (without having to regex the string).
For example:
there are four smooth terms and I would like to be able to extract the variables in the terms, so that for example from
"s(x0)
I get"x0"
and so on (in principle regexing would work, but smooth specifications can get so complicated that it's a bit of a puzzle making sure you get indeed the variable).Is this possible with insight?
The text was updated successfully, but these errors were encountered: