Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Imputor API #69

Merged
merged 19 commits into from
Oct 14, 2020
Merged

Simplify Imputor API #69

merged 19 commits into from
Oct 14, 2020

Conversation

rofinn
Copy link
Member

@rofinn rofinn commented Oct 1, 2020

Attempting to resolve the following issues: #66, #57, #51, #50, #44, #43, #31

  • Drop Context
  • Introduce a separate Threshold type and assert function
  • Update old tests cases to use new type
  • Add deprecations for old context calls
  • Generalize the dimensionality behaviour using a dims keyword similar to the stats functions
    https://github.com/JuliaLang/Statistics.jl/blob/master/src/Statistics.jl#L164 (using special cased :rows and :cols dims to improve readability)
  • Drop in-place calls Might leave this for now
  • Base imputors should dispatch on _impute!(AbstractArray{Union{T, Missing}}, imp)
  • Replace dropobs and dropvars with Impute.filter and a dims keyword
  • Make Chain not an imputor and have it work on Assertions and Imputors
  • Add function for checking if a methods support some input data? This might just be helpful for testing? Gonna leave this for now.

1. Replaced DropObs and DropVars in favour of a more generic Impute.Filter which isn't an imputor and requires `dims` to be specified for non-vector input.
2. Default colwise behaviour for matrices is deprecated in favour of `:` as the default. Folks may want to apply a method across all dimensions depending on the dataset.
3. `_impute!` methods should explicitly operate on `AbstractArray{Union{T, Missing}}`
4. Added `dims` option to the svd and knn imputors
5. Reworked the `knn` imputor a little as the code was overly complicated... may have also found a bug.
6. Added deprecation tests where possible to reduce testing noise
7. Everything should work on n-dimensional arrays now :)

TODO:
- Check on potential KNN bug
- Extend and breakup imputor tests
- Add n-dimensional array tests
- Make `Chain` not an imputor and have it work with Assertions, Imputors and Filter
@rofinn rofinn changed the title Simplify Imputor API [WIP] Simplify Imputor API Oct 1, 2020
@rofinn rofinn changed the title [WIP] Simplify Imputor API WIP: Simplify Imputor API Oct 1, 2020
@rofinn rofinn changed the title WIP: Simplify Imputor API [WIP] Simplify Imputor API Oct 1, 2020
@rofinn rofinn changed the title [WIP] Simplify Imputor API Simplify Imputor API Oct 8, 2020
@rofinn rofinn requested a review from oxinabox October 8, 2020 21:23
@rofinn
Copy link
Member Author

rofinn commented Oct 8, 2020

Alright, this was bigger than expected, but a lot of the LOC changes are tied to removing the context blocks and moving a couple files around (also adding tests and docs). I've left several issues as TODO items to avoid making this bigger.

@oxinabox Would you mind double checking that I'm using DataDeps.jl as expected?

- Cleaned up docstrings for Imputors and functional calls
- Reorganized the docs API section
- Added 2 walkthroughs for SVD explicitly and working with a real spatiotemporal dataset
- Added a DataDep for our test and demo datasets rather than relying on RDatasets and the corresponding dependencies
src/imputors.jl Outdated Show resolved Hide resolved
src/imputors/substitute.jl Outdated Show resolved Hide resolved
src/imputors/svd.jl Outdated Show resolved Hide resolved
src/utils.jl Show resolved Hide resolved
src/utils.jl Show resolved Hide resolved
Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised a lot of points ( though less than i suspected given the size of the PR).
But it all makes sense,
so I don't think i will need to review again.
But most of my points should be addressed or explained before you merge this

src/imputors/drop.jl Outdated Show resolved Hide resolved
src/imputors/srs.jl Outdated Show resolved Hide resolved
src/Impute.jl Outdated Show resolved Hide resolved
src/chain.jl Outdated Show resolved Hide resolved
src/chain.jl Outdated Show resolved Hide resolved
src/imputors/standardize.jl Outdated Show resolved Hide resolved
src/imputors/substitute.jl Show resolved Hide resolved
src/imputors/substitute.jl Show resolved Hide resolved
src/imputors/substitute.jl Show resolved Hide resolved
src/imputors/svd.jl Outdated Show resolved Hide resolved
Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@rofinn
Copy link
Member Author

rofinn commented Oct 14, 2020

Alright, I believe I've either addressed, responded or opened an issue for each comment left. Since there don't seem to be any new comments I'm gonna merge this and get started on the opened issues tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants