Skip to content

Releases: TimSchopf/KeyphraseVectorizers

v0.0.13

02 May 15:58
Compare
Choose a tag to compare

Fix small document split and stop words bugs

v0.0.12

29 Apr 13:00
Compare
Choose a tag to compare

This release fixes stop word removal bugs, fixes memory issues for long documents, adds a build_tokenizer attribute, and online update functions.

Solves issues #34, #31, #29, #28, #26, and #6.

Add spacy.Language as valid argument for 'spacy_pipeline'

23 Dec 10:53
a20de03
Compare
Choose a tag to compare

This release allows to reuse an object from spacy.load for many different KeyphraseVectorizer objects. This release includes PR #19

Custom POS-tagger feature added

19 Jun 14:01
Compare
Choose a tag to compare

Added the options to use a custom POS-tagger, define custom stop words, and exclude certain spaCy pipeline components. This release solves issues #2 and #7.

Higher compatibility with available SpaCy pipelines

18 Jun 19:23
Compare
Choose a tag to compare

Fixed issue #11 and #10 by removing the default exclusion of certain spaCy pipeline components. This slightly slows down the keyphrase extraction process. However it grants higher compatibility to all available spaCy pipelines, including the ones that use transformers.

Added 'stop_words'=None option

16 May 15:14
Compare
Choose a tag to compare

Add stopwords download automation

14 Feb 16:11
Compare
Choose a tag to compare
v0.0.7

Signed-off-by: Tim Schopf <[email protected]>

Change "multiprocessing" parameter to "workers" parameter

12 Feb 14:47
Compare
Choose a tag to compare
change "multiprocessing" parameter to "workers" parameter

Signed-off-by: Tim Schopf <[email protected]>

Added min_df and max_df parameters, added support for documents that have more than 1000000 characters, and limit max keyphrase length to 8 words to prevent memory issues

Increased efficiency of spaCy pipeline for POS tagging

03 Feb 16:25
Compare
Choose a tag to compare
v0.0.4

v0.0.4, increased efficiency of spaCy pipeline for POS tagging + adde…