Skip to content

1.15.0

Compare
Choose a tag to compare
@ngxson ngxson released this 03 Aug 20:34
· 19 commits to master since this release
667dd91

New features

downloadModel()

Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:

  • Download model via downloadModel()
  • List all downloaded models using CacheManager.list()
  • Delete a downloaded model using CacheManager.delete()

KV cache reuse in createCompletion

When calling createCompletion, you can pass useCache: true as an option. It will reuse the KV cache from the last createCompletion call. It is equivalent to cache_prompt option on llama.cpp server.

wllama.createCompletion(input, {
  useCache: true,
  ...
});

For example:

  • On the first call, you have 2 messages: user: hello, assistant: hi
  • On the second call, you add one message: user: hello, assistant: hi, user: who are you?

Then, only the added message user: who are you? will need to be evaluated.

What's Changed

Full Changelog: 1.14.2...1.15.0