-
-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kiwix-serve is unable to add ZIM to internal library despite zimcheck passing #640
Comments
Here's the hexdump of the ZIM:
|
@IMayBeABitShy Thank you for the detailed report. We are interested to understand the cade and help you... but the answer doesn't seem super obvious. Give us a bit of time to come back to you. |
Implementing #572 might help here maybe |
Regarding how the PWA detects legacy ZIM listings, it should mean that this is a ZIM that uses namespaces (like A/index.html, -/some_stylesheet.css, I/an_image.webp), as opposed to being a type 1 ZIM (one that has all user content under a C/ namespace, and doesn't distinguish amongst images, stylesheets, HTML, etc.). It doesn't mean that the ZIM is non-conformant, it just means that it adheres to an earlier opemZIM spec. Having said that, the ZIMFile object looks like it's missing the It's possible the fields weren't populated at the time you took your snapshot, or that an older version of the PWA displayed the object too soon in console. Might be worth double-checking, though, in the latest PWA (2.7.2+), that those properties are populated (check after the landing page has loaded). |
That is interesting, thank you for the explanation. How is this check performed? This should be a ZIM with the newer namespace behavior (minor version 1, content in
So the "C" namespace is used for content, "X" for the indexes, "W" for well-known entries and "M" for metadata. I've re-checked the output of the PWA version. The article-related fields still aren't populated and I am fairly sure it is fully loaded. Also, the "random article" button took me to a metadata field, so something seems to be wrong with the article index... This should be the entry at |
Hmm, there is a byte set in the ZIM file header that indicates whether the ZIM is type 0 or type 1. It's the Your minorVersion is indeed set to type 1, but the logic that decides whether it is using legacy listing or not is here: https://github.com/kiwix/kiwix-js-windows/blob/main/www/js/lib/zimfile.js#L366 The legacy listing is a kind of fallback, so should work for any ZIM, so it gets used if the app can't find the X/listing... By the way that should be in the X/ namespace (you wrote I'm not sure if labelling the title listing as legacy is an inaccuracy in the PWA, or some problem with the ZIM format, though if it's not readable by Kiwix Serve, it would suggest the latter. By the way, you can (in modern Firefox or Chromium) get a debuggable (unminified) version of the PWA by going to https://kiwix.github.io/kiwix-js-windows/ (ignore the Repo title, it's not Windows-specific). This should allow you to pause on those lines during ZIM loading in case it helps you to debug the format. |
About that: the specification always uses
Great, thank you. I'll look into it. |
It looks like that might be the problem... The PWA at least assumes the title listing is in the X/ namespace, based on the OpenZIM spec, and we certainly find it there in most ZIMs. However, as we use a fallback if we don't find the X/titleOrdered/v0 or /v1 listings, this would not show up as an obvious error. In the listing you provide above, I notice that the |
Sorry, I may have formulated that a bit suboptimal. The entry is inside the
But it seems like the next function:
seems to receive
Yes, I think so too. |
I think I may have found the problem. The v1 title index is in a compressed cluster. In kiwix JS, this results in a problem in the following code section:
Where |
Ah yes, that rings a bell! Basically, I'm not sure if it's permissible to compress |
It's unfortunately not. I've just missed the following line in the spec:
|
Using
It is kind of coherent with openzim/libzim#822 where you asked if we could remove the constraint on mimelistPos being 80. Are you sure you are using the right zim file (or not a patched version of zimcheck) ? |
I assure you, I most definitely did not do anything that involved writing or modifying C/C++ code ;) The ZIM file above used 2KiB of reserved space for the mimetypelist at offset 80 (thus the many zero bytes). The problem here is that the hexdumps created with It's most likely the compressed title indexes, but I am still checking if that is also the problem with kiwix-serve and not only for the PWA. |
Can you upload the zim file here ? (You may have to rename it to |
Sure, here you go: test.zip |
I confirm the PWA can't populate FYI In the PWA, you can access what it thinks is the title list by pressing a space in the search field. And you can access the full URL list, including namespaces, by typing space + / (space followed by /) as shown below: |
It is missing the |
But the ZIM is still incorrectly formatted if |
Yes. The cluster should not be compressed.
No, libzim is nice here (https://github.com/openzim/libzim/blob/main/src/fileimpl.cpp#L250-L254). |
OK, thanks for confirming -- I think libzim and the KJS backend are in accordance, then (except for the former requiring M/Counter). KJS backend falls back from |
Should we not check this in zimcheck? |
Yes |
I can confirm the issue with Regarding the bug with the PWA/kiwix-js: Unfortunately, this seems to remain even when the X/ namespace remains properly uncompressed. However, this is a unrelated issue and I should hopefully be capable of debugging that myself. Once again, thank you for your helpful comments and the quick fix. |
Hello again,
As I've mentioned on the slack channel, I've encountered a potential bug when trying to serve a ZIM created by a custom zim writer library. I had initially assumed that this was a bug in my library, but
zimcheck
passes and both thekiwix-desktop
appimage and the PWA are able to read the ZIM file.ZIM creation
The ZIM file has been created using this library, more specificaly this file.
Zimcheck
kiwix-serve
kiwix-serve
is unable to open the file but does not give a specific error.It works fine with
askubuntu.com_en_all_2022-11.zim
.kiwix-desktop (app image)
The recent app image (invoked via
./kiwix-desktop_x86_64_2.3.1-4.appimage /tmp/test.zim
) is capable of reading the ZIM without any problems.kiwix-desktop (from apt)
This is where it gets interesting: using the old version installed via the package manager fails:
However, if we set
$ZIM_DIRENTLOOKUPCACHE=1
, it works without issues.Of course, this is the old, outdated package from the package manager and this may be entirely unrelated, but it provided the most interesting debug output so far.
kiwix PWA
As mentioned, the kiwix PWA is capable of reading the ZIM without issue. Here is the console log:
Not sure why it believes this is a legacy ZIM DirListing (I think I followed the standard). Expanding the collapsed object:
Additional info:
ZIM header and metadata:
System
The text was updated successfully, but these errors were encountered: