Versioned according to Semantic Versioning.
2.71.0 - 2024-11-20
Changed:
- Rewrite
ocrd_utils.logging
, #1288- Handle only
''
as the root logger disableLogging
: Remove handlers from root and all configured loggers- Do not do any module-level modification of the log config
- Handle only
Fixed:
- Typo in processing_worker log message, #1293
- Call
initLogging
at the right time inocrd_network
, #1292 make docs
fixed with absolute path to location, #1273
2.70.0 - 2024-10-10
Added:
ocrd network client workflow run
: Add--print-status
flag to periodically print the job status, #1277- Processing Server:
DELETE /mets_server_zombies
to kill any renegade METS servers, #1277 - No more zombie METS Server by properly shutting them down, #1284
OCRD_NETWORK_RABBITMQ_HEARBEAT
to allow overriding the heartbeat behavior of RabbitMQ, #1285
Changed:
- significantly more detailed logging for the METS Server and Processing Server, #1284
- Only import
ocrd_network
in src/ocrd/decorators/init.py once needed, #1289 - Automate release via GitHub Actions, #1290
Fixed:
ocrd/core-cuda-torch
: Install torchvision as well, #1286- Processing Server: remove shut down METS servers from deployer's cache, #1287
- typos, #1274
2.69.0 - 2024-09-30
Fixed:
- tests: ensure
ocrd_utils.config
gets reset whenever changing it globally ocrd.cli.workspace
: consistently pass on--mets-server-url
and--backup
ocrd.cli.workspace
: makelist-page
work w/ METS Serverocrd.cli.validate "tasks"
: pass on--mets-server-url
lib.bash
: fixerrexit
handling- actually apply CLI
--log-filename
, and show in--help
- adapt to Pillow changes
ocrd workspace clone
: do pass on--file-grp
(for download filtering)OcrdMetsServer.add_file
: pass onforce
kwargWorkspace.reload_mets
: handle ClientSideOcrdMets as wellOcrdMets.get_physical_pages
: coverreturn_divs
w/ofor_fileIds
andfor_pageIds
disableLogging
: also re-instate root logger to Python defaultsOcrdExif
: handle multi-frame TIFFs gracefully inidentify
callout, #1276
Changed:
run_processor
: be robust ifocrd_tool
is missingsteps
PcGtsType.PageType.id
viamake_xml_id
: replace/
with_
ClientSideOcrdMets
: use same logger name prefix as METS ServerProcessor.zip_input_files
: when--page-id
yields empty list, just log instead of raise
Added:
OcrdPage
: newPageType.get_ReadingOrderGroups()
to retrieve recursive RO as dict- METS Server: export and delegate
physical_pages
- ocrd.cli.workspace
server
: add subcommandsreload
andsave
- processor CLI: delegate
--resolve-resource
, too OcrdConfig.reset_defaults
to reset config variables to their defaultsocrd_utils.scale_coordinates
for resizing images
2.68.0 - 2024-08-23
Changed:
- ocrd_network: Use
ocrd-all-tool.json
bundled by core instead of download from website, #1257, #1260 - 🔥
ocrd network client processing processor
renamedocrd network client processing run
, #1269 ocrd network client processing run
supports blocking behavior with--block
by polling job status, #1265, #1269
Added:
ocrd network client workflow run
Run, optionally blocking, a workflow on the processing server, #1265, #1269ocrd network client workflow check-status
to get the status of a workflow job, #1269ocrd network client processing check-status
to get the status of a processing (processor) job, #1269ocrd network client discovery processors
to list the processors deployed in the processing server, #1269ocrd network client discovery processor
to get theocrd-tool.json
of a deployed processor, #1269ocrd network client processing check-log
to retrieve the log data for a processing job, #1269- Environment variables
OCRD_NETWORK_CLIENT_POLLING_SLEEP
andOCRD_NETWORK_CLIENT_POLLING_TIMEOUT
to control polling interval and timeout forocrd network client {processing processor,workflow run
, #1269 ocrd workspace clone
/Resolver.workspace_from_url
: withclobber_mets=False
, raise a FileExistsError for existing mets.xml on disk, #563, #1268ocrd workspace find --download
: print the the correct, up-to-date field, notNone
, #1202, #1266
Fixed:
- Sanitize
self.imageFilename
for thepcGtsId
to ensure it is a validxml:id
, #1271
2.67.2 - 2024-07-19
Fixed:
- Run
multiprocessing.set_start_method('fork')
only for OSX, #1261 - Broken PyPI release, #1262
2.67.1 - 2024-07-17
Fixed:
- Build and tests fixed, no functional changes from #1258
2.67.0 - 2024-07-16
Changed:
- Additional docker base images with preinstalled tensorflow 1 (
core-cuda-tf1
), tensorflow 2 (core-cuda-tf2
) and torch (core-cuda-torch
), #1239 - Resource Manager: Skip instead of raise an exception download if target file already exists (unless
--overwrite
), #1246 - Resource Manager: Try to use bundled
ocrd-all-tool.json
if available, #1250, OCR-D/all#444
Added:
ocrd process
does support-U/--mets-server
, #1243
Fixed:
ocrd process
-derived tasks are not run in a temporary directory when not called from within workspace, #1243- regression from #1238 where processors failed that had required parameters, #1255, #1256
- METS Server: Unlink UDS sockert file if it exists before startup, #1244
- Resource Manager: Do not create zero-size files for failing downloads, #1201, #1246
- Workspace.add_file: Allow multiple processors to create file group folders simultaneously, #1203, #1253
- Resource Manager: Do not try to run
--dump-json
for known non-processorsocrd-{cis-data,import,make}
, #1218, #1249 - Resource Manager: Properly handle copying of directories, #1237, #1248
- bashlib: regression in parsing JSON from introducing parameter preset files, #1258
Removed:
- Defaults for
-I/--input-file-grp
/-O/--output-file-grp
, #1256, #274
2.66.1 - 2024-06-26
Fixed:
- GHA Docker: build docker.io first, then tag ghcr.io
2.66.0 - 2024-06-07
Fixed:
OcrdFile.url
can now be removed properly, #1226, #1227ocrd workspace find --undo-download
: Only remove file refs if it's an actual download, #1150, #1235ocrd workspace find --undo-download
: When--keep-files
is not set, remove file from disk, #1150, #1235OCRD_LOGGING_DEBUG
: Normalize/lowercase boolean values, #1230, #1231Workspace.download_file
: UseOcrd.local_filename
if set but not already present in the FS, #1149, #1228
Changed:
- Install ocrd with
pip --editable
inside Docker, #1225, OCR-D/ocrd_all#416 - Reduce log spam in ocrd_network, #1222
- CI: Stop testing for 3.7, #1207, #1221
Added:
- Separate docker versions for tensorflow v1, tensorflow v2 and torch, #1186
- Processing server can serve as a proxy for METS Server TCP requests, forwarding to UDS, #1220
ocrd workspace clean
to remove "untracked", i.e. not METS-referenced, files, #1150, #1236-p
now supports parameter preset resources in addition to raw JSON and absolute/relative paths to JSON files, #930, #969, #1238
2.65.0 - 2024-05-03
Fixed:
- bashlib processors will download on-demand, like pythonic processors do, #1216, #1217
Changed:
- Replace
distutils
which equivalents fromshutil
for compatibility with python 3.12+, #1219 - CI: Updated GitHub actions, #1206
- CI: Fixed scrutinizer, #1217
Added:
- Integration tests for
ocrd_network
, #1184
2.64.1 - 2024-04-22
Fixed:
- Broken PyPI release
2.64.0 - 2024-04-22
Removed:
- Support for Python
<=
3.7, #1207
Fixed:
- remove duplicate description of
OCRD_DOWNLOAD_TIMEOUT
in--help
, #1204 - Use
importlib_metadata
shim for 3.9+, #1210, OCR-D/ocrd_froc#10
2.63.3 - 2024-03-07
Added:
make uninstall-workaround
compantion tomake install-workaround
, #119
Fixed:
OcrdMets.add_file
: fix finding existing el_pagediv, #1199
2.63.2 - 2024-03-05
Fixed:
- Missed incrementing version
2.63.1 - 2024-03-05
Fixed:
OcrdMets
bug that produced invalid caches, #1192, #1195, #1193
2.63.0 - 2024-02-12
Fixed:
- Reduce logging level of spammy log statements to
DEBUG
in workspace, #1181 - Clean up lxml code, #1188
Changed:
- 🔥
OcrdFile.local_filename
returns/acceptsstr
after unpopular change toPath
from #1079, #1182, #1167 WorkspaceValidator
: more efficiency by doing all page checks in the same loop, #1071
Added:
OcrdMets.get_physical_pages
to search for/change/generate ranges for page-specificmets:div
attributes beyond to@ID
, #821, #1063
2.62.0 - 2024-01-30
Added:
- Basic integration test for
ocrd_network
, #1164 ocrd-tool.json
forocrd-dummy
now in repo root, for ocrd_all'smake ocrd-all-tool.json
, #1168
Fixed:
- METS Server: UDS sockets are removed on process exit, #117
Changed:
- replace license-incompatible sparkline library with a simpler implementation, #1176
- remove all pkg_resources calls with modern alternatives, no more run-time setuptools dependency, #1174
2.61.2 - 2024-01-24
Fixed:
- another regression to docker deployment (requirements.txt missing), #1173
2.61.1 - 2024-01-23
Fixed:
- deps-cuda: add workaround for keras-team/tf-keras#62, #1169
- fix regression docker deployment, #1172
2.61.0 - 2024-01-23
Changed:
- 🔥 simplify the project layout and distribution policy, #1166
- In the future there will be only one distribution
ocrd
- The previous separate distributions of the
ocrd_utils
,ocrd_models
,ocrd_modelfactory
,ocrd_validators
andocrd_network
are all part ofocrd
now - Nothing needs to be changed in code using OCR-D/core, the package structure and API is the same as before
- Until the next major release, we will continue to provide distributions for
ocrd_utils
etc. that contain the same code asocrd
- Using
ocrd_utils
etc. as distributions inrequirements.txt
orinstall_requires
is now deprecated - Once we release v3.0.0, these distributions will be depublished
- In the future there will be only one distribution
2.60.3 - 2024-01-10
Fixed:
make install-dev
working withsetuptools>=64
again, #1163
2.60.2 - 2024-01-09
Fixed:
- Log level downgraded from DEBUG to INFO in logging.conf, #1161
- log OAI check as
DEBUG
notINFO
, #1160
2.60.1 - 2023-12-15
Fixed:
- Docker: copy
.git
during build, sosetuptools_scm
can determine version number, #1159
2.60.0 - 2023-12-15
Fixed:
ocrd workspace list-page
now works in workspaces with non-page-specific files, #1148, #1151
Changed:
cli.workspace.WorkspaceCtx
andResolver.resolve_mets_arguments
now have defaults formets_server_url
,mets_basename
andautomatic_backup
, slub/mets-mods2tei#68, #1156- 🔥 switch to
pyproject.toml
, derive version from git, separate build from install, #1065
2.59.1 - 2023-12-05
Fixed:
- Chunking algorithm for
ocrd workspace list-page
now handles edge cases properly, #1145 - Avoid deadlocks in
ocrd_network
if processing workers not deployed, #1125, #1142
2.59.0 - 2023-11-27
Changed:
- Change web API paths to avoid any potential URL segment clashes, #1136, OCR-D/spec#250
GET /
->GET /info
POST /
->POST /run
/{job-id}
->/job/{job-id}
/{job-id}/log
->/log/{job-id}
Fixed:
- WorkspaceBagger: do not overwrite files in case of filename conflict, #1129, #1137
- Update apidocs to include
ocrd_network
, #1131
Added:
ocrd workspace update-page
to set attributes on themets:div
of a page, #1133, #1134ocrd workspace list-page
now has configurable output format and optional partitioning of the page list, #1140, #1141ocrd zip bag
,ocrd workspace merge
,ocrd workspace clone
now support whitelisting/blacklisting file groups, #356, #383, #506, #582, #1138, #1139- workflow endpoint supports storing and deduplicating workflows, #1143
Removed:
OcrdMets
: remove Unused__exit__
method,.#1130 #1132
Fixed:
- bashlib: regression introduced in v2.58.0 breaking non-mets-server calls, #1128
2.58.0 - 2023-10-20
Fixed:
helpers.run_cli
: Handle bothint
andstr
log levels, #1121- bashlib: typo
ocrd_argv
->ocrd__argv
, #1122, #1123 processing workers: pass log level as string andHandle logging of bashlib workers separately, #1123 #1127initLogging
at the right time,ocrd workspace bulk-add
now supports-U/--mets-server-url
, #1126- bashlib: Support
-U
as alias for--mets-server-url
, #1126
Added:
- METS server:
POST /reload
to reload METS from disk, #1123, #1124
2.57.2 - 2023-10-18
Fixed:
- bashlib: remove vestigial
--log-filename
option from #1105, #1120
2.57.1 - 2023-10-18
Fixed:
- Docker deployment process, no functional change
2.57.0 - 2023-10-18
Fixed:
- running a processor as a worker no longer dumps
ocrd-tool.json
and messed up logging, #1116
Changed:
- logging: With
ocrd_logging.conf
(e.g. in Docker), log all messagesDEBUG
and up and log toocrd.log
, #1117
2.56.0 - 2023-10-13
Changed:
- A separate logging dir tree structure for the modules (processing servers, processing workers, processor servers, mets servers, processing jobs). Configurable with env variable
OCRD_NETWORK_LOGS_ROOT_DIR
, #1111- Processing job-level logging - each job is logged into a separate file with format
{job_id}.log
- Processing job-level logging file paths are added to the Job models and preserved in the database.
- The
ocrd_network
logging is based on the format provided inocrd_utils
- Processing job-level logging - each job is logged into a separate file with format
- Support env variable
OCRD_NETWORK_SOCKETS_ROOT_DIR
for setting the root directory for METS server sockets, #1111 - An endpoint
/job/{id}/log
for getting the log file of a processing job of a processor, #1111
2.55.2 - 2023-10-12
Fixed:
OcrdAgentModel
:_type
must betype
, pydantic/pydantic#6797, #1114
2.55.1 - 2023-10-12
Changed:
ocrd workspace bulk-add
distinguishes betweenurl
andlocal_filename
, supporting both, #1086, #1079, #1113
2.55.0 - 2023-10-11
Added:
/workflow
endpoint that can handleocrd process
workflows and distribute jobs page-wise across workers, #1083, #1105, #1108, #1109
Changed:
- METS Server: Make sockets world-readable and -writable, #1098, #1099
- METS Server: Implement find_files support for
local_filename
andurl
, #1100 - Logging: consistent logger names derived from
ocrd.
, #1101 - Logging: consistent logging across the packages, including
ocrd_network
, #1101 ..
page range operator: allow single-page ranges, #1106, #1107
2.54.0 - 2023-09-12
Added:
- METS Server: providing concurrent additional access to the METS file for parallel processing, #966
- Web API: Cache jobs in the processing server with an optional callback once processed, #1069
- Web API: Lock pages output file groups of a workspace to prevent simultaneous non-additive access to workspaces, #1069
- Web API: Support job dependency for caching complete fully-deterministic workflows, #1069
- Web API: Processing server will start all ready requests, not just the first one, #1069
- Web API: Workers will create on demand, retry attempts configurable via
OCRD_NETWORK_WORKER_QUEUE_CONNECT_ATTEMPTS
, #1093 ocrd_utils.config
to collect all configuration based on environment variables in one place, #1081- Processor CLI: Warn if a given page ID cannot be found in METS, #1088, #1089
Changed:
- Processors now have
worker
andserver
subcommands, with separate --help, for starting processing worker/processor server, #1087 - Move
tf_disable_interactive_logs
(to silence keras/tensorflow print statements) toocrd_utils.logging
and do not call on module-level, #1090, #1091 - 🔥 We do now properly distinguish between original and local-file FLocat, original URL will not be changed for downloads anymore, #323, #1079
- 🔥 logging has been streamlined to be better usable as a library, #1080
2.53.0 - 2023-08-21
Fixed:
WorkspaceValidator
: make the check for consistency ofpc:Page[@pcGtsId]
andmets:file[@ID]
optional with themets_fileid_page_pcgtsid
skip flag, #1066ocrd resmgr download
: usebasedir
as an arg and not a kwarg, #1078
Changed:
WorkspaceValidator
: Download files temporarily/on-demand, #1066ocrd-* --version
now prints only the version of the processor without noise or core version, #1068
Added
- Environment variables to control optional retries and timeouts for downloading files:
OCRD_DOWNLOAD_RETRIES
: Number of times to retry failed attempts for downloads of workspace files. #1073OCRD_DOWNLOAD_TIMEOUT
: Timeout in seconds for connecting or reading (comma-separated) when downloading. #1073
- Environment variables used throughout core are now documented in README and
ocrd --help
, #1073 - Web API:
--create-queue
option to on-demand create RabbitMQ for processing workers, #1075 - Web API:
--queue-connec-attempts
to retry connection to RabbitMQ in case server is not yet running, #1075
2.52.0 - 2023-06-26
Added:
make deps-cuda
: Makefile target to set up a working CUDA installation, both for native and Dockerfile.cuda, #1055- Implementation of the Standalone Processor Server module, #1030
ocrd_utils.guess_media_type
to consistently try to determine media type from a file name, #1045
Changed:
- Refactoring the Network CLI, all network module CLI are in
ocrd_network
now, #1030 - The Processing Server uses
ocrd-all-tool.json
file, removing local processor install dependencies, #1030 - Overall improvement and refactoring of the
ocrd_network
package, #1030 - Optionally skip deployment of mongodb and rabbitmq to make external usage/management possible, #1048
page_from_file
now also accepts a (str
) file path in addition toOcrdFile
, #1045- packaging: install/uninstall in correct build order, use
python -m build
instead ofpython setup.py sdist bdist_wheel
, #1051
Removed:
- Obsolete travis CI configuration removed, #1056
- Support for end-of-life python versions 3.5 and 3.6, #1057
Fixed:
- Makefile
FIND_VERSION
macro: usegrep -E
instead ofgrep -P
for macos compatibility, #1060 ocrd resmgr
: detect HTTP errors as such and don't try to continue if HTTP >= 400, #1062PageValidator
: Ensure TextLine has coordinates when checking for Baseline containment, #1049
2.51.0 - 2023-06-07
Changed:
core cuda
Docker: CUDA base image working again, based onocrd/core
notnvidia/cuda
in a separateDockerfile.cuda
, #1041core-cuda
Docker: adopt #1008 (venv under /usr/local, as in ocrd_all, instead of dist-packages), #1041core-cuda
Docker: use conda (micromamba) for CUDA toolkit, and nvidia-pyindex for CUDA libs – instead of nvidia/cuda base image, #1041- more robust workaround for shapely#1598, #1041
Removed:
- Revert #882 (fastentrypoints) as it enforces deps versions at runtime
- Drop
ocrd_utils.package_resources
and usepkg_resources.*
directly, #1041 ocrd resmgr
: Drop redundant (processor-provided) entries in the centralresource_list.yml
.
2.50.0 - 2023-04-24
Added:
- 🔥
ocrd_network
: Components related to OCR-D Web API, #974
Changed:
bashlib
: support file input from multiple file groups, #1027, #1031
Fixed:
- Don't output default docstrings for bashlib processors, #1026
2.49.0 - 2023-03-24
Changed:
- 🔥 (for now: also) publish Docker images to ghcr.io, not docker.io, #997
ocrd resmgr
: eynollah models now provided by eynollah itself, qurator-spk/eynollah#91
2.48.1 - 2023-03-22
Changed:
make docker-cuda
: Support CUDA 11.3 not 11.2, #1020
2.48.0 - 2023-03-22
Changed:
- 🔥 ocrd.run_processor / ocrd.processor.get_processor: rm unnecessary ocrd_tool kwarg #998, #1009
- chdir into workspace directory for both cached and uncached
get_processor
, #972, 987 - 🔥 new CUDA base image 20.04, support CUDA runtime 11/12, not 10, #1014
Fixed:
make install
: do not update opencv-python-headless or numpy for python<= 3.6
, #1014
2.47.4 - 2023-03-16
Changed:
resmgr
:ocrd-typegroups-classifier
resources now listed decentrally, #1011, OCR-D/ocrd_typegroups_classifier#15
2.47.3 - 2023-03-15
Fixed:
- Docker: reintroduce
python3-pip
because why not, #1004
2.47.2 - 2023-03-15
Fixed:
- Docker: Use
pip3
notpip
, #986 make install
: Speed up opencv built for (now unsupported) python<= 3.6
, #986, OCR-D/ocrd_calamari#72
Added:
- CI/CD: GH action to deploy docker images to ghcr.io, #986
2.47.1 - 2023-03-15
Fixed:
- Docker: install
python3-venv
, do not installpython3-pip
, #1003, #1004, OCR-D/ocrd_all#352
2.47.0 - 2023-03-15
Fixed:
ocrd resmgr
: handle namespaces packages gracefully for Python<=
3.6, #917, #985ocrd resmgr
: guess media type withfiletype.py
in addition toMIME_TO_EXT
, #991OcrdMets
: Insertmets:agent
in a schema-compliant way, #976, #977ocrd_cli_wrap_processor
: remove unnecessary ocrd_tool kwarg, #998, #999- Docker base image builds again, except CUDA, #986
Added:
ocrd resmgr
: support Google Drive links, #993, #992
Changed:
- Use the new
importlib.resources.files
API, #995, #996 ocrd resmgr
: resources forocrd_anybaseocr
removed from central list, provided by the project, #989, OCR-D/ocrd_anybaseocr#100
2.46.0 - 2023-02-16
Changed:
WorkspaceValidator
: anOcrdFile
without a pageId is not an error, but a document-wide file, #485, #979WorkspaceBackupManager
: add snapshot on init if enabled, #981- 🔥 end-of-life for python 3.6, test from 3.7 to 3.11, #956
- 🔥 update base image to Ubuntu 20.04, #956
Fixed:
bashlib
: Handle empty list of input files, #978OcrdMets.find_files
: don't override the@LOCTYPE
of file candidates, #980ocrd resmgr
: replace libmagic with simple lookup by suffix, #982, #984
Added:
helpers
:get_cached_processor
to get instances of a processor in preparation for #974, #972
2.45.1 - 2023-01-20
Fixed:
ocrd resmgr
: insert new entries first, so dedup works as expected, #971
2.45.0 - 2022-12-13
Fixed:
ocrd resmgr download --overwrite
now works properly for both directories and files, #690, #797ocrd resmgr
:archive
resources can now also be ZIP files and reference files and folders in them, #967ocrd-dummy
: can now be used to create PAGE-XML for images without copying, #803, #814
2.44.0 - 2022-12-08
Added:
ocrd zip update
command to update checksums for an OCRD-ZIP after changing it, #363, #951
Removed:
ocrd zip bag
does no longer support the long-broken--in-place
option, #964, #363
2.43.0 - 2022-12-01
Added:
OcrdMets.refresh_caches
to update caches after changes to XML outside ofOcrdMets
, #957, #960
2.42.1 - 2022-11-30
Fixed:
- Regressions from PR #875, #957, #958
- Missing import in
ocrd workspace merge
, #956
2.42.0 - 2022-11-23
Fixed:
- Symlinks in workspaces are properly resolved now, #802, #954
Added:
- Optional caching of access to METS, configured via environment variable
OCRD_METS_CACHING
, #875 - CPU and memory profiling , configured via environment variable
OCRD_PROFILE
andOCRD_PROFILE_FILE
, #678
Changed:
ocrd workspace find
: supports comma-separated regexes, ranges and literal values for--page-id
, #955ocrd workspace find
: ranges are generated with last number in string, #955
2.41.0 - 2022-11-09
Fixed:
ocrd workspace list-installed
should not create spurious entries formoduledir
files, #940OcrdResourceManager.download
does not need to querysize
via HTTPContent-Length
in most cases, #924, #939make install
: Reinstall shapely to work around shapely/shapely#1598, #947
Changed:
ocrd workspace bulk-add
: Generate file_id consistent with conventions from filename if no--file-id
given, #943
2.40.0 - 2022-10-25
Fixed:
- Downloading to
moduledir
should not create subdirectory, #934 - Reduce logging noise in Ocrd{ResourceManager,Mets,Workspace}, #933, #916
- Allow downloading resources explicitly to
module
location, #932
Changed:
- bashlib processors support
--profile{-file}
and--dump-module-dir
, #929
2.39.0 - 2022-10-23
Fixed:
ocrd resmgr download '*'
working again, #904, #908, #909- Resource manager respects
moduledir
correctly, #904 moduledir
now be able to handle namespace packages properly, #917- processing with
--overwrite
does not create duplicates any more, #861 - bashlib:
ocrd validate tasks
call now supports non-standard METS name, #925
Added:
- Processors have a
--dump-module-dir
to print their implementation-specific module directory to STDOUT, #904 ocrd workspace merge
: support--force
to overwrite mets:file with clashing IDs, #926
Changed:
ocrd_utils.make_file_id
: only fall back to output fileGrp + (page)ID instead of page counter, #861OcrdWorkspace.add_file
: when ID already exists, remove (with overwrite) or fail instead of reusing- Workspace.merge: delegate
force
to eachadd_file
2.38.0 - 2022-08-14
Fixed:
ocrd zip
: Properly respectOcrd-Mets
, #899ocrd workspace merge
: missing arguments, #896ocrd resmgr download
: Support dynamic discovery, #901
Added:
- Processors support profiling with
--profile
and--profile-file
, #878, bertsky#4
Removed:
ocrd zip
: remove support for obsoleteOcrd-Manifestation-Depth
, #902, OCR-D/spec#182
2.37.0 - 2022-08-03
Added:
ocrd resmgr
: Resources of processors can be described in theocrd-tool.json
, #800
2.36.0 - 2022-07-18
Fixed:
ocrd_utils.generate_range
:maxsplits
should be 1, not 2, #880- Typos in CHANGELOG, README and code comments, #890
Changed:
- Consistently use snake_case but continue to support CamelCase for kwargs and CLI options, #874, #862
- Update to spec to 3.19.0, introducing greater flexibility in describing parameters, #872, #848, OCR-D/spec#206
ocrd workspace merge
: support mappingfile_id
andpage_id
in addition tofile_grp
, #886, #888ocrd workspace merge
: rebaseOcrdFile.url
to target workspace, #887, #888- Replace
resource_filename
et al from pkg_resources with faster alternatives, #881, #882
2.35.0 - 2022-06-02
Changed:
- OCRD-ZIP: Drop
Ocrd-Manifestation-Depth
and disallowfetch.txt
, OCR-D/spec#182 - Parameters can now be described with most JSON-Schema constructs, OCR-D/spec#206, #848
2.34.0 - 2022-05-20
Added:
ocrd log
now accepts-
argument to read messages from STDIN, #852, #870
Changed:
ocrd_utils.safe_filename
: replace with_
instead of.
and retain pre-existing_
, #858, #859OcrdMets.find_files
: allow pageId regex, precompile all regexes, #855, #856
Fixed:
ocrd resmgr list-available
: handle processors not in resource list gracefully, #854, #865ocrd resmgr
: do not try to parse strings as dates, #867, #869ocrd workspace bulk-add
: use 1-based counter, #864
2.33.0 - 2022-05-03
Fixed:
ocrd workspace remove-group
: Pass on--recursive
toremove_file_group
, #831, #832ocrd workspace bulk-add
: handle unset file_id properly, #812, #846io.BufferedReader
filename attribute should bename
notfilename
, #838, #839
Changed:
OcrdWorkspace.image_from_*
: support passing explicit AlternativeImage filename, #845
Removed:
make asset-server
feature no longer used, #843python3-pip
dependency is redundant, #813
2.32.0 - 2022-03-30
Fixed:
ocrd zip bag
:-I
is not required, #828, #829
Changed:
OcrdExif
: fallback to PIL if ImageMagick'sidentify
is not available, #796, #676OcrdWorkspace.image_from_*
: Avoid false warning when recropping, #820, #687
2.31.0 - 2022-03-20
Changed:
make cuda-ubuntu
installs all CUDA versions, #704, OCR-D/ocrd_all#270ocrd resmgr
: updated models for ocrd-anybaseocr-{tiseg,layout-analysis}, #819, OCR-D/ocrd_anybaseocr#89
Fixed:
- Error message erroneously referenced
mets:file/@ID
insteadmets:fileGrp/@USE
, #823 - Consistently use kwargs/args in
OcrdWorkspace.save_image_file
, #822 - Missing arg for log message in WorkspaceValidator, #811
2.30.0 - 2022-02-01
Changed:
- Images processed by OCR-D can now be up to 40,000 by 40,000 pixels, #735, #768
OcrdExif
: get pixel density metadata from ImageMagick'sidentify
, not PIL, #676- Refactor parsing of
--mets
/--mets-basename
/--working-dir
to reduce ambiguities, #693, #696 - bashlib: implify (i.e. remove) build process, #742, #785
ocrd workspace bulk-add
: Make bulk-add more flexible and (hopefully) user-friendly, #641, #754, #769, #776
Fixed:
- PAGE validation: handle
pc:ImageRegion
as well, #781 - bashlib: pass on parameters for task validation, #784
<processor> --list-resources
: List only the relevant type of resource (directory or file), #750, #777
2.29.0 - 2021-12-08
Changed:
ocrd_utils.make_file_id
: combine with output fileGrp if input has pageId, but don't extract numbers, #744OcrdMets.add_file
:mets:fileGrp/@USE
must be validxs:ID
, #746
Added:
ocrd ocrd-tool
: wraplist-resources
andshow-resource
fromProcessor
- bashlib
ocrd__parse_argv
: add--list-resources
and--show-resource
, #751 ocrd bashlib
: wrapinput-files
fromProcessor
andmake_file_id
- bashlib
ocrd__wrap
: offerocrd__files
andocrd__input_file
, #571
2.28.0 - 2021-11-30
Added:
- Store parameterization of processors in METS for provenance, #747
ocrd workspace find --download
: Add a--wait
option to wait between downloads, #745- bashlib: Check fileGrps when parsing CLI args, #743, OCR-D/ocrd_olena#76
- Dockerfile: Install
time
to have/usr/bin/time
in the image, #748, OCR-D/ocrd_all#271
Fixed:
ocrd-dummy
: Also set pcGtsId, v0.0.2, #739
2.27.0 - 2021-11-09
Fixed:
- remove dependency on six, #732
ocrd workspace remove-group
: handle files not in subdir gracefully, #734ocrd resmgr
: fix "reference before assignment" issue #689, #733OcrdWorkspace.remove_file
: handle empty regexes, #725
Changed:
ocrd workspace rename-group
will now also rename filenames andmets:file/@ID
, #736
2.26.1 - 2021-10-14
Fixed:
resmgr
: Correct URL for tesseract configs
2.26.0 - 2021-09-20
Added:
ocrd_utils
: functions for scaling images, #707
Changed:
OcrdFile
: should only ever be instantiated in the context ofOcrdMets
, #324, #714- Logging outputs to
STDERR
notSTDOUT
, OCR-D/spec#183, #713, #667
Fixed:
ocrd workspace merge
: handlefile_grp
parameter, #715ocrd workspace merge
: explicit --copy-files was --no-copy-files, #715ocrd resmgr
: Fix tesseract URLs, #721
2.25.1 - 2021-06-30
Fixed:
ocrd_page
: fallback forid
if none of the attributes are set, #683
2.25.0 - 2021-06-30
Added:
ocrd_page
: Universal attributeid
to get eitherid
,imageFilename
orpcGtsId
, #683, #682ocrd_page
: functionparseTree
andwith_etree
kwarg toworkspace.page_from_*
to access PAGE with etree API, #699, #313
Fixed:
- Version-independent URL of METS XSD, #695, #694
- Recrop if deskewed after cropping, #688
2.24.0 - 2021-04-27
Changed:
workspace.image_from_page
will return the AlternativeImage with most features matched, not the last one, #686crop_image
: Ensures that masked areas do not influence the median forfill='background'
, #686
Added:
ocrd resmgr
: modeldefault
for eynollah, #668
2.23.2 - 2021-03-10
Added:
ocrd resmgr
: new modeldefault-2021-03-09
for sbb_binarization, #681
2.23.1 - 2021-03-07
Added:
configs
resource forocrd-tesserocr-recognize
, #680
Changed:
- Stop testing python 3.5, start testing python 3.9
ocrd resmgr
: skip redundantcontent-length
request ifsize
is known
2.23.0 - 2021-02-26
Changed:
- The
--page-id
/-g
option now accepts value ranges with the..
operator, #672
Added:
ocrd workspace merge
to merge two workspaces, #670, #673- Two experimental calmari models
c1_fraktur19-1
andc1_latin-script-hist-3
, #675
2.22.4 - 2021-02-17
Fixed:
OcrdPage
: never output parsing results to STDOUT, #665, #372- improved docstrings throughout (ht @bertsky), #664
resmgr --location cwd
will download to$PWD
, not$PWD/ocrd-resources/<executable>
, #671
2.22.3 - 2021-01-27
Changed:
- resmgr: do not download on-demand when encountering unresolveable file parameters
Fixed:
- resmgr: bugs in
--location cwd
resolving
2.22.2 - 2021-01-26
Changed:
- resmgr/ocrd_calamari: disable pre-1.0 model
- resmgr/ocrd_calamri: Rename resource
qurator-gt4hist{,ocr}-1.0
2.22.1 - 2021-01-26
Fixed:
mkdir -p $HOME/.config/ocrd
2.22.0 - 2021-01-26
Added:
- Implement file resource algorithm from OCR-D/spec#169, #559
- New CLI
ocrd resmgr
to download/browse processor resources, #559 Workspace.rename_file_group
with CLIocrd workspace rename-group
to rename file groups, #646- PAGE API method
get_AllTextLines
- resources for kraken
Changed:
ocrd workspace add
: guess--mimetype
if not provided, #658ocrd workspace add
: warn if--page-id
not provided, #659
Fixed:
run_cli
: don't reference undefined vars in error handler, #651name
of resources mustn't contain slash/
2.21.0 - 2020-11-27
Changed:
prune_ReadingOrder
: also remove if RO contains only empty groups, #648- Don't restrict version range on
numpy
, #642 run_cli
: measure both wall time and CPU time, #647
Fixed:
TaskSequence.parse
do not hard-code class to instantiate, #649
2.20.2 - 2020-11-20
Fixed:
ocrd workspace
: workspace directory should always be absolute, #644
2.20.1 - 2020-11-17
Fixed:
Workspace.find_files
: Ignoremets:file
w/omets:FLocat
, #640- Re-crop after rotation to avoid coordinate inconsistencies (and get tighter crops), #640
- Handle missing
@orientation
by using the next-upper-level@orientation
, #640
2.20.0 - 2020-11-03
Fixed:
- Use
chmod
, notfchmod
to support Windows, #636 ht @b2m - Do not capture processor output in
run_cli
, #592, #638
Changed:
- Record version information in
pg:MetadataItem
, #637 - New method
Processor.zip_input_files
to uniformly handle files from multiple input file groups, #635 - Auto-invalidate derived
pg:AlternativeImage
images when changing coords, #639
2.19.0 - 2020-10-23
Changed:
- CUDA base image is now nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04, #629 ht @Witiko
- Convert 16-/32-bit images to 8-bit because PIL poorly supports the former, #627
Fixed:
- Permission of existing
mets.xml
should not change, #403, #625 - Handle
pg:AlternativeImage
withoutcomments
, #633
2.18.1 - 2020-10-21
Fixed:
- default ocrd_logging.conf had a typo, #590, #628
- default ocrd_logging.conf mirrors builtin logging config, #630
--log-level
override mechanism works with logging config file, #626, #631
2.18.0 - 2020-10-12
Changed:
- 🔥
OcrdMets.find_files
is now a generator, #614 - Print docstrings if available for
--help
, #623 - parameter values can be arrays, OCR-D/spec#174
2.17.2 - 2020-10-07
Fixed:
- As a workaround for tensorflow compatibility, require
numpy < 1.19.0
, #620
2.17.1 - 2020-10-05
Fixed:
ocrd workspace remove-group
: don't fail on non-existing files, #618
Changed:
- media type <-> extension mapping for
text/plain
, #612
2.17.0 - 2020-09-23
Fixed:
ocrd_utils.image
handles 16/32-bit images correctly, #606, python-pillow/Pillow#4925- OcrdPage: Most elements should be hashable, i.e. usable in sets, maps etc., #610
ocrd_utils.make_file_id
: Ensure produced IDs are syntactically valid xs:ID, #616
Changed:
- Calling a processor w/o any arguments: show help and exit 1, #586, #615
- 🔥
Workspace.add_file
requires page_id kwarg (which can be None), #560 - Reorganized the loggign to be more consistent and well-behaved, #599
2.16.3 - 2020-09-09
Fixed:
- Prune empty reading order when serializing, #602
2.16.2 - 2020-09-08
Fixed:
- handle empty reading order in PAGE gracefully, #600
2.16.1 - 2020-09-08
Fixed:
ocrd_utils
on module-level, disable logging less than CRITICAL, call initLogging explicitly in CLI, #594
2.16.0 - 2020-09-03
Fixed:
ocrd process
: better-readable output on failure, #583ocrd log
no longer produces "Logging errors" for multi-arg call, #588ocrd workspace remove-group
: remove empty file groups on disk, #584, #569
Added:
METS_URL
can also be an OAI-PMH GetRecord request, ht @m3ssman, #581- Additional docker base image for Nvidia CUDA, ht @sepastian, #452
2.15.0 - 2020-08-28
Changed:
- 🔥 Finish deprecations on workspace CLI begun in 2.11.0, #578
--mets-basename
is deprecated now, use--mets
and--directory
- Deprecated arguments are hidden in
--help
- some internal function name changes
2.14.0 - 2020-08-22
Fixed:
make_file_id
ID incrementation algorithm failed under certain conditions, #570
Changed:
- ValidationReport is now in ocrd_models, not ocrd_validators, #573
run_cli
now returnsexitcode, stdout, stderr
, not justexit_code
, #574ocrd-dummy
will create PAGE-XML for images on-the-fly, #574
Added:
- processors can
self.add_metadata(pcgts)
to add a self-describingpg:MetadataItem
, #574
2.13.2 - 2020-08-13
Fixed:
- workspace: Possible source of "too many open files" closed, #564
2.13.1 - 2020-08-07
Changed:
assert_file_grp_cardinality
accepts a thirdmsg
parameter
2.13.0 - 2020-08-04
Changed:
Processor.input_files
can handle images mixed with PAGE-XML in a file group, OCR-D/spec#164, #554
2.12.7 - 2020-07-31
Changed:
ocrd process
also supports -P
, #552
2.12.6 - 2020-07-28
Fixed:
- Vulnerabilities in Pillow < 7.2.0, #550
2.12.5 - 2020-07-25
Changed:
setOverrideLoglevel
now accepts asilent
parameter, #548
Fixed:
- No more logging output interfering with
--dump-json
et al. #540
2.12.4 - 2020-07-22
Fixed:
- logging no longer interferes with
--dump-json
/--help
/--version
, #540, #546
2.12.3 - 2020-07-23
Fixed:
ocrd workspace validate
: properly check whether AlternativeImage are available locally, #450, 543ocrd workspace validate
: reduceINFO
log messages
2.12.2 - 2020-07-22
Fixed:
ocrd_validators
: adapt ocrd-tool schema to include OCR-D/spec#152
2.12.1 - 2020-07-21
Fixed:
ocrd process
: first task was ignored, #529, #542
2.12.0 - 2020-07-21
Changed:
- Refactoring of
ocrd_utils.__init__
into thematic submodules, #536 - validation of file groups downgraded to notice, allow PRE fileGrp/USE prefix, #541
- BaseProcessor: 🔥 show help if no METS was specified, OCR-D/spec#156, #438, #503
Fixed:
- bashlib: Don't set
-x
inocrd__minversion
, #535 - bashlib:
ocrd__minversion
logic was broken
2.11.0 - 2020-07-13
Fixed:
- OcrdFile now has
__eq__
implementation to allow for==
/!=
comparisons, #532 Workspace.image_from_page
: Respect the order of feature annotation onpage:AlternativeImage
, #525
Changed:
- processors:
-p
is now repeatable and the referenced JSON may contain comments, #514, #533 ocrd workspace validate
:METS_URL
argument now optional as redundant to--directory
/--mets-basename
, #518ocrd workspace clone
:WORKSPACE_DIR
argument now optional as redundant to--directory
, #518ocrd workspace init
:DIRECTORY
argument now optional as redundant to--directory
, #518ocrd workpace *
: Pass--mets-basename
on to resolver, #518
Added:
- processors:
-P/--parameter-override
to override individual key-value pair of the parameter JSON, #533 - utils:
make_file_id
to generate newmets:file/@ID
from existing OcrdFiles, #530 - utils:
assert_file_grp_cardinality
to assert the correct number of comma-separated fileGrps were passed, #530
2.10.5 - 2020-07-11
Fixed:
- Blacklist PIL versions with PNG issues, #527
ocrd workspace validate
: Allow skippingpage_xsd
andmets_xsd
, #531- Fix import of
xlink
XSD inmets
XSD, #531
2.10.4 - 2020-06-17
Added:
bashlib
: support --overwrite flag, #522
2.10.3 - 2020-06-16
Fixed:
- Regression in
ocrd workspace add
that prevented files from being copied, #519
2.10.2 - 2020-06-14
Fixed:
- bashlib: Typo, #516
2.10.1 - 2020-06-13
Changed:
- bashlib: Make
input-file-grp
andoutput-file-grp
mandatory, #512 - bashlib: Add a function
ocrd__minversion
that will check whetherocrd --version
is new enough for the processor., #512
Fixed:
- Re-introduce
ocrd__raise
, #511 - Move XSD into root package of
ocrd_validators
, #513
2.10.0 - 2020-06-11
Fixed:
--help
: Improve formatting of parameters, document--overwrite
, ht @bersky
Changed:
Workspace.remove_file
: Optionalpage_recursive
parameter to remove images linked in PAGE as well, #434, #471Workspace.remove_file
: Optionalpage_same_gropup
parameter to remove only those images linked in PAGE that are in the same group as the PAGE-XMLWorkspace.remove_file_gropup
: The samepage_recursive
andpage_same_gropup
parameters asWorkspace.remove_file
WorkspaceValidator.check_file_grp
now accepts apage_id
parameter and will not raise an error if an existing output file group is targeted but for pages that aren't in that group, #471ocrd_cli_wrap_processor
: Takepage_id
into account when doingWorkspaceValidator.check_file_grp
run_cli
accepts anoverwrite
parameter to pass on to processor calls, #471blocked by OCR-D/spec#121Task.validate
: set implicit input/output groups from ocrd-tool.json, #471ocrd process
: support --overwrite and pass on to processor calls, #471TaskSequencec.validate_tasks
: Check output file groups are not in METS unless overwrite for every task, ht @berskyocrd workspace add
/ocrd workspace bulk-add
support--ignore
Added:
- Workspace: Optional
overwrite_mode
that setsforce
for all operations OcrdPage
:get_AllAlternativeImagePaths
to list allpc:AlternativeImage/@filename
referenced in a PcGts, #434, #471ocrd workspace bulk-add
to add many files at once to a workspace, #428OcrdMets.add_file
:ignore
parameter to optionally disable looking for existing files, #428
2.9.0 - 2020-06-09
Changed:
OcrdMets.add_file
now validates file ID syntax, #447
Added:
ocrd log
, CLI to OCR-D's logging mechanism, #472- XML Schema validation of PAGE-XML and METS, #470
2.8.3 - 2020-06-08
Fixed:
- workspace.remove_file: fix for list-valued results, #507
Changed:
- workspace prune-files CLI: support filtering (like workspace find), #507
- workspace CLI: update help strings (documenting regex support), #507
2.8.2 - 2020-06-08
Changed:
- bashlib: check bash version to be >= 4.4, #445, OCR-D/ocrd_olena#30
ocrd workspace add
supports-C
/--check-file-exists
to validate thatFNAME
is an existing local file, #495OcrdFile
constructor acceptsID
parametermodel_factory.page_from_image
now sets the@pcGtsId
attribute tot the file's@ID
, #378WorkspaceValidator
: checkpc:PcGts/@pcGtsId
==mets:file/@ID
, #378OcrdFile
constructor: removed long-obsoleteinstance
parameterOcrdFile
constructor: acceptspageId
parameter- METS: reorder elements according to schema in empty METS, #487
2.8.1 - 2020-06-06
Changed:
OcrdMets.remove_file
now supports all the options ofOcrdMets.find_files
, #497, #458OcrdMets.remove_file_group
now supports theUSE
param being a regex,, #497, #458
Added:
OcrdMets.remove_one_file
: remove a singleOcrdFile
, either directly or by ID, #497, #458
2.8.0 - 2020-06-04
Added:
ocrd-dummy
, a minimal processor that copies input to output, #468- OcrdPage:
get_AllRegions
: retrieve all regions, sorted by document or reading order, #479 - OcrdPage:
sort_AllIndexed
: sort all children by@index
in-place - OcrdPage:
clear_AllIndexed
: clear all@index
children - OcrdPage:
extend_AllIndexed
: Add elements with incrementing@index
- OcrdPage: Replace empty reading order groups with equivalent
RegionRef
on export - OcrdPage:
get_UnorderedGroupChildren
: get reading order elements of anUnorderedGroup
Changed:
- OcrdPage:
get_AllIndexed
: allow filtering by child type - OcrdPage:
get_AllIndexed
: index_sort parameter to enable/disable sorting
2.7.1 - 2020-05-27
Fixed:
ocrd workspace find
: UseOcrdMets.get_physical_pages
method, fix #491
2.7.0 - 2020-05-27
Changed:
- 🔥
Workspace.image_from_page
no longer treatsPrintSpace
as functionally equivalent toBorder
, #490 OcrdMets.get_physical_pages
method companion toOcrdMets.physical_pages
property, #484- OcrdMets/ocrd workspace find: Search for multiple pages by fileid, ht @bertsky, #463, #484
- PAGE validation: respect reading order in consistency checks, #442
2.6.1 - 2020-05-14
Added:
- OcrdPage: new method
get_AllIndexed
for OrderedGroup and OrderedGroupIndexed that lists their children, sorted by index, #478
Changed:
- Improved log message for profiling processors, #477
Fixed:
- Search for
--page-id
now orders of magnitutde faster, ht @bertsky, #481 - Not all generateDS types were exported by ocrd_page, now they are, #480
2.6.0 - 2020-05-12
Fixed:
- image files no longer cached in workspace, #446, #448
Added
- Many mets:file search fields support regex now, #458, #448
2.5.3 - 2020-04-30
Fixed:
- OcrdPage: hacks to make XML namespace output consistent, #474
2.5.2 - 2020-04-29
Fixed:
- logging: format is spec-conformant again, #466
Added:
- ocrd_page: generateDS-generated code has
__hash__
method now, #443
2.5.1 - 2020-04-23
Fixed:
- logging: disable propagation for loggers with handler, #463
2.5.0 - 2020-04-23
Changed:
- Logging configuration via configuration file, not script, #460 (HT @M3ssman @bertsky)
Added:
- Execution of processors is tracked with logger 'orcd.process.profile', #461
2.4.4 - 2020-03-17
Fixed:
- #437 broke simple types in PAGE model, (really) revert 3a0a3a8, #451
2.4.3 - 2020-03-12
Fixed:
- bashlib: ocrd-tool.json-related errors no longer lead to silent exit, #456
- #437 broke simple types in PAGE model, revert 3a0a3a8, #451
2.4.2 - 2020-02-20
Fixed:
- JSON strings longer than OS-allowed filename size crash fixed, #444
2.4.1 - 2020-02-19
Changed:
- Updated PAGE bindings generated using generateDS, #437
2.4.0 - 2020-02-17
Fixed:
- concatenation of PAGE elements was based on
@index
starting with1
, #430 - CLI:
--help
should work for processors w/oinput_file_grp
, #440 - CLI:
--version
more robust, #433
Changed:
- processor
--help
lists parameter enum values if available, #427 - 🔥
workspace.save_image_file
takesmimetype
instead offormat
, #441
2.3.1 - 2020-01-23
Changed:
- More expressive coordination validation, #418
ocrd workspace init
shortcut forocrd workspace init .
, #419ocrd workspace validate
shortcut forocrd workspace validate mets.xml
, #419ocrd workspace clone <METS_URL>
shortcut forocrd workspace <METS_URL> .
, #419- workspace validation: check of dimensions in image and PAGE only with
--download
, #423 - behavior of
clobber_mets=False
changed to silently skip re-downloading METS instead of making a fuss, #425
Fixed:
- regression from #419 for
ocrd workspace validate
- regression from #384 for task sequence validation, #424
2.3.0 - 2020-01-21
Changed:
- layout of --help, #411
- OcrdMets: Removing the last file for a physical page will remove the physical page
Fixed:
mets_basename
was not respected inocrd workspace init
, #415, #417- typos, #414
Added:
- OcrdMets.remove_physical_page to remove
structMap[@TYPE="physical"]
entries ocrd validate
CLI to validate task groups, PAGE XML and processor parameters, #245, #384
2.2.2 - 2020-01-16
Added:
- Validation of input/output file groups before running a processor/task sequence, #392
- Improved
--help
for both python and bashlib processors, #402, #408
Fixed:
- bashlib: Calling bashlib processor w/o parameters, #381, #400
- bashlib: syntax error, regression from 2d89c22ae3. #410
2.2.1 - 2020-01-14
Fixed:
- OcrdExif: PNG metadata extraction was broken, #395, #396
- Remove the trivial and error-prone image caching feature in resolver, #399
- When creating files with workspace.add_file, single-component file paths (i.e. just the basename of a file) were treated as directories, #404
- When downloading files to a workspace, check first whether those files to be added already exist on disk and are within the workspace directory., #404
2.2.0 - 2020-01-10
Fixed:
- PIL.Image.open'ed files weren't closed, #390
- resolver: if mets_url is relative path, resolve before anything else, #319, #397
- Resolver.workspace_from_url: Create dst_dir before resolve for py
<=
3.5, #330, #393 - fix help string for -m/--mets, fix #263, #391
Changed:
- downgrade filegrp syntax errors to warnings, #364, #389
2.1.3 - 2020-01-08
Changed:
- bagit-profile matches changes from spec v3.4.2 (metadata dir)
2.1.2 - 2020-01-06
Changed:
- have save_mets use UTF-8 encoding for byte-serialization (no entities), #388
Fixed:
- regression from #374, #387
2.1.1 - 2020-01-02
Added:
- PAGE validator: coordinate self-validity and mutual consistency, #374
Fixed:
- Add more related mime types and fix image/jpeg, #382
2.1.0 - 2019-12-20
Added:
- Workspace validation will check cardinality of images per file is 1, #243, OCR-D/spec#132
Changed:
- bashlib will no longer warn about "non-conformant" file group names, #365
- Invalid
file:/
URL will now raise exceptions, #373 - image_from_*: increase tolerance for size mismatch after rotation to 2px, #371
2.0.2 - 2019-12-02
Changed:
ocrd process
: Validate parameters when validating a task- Dockerfile: Revert to Ubuntu 18.04 for LTS compatibility, #344
- Parameter validation: Raise exception for unknown parameters
ocrd ocrd-tool validate
: Raise exception for unknown keys in JSON
2.0.1 - 2019-11-26
Fixed:
- METS
CREATEDATE
date format now ISO8601, #360 ocrd workspace find
allow outputting file group, #359- processor decorator:
--version
should succeed independent of parameters, #358
Changed:
ocrd process
uses the ocrd-tool.json of the tools to check whether output file group necessary, #296- Dockerfile: Revert to Ubuntu 18.04 for LTS compatibility, #344
- pixel density warnings downgraded further to "notice", #361
2.0.0 - 2019-11-05
Changed:
- image_from_page etc: allow filling with background or transparency
- 🔥 API changes, #311, #327
- Dockerfile: Omit
ENTRYPOINT
, OCR-D/spec#130, #340 - Relax pixel density validation errors to warnings, OCR-D/spec#129, #339
1.0.1 - 2019-10-25
Fixed:
- Add
dimension
to workspace validation skip list, #329 - Update ocrd-tool.json schema to spec 3.3.0 (no output_file_grp, no syntax restriction on content-type)
- PAGE XML output references xsi:schemaLocation, #331
- Update Pillow to 6.2.0
Changed:
ocrd process
: task validation takes processor's ocrd-tool into account, #296
1.0.0 - 2019-10-18
- Workspace validation: Validate that files mentioned in pc:Page/@imageFilename exist in METS and on FS, #309
ocrd ocrd-tool parse-params
has the string-or-filepath logic for -p/--parameter as for the CLI
1.0.0b19 - 2019-09-10
- image_from_page etc: allow filtering by feature (@comments), #294
1.0.0b18 - 2019-09-06
Changed:
-m/--mets
is not required anymore, #301ocrd workspace prune-files
: Throw on error removing non-existent file-p/--parameter
argument accepts raw JSON as well now, #239
Fixed:
- OcrdFile: Default fileGrp to
TEMP
- OcrdFile: Accept url constructor arg
- Workspace: Simplify file download code, add extensions to files
- Processor:
chdir
to workspace directory on init so relative files resolve properly - typos in docstrings
- README: 'module' -> 'package'
- workspace.image_from_page etc: logic with rotation/angle
- Adapted test suite to OCR-D/assets now with file extensions
Added:
- utils:
MIME_TO_EXT
to map mime types to preferred extension - Validation of imageHeight/imageWidth in PAGE vs. actual image height/width, #229
1.0.0b17 - 2019-08-21
Fixed:
- Require
Pillow == 5.4.1
throughout
1.0.0b16 - 2019-08-21
Added:
- many utility methods for image manipulation and coordinate handling, #268, OCR-D/ocrd_tesserocr#49
bbox_from_points
bbox_from_xywh
bbox_from_polygon
coordinates_for_segment
coordinates_of_segment
crop_image
membername
image_from_polygon
points_from_bbox
points_from_polygon
points_from_xywh
polygon_from_bbox
polygon_from_x0y0x1y1
polygon_from_xywh
polygon_mask
rotate_coordinates
xywh_from_bbox
- Spec-conformant handling of AlternativeImage, OCR-D/spec#116, OCR-D/ocrd_tesserocr#33, #284
Changed:
- workspace bagger will create files with extension
save_mets
is atomic now, #278, #285
1.0.0b15 - 2019-08-14
Fixed:
- regression in namespace handling of PAGE output, #277
1.0.0b14 - 2019-08-14
Fixed:
- METS is serialized as Unicode instead of character entities, #279
1.0.0b13 - 2019-08-13
Added:
ocrd workspace remove
to remove files, #275, #245ocrd workspace remove-group
to remove file groups, #275, #245ocrd workspace prune-files
1.0.0b12 - 2019-08-08
Fixed:
- Regression with ocrd_page data types, #269
- Segfault issue with Pillow >= 6.0.0, #270
1.0.0b11 - 2019-07-29
Changed:
- Improve pixel density logic in OcrdExif, #256, #37, OCR-D/ocrd_tesserocr#54
- 🔥 stop supporting python
<= 3.4
- Support only 2019-07-15 PAGE version
1.0.0b10 - 2019-06-25
Fixed:
- Handle TIFF ResolutionUnit not being set #250
- bashlib:
--mets-file
should be--mets
Changed:
- missing required parameters should raise exception, fix #244 #247
1.0.0b9 - 2019-05-20
Changed:
- export additional region types from generated code, #241
Added:
points_from_y0x0y1x1
for inverted x/y pairs
Added:
ocrd workspace list-page
to list all page IDs
Changed:
- Extended page with TextStyle for Page, , PRImA-Research-Lab/PAGE-XML#8
1.0.0b6 - 2019-03-19
ocrd workspace set-id
case in argument error- fix DeprecationWarning for PyYAML 5.1+
- use headless opencv
1.0.0b1 - 2019-02-27
First beta of 1.0.0
Changed:
- 🔥 Drop Python2 support
- 🔥 Refactored project into 5 modules with little dependencies each
- Implement 3.2.0 of the spec
Removed:
- 🔥 Move factory methods from OcrdPage and OcrdExif to new module
ocrd.model_factory
- Factor out XML constants to
ocrd.constants.xml
- 🔥 BaseProcessor.add_output_file removed
0.15.2 - 2019-01-07
Added:
- workspace validator: make url tests (java, non-http) checks skippable
0.15.1 - 2019-01-04
Fixed:
resolver: fall back to "mets.xml" as basename if not provided
Added:
- PageValidator: Check consistency of PAGE according to spec 3.0.0, #223
Changed:
- 🔥 Change validators to use a single static
validate
method where applicable, #224
0.14.0 - 2018-12-13
Changed:
- 🔥 Use mets:structMap[@TYPE="PHYSICAL"] instead of GROUPID to group by page, #221, #222, OCR-D/spec#81
- 🔥 Rename group_id -> page_id, throughout, OCR-D/spec#101
- 🔥 CLI: Rename --group-id -> --page-id (
-g
)
0.13.3 - 2018-12-05
Fixed:
ocrd workspace clone
fixed to have the pre-0.13.1 behavior- Missing metsHdr made adding agents fail, #218
0.13.2 - 2018-11-29
Changed:
- various fixes to workspace and OCRD-ZIP validation, #217
0.13.1 - 2018-11-26
Fixed:
- Relative files in workspace were resolved to target, not source dir of METS, #215
0.13.0 - 2018-11-23
Changed:
- Adapt bagit profile to spec v2.6.2: allow fetch.txt
- Adapt bagit profile to spec v2.6.3: Tag-Files-Allowed
Removed:
- Code related to generating Open API / Swagger definitions, #210
0.12.0 - 2018-11-22
Changed:
- Depend on pillow >= 5.3.0 to mitigate python-pillow/Pillow#2926
Added:
- backup functionality for
ocrd workspace
, #204
0.11.0 - 2018-11-21
Changed:
ocrd process
interface change, #205
Fixed:
- Grayscale images can be handled by workspace correctly now, #211
0.10.0 - 2018-11-15
Added:
- OCRD-ZIP implementation based on OCR-D/spec#70, #207
- CLI:
ocrd zip
to bag, spill and validate OCRD-ZIP
Fixed:
- Adapted tests to work with bagit-based structure, OCR-D/assets#18
- Read
VERSION
constant from setup.py, #209
0.9.0 - 2018-10-30
Fixed:
- CLI:
ocrd process
, #199
Changed:
- Implement changes from spec v2.5.0: ocrd-tool must have in/output groups
0.8.8 - 2018-10-24
Fixed:
KeyError
because ocrd_tool not saved on processor instance, #192- another
KeyError
becausepnginfo
isn't set for all PNG by exiftool, #194 - support 1-bit bitonal images, #196
0.8.7 - 2018-10-23
Changed:
- Calling
workspace.save_mets()
will now save processor information in the header, #147, #191
0.8.6 - 2018-10-19
Changed:
- Update ocrd-tool schema to spec v2.4.0
Fixed:
- Calls to
click.argument
had too many arguments, #186
0.8.5 - 2018-09-26
Changed:
- Logger setup according to v2.3.0 of the spec, #185
0.8.4 - 2018-09-21
Removed:
- remove
workspace_from_folder
, #180
Changed:
- Creating METS from scratch will set creator agent and creation date, #147
Fixed:
- Schema regression in 0.8.3 (description)
0.8.3 - 2018-09-20
Changed:
- Make logging logic spec-compliant according to #173
Removed:
prefer_symlink
mechanism, #179
0.8.2 - 2018-09-04
Fixed:
- Handle missing parameters in ocrd-tool.json like empty array #160
- cli:
ocrd workspace validate
#175
0.8.1 - 2018-08-31
Removed:
merge_ocr_txt.py
script to be reborn as its own project, #170
0.8.0 - 2018-08-22
Fixed:
- file://-URL will not be "downloaded" if already present, #165
Changed:
- Logging can be configured with a file
ocrd_logging.py
in either $PWD, $HOME or /etc, #164 Processor.add_output_file
deprecated, #166
0.7.5 - 2019-08-16
Fixed:
- bashlib: invalid parsers caused cryptic error, #161
0.7.4 - 2019-08-15
Fixed:
- Sort coordinates clockwise, #159, OCR-D/assets#11
0.7.3 - 2019-08-14
Changed:
- Missing
description
for parameters in schema, #155
0.7.2 - 2019-08-01
Fixed
- Regression in XML pretty printing, #152
0.7.1 - 2019-07-25
Changed:
- Remove dependency on
xmllint
command line tool, #72, #151 - Remove dependency on
exiftool
, #71, #150
0.7.0 - 2018-07-25
Changed:
- Behavior of workspace generation from URL changes (#149)
- when cloning from a
file://
URL and no directory is given do NOT create a temporary directory but reuse the existing directory - When not providing
mets_basename
, assume the last URL path segment to be the METS basename instead of the fixed stringmets.xml
- when cloning from a
- incorporate changes to ocrd_tool schema from spec/v2.2.1
0.6.0 - 2018-07-23
Changed:
- Harmonized URL handling according to https://ocr-d.github.io/cli#urlfile-convention / spec v2.2.0
0.5.0 - 2018-07-19
Fixed:
- EXIF supports JFIF metadata, #141
- typo: repository -> workspace
Changed:
- sh wrapper: set
working_dir
always, default to$PWD
- Drop all caching, #132, #143, #126, #100, #92, #45, #85
Added:
- CLI:
--force
option forworkspace add
to replace existing ID, #134
0.4.4 - 2018-07-03
Fixes:
- CLI:
-k
onworkspace find
for non-existent fields, #133 - CLI: Persist downloads in METS, #136
- CLI:
workspace find --download
will download to subdir of fileGrp, #137
Added:
OcrdFile
has getterfileGrp
for theUSE
attribute of parentmets:fileGrp
, #139
0.4.3 - 2018-06-27
Fixed:
- bash 4.3 compat fixes, #131
ocrd workspace find
default was wrong, #130
0.4.2 - 2018-06-25
Added:
- CLI:
-k
onworkspace find
repeatable
Changed:
- Simplified bashlib
0.4.1 - 2018-06-22
Changed:
- CLI: rename
workspace create
->workspace init
and align with clone syntax
0.4.0 - 2018-06-18
Changed:
- Caching is disabled by default
Added:
- CLI:
ocrd ocrd-tool version
to show version - API:
OcrdMets
getters/setters forunique_identifier
- CLI:
ocrd workspace set-id MODS_IDENTIFIER_PURL
to set mods:identifier - CLI:
ocrd workspace get-id
to get mods:identifier - CLI:
ocrd workspace clone
syntax changed - CLI:
ocrd workspace clone/find
support downloading files - CLI:
ocrd workspace find -L
for local files - CLI:
ocrd workspace find -i ID
to search by ID
0.3.2 - 2018-06-19
Fixed:
- CLI:
cord ocrd-tool tool parse-params
validate as well as merges with default - Parameter validation: Check whether parameter is
required: true
0.3.1 - 2018-06-18
Added
- CLI support
--version
Fixed:
- CLI: validate of ocrd-tool.json didn't parse JSON, #115
Changed:
mets_file_id
in utils is nowconcat_padded
and more flexible
0.3.0 - 2018-06-18
Added
- find_files supports
ID
Changed
- Adding a file with an existing
ID
will raise an exception, #110
Fixed
- Wrapped tools support
-J
/--dump-json
GROUPID
was not passed when adding files
Removed
--output-mets
, spec v2.0.0
0.2.5 - 2018-06-18
Added
- Bash library
- CLI command
ocrd-tool
- CLI command
workspace find
- CLI command
workspace list-group
- CLI command
workspace create
- CLI
workspace
flag --no-cache - bash library to build wrappers
- CLI:
--version
global flag Changed - Downloading a file will set the URL in the METS Fixed
- run_processor will save_mets after process
0.2.4 - 2018-06-04
Fixed
GT
is a valid category for metsfileGrp@USE
Changed:
- Adapted to spec v1.2.0
0.2.3 - 2018-05-17
Fixed:
- Adapted to spec v1.1.5
- Ensure python 2.7 backwards compatibility
Changed:
- ocrd_tool must be passed to
Processor
constructor
0.2.2 - 2018-05-15
Added:
- points_from_x0y0x1y1 util
Fixed:
- EXIF tags for PNG
0.2.1 - 2018-05-08
Added
- CLI:
ocrd workspace {add,clone,pack,unpack,validate}
, #75
0.2.0 - 2018-05-08
Changed:
- Use 2018 PAGE namespace http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15, PRImA-Research-Lab/PAGE-XML#4
0.1.0 - 2018-04-26
Changed:
- ocrd_page API based on XSD
- rename/fix coordinate code
0.0.7 - 2018-04-18
Changed
- Adapted schemas to v1.1.1 of OCR-D/spec
0.0.6 - 2018-04-18
Changed
- Support parameters in run_processor
0.0.5 - 2018-04-18
Changed
- Adapted schemas to v1.1.0 of OCR-D/spec
0.0.4 - 2018-04-18
Changed
- Adapted click wrapper in decorators to v1.0.0 of OCR-D/spec
0.0.2 - 2018-04-17
Fixed
- setup.py install_requires mirrors requirements.txt
0.0.1 - 2018-04-17
Initial Release