From: chrysos349 <chrysos349@users.noreply.github.com>
To: ml@inbox.vuxu.org
Subject: Re: [PR PATCH] [Updated] tesseract-ocr: update to 5.3.3
Date: Fri, 29 Dec 2023 11:00:49 +0100 [thread overview]
Message-ID: <20231229100049.M1oztlEY2hx5crpGKvwtBUhiXgatMKQvwoql4B9tZcQ@z> (raw)
In-Reply-To: <gh-mailinglist-notifications-41a7ca26-5023-4802-975b-f1789d68868e-void-packages-46124@inbox.vuxu.org>
[-- Attachment #1: Type: text/plain, Size: 853 bytes --]
There is an updated pull request by chrysos349 against master on the void-packages repository
https://github.com/chrysos349/void-packages tesseract-ocr
https://github.com/void-linux/void-packages/pull/46124
tesseract-ocr: update to 5.3.3
@newbluemoon
`ccextractor` was updated to the latest version, because it had to be rebuilt for tesseract-ocr-5.3.2 anyway.
@Piraty
`arcan` was revbumped for tesseract-ocr-5.3.2 .
#### Testing the changes
- I tested the changes in this PR: **YES**
#### Local build testing
- I built this PR locally for my native architecture, (x86_64)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
- i686
- aarch64
- armv7l
- x86_64-musl
- armv6l-musl
- aarch64-musl
A patch file from https://github.com/void-linux/void-packages/pull/46124.patch is attached
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-tesseract-ocr-46124.patch --]
[-- Type: text/x-diff, Size: 20702 bytes --]
From 48af9e27c35aca3c9e505844155c40ba7852a800 Mon Sep 17 00:00:00 2001
From: chrysos349 <chrysostom349@gmail.com>
Date: Tue, 19 Sep 2023 04:03:59 +0300
Subject: [PATCH 1/4] leptonica: update to 1.84.0
---
common/shlibs | 2 +-
.../patches/fix-flaky-test-on-i686.patch | 70 -------------------
srcpkgs/leptonica/template | 24 +++++--
3 files changed, 20 insertions(+), 76 deletions(-)
delete mode 100644 srcpkgs/leptonica/patches/fix-flaky-test-on-i686.patch
diff --git a/common/shlibs b/common/shlibs
index c9d59ef3b97ca..950f5f3cf76aa 100644
--- a/common/shlibs
+++ b/common/shlibs
@@ -2294,7 +2294,7 @@ libOkteta3Gui.so.0 okteta-0.26.0_1
libhttp_parser.so.2.9 http-parser-2.9.0_1
libmaa.so.4 libmaa-1.4.2_1
libcodeblocks.so.0 codeblocks-13.12_1
-liblept.so.5 leptonica-1.73_1
+libleptonica.so.6 leptonica-1.84.0_1
libtesseract.so.4 tesseract-ocr-4.0.0_1
libffmpegthumbnailer.so.4 ffmpegthumbnailer-2.0.10_1
libopenraw.so.7 libopenraw-0.1.0_1
diff --git a/srcpkgs/leptonica/patches/fix-flaky-test-on-i686.patch b/srcpkgs/leptonica/patches/fix-flaky-test-on-i686.patch
deleted file mode 100644
index bec1a2482f414..0000000000000
--- a/srcpkgs/leptonica/patches/fix-flaky-test-on-i686.patch
+++ /dev/null
@@ -1,70 +0,0 @@
-From ea2bb8c9cf61d3eba2589cfaac05f59a33b4110d Mon Sep 17 00:00:00 2001
-From: danblooomberg <dan.bloomberg@gmail.com>
-Date: Sun, 14 Nov 2021 14:52:24 -0800
-Subject: [PATCH] Fix flaky hash_reg test on i686 * The sets that are generated
- from *SelectRange() functions can depend on the platform, resulting in
- intersection sizes that differ by 1. * So, loosen the comparison to allow a
- difference of 1.
-
----
- prog/hash_reg.c | 12 ++++++------
- 1 file changed, 6 insertions(+), 6 deletions(-)
-
-diff --git a/prog/hash_reg.c b/prog/hash_reg.c
-index 8b408d6d..3414ba90 100644
---- a/prog/hash_reg.c
-+++ b/prog/hash_reg.c
-@@ -100,7 +100,7 @@ L_REGPARAMS *rp;
- sarrayIntersectionByAset(sa1, sa2, &sa3);
- c1 = sarrayGetCount(sa3);
- sarrayDestroy(&sa3);
-- regTestCompareValues(rp, string_intersection, c1, 0); /* 2 */
-+ regTestCompareValues(rp, string_intersection, c1, 1); /* 2 */
- if (rp->display) lept_stderr(" aset: intersection size = %d\n", c1);
- sarrayUnionByAset(sa1, sa2, &sa3);
- c1 = sarrayGetCount(sa3);
-@@ -123,7 +123,7 @@ L_REGPARAMS *rp;
- sarrayIntersectionByHmap(sa1, sa2, &sa3);
- c1 = sarrayGetCount(sa3);
- sarrayDestroy(&sa3);
-- regTestCompareValues(rp, string_intersection, c1, 0); /* 6 */
-+ regTestCompareValues(rp, string_intersection, c1, 1); /* 6 */
- if (rp->display) lept_stderr(" hmap: intersection size = %d\n", c1);
- sarrayUnionByHmap(sa1, sa2, &sa3);
- c1 = sarrayGetCount(sa3);
-@@ -160,7 +160,7 @@ L_REGPARAMS *rp;
- ptaIntersectionByAset(pta1, pta2, &pta3);
- c1 = ptaGetCount(pta3);
- ptaDestroy(&pta3);
-- regTestCompareValues(rp, pta_intersection, c1, 0); /* 10 */
-+ regTestCompareValues(rp, pta_intersection, c1, 1); /* 10 */
- if (rp->display) lept_stderr(" aset: intersection size = %d\n", c1);
- ptaUnionByAset(pta1, pta2, &pta3);
- c1 = ptaGetCount(pta3);
-@@ -182,7 +182,7 @@ L_REGPARAMS *rp;
- ptaIntersectionByHmap(pta1, pta2, &pta3);
- c1 = ptaGetCount(pta3);
- ptaDestroy(&pta3);
-- regTestCompareValues(rp, pta_intersection, c1, 0); /* 14 */
-+ regTestCompareValues(rp, pta_intersection, c1, 1); /* 14 */
- if (rp->display) lept_stderr(" hmap: intersection size = %d\n", c1);
- ptaUnionByHmap(pta1, pta2, &pta3);
- c1 = ptaGetCount(pta3);
-@@ -220,7 +220,7 @@ L_REGPARAMS *rp;
- l_dnaIntersectionByAset(da1, da2, &da3);
- c1 = l_dnaGetCount(da3);
- l_dnaDestroy(&da3);
-- regTestCompareValues(rp, da_intersection, c1, 0); /* 18 */
-+ regTestCompareValues(rp, da_intersection, c1, 1); /* 18 */
- if (rp->display) lept_stderr(" aset: intersection size = %d\n", c1);
- l_dnaUnionByAset(da1, da2, &da3);
- c1 = l_dnaGetCount(da3);
-@@ -242,7 +242,7 @@ L_REGPARAMS *rp;
- l_dnaIntersectionByHmap(da1, da2, &da3);
- c1 = l_dnaGetCount(da3);
- l_dnaDestroy(&da3);
-- regTestCompareValues(rp, da_intersection, c1, 0); /* 22 */
-+ regTestCompareValues(rp, da_intersection, c1, 1); /* 22 */
- if (rp->display) lept_stderr(" hmap: intersection size = %d\n", c1);
- l_dnaUnionByHmap(da1, da2, &da3);
- c1 = l_dnaGetCount(da3);
diff --git a/srcpkgs/leptonica/template b/srcpkgs/leptonica/template
index 17256b7b157b4..f2c5766415c56 100644
--- a/srcpkgs/leptonica/template
+++ b/srcpkgs/leptonica/template
@@ -1,9 +1,9 @@
# Template file for 'leptonica'
pkgname=leptonica
-version=1.82.0
-revision=2
+version=1.84.0
+revision=1
build_style=gnu-configure
-hostmakedepends="pkg-config"
+hostmakedepends="pkg-config automake libtool"
makedepends="libopenjpeg2-devel libwebp-devel"
checkdepends="which gnuplot"
short_desc="Image processing and analysis library"
@@ -11,8 +11,21 @@ maintainer="Orphaned <orphan@voidlinux.org>"
license="BSD-2-Clause"
homepage="http://leptonica.org/"
changelog="http://leptonica.org/source/version-notes.html"
-distfiles="http://leptonica.org/source/${pkgname}-${version}.tar.gz"
-checksum=155302ee914668c27b6fe3ca9ff2da63b245f6d62f3061c8f27563774b8ae2d6
+distfiles="https://github.com/DanBloomberg/leptonica/archive/${version}.tar.gz"
+checksum=440e6bb1b11e385310b31fab2505c9b0e0835a42f2fc985c2f79c81a8684ff98
+
+pre_check() {
+ # disable failing tests
+ vsed -i prog/Makefile.am \
+ -e "s/boxa3_reg//" \
+ -e "s/projection_reg//" \
+ -e "s/rankhisto_reg//" \
+ -e "s/rankbin_reg//"
+}
+
+pre_configure() {
+ ./autogen.sh
+}
post_install() {
vdoc moller52.jpg
@@ -28,6 +41,7 @@ leptonica-devel_package() {
vmove usr/lib/cmake
vmove usr/lib/pkgconfig
vmove "usr/lib/*.so"
+ vmove "usr/lib/*.a"
vdoc style-guide.txt
}
}
From badf01d4cecdcfc8273a526d9b46c833932ac756 Mon Sep 17 00:00:00 2001
From: chrysos349 <chrysostom349@gmail.com>
Date: Tue, 19 Sep 2023 04:07:50 +0300
Subject: [PATCH 2/4] tesseract-ocr: update to 5.3.3
---
common/shlibs | 2 +-
.../{tesseract-ocr-kur => tesseract-ocr-kmr} | 0
srcpkgs/tesseract-ocr-kur_ara | 1 -
srcpkgs/tesseract-ocr/files/COPYING | 14 ------
.../tesseract-ocr/patches/disable-neon.patch | 14 ++++++
.../tesseract-ocr/patches/musl-sys-time.patch | 17 +++----
srcpkgs/tesseract-ocr/template | 45 +++++++------------
7 files changed, 41 insertions(+), 52 deletions(-)
rename srcpkgs/{tesseract-ocr-kur => tesseract-ocr-kmr} (100%)
delete mode 120000 srcpkgs/tesseract-ocr-kur_ara
delete mode 100644 srcpkgs/tesseract-ocr/files/COPYING
create mode 100644 srcpkgs/tesseract-ocr/patches/disable-neon.patch
diff --git a/common/shlibs b/common/shlibs
index 950f5f3cf76aa..1de39e0bfa84c 100644
--- a/common/shlibs
+++ b/common/shlibs
@@ -2295,7 +2295,7 @@ libhttp_parser.so.2.9 http-parser-2.9.0_1
libmaa.so.4 libmaa-1.4.2_1
libcodeblocks.so.0 codeblocks-13.12_1
libleptonica.so.6 leptonica-1.84.0_1
-libtesseract.so.4 tesseract-ocr-4.0.0_1
+libtesseract.so.5 tesseract-ocr-5.3.3_1
libffmpegthumbnailer.so.4 ffmpegthumbnailer-2.0.10_1
libopenraw.so.7 libopenraw-0.1.0_1
libopenrawgnome.so.7 libopenraw-0.1.0_1
diff --git a/srcpkgs/tesseract-ocr-kur b/srcpkgs/tesseract-ocr-kmr
similarity index 100%
rename from srcpkgs/tesseract-ocr-kur
rename to srcpkgs/tesseract-ocr-kmr
diff --git a/srcpkgs/tesseract-ocr-kur_ara b/srcpkgs/tesseract-ocr-kur_ara
deleted file mode 120000
index 79bcf15f05ba5..0000000000000
--- a/srcpkgs/tesseract-ocr-kur_ara
+++ /dev/null
@@ -1 +0,0 @@
-tesseract-ocr
\ No newline at end of file
diff --git a/srcpkgs/tesseract-ocr/files/COPYING b/srcpkgs/tesseract-ocr/files/COPYING
deleted file mode 100644
index 11e05af425fc8..0000000000000
--- a/srcpkgs/tesseract-ocr/files/COPYING
+++ /dev/null
@@ -1,14 +0,0 @@
-This repository contains language data for Tesseract Open Source
-OCR Engine. All data in the repository are licensed under the Apache
-License:
-
-** Licensed under the Apache License, Version 2.0 (the "License");
-** you may not use this file except in compliance with the License.
-** You may obtain a copy of the License at
-** http://www.apache.org/licenses/LICENSE-2.0
-** Unless required by applicable law or agreed to in writing, software
-** distributed under the License is distributed on an "AS IS" BASIS,
-** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-** See the License for the specific language governing permissions and
-** limitations under the License.
-
diff --git a/srcpkgs/tesseract-ocr/patches/disable-neon.patch b/srcpkgs/tesseract-ocr/patches/disable-neon.patch
new file mode 100644
index 0000000000000..d491ef1e47b81
--- /dev/null
+++ b/srcpkgs/tesseract-ocr/patches/disable-neon.patch
@@ -0,0 +1,14 @@
+--- a/configure.ac
++++ b/configure.ac
+@@ -177,6 +177,11 @@
+ AC_DEFINE([HAVE_NEON], [1], [Enable NEON instructions])
+ ;;
+
++ arm|armv7l)
++
++ AC_MSG_WARN([No compiler options for $host_cpu])
++ ;;
++
+ arm*)
+
+ AX_CHECK_COMPILE_FLAG([-mfpu=neon], [neon=true], [neon=false], [$WERROR])
diff --git a/srcpkgs/tesseract-ocr/patches/musl-sys-time.patch b/srcpkgs/tesseract-ocr/patches/musl-sys-time.patch
index 9c6337d188639..5c75864248fe8 100644
--- a/srcpkgs/tesseract-ocr/patches/musl-sys-time.patch
+++ b/srcpkgs/tesseract-ocr/patches/musl-sys-time.patch
@@ -1,12 +1,13 @@
---- a/src/ccutil/ocrclass.h 2019-07-07 14:34:08.000000000 +0200
-+++ b/src/ccutil/ocrclass.h 2019-07-08 10:47:15.347415888 +0200
-@@ -31,6 +31,9 @@
- #ifdef _WIN32
- #include <winsock2.h> // for timeval
- #endif
+--- a/include/tesseract/ocrclass.h
++++ b/include/tesseract/ocrclass.h
+@@ -29,6 +29,10 @@
+
+ #include <chrono>
+ #include <ctime>
+#ifndef __GLIBC__
+#include <sys/time.h>
+#endif
++
+
+ namespace tesseract {
- /**********************************************************************
- * EANYCODE_CHAR
diff --git a/srcpkgs/tesseract-ocr/template b/srcpkgs/tesseract-ocr/template
index de6df3a768d31..49b4045888324 100644
--- a/srcpkgs/tesseract-ocr/template
+++ b/srcpkgs/tesseract-ocr/template
@@ -1,14 +1,15 @@
# Template file for 'tesseract-ocr'
pkgname=tesseract-ocr
-version=4.1.1
-revision=9
-_tessdataver=4.0.0
+version=5.3.3
+revision=1
+_tessdataver=4.1.0
create_wrksrc=yes
build_style=gnu-configure
configure_args="LIBLEPT_HEADERSDIR=${XBPS_CROSS_BASE}/usr/include $(vopt_enable openmp)"
make_build_args="all training"
hostmakedepends="automake libtool pkg-config leptonica libxslt asciidoc"
-makedepends="cairo-devel pango-devel leptonica-devel $(vopt_if openmp libgomp-devel) icu-devel"
+makedepends="cairo-devel pango-devel leptonica-devel $(vopt_if openmp libgomp-devel) icu-devel
+ libarchive-devel libcurl-devel"
short_desc="Tesseract Open Source OCR engine"
maintainer="Orphaned <orphan@voidlinux.org>"
license="Apache-2.0"
@@ -16,13 +17,15 @@ homepage="https://github.com/tesseract-ocr/tesseract"
distfiles="
https://github.com/tesseract-ocr/tesseract/archive/${version}.tar.gz>${pkgname}-${version}.tar.gz
https://github.com/tesseract-ocr/tessdata/archive/${_tessdataver}.tar.gz>tessdata-${_tessdataver}.tar.gz"
-checksum="2a66ff0d8595bff8f04032165e6c936389b1e5727c3ce5a27b3e059d218db1cb
- 38c637d3a1763f6c3d32e8f1d979f045668676ec5feb8ee1869ee77cedd31b08"
+checksum="dc4329f85f41191b2d813b71b528ba6047745813474e583ccce8795ff2ff5681
+ 990fffb9b7a9b52dc9a2d053a9ef6852ca2b72bd8dfb22988b0b990a700fd3c7"
build_options="openmp"
build_options_default="openmp"
desc_option_openmp="Enable Open MP (gomp)"
+disable_parallel_build=yes # fails to build otherwise
+
# Create a package for one specific language $1
pkg_lang() {
local f script lang=$1
@@ -46,8 +49,8 @@ pkg_lang() {
post_extract() {
mv tesseract-${version}/* .
+ rm -rf tessdata-${_tessdataver}/{tessconfigs,configs,pdf.ttf}
mv tessdata-${_tessdataver}/* ${wrksrc}/tessdata
- rmdir tessdata-${_tessdataver}
}
pre_configure() {
NOCONFIGURE=1 ./autogen.sh
@@ -62,7 +65,6 @@ post_install() {
mv ${DESTDIR}/usr/share/man/man1/tesseract{,-ocr}.1
vdoc ChangeLog
vdoc README.md
- vlicense ${FILESDIR}/COPYING LICENSE-tessdata
# Move the pseudo languges "equ" (math / equation detection) and
# "osd" (orientation and script detection) to the main package
for lang in equ osd; do
@@ -79,13 +81,6 @@ tesseract-ocr-tools_package() {
vmkdir usr/share/tesseract
vmkdir usr/share/man/man1
vmkdir usr/share/man/man5
- # Copy shell scripts
- for f in language-specific.sh tesstrain.sh tesstrain_utils.sh; do
- if [ -e ${wrksrc}/training/${f} ]; then
- cp -a ${wrksrc}/training/${f} \
- ${PKGDESTDIR}/usr/share/tesseract
- fi
- done
# Move tool manual pages
for f in ambiguous_words cntraining combine_tessdata \
dawg2wordlist mftraining shapeclustering unicharambigs \
@@ -99,7 +94,8 @@ tesseract-ocr-tools_package() {
}
}
tesseract-ocr-devel_package() {
- depends="${sourcepkg}>=${version}_${revision}"
+ depends="${sourcepkg}>=${version}_${revision} leptonica-devel
+ libarchive-devel libcurl-devel"
short_desc+=" - development files"
pkg_install() {
vmove usr/include/tesseract
@@ -129,7 +125,7 @@ tesseract-ocr-all_package() {
for lang in afr amh ara asm aze aze_cyrl bel ben bod bos bre bul cat ceb \
ces chi_sim chi_tra chr cos cym dan deu div dzo ell eng enm epo est eus fao \
fas fil fin fra frk frm fry gla gle glg grc guj hat heb hin hrv hun hye iku ind isl ita \
- ita_old jav jpn kan kat kat_old kaz khm kir kor kur kur_ara lao lat lav lit ltz mal mar \
+ ita_old jav jpn kan kat kat_old kaz khm kir kmr kor lao lat lav lit ltz mal mar \
mkd mlt mon mri msa mya nep nld nor oci ori pan pol por que pus ron rus san sin slk slv \
snd spa spa_old sqi srp srp_latn sun swa swe syr tam tat tel tgk tgl tha tir ton tur \
uig ukr urd uzb uzb_cyrl vie yid yor \
@@ -576,23 +572,16 @@ tesseract-ocr-kir_package() {
$(pkg_lang ${pkgname#tesseract-ocr-})
}
}
-tesseract-ocr-kor_package() {
- depends="${sourcepkg}>=${version}_${revision}"
- short_desc+=" - Korean language data"
- pkg_install() {
- $(pkg_lang ${pkgname#tesseract-ocr-})
- }
-}
-tesseract-ocr-kur_package() {
+tesseract-ocr-kmr_package() {
depends="${sourcepkg}>=${version}_${revision}"
- short_desc+=" - Kurdish language data"
+ short_desc+=" - Kurmanji (Kurdish - Latin Script) language data"
pkg_install() {
$(pkg_lang ${pkgname#tesseract-ocr-})
}
}
-tesseract-ocr-kur_ara_package() {
+tesseract-ocr-kor_package() {
depends="${sourcepkg}>=${version}_${revision}"
- short_desc+=" - Kurdish (Arabic) language data"
+ short_desc+=" - Korean language data"
pkg_install() {
$(pkg_lang ${pkgname#tesseract-ocr-})
}
From 060437b926fdb0d61fffae99583263479115bde1 Mon Sep 17 00:00:00 2001
From: chrysos349 <chrysostom349@gmail.com>
Date: Tue, 19 Sep 2023 04:09:24 +0300
Subject: [PATCH 3/4] arcan: revbump for tesseract-5.3.3
---
srcpkgs/arcan/template | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/srcpkgs/arcan/template b/srcpkgs/arcan/template
index efd0afe10576d..ff9091f90ebb1 100644
--- a/srcpkgs/arcan/template
+++ b/srcpkgs/arcan/template
@@ -2,7 +2,7 @@
# !! keep synced with: acfgfs aclip aloadimage
pkgname=arcan
version=0.6.2.1
-revision=1
+revision=2
create_wrksrc=yes
build_wrksrc=arcan/src
build_style=cmake
@@ -27,7 +27,7 @@ homepage="https://arcan-fe.com/"
_versionOpenal=0.5.4
distfiles="https://github.com/letoram/arcan/archive/${version}.tar.gz
https://github.com/letoram/openal/archive/${_versionOpenal}.tar.gz>openal_arcan.${_versionOpenal}.tar.gz"
-checksum="7bf083412bc61555472877313c13116431a0a36fccbf142f97559db43b4a1475
+checksum="30900dd80dfa272e6cc3343d50e9d2748eb06d97c78a8e87a743abd475638deb
3a50a87c05b67c466a868cc77f8dc7f9cfc9466aeeafcd823daca0d108c504da"
export CMAKE_GENERATOR="Unix Makefiles"
From b255f2a51877fd0f3691828e14f0b21ac4b65139 Mon Sep 17 00:00:00 2001
From: chrysos349 <chrysostom349@gmail.com>
Date: Tue, 19 Sep 2023 04:11:27 +0300
Subject: [PATCH 4/4] ccextractor: revbump for tesseract-5.3.3
---
srcpkgs/ccextractor/patches/fix-ocr.patch | 106 ++++++++++++++++++++++
srcpkgs/ccextractor/template | 7 +-
2 files changed, 112 insertions(+), 1 deletion(-)
create mode 100644 srcpkgs/ccextractor/patches/fix-ocr.patch
diff --git a/srcpkgs/ccextractor/patches/fix-ocr.patch b/srcpkgs/ccextractor/patches/fix-ocr.patch
new file mode 100644
index 0000000000000..2681c60aa414e
--- /dev/null
+++ b/srcpkgs/ccextractor/patches/fix-ocr.patch
@@ -0,0 +1,106 @@
+--- a/src/lib_ccx/hardsubx.c
++++ b/src/lib_ccx/hardsubx.c
+@@ -221,7 +221,7 @@
+ char *pars_values = strdup("/dev/null");
+ char *tessdata_path = NULL;
+
+- char *lang = options->ocrlang;
++ char *lang = (char *)options->ocrlang;
+ if (!lang)
+ lang = "eng"; // English is default language
+
+@@ -245,7 +245,7 @@
+
+ int ret = -1;
+
+- if (!strncmp("4.", TessVersion(), 2))
++ if (!strncmp("4.", TessVersion(), 2) || !strncmp("5.", TessVersion(), 2))
+ {
+ char tess_path[1024];
+ if (ccx_options.ocr_oem < 0)
+--- a/src/lib_ccx/ocr.c
++++ b/src/lib_ccx/ocr.c
+@@ -97,36 +97,22 @@
+ char *probe_tessdata_location(const char *lang)
+ {
+ int ret = 0;
+- char *tessdata_dir_path = getenv("TESSDATA_PREFIX");
+
+- ret = search_language_pack(tessdata_dir_path, lang);
+- if (!ret)
+- return tessdata_dir_path;
+-
+- tessdata_dir_path = "./";
+- ret = search_language_pack(tessdata_dir_path, lang);
+- if (!ret)
+- return tessdata_dir_path;
+-
+- tessdata_dir_path = "/usr/share/";
+- ret = search_language_pack(tessdata_dir_path, lang);
+- if (!ret)
+- return tessdata_dir_path;
+-
+- tessdata_dir_path = "/usr/local/share/";
+- ret = search_language_pack(tessdata_dir_path, lang);
+- if (!ret)
+- return tessdata_dir_path;
+-
+- tessdata_dir_path = "/usr/share/tesseract-ocr/";
+- ret = search_language_pack(tessdata_dir_path, lang);
+- if (!ret)
+- return tessdata_dir_path;
+-
+- tessdata_dir_path = "/usr/share/tesseract-ocr/4.00/";
+- ret = search_language_pack(tessdata_dir_path, lang);
+- if (!ret)
+- return tessdata_dir_path;
++ const char *paths[] = {
++ getenv("TESSDATA_PREFIX"),
++ "./",
++ "/usr/share/",
++ "/usr/local/share/",
++ "/usr/share/tesseract-ocr/",
++ "/usr/share/tesseract-ocr/4.00/",
++ "/usr/share/tesseract-ocr/5/",
++ "/usr/share/tesseract/"};
++
++ for (int i = 0; i < sizeof(paths) / sizeof(paths[0]); i++)
++ {
++ if (!search_language_pack(paths[i], lang))
++ return (char *)paths[i];
++ }
+
+ return NULL;
+ }
+@@ -174,7 +160,7 @@
+ char *pars_values = strdup("tess.log");
+
+ ctx->api = TessBaseAPICreate();
+- if (!strncmp("4.", TessVersion(), 2))
++ if (!strncmp("4.", TessVersion(), 2) || !strncmp("5.", TessVersion(), 2))
+ {
+ char tess_path[1024];
+ snprintf(tess_path, 1024, "%s%s%s", tessdata_path, "/", "tessdata");
+@@ -331,6 +317,11 @@
+ }
+
+ BOX *crop_points = ignore_alpha_at_edge(copy->alpha, copy->data, w, h, color_pix, &color_pix_out);
++
++ l_int32 x, y, _w, _h;
++
++ boxGetGeometry(crop_points, &x, &y, &_w, &_h);
++
+ // Converting image to grayscale for OCR to avoid issues with transparency
+ cpix_gs = pixConvertRGBToGray(cpix, 0.0, 0.0, 0.0);
+
+@@ -426,8 +417,8 @@
+ {
+ for (int j = x1; j <= x2; j++)
+ {
+- if (copy->data[(crop_points->y + i) * w + (crop_points->x + j)] != firstpixel)
+- histogram[copy->data[(crop_points->y + i) * w + (crop_points->x + j)]]++;
++ if (copy->data[(y + i) * w + (x + j)] != firstpixel)
++ histogram[copy->data[(y + i) * w + (x + j)]]++;
+ }
+ }
+ /* sorted in increasing order of intensity */
diff --git a/srcpkgs/ccextractor/template b/srcpkgs/ccextractor/template
index 9abcd82852b27..64e57a2e4afc9 100644
--- a/srcpkgs/ccextractor/template
+++ b/srcpkgs/ccextractor/template
@@ -1,7 +1,7 @@
# Template file for 'ccextractor'
pkgname=ccextractor
version=0.93
-revision=1
+revision=2
build_wrksrc="linux"
build_style=gnu-configure
configure_args="--enable-ocr --enable-hardsubx"
@@ -16,7 +16,12 @@ distfiles="https://github.com/CCExtractor/${pkgname}/archive/v${version}.tar.gz"
checksum=0e66d3e360db1b02a88271af11313ca4c9bbda1b03728e264a44c4c9f77192e3
CFLAGS="-I${XBPS_CROSS_BASE}/usr/include/tesseract -DPNG_POWERPC_VSX_OPT=0 -fcommon"
+if [ "$CROSS_BUILD" ]; then
+ hostmakedepends+=" tesseract-ocr-devel"
+fi
+
pre_configure() {
+ ln -sf libleptonica.so ${XBPS_CROSS_BASE}/usr/lib/liblept.so
sed -i -e "s/tesseract --version/tesseract-ocr --version/g" configure.ac
./autogen.sh
}
next prev parent reply other threads:[~2023-12-29 10:00 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-19 1:19 [PR PATCH] tesseract-ocr: update to 5.3.2 chrysos349
2023-09-19 1:42 ` chrysos349
2023-09-19 3:42 ` [PR PATCH] [Updated] " chrysos349
2023-09-19 3:49 ` chrysos349
2023-09-19 4:08 ` newbluemoon
2023-09-19 14:41 ` chrysos349
2023-09-19 15:18 ` newbluemoon
2023-12-19 1:46 ` github-actions
2023-12-27 0:04 ` chrysos349
2023-12-27 18:24 ` Piraty
2023-12-28 0:45 ` [PR REVIEW] " Piraty
2023-12-29 2:19 ` [PR PATCH] [Updated] " chrysos349
2023-12-29 2:19 ` [PR REVIEW] " chrysos349
2023-12-29 2:21 ` chrysos349
2023-12-29 10:00 ` chrysos349 [this message]
2023-12-31 1:10 ` [PR REVIEW] tesseract-ocr: update to 5.3.3 Piraty
2023-12-31 1:12 ` Piraty
2023-12-31 3:17 ` [PR PATCH] [Updated] " chrysos349
2023-12-31 3:18 ` [PR REVIEW] " chrysos349
2024-01-07 22:25 ` [PR PATCH] [Merged]: " Piraty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231229100049.M1oztlEY2hx5crpGKvwtBUhiXgatMKQvwoql4B9tZcQ@z \
--to=chrysos349@users.noreply.github.com \
--cc=ml@inbox.vuxu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).