Github messages for voidlinux
 help / color / mirror / Atom feed
From: chrysos349 <chrysos349@users.noreply.github.com>
To: ml@inbox.vuxu.org
Subject: Re: [PR PATCH] [Updated] tesseract-ocr: update to 5.3.3
Date: Fri, 29 Dec 2023 11:00:49 +0100	[thread overview]
Message-ID: <20231229100049.M1oztlEY2hx5crpGKvwtBUhiXgatMKQvwoql4B9tZcQ@z> (raw)
In-Reply-To: <gh-mailinglist-notifications-41a7ca26-5023-4802-975b-f1789d68868e-void-packages-46124@inbox.vuxu.org>

[-- Attachment #1: Type: text/plain, Size: 853 bytes --]

There is an updated pull request by chrysos349 against master on the void-packages repository

https://github.com/chrysos349/void-packages tesseract-ocr
https://github.com/void-linux/void-packages/pull/46124

tesseract-ocr: update to 5.3.3
@newbluemoon
`ccextractor` was updated to the latest version, because it had to be rebuilt for tesseract-ocr-5.3.2 anyway.

@Piraty 
`arcan` was revbumped for tesseract-ocr-5.3.2 .

#### Testing the changes
- I tested the changes in this PR: **YES**

#### Local build testing
- I built this PR locally for my native architecture, (x86_64)
- I built this PR locally for these architectures (if supported. mark crossbuilds): 
  - i686
  - aarch64
  - armv7l
  - x86_64-musl
  - armv6l-musl
  - aarch64-musl

A patch file from https://github.com/void-linux/void-packages/pull/46124.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-tesseract-ocr-46124.patch --]
[-- Type: text/x-diff, Size: 20702 bytes --]

From 48af9e27c35aca3c9e505844155c40ba7852a800 Mon Sep 17 00:00:00 2001
From: chrysos349 <chrysostom349@gmail.com>
Date: Tue, 19 Sep 2023 04:03:59 +0300
Subject: [PATCH 1/4] leptonica: update to 1.84.0

---
 common/shlibs                                 |  2 +-
 .../patches/fix-flaky-test-on-i686.patch      | 70 -------------------
 srcpkgs/leptonica/template                    | 24 +++++--
 3 files changed, 20 insertions(+), 76 deletions(-)
 delete mode 100644 srcpkgs/leptonica/patches/fix-flaky-test-on-i686.patch

diff --git a/common/shlibs b/common/shlibs
index c9d59ef3b97ca..950f5f3cf76aa 100644
--- a/common/shlibs
+++ b/common/shlibs
@@ -2294,7 +2294,7 @@ libOkteta3Gui.so.0 okteta-0.26.0_1
 libhttp_parser.so.2.9 http-parser-2.9.0_1
 libmaa.so.4 libmaa-1.4.2_1
 libcodeblocks.so.0 codeblocks-13.12_1
-liblept.so.5 leptonica-1.73_1
+libleptonica.so.6 leptonica-1.84.0_1
 libtesseract.so.4 tesseract-ocr-4.0.0_1
 libffmpegthumbnailer.so.4 ffmpegthumbnailer-2.0.10_1
 libopenraw.so.7 libopenraw-0.1.0_1
diff --git a/srcpkgs/leptonica/patches/fix-flaky-test-on-i686.patch b/srcpkgs/leptonica/patches/fix-flaky-test-on-i686.patch
deleted file mode 100644
index bec1a2482f414..0000000000000
--- a/srcpkgs/leptonica/patches/fix-flaky-test-on-i686.patch
+++ /dev/null
@@ -1,70 +0,0 @@
-From ea2bb8c9cf61d3eba2589cfaac05f59a33b4110d Mon Sep 17 00:00:00 2001
-From: danblooomberg <dan.bloomberg@gmail.com>
-Date: Sun, 14 Nov 2021 14:52:24 -0800
-Subject: [PATCH] Fix flaky hash_reg test on i686 * The sets that are generated
- from *SelectRange() functions can depend on   the platform, resulting in
- intersection sizes that differ by 1. * So, loosen the comparison to allow a
- difference of 1.
-
----
- prog/hash_reg.c | 12 ++++++------
- 1 file changed, 6 insertions(+), 6 deletions(-)
-
-diff --git a/prog/hash_reg.c b/prog/hash_reg.c
-index 8b408d6d..3414ba90 100644
---- a/prog/hash_reg.c
-+++ b/prog/hash_reg.c
-@@ -100,7 +100,7 @@ L_REGPARAMS  *rp;
-     sarrayIntersectionByAset(sa1, sa2, &sa3);
-     c1 = sarrayGetCount(sa3);
-     sarrayDestroy(&sa3);
--    regTestCompareValues(rp, string_intersection, c1, 0);  /* 2 */
-+    regTestCompareValues(rp, string_intersection, c1, 1);  /* 2 */
-     if (rp->display) lept_stderr("  aset: intersection size = %d\n", c1);
-     sarrayUnionByAset(sa1, sa2, &sa3);
-     c1 = sarrayGetCount(sa3);
-@@ -123,7 +123,7 @@ L_REGPARAMS  *rp;
-     sarrayIntersectionByHmap(sa1, sa2, &sa3);
-     c1 = sarrayGetCount(sa3);
-     sarrayDestroy(&sa3);
--    regTestCompareValues(rp, string_intersection, c1, 0);  /* 6 */
-+    regTestCompareValues(rp, string_intersection, c1, 1);  /* 6 */
-     if (rp->display) lept_stderr("  hmap: intersection size = %d\n", c1);
-     sarrayUnionByHmap(sa1, sa2, &sa3);
-     c1 = sarrayGetCount(sa3);
-@@ -160,7 +160,7 @@ L_REGPARAMS  *rp;
-     ptaIntersectionByAset(pta1, pta2, &pta3);
-     c1 = ptaGetCount(pta3);
-     ptaDestroy(&pta3);
--    regTestCompareValues(rp, pta_intersection, c1, 0);  /* 10 */
-+    regTestCompareValues(rp, pta_intersection, c1, 1);  /* 10 */
-     if (rp->display) lept_stderr("  aset: intersection size = %d\n", c1);
-     ptaUnionByAset(pta1, pta2, &pta3);
-     c1 = ptaGetCount(pta3);
-@@ -182,7 +182,7 @@ L_REGPARAMS  *rp;
-     ptaIntersectionByHmap(pta1, pta2, &pta3);
-     c1 = ptaGetCount(pta3);
-     ptaDestroy(&pta3);
--    regTestCompareValues(rp, pta_intersection, c1, 0);  /* 14 */
-+    regTestCompareValues(rp, pta_intersection, c1, 1);  /* 14 */
-     if (rp->display) lept_stderr("  hmap: intersection size = %d\n", c1);
-     ptaUnionByHmap(pta1, pta2, &pta3);
-     c1 = ptaGetCount(pta3);
-@@ -220,7 +220,7 @@ L_REGPARAMS  *rp;
-     l_dnaIntersectionByAset(da1, da2, &da3);
-     c1 = l_dnaGetCount(da3);
-     l_dnaDestroy(&da3);
--    regTestCompareValues(rp, da_intersection, c1, 0);  /* 18 */
-+    regTestCompareValues(rp, da_intersection, c1, 1);  /* 18 */
-     if (rp->display) lept_stderr("  aset: intersection size = %d\n", c1);
-     l_dnaUnionByAset(da1, da2, &da3);
-     c1 = l_dnaGetCount(da3);
-@@ -242,7 +242,7 @@ L_REGPARAMS  *rp;
-     l_dnaIntersectionByHmap(da1, da2, &da3);
-     c1 = l_dnaGetCount(da3);
-     l_dnaDestroy(&da3);
--    regTestCompareValues(rp, da_intersection, c1, 0);  /* 22 */
-+    regTestCompareValues(rp, da_intersection, c1, 1);  /* 22 */
-     if (rp->display) lept_stderr("  hmap: intersection size = %d\n", c1);
-     l_dnaUnionByHmap(da1, da2, &da3);
-     c1 = l_dnaGetCount(da3);
diff --git a/srcpkgs/leptonica/template b/srcpkgs/leptonica/template
index 17256b7b157b4..f2c5766415c56 100644
--- a/srcpkgs/leptonica/template
+++ b/srcpkgs/leptonica/template
@@ -1,9 +1,9 @@
 # Template file for 'leptonica'
 pkgname=leptonica
-version=1.82.0
-revision=2
+version=1.84.0
+revision=1
 build_style=gnu-configure
-hostmakedepends="pkg-config"
+hostmakedepends="pkg-config automake libtool"
 makedepends="libopenjpeg2-devel libwebp-devel"
 checkdepends="which gnuplot"
 short_desc="Image processing and analysis library"
@@ -11,8 +11,21 @@ maintainer="Orphaned <orphan@voidlinux.org>"
 license="BSD-2-Clause"
 homepage="http://leptonica.org/"
 changelog="http://leptonica.org/source/version-notes.html"
-distfiles="http://leptonica.org/source/${pkgname}-${version}.tar.gz"
-checksum=155302ee914668c27b6fe3ca9ff2da63b245f6d62f3061c8f27563774b8ae2d6
+distfiles="https://github.com/DanBloomberg/leptonica/archive/${version}.tar.gz"
+checksum=440e6bb1b11e385310b31fab2505c9b0e0835a42f2fc985c2f79c81a8684ff98
+
+pre_check() {
+	# disable failing tests
+	vsed -i prog/Makefile.am \
+		-e "s/boxa3_reg//" \
+		-e "s/projection_reg//" \
+		-e "s/rankhisto_reg//" \
+		-e "s/rankbin_reg//"
+}
+
+pre_configure() {
+	./autogen.sh
+}
 
 post_install() {
 	vdoc moller52.jpg
@@ -28,6 +41,7 @@ leptonica-devel_package() {
 		vmove usr/lib/cmake
 		vmove usr/lib/pkgconfig
 		vmove "usr/lib/*.so"
+		vmove "usr/lib/*.a"
 		vdoc style-guide.txt
 	}
 }

From badf01d4cecdcfc8273a526d9b46c833932ac756 Mon Sep 17 00:00:00 2001
From: chrysos349 <chrysostom349@gmail.com>
Date: Tue, 19 Sep 2023 04:07:50 +0300
Subject: [PATCH 2/4] tesseract-ocr: update to 5.3.3

---
 common/shlibs                                 |  2 +-
 .../{tesseract-ocr-kur => tesseract-ocr-kmr}  |  0
 srcpkgs/tesseract-ocr-kur_ara                 |  1 -
 srcpkgs/tesseract-ocr/files/COPYING           | 14 ------
 .../tesseract-ocr/patches/disable-neon.patch  | 14 ++++++
 .../tesseract-ocr/patches/musl-sys-time.patch | 17 +++----
 srcpkgs/tesseract-ocr/template                | 45 +++++++------------
 7 files changed, 41 insertions(+), 52 deletions(-)
 rename srcpkgs/{tesseract-ocr-kur => tesseract-ocr-kmr} (100%)
 delete mode 120000 srcpkgs/tesseract-ocr-kur_ara
 delete mode 100644 srcpkgs/tesseract-ocr/files/COPYING
 create mode 100644 srcpkgs/tesseract-ocr/patches/disable-neon.patch

diff --git a/common/shlibs b/common/shlibs
index 950f5f3cf76aa..1de39e0bfa84c 100644
--- a/common/shlibs
+++ b/common/shlibs
@@ -2295,7 +2295,7 @@ libhttp_parser.so.2.9 http-parser-2.9.0_1
 libmaa.so.4 libmaa-1.4.2_1
 libcodeblocks.so.0 codeblocks-13.12_1
 libleptonica.so.6 leptonica-1.84.0_1
-libtesseract.so.4 tesseract-ocr-4.0.0_1
+libtesseract.so.5 tesseract-ocr-5.3.3_1
 libffmpegthumbnailer.so.4 ffmpegthumbnailer-2.0.10_1
 libopenraw.so.7 libopenraw-0.1.0_1
 libopenrawgnome.so.7 libopenraw-0.1.0_1
diff --git a/srcpkgs/tesseract-ocr-kur b/srcpkgs/tesseract-ocr-kmr
similarity index 100%
rename from srcpkgs/tesseract-ocr-kur
rename to srcpkgs/tesseract-ocr-kmr
diff --git a/srcpkgs/tesseract-ocr-kur_ara b/srcpkgs/tesseract-ocr-kur_ara
deleted file mode 120000
index 79bcf15f05ba5..0000000000000
--- a/srcpkgs/tesseract-ocr-kur_ara
+++ /dev/null
@@ -1 +0,0 @@
-tesseract-ocr
\ No newline at end of file
diff --git a/srcpkgs/tesseract-ocr/files/COPYING b/srcpkgs/tesseract-ocr/files/COPYING
deleted file mode 100644
index 11e05af425fc8..0000000000000
--- a/srcpkgs/tesseract-ocr/files/COPYING
+++ /dev/null
@@ -1,14 +0,0 @@
-This repository contains language data for Tesseract Open Source
-OCR Engine. All data in the repository are licensed under the Apache 
-License:
-
-** Licensed under the Apache License, Version 2.0 (the "License");
-** you may not use this file except in compliance with the License.
-** You may obtain a copy of the License at
-** http://www.apache.org/licenses/LICENSE-2.0
-** Unless required by applicable law or agreed to in writing, software
-** distributed under the License is distributed on an "AS IS" BASIS,
-** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-** See the License for the specific language governing permissions and
-** limitations under the License.
-
diff --git a/srcpkgs/tesseract-ocr/patches/disable-neon.patch b/srcpkgs/tesseract-ocr/patches/disable-neon.patch
new file mode 100644
index 0000000000000..d491ef1e47b81
--- /dev/null
+++ b/srcpkgs/tesseract-ocr/patches/disable-neon.patch
@@ -0,0 +1,14 @@
+--- a/configure.ac
++++ b/configure.ac
+@@ -177,6 +177,11 @@
+     AC_DEFINE([HAVE_NEON], [1], [Enable NEON instructions])
+     ;;
+ 
++  arm|armv7l)
++
++    AC_MSG_WARN([No compiler options for $host_cpu])
++    ;;
++
+   arm*)
+ 
+     AX_CHECK_COMPILE_FLAG([-mfpu=neon], [neon=true], [neon=false], [$WERROR])
diff --git a/srcpkgs/tesseract-ocr/patches/musl-sys-time.patch b/srcpkgs/tesseract-ocr/patches/musl-sys-time.patch
index 9c6337d188639..5c75864248fe8 100644
--- a/srcpkgs/tesseract-ocr/patches/musl-sys-time.patch
+++ b/srcpkgs/tesseract-ocr/patches/musl-sys-time.patch
@@ -1,12 +1,13 @@
---- a/src/ccutil/ocrclass.h	2019-07-07 14:34:08.000000000 +0200
-+++ b/src/ccutil/ocrclass.h	2019-07-08 10:47:15.347415888 +0200
-@@ -31,6 +31,9 @@
- #ifdef _WIN32
- #include <winsock2.h> // for timeval
- #endif
+--- a/include/tesseract/ocrclass.h
++++ b/include/tesseract/ocrclass.h
+@@ -29,6 +29,10 @@
+ 
+ #include <chrono>
+ #include <ctime>
 +#ifndef __GLIBC__
 +#include <sys/time.h>
 +#endif
++
+ 
+ namespace tesseract {
  
- /**********************************************************************
-  * EANYCODE_CHAR
diff --git a/srcpkgs/tesseract-ocr/template b/srcpkgs/tesseract-ocr/template
index de6df3a768d31..49b4045888324 100644
--- a/srcpkgs/tesseract-ocr/template
+++ b/srcpkgs/tesseract-ocr/template
@@ -1,14 +1,15 @@
 # Template file for 'tesseract-ocr'
 pkgname=tesseract-ocr
-version=4.1.1
-revision=9
-_tessdataver=4.0.0
+version=5.3.3
+revision=1
+_tessdataver=4.1.0
 create_wrksrc=yes
 build_style=gnu-configure
 configure_args="LIBLEPT_HEADERSDIR=${XBPS_CROSS_BASE}/usr/include $(vopt_enable openmp)"
 make_build_args="all training"
 hostmakedepends="automake libtool pkg-config leptonica libxslt asciidoc"
-makedepends="cairo-devel pango-devel leptonica-devel $(vopt_if openmp libgomp-devel) icu-devel"
+makedepends="cairo-devel pango-devel leptonica-devel $(vopt_if openmp libgomp-devel) icu-devel
+ libarchive-devel libcurl-devel"
 short_desc="Tesseract Open Source OCR engine"
 maintainer="Orphaned <orphan@voidlinux.org>"
 license="Apache-2.0"
@@ -16,13 +17,15 @@ homepage="https://github.com/tesseract-ocr/tesseract"
 distfiles="
  https://github.com/tesseract-ocr/tesseract/archive/${version}.tar.gz>${pkgname}-${version}.tar.gz
  https://github.com/tesseract-ocr/tessdata/archive/${_tessdataver}.tar.gz>tessdata-${_tessdataver}.tar.gz"
-checksum="2a66ff0d8595bff8f04032165e6c936389b1e5727c3ce5a27b3e059d218db1cb
- 38c637d3a1763f6c3d32e8f1d979f045668676ec5feb8ee1869ee77cedd31b08"
+checksum="dc4329f85f41191b2d813b71b528ba6047745813474e583ccce8795ff2ff5681
+ 990fffb9b7a9b52dc9a2d053a9ef6852ca2b72bd8dfb22988b0b990a700fd3c7"
 
 build_options="openmp"
 build_options_default="openmp"
 desc_option_openmp="Enable Open MP (gomp)"
 
+disable_parallel_build=yes # fails to build otherwise
+
 # Create a package for one specific language $1
 pkg_lang() {
 	local f script lang=$1
@@ -46,8 +49,8 @@ pkg_lang() {
 
 post_extract() {
 	mv tesseract-${version}/* .
+	rm -rf tessdata-${_tessdataver}/{tessconfigs,configs,pdf.ttf}
 	mv tessdata-${_tessdataver}/* ${wrksrc}/tessdata
-	rmdir tessdata-${_tessdataver}
 }
 pre_configure() {
 	NOCONFIGURE=1 ./autogen.sh
@@ -62,7 +65,6 @@ post_install() {
 	mv ${DESTDIR}/usr/share/man/man1/tesseract{,-ocr}.1
 	vdoc ChangeLog
 	vdoc README.md
-	vlicense ${FILESDIR}/COPYING LICENSE-tessdata
 	# Move the pseudo languges "equ" (math / equation detection) and
 	# "osd" (orientation and script detection) to the main package
 	for lang in equ osd; do
@@ -79,13 +81,6 @@ tesseract-ocr-tools_package() {
 		vmkdir usr/share/tesseract
 		vmkdir usr/share/man/man1
 		vmkdir usr/share/man/man5
-		# Copy shell scripts
-		for f in language-specific.sh tesstrain.sh tesstrain_utils.sh; do
-			if [ -e ${wrksrc}/training/${f} ]; then
-				cp -a ${wrksrc}/training/${f} \
-					${PKGDESTDIR}/usr/share/tesseract
-			fi
-		done
 		# Move tool manual pages
 		for f in ambiguous_words cntraining combine_tessdata \
 			dawg2wordlist mftraining shapeclustering unicharambigs \
@@ -99,7 +94,8 @@ tesseract-ocr-tools_package() {
 	}
 }
 tesseract-ocr-devel_package() {
-	depends="${sourcepkg}>=${version}_${revision}"
+	depends="${sourcepkg}>=${version}_${revision} leptonica-devel
+	 libarchive-devel libcurl-devel"
 	short_desc+=" - development files"
 	pkg_install() {
 		vmove usr/include/tesseract
@@ -129,7 +125,7 @@ tesseract-ocr-all_package() {
 	for lang in afr amh ara asm aze aze_cyrl bel ben bod bos bre bul cat ceb \
 		ces chi_sim chi_tra chr cos cym dan deu div dzo ell eng enm epo est eus fao \
 		fas fil fin fra frk frm fry gla gle glg grc guj hat heb hin hrv hun hye iku ind isl ita \
-		ita_old jav jpn kan kat kat_old kaz khm kir kor kur kur_ara lao lat lav lit ltz mal mar \
+		ita_old jav jpn kan kat kat_old kaz khm kir kmr kor lao lat lav lit ltz mal mar \
 		mkd mlt mon mri msa mya nep nld nor oci ori pan pol por que pus ron rus san sin slk slv \
 		snd spa spa_old sqi srp srp_latn sun swa swe syr tam tat tel tgk tgl tha tir ton tur \
 		uig ukr urd uzb uzb_cyrl vie yid yor \
@@ -576,23 +572,16 @@ tesseract-ocr-kir_package() {
 		$(pkg_lang ${pkgname#tesseract-ocr-})
 	}
 }
-tesseract-ocr-kor_package() {
-	depends="${sourcepkg}>=${version}_${revision}"
-	short_desc+=" - Korean language data"
-	pkg_install() {
-		$(pkg_lang ${pkgname#tesseract-ocr-})
-	}
-}
-tesseract-ocr-kur_package() {
+tesseract-ocr-kmr_package() {
 	depends="${sourcepkg}>=${version}_${revision}"
-	short_desc+=" - Kurdish language data"
+	short_desc+=" - Kurmanji (Kurdish - Latin Script) language data"
 	pkg_install() {
 		$(pkg_lang ${pkgname#tesseract-ocr-})
 	}
 }
-tesseract-ocr-kur_ara_package() {
+tesseract-ocr-kor_package() {
 	depends="${sourcepkg}>=${version}_${revision}"
-	short_desc+=" - Kurdish (Arabic) language data"
+	short_desc+=" - Korean language data"
 	pkg_install() {
 		$(pkg_lang ${pkgname#tesseract-ocr-})
 	}

From 060437b926fdb0d61fffae99583263479115bde1 Mon Sep 17 00:00:00 2001
From: chrysos349 <chrysostom349@gmail.com>
Date: Tue, 19 Sep 2023 04:09:24 +0300
Subject: [PATCH 3/4] arcan: revbump for tesseract-5.3.3

---
 srcpkgs/arcan/template | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/srcpkgs/arcan/template b/srcpkgs/arcan/template
index efd0afe10576d..ff9091f90ebb1 100644
--- a/srcpkgs/arcan/template
+++ b/srcpkgs/arcan/template
@@ -2,7 +2,7 @@
 # !! keep synced with: acfgfs aclip aloadimage
 pkgname=arcan
 version=0.6.2.1
-revision=1
+revision=2
 create_wrksrc=yes
 build_wrksrc=arcan/src
 build_style=cmake
@@ -27,7 +27,7 @@ homepage="https://arcan-fe.com/"
 _versionOpenal=0.5.4
 distfiles="https://github.com/letoram/arcan/archive/${version}.tar.gz
  https://github.com/letoram/openal/archive/${_versionOpenal}.tar.gz>openal_arcan.${_versionOpenal}.tar.gz"
-checksum="7bf083412bc61555472877313c13116431a0a36fccbf142f97559db43b4a1475
+checksum="30900dd80dfa272e6cc3343d50e9d2748eb06d97c78a8e87a743abd475638deb
  3a50a87c05b67c466a868cc77f8dc7f9cfc9466aeeafcd823daca0d108c504da"
 
 export CMAKE_GENERATOR="Unix Makefiles"

From b255f2a51877fd0f3691828e14f0b21ac4b65139 Mon Sep 17 00:00:00 2001
From: chrysos349 <chrysostom349@gmail.com>
Date: Tue, 19 Sep 2023 04:11:27 +0300
Subject: [PATCH 4/4] ccextractor: revbump for tesseract-5.3.3

---
 srcpkgs/ccextractor/patches/fix-ocr.patch | 106 ++++++++++++++++++++++
 srcpkgs/ccextractor/template              |   7 +-
 2 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 srcpkgs/ccextractor/patches/fix-ocr.patch

diff --git a/srcpkgs/ccextractor/patches/fix-ocr.patch b/srcpkgs/ccextractor/patches/fix-ocr.patch
new file mode 100644
index 0000000000000..2681c60aa414e
--- /dev/null
+++ b/srcpkgs/ccextractor/patches/fix-ocr.patch
@@ -0,0 +1,106 @@
+--- a/src/lib_ccx/hardsubx.c
++++ b/src/lib_ccx/hardsubx.c
+@@ -221,7 +221,7 @@
+ 	char *pars_values = strdup("/dev/null");
+ 	char *tessdata_path = NULL;
+ 
+-	char *lang = options->ocrlang;
++	char *lang = (char *)options->ocrlang;
+ 	if (!lang)
+ 		lang = "eng"; // English is default language
+ 
+@@ -245,7 +245,7 @@
+ 
+ 	int ret = -1;
+ 
+-	if (!strncmp("4.", TessVersion(), 2))
++	if (!strncmp("4.", TessVersion(), 2) || !strncmp("5.", TessVersion(), 2))
+ 	{
+ 		char tess_path[1024];
+ 		if (ccx_options.ocr_oem < 0)
+--- a/src/lib_ccx/ocr.c
++++ b/src/lib_ccx/ocr.c
+@@ -97,36 +97,22 @@
+ char *probe_tessdata_location(const char *lang)
+ {
+ 	int ret = 0;
+-	char *tessdata_dir_path = getenv("TESSDATA_PREFIX");
+ 
+-	ret = search_language_pack(tessdata_dir_path, lang);
+-	if (!ret)
+-		return tessdata_dir_path;
+-
+-	tessdata_dir_path = "./";
+-	ret = search_language_pack(tessdata_dir_path, lang);
+-	if (!ret)
+-		return tessdata_dir_path;
+-
+-	tessdata_dir_path = "/usr/share/";
+-	ret = search_language_pack(tessdata_dir_path, lang);
+-	if (!ret)
+-		return tessdata_dir_path;
+-
+-	tessdata_dir_path = "/usr/local/share/";
+-	ret = search_language_pack(tessdata_dir_path, lang);
+-	if (!ret)
+-		return tessdata_dir_path;
+-
+-	tessdata_dir_path = "/usr/share/tesseract-ocr/";
+-	ret = search_language_pack(tessdata_dir_path, lang);
+-	if (!ret)
+-		return tessdata_dir_path;
+-
+-	tessdata_dir_path = "/usr/share/tesseract-ocr/4.00/";
+-	ret = search_language_pack(tessdata_dir_path, lang);
+-	if (!ret)
+-		return tessdata_dir_path;
++	const char *paths[] = {
++	    getenv("TESSDATA_PREFIX"),
++	    "./",
++	    "/usr/share/",
++	    "/usr/local/share/",
++	    "/usr/share/tesseract-ocr/",
++	    "/usr/share/tesseract-ocr/4.00/",
++	    "/usr/share/tesseract-ocr/5/",
++	    "/usr/share/tesseract/"};
++
++	for (int i = 0; i < sizeof(paths) / sizeof(paths[0]); i++)
++	{
++		if (!search_language_pack(paths[i], lang))
++			return (char *)paths[i];
++	}
+ 
+ 	return NULL;
+ }
+@@ -174,7 +160,7 @@
+ 	char *pars_values = strdup("tess.log");
+ 
+ 	ctx->api = TessBaseAPICreate();
+-	if (!strncmp("4.", TessVersion(), 2))
++	if (!strncmp("4.", TessVersion(), 2) || !strncmp("5.", TessVersion(), 2))
+ 	{
+ 		char tess_path[1024];
+ 		snprintf(tess_path, 1024, "%s%s%s", tessdata_path, "/", "tessdata");
+@@ -331,6 +317,11 @@
+ 	}
+ 
+ 	BOX *crop_points = ignore_alpha_at_edge(copy->alpha, copy->data, w, h, color_pix, &color_pix_out);
++
++	l_int32 x, y, _w, _h;
++
++	boxGetGeometry(crop_points, &x, &y, &_w, &_h);
++
+ 	// Converting image to grayscale for OCR to avoid issues with transparency
+ 	cpix_gs = pixConvertRGBToGray(cpix, 0.0, 0.0, 0.0);
+ 
+@@ -426,8 +417,8 @@
+ 				{
+ 					for (int j = x1; j <= x2; j++)
+ 					{
+-						if (copy->data[(crop_points->y + i) * w + (crop_points->x + j)] != firstpixel)
+-							histogram[copy->data[(crop_points->y + i) * w + (crop_points->x + j)]]++;
++						if (copy->data[(y + i) * w + (x + j)] != firstpixel)
++							histogram[copy->data[(y + i) * w + (x + j)]]++;
+ 					}
+ 				}
+ 				/* sorted in increasing order of intensity */
diff --git a/srcpkgs/ccextractor/template b/srcpkgs/ccextractor/template
index 9abcd82852b27..64e57a2e4afc9 100644
--- a/srcpkgs/ccextractor/template
+++ b/srcpkgs/ccextractor/template
@@ -1,7 +1,7 @@
 # Template file for 'ccextractor'
 pkgname=ccextractor
 version=0.93
-revision=1
+revision=2
 build_wrksrc="linux"
 build_style=gnu-configure
 configure_args="--enable-ocr --enable-hardsubx"
@@ -16,7 +16,12 @@ distfiles="https://github.com/CCExtractor/${pkgname}/archive/v${version}.tar.gz"
 checksum=0e66d3e360db1b02a88271af11313ca4c9bbda1b03728e264a44c4c9f77192e3
 CFLAGS="-I${XBPS_CROSS_BASE}/usr/include/tesseract -DPNG_POWERPC_VSX_OPT=0 -fcommon"
 
+if [ "$CROSS_BUILD" ]; then
+	hostmakedepends+=" tesseract-ocr-devel"
+fi
+
 pre_configure() {
+	ln -sf libleptonica.so ${XBPS_CROSS_BASE}/usr/lib/liblept.so
 	sed -i -e "s/tesseract --version/tesseract-ocr --version/g" configure.ac
 	./autogen.sh
 }

  parent reply	other threads:[~2023-12-29 10:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-19  1:19 [PR PATCH] tesseract-ocr: update to 5.3.2 chrysos349
2023-09-19  1:42 ` chrysos349
2023-09-19  3:42 ` [PR PATCH] [Updated] " chrysos349
2023-09-19  3:49 ` chrysos349
2023-09-19  4:08 ` newbluemoon
2023-09-19 14:41 ` chrysos349
2023-09-19 15:18 ` newbluemoon
2023-12-19  1:46 ` github-actions
2023-12-27  0:04 ` chrysos349
2023-12-27 18:24 ` Piraty
2023-12-28  0:45 ` [PR REVIEW] " Piraty
2023-12-29  2:19 ` [PR PATCH] [Updated] " chrysos349
2023-12-29  2:19 ` [PR REVIEW] " chrysos349
2023-12-29  2:21 ` chrysos349
2023-12-29 10:00 ` chrysos349 [this message]
2023-12-31  1:10 ` [PR REVIEW] tesseract-ocr: update to 5.3.3 Piraty
2023-12-31  1:12 ` Piraty
2023-12-31  3:17 ` [PR PATCH] [Updated] " chrysos349
2023-12-31  3:18 ` [PR REVIEW] " chrysos349
2024-01-07 22:25 ` [PR PATCH] [Merged]: " Piraty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231229100049.M1oztlEY2hx5crpGKvwtBUhiXgatMKQvwoql4B9tZcQ@z \
    --to=chrysos349@users.noreply.github.com \
    --cc=ml@inbox.vuxu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).