library Tesseract

lang/py

library Tesseract - OCR

C/H 2019. 1. 24. 08:30

테세렉트는 OCR 라이브러로 숫자 제한 없이 폰트를 인식할 수 있도록 훈련이 가능하며, 유니코드문자도 인식할 수 있다.

Install

# Install
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

# edit sources.list
sudo vi /etc/apt/sources.list

Copy the first line "deb http://archive.ubuntu.com/ubuntu bionic main" and paste it as shown below on the next line.
If you are using a different release of ubuntu, then replace bionic with the respective release name.

deb http://archive.ubuntu.com/ubuntu bionic universe

brew install tesseract

Tesseract at UB Mannheim Windows

PATH

export TESSDAT_PREFIX=/usr/local/share/

setx TESSDATA_PRIFIX "C:\Program Files\tesseract OCR\"

Running Tesseract

테서랙트은 CLI 프로그램에서 다음과 같이 사용한다.

tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

# myscan 읽고 결과를 out.txt 저장
tesseract myscan.png out

# lang 설정
tesseract myscan.png out -l deu
tesseract myscan.png out -l eng+deu

#Tesseract also includes a hOCR mode, which produces a special HTML file with the coordinates of each word. This can be used to create a searchable pdf, using a tool such as Hocr2PDF. To use it, use the 'hocr' config option, like this:
#Tesseract에는 각 단어의 좌표가 있는 특수 HTML 파일을 생성하는 hOCR 모드도 포함됩니다. 이것은 Hocr2PDF와 같은 도구를 사용하여 검색 가능한 pdf를 만드는 데 사용할 수 있습니다. 그것을 사용하려면 다음과 같은 'hocr'구성 옵션을 사용하십시오.
tesseract myscan.png out hocr

# 검색가능 PDF 저장
tesseract myscan.png out pdf

다른 언어 트레이닝은 Tessdata repository에서 확인.

Numpy

다른문자나 폰트인식을 위한 훈련을 위해서 필요.

pip install numpy

'lang > py' 카테고리의 다른 글

python proxy scraping (0)	2019.01.29
library Tesseract - OCR test (0)	2019.01.25
library Pillow - thumbnail create (0)	2019.01.23
library requests - HTTPBasicAuth (0)	2019.01.22
library requests - cookie, session, login (0)	2019.01.21

현재글library Tesseract - OCR

Blue Breeze C.H가 끄적이는 개발자 로그

C.H가 끄적이는 개발자 로그

error, Google, windows, ubuntu, Python, 구글, node, HTML, API, PHP, Godot, CSS, Linux, Godot3, mysql, java, 우분투, Android, javascript, nodejs,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Blue Breeze

library Tesseract - OCR

Install

PATH

Running Tesseract

Numpy

'lang > py' 카테고리의 다른 글

'lang/py'의 다른글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

library Tesseract - OCR

Install

PATH

Running Tesseract

Numpy

'lang > py' 카테고리의 다른 글

'lang/py'의 다른글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역