Optical Character Recognition made easy: Learn how to perform OCR with OpenCV, Tesseract, and Python
Optical Character Recognition (OCR) is a simple concept but is hard in practice: Create a piece of software that accepts an input image, have that software automatically recognize the text in the image, and then convert it to machine-encoded text (i.e., a “string” data type).
For example, if I were to present the following image (top) to my OCR algorithm, I would expect it to detect the text, recognize the text, and then encode the text as a string variable, as shown in the output image (bottom).
But despite being such an intuitive concept, OCR is incredibly hard. The field of computer vision has existed for over 50 years (with mechanical OCR machines dating back over 100 years), but we still have not “solved” OCR and created an off-the-shelf OCR system that works in nearly any situation.
And worse, trying to code custom software that can perform OCR is even harder:
My brand-new book, OCR with OpenCV, Tesseract, and Python, is for developers, students, researchers, and hobbyists just like you who want to learn how to successfully apply Optical Character Recognition to your work, research, and projects.
Regardless of your current experience level with computer vision and OCR, after reading this book, you will be armed with the knowledge necessary to tackle your own OCR projects. Just keep reading to learn more.
OCR with OpenCV, Tesseract, and Python will teach you how to successfully apply Optical Character Recognition to your work, projects, and research.
You will learn via practical, hands-on projects (with lots of code) so you can not only develop your own OCR Projects, but feel confident while doing so.
Inside the book we will focus on:
Currently, I have 35+ chapters planned out, with more to come!
Since we’ll be covering so many OCR techniques in-depth, I’ve decided to break the book down into three volumes called “bundles”.
I’ve included a short breakdown of the three bundles below:
The “Intro to OCR” Bundle is right for you if:
- You are new to the world of OCR and Computer Vision
- You are just testing the OCR waters
- You are on a budget
Inside this bundle you will learn the fundamentals of Optical Character Recognition using Tesseract, OpenCV, and Python. And while this is the lowest tier bundle, you’ll still be getting a great education with a lot of hands-on experience.
That said, for a more in-depth treatment of OCR, I would recommend either the “OCR Practitioner” Bundle or “OCR Expert” Bundle.
My Recommendation: The “Intro to OCR” Bundle is a great first step toward applying OCR to real-world projects. You’ll learn the fundamentals of OCR and Tesseract, empowering you to apply OCR to your own projects.
That said, if you are going with this bundle because you’re new to the world of computer vision and OCR, then you should absolutely look at the Practical Python and OpenCV and PyImageSearch Gurus add-ons. Both of these can be used to help you level-up your computer vision skills quickly (and be more successful when applying OCR).
The “OCR Practitioner” Bundle builds on the previous bundle and includes every chapter in the “Intro to OCR” Bundle. This bundle is geared toward more advanced OCR algorithms, techniques, and use cases, including deep learning, image/document alignment, OCR in real-time video streams, OCR with GPUs, cloud-based OCR APIs, and more!
My Recommendation: The “OCR Practitioner” Bundle gives you the best bang for your buck. You should choose this bundle if you want a super in-depth treatment of OCR, but cannot afford the “OCR Expert” Bundle.
If you’re new to computer vision and deep learning, I highly suggest you also get the PyImageSearch Gurus and/or Deep Learning for Computer Vision with Python add-ons — both of these resources will teach you computer vision and deep learning quickly (ensuring you get more value out of your purchase of the OCR book).
The “OCR Expert” Bundle includes everything from both the “Intro to OCR” Bundle and “OCR Practitioner” Bundle.
This bundle also includes:
- All bonus chapters from stretch goals during the IndieGoGo campaign (including chapters that are authored after the campaign has ended).
- A physical, printed edition of all three volumes of OCR with OpenCV, Tesseract, and Python — this is the only bundle that includes a hardcopy edition.
- Access to my private community forums for additional help and support. You’ll get faster, more detailed answers to your questions, and you’ll be able to better connect with myself and other readers. (Again, the other two bundles do not have access to these forums.)
My Recommendation: You should go with the “OCR Expert” Bundle if (1) you want to study OCR in-depth and (2) you want additional help and support along the way. When it comes to learning Optical Character Recognition, you just can’t beat this bundle! Be sure to look at the additional add-ons below to boost your computer vision/deep learning skills before you get started.
The “OCR Expert” Bundle includes a Certificate of Completion. To receive the certificate, you will need to complete all lessons and quizzes associated with the text.
After successfully completing all lessons/quizzes, you will receive your certificate and be able to embed it directly on your LinkedIn profile, thereby demonstrating your Optical Character Recognition skills.
The primary focus of this book is around Tesseract, which is the world’s most popular open source OCR engine. Simply put — if you’re interested in learning how to apply OCR to your own projects, you need to learn how to operate the Tesseract OCR engine.
We’ll be utilizing the Python programming language in this book. Python is an extremely easy language. It also has easy-to-use libraries and packages that allow us to seamlessly interact with our computer vision, deep learning, and OCR APIs.
When we interact with Tesseract via Python, we’ll use PyTesseract. The PyTesseract package interfaces with Tesseract, making it easy to OCR images using Python.
For computer vision and image processing, we’ll be using OpenCV, the de facto standard library for image processing. You’ll find OpenCV easy to use, especially with the hands-on projects covered in the text.
When training our own custom deep learning OCR models, we’ll be using Keras and TensorFlow 2.0+. Using Keras and TensorFlow 2 is the fastest, easiest way to go from idea to experimentation, to result.
You’ll also learn how to use cloud-based OCR APIs, including Amazon Rekognition, Microsoft Cognitive Services, and the Google Vision API.
OCR with OpenCV, Tesseract, and Python is more than just a book — it’s your complete training guide to mastering Optical Character Recognition.
After going through the text and code, you will have the confidence to successfully apply OCR to your own projects, work, and research. I guarantee it.
So, why buy OCR with OpenCV, Tesseract, and Python?
Why not some other book or course?
Well, to start, you won’t be able to find high-quality OCR books that are authored with practitioners and coders in mind. That type of book simply doesn’t exist (until now).
Secondly, when it comes to computer vision, deep learning, and OpenCV, my name and the PyImageSearch brand have become synonymous with super high-quality tutorials and guides.
Here are 5 reasons why this book is better than any other OCR training resource available today:
This book is for developers, students, researchers, and hobbyists who want to learn how to successfully code Optical Character Recognition projects (and have at least some programming/scripting experience).
This book assumes that you have prior programming experience (e.g., you know what a variable, function, loop, etc. are). It does not make any assumptions on your previous experience with computer vision or deep learning.
That said, having some experience in both computer vision and deep learning can be very helpful while working through the material (especially in the more advanced chapters).
If you have little-to-no experience with computer vision and deep learning, don’t worry — this book is still for you; but I would highly recommend you select an IndieGoGo Perk add-on that includes one or more of my previous books and courses:
These resources will help you master computer vision and deep learning (and build a strong foundation for when you study OCR).
This book isn’t just for beginners — there are advanced concepts, algorithms, and techniques covered as well:
I’m Adrian Rosebrock, a Ph.D, entrepreneur, and educator who has spent his entire adult life studying (and teaching) computer vision and deep learning.
Over the past 6 years, I have:
- Started the PyImageSearch.com blog and published over 370 free tutorials and articles aimed at teaching computer vision, deep learning, and OpenCV.
- Authored my first book, Practical Python and OpenCV, which has been featured on the official OpenCV.org website.
- Created the PyImageSearch Gurus course, an actionable, real-world course on computer vision and OpenCV. This course is the most comprehensive computer vision education online today, covering 13 modules broken out into 168 lessons with over 2,161 pages of content.
- Authored Deep Learning for Computer Vision with Python, the most in-depth computer vision and deep learning book available today, including super practical walkthroughs, hands-on tutorials (with lots of code), and a no-nonsense teaching style that will help you master computer vision and deep learning.
- Published Raspberry Pi for Computer Vision, which covers embedded computer vision and deep learning on devices such as the Raspberry Pi, Google Coral, Movidius NCS, and NVIDIA Jetson Nano.
- Answered over 50,000+ emails and helped 10,000s of developers, researchers, and students just like you learn the ropes of computer vision and deep learning.
Teaching computer vision and deep learning is my passion, and I want to pass this passion on to you.
If learning how to successfully apply OCR to your own projects, work, and research sounds interesting, I hope you’ll consider helping me bring this book to life.
Students of mine have gone on to change their careers to CV/DL practitioners, land high paying jobs, publish novel research papers, and win academic research grants. Are you ready to join them?
See you on the other side!
—Adrian Rosebrock
During this IndieGoGo campaign, you will be able to claim your copy of OCR with OpenCV, Tesseract, and Python at a substantial discount (compared to when it is formally published in early 2021).
To celebrate the launch of this campaign, I’m also offering 25% OFF my existing super popular books and courses.
I highly recommend that you choose at least one of these add-ons to make the most out of OCR with OpenCV, Tesseract, and Python. The figure below describes what topics are covered in each add-on. I’ve included descriptions of each add-on as well.
OCR with OpenCV, Tesseract, and Python utilizes deep learning inside the “OCR Practitioner” Bundle and “OCR Expert” Bundle. If you back either of these perks, you should absolutely choose the Deep Learning for Computer Vision with Python add-on as well.
Inside this deep learning book, you’ll find:
- Super practical walkthroughs that present solutions to actual, real-world image understanding problems, including image classification, object detection, and image segmentation
- Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well
- A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition
My Recommendation: You should go with this add-on if you have any interest in studying deep learning. This book has helped 1,000s of developers, researchers, and students master deep learning for computer vision. The knowledge in this book can be directly applied to your computer vision and deep learning projects that leverage OCR.
And best of all, OCR with OpenCV, Tesseract and Python is essentially FREE once you build in the price of the deep learning book.
Practical Python and OpenCV is your guaranteed quick-start guide to learning the fundamentals of computer vision and image processing using OpenCV and Python.
My Recommendation: You should choose the Practical Python and OpenCV add-on if you have zero (or minimal) experience with computer vision and OpenCV and want to learn the basics in less than a single weekend.
The PyImageSearch Gurus Course is similar to a college-level survey course on computer vision but is far more detailed and much more hands-on and practical.
Inside the PyImageSearch Gurus course, you’ll find:
- An actionable, real-world course on computer vision and OpenCV
- The most comprehensive computer vision education online today, including 13 modules broken out into 168 lessons with over 2,161 pages of content
- A community of like-minded developers, researchers, and students — just like you — who are eager to study computer vision
My Recommendation: You should choose the PyImageSearch Gurus course add-on if you want to study computer vision in-depth, enabling you to better develop OCR applications. You’ll be getting a GREAT deal by going with this add-on — the OCR with OpenCV, Tesseract and Python book cost is essentially FREE once you build in the price of the Gurus course.
Raspberry Pi for Computer Vision (RPi4CV) allows you to bring computer vision and deep learning to embedded devices, including the:
- Raspberry Pi
- Intel Movidius NCS
- Google Coral
- NVIDIA Jetson Nano
Inside, you’ll find over 40 projects (including 60+ chapters) on embedded Computer Vision and Deep Learning. A handful of highlighted projects include:
- Traffic counting and vehicle speed detection
- Real-time face recognition and building a smart classroom attendance system
- Automatic hand gesture recognition
- Daytime and nighttime wildlife monitoring
- Security applications using computer vision
- Deep Learning classification, object detection, and instance segmentation on resource-constrained devices
- …and much more!
My Recommendation: You should grab the Raspberry Pi for Computer Vision add-on if you’re interested in using embedded devices such as the RPi, Google Coral, Movidius NCS, and Jetson Nano for computer vision and deep learning. Combining the knowledge you’ll gain from RPi4CV with OCR with OpenCV, Tesseract, and Python will allow you to perform OCR on embedded devices.
This add-on includes every single book/course I’ve ever written, including the four add-ons I’ve detailed above:
- Deep Learning for Computer Vision with Python
- Practical Python and OpenCV
- PyImageSearch Gurus course
- Raspberry Pi for Computer Vision
Not only will you be getting the OCR with OpenCV, Tesseract, and Python book, but you’ll also be getting ~30% OFF the normal price for these four resources (compared to the list price).
My Recommendation: You should choose this add-on if you want access to my entire library of books and courses. You’ll be getting ~30% OFF the normal list price — and you’ll be getting the most complete, comprehensive computer vision and deep learning education available today. The knowledge gained from these books and courses will better help you build computer vision, deep learning, and OCR applications.
IndieGoGo supports payment via credit and debit card. If you wish to order using credit or credit card, feel free to use this campaign page.
Otherwise, if you instead want to purchase using PayPal, I created a separate checkout page on PyImageSearch’s website so you can still take advantage of the discount.
And don’t worry, both the IndieGoGo page and the PayPal page give you the same discount on the OCR book and add-ons.
Provided this IndieGoGo campaign is successfully funded, I intend to complete and release the “Intro to OCR” Bundle by January 2021. The “OCR Practitioner” Bundle will be released in February 2021. And finally, the “OCR Expert” Bundle will be released in March 2021.
I have included a timeline of important events below:
Currently, over 85% of all examples in the book have been coded up. I have drafted ~45% of all chapters.
You can use this link to download a PDF that contains the table of contents and a few sample chapters. I think you will be impressed with the quality of text thus far.
Given my extensive 6+ years experience authoring blog posts, tutorials, books, and courses, I am extremely comfortable with my writing abilities. I’m confident that I can deliver the three bundles by the proposed late-winter/early-spring deadlines.
Like many IndieGoGo projects, OCR with OpenCV, Tesseract, and Python is squarely in the “alpha phase”. I’ve coded up ~85% of all examples in the book and have drafted ~45% of all chapters. I have a clear path to finishing up the text and creating a high-quality publishable work.
With focused writing time (which I will of course dedicate to this project), I can typically draft 5-8 chapters in a single week. Given my experience and expertise in this area, I believe many of the risks and challenges have already been mitigated.
Launch timing: Every project has inherent risks and potential unforeseen circumstances that cause delays in a launch. That said, with nearly half of the text already drafted, I don’t expect any significant hiccups along the way. Even if something tragic were to happen to me personally, my team could easily finish the text and release it on time. That said, if there is any type of delay in the publishing process, I will absolutely keep you in the loop at all times.
Experience: Over the past six years running the PyImageSearch blog, my team and I have authored 350+ free tutorials, a 275-page book, a 900+ page book, a 1,200+ page book, and a 2,100+ page course. I have no doubt that we’ll be able to deliver a high-quality book that will teach you how to successfully and confidently apply OCR to your own projects.