Building a command line version of Apple's Live Text using Google Cloud

I’ve updated the code on GitHub to create an overlay image showing where the text was found when the user uses the -o modifier

In the latest update to iOS and iPad OS, the camera can now auto-detect text in pictures (yes, this has been in Android for a while).

This is a handy feature that I’d like on my Mac as well. But it’s not out yet, so let’s have some fun and build a little tool to do the same thing!

We’re going to using a couple of Google Cloud services to build out a simple command line application that takes an image file and returns any text inside.

You can watch the live stream on demand 👆 and there’s more detail below…

Research

The “pre-work” done for this stream was simply research. I checked out the Vision API product page and the pricing. The service is very inexpensive at free under 1,000 requests per month and only $1.50/1,000 requests. That’s completely reasonable for any uses I might have.

One example of the Vision API I saw indicated that images should be pulled from a Google Cloud Cloud Storage bucket. I made sure to create a new bucket and in fact, a new project before starting the stream.

I also put together some of the plumbing code, so as not to bore everyone…more 😉.

References

The code for the stream is available at https://github.com/marknca/tiny-cloud-projects
Detect text in images from the Google Cloud docs
Detect handwritten text in a local file from the Google Cloud docs
Apple’s docs on their Live Text feature

What’s Next?

On previous streams, I’ve work through my thought process or even the UI of various cloud services but I’ve never done one where I’m live coding. I wasn’t sure what to expect.

I’m generally happy with it. Sure, I made some silly mistakes (not including a core module, missing a semicolon, etc.) but that’s expected anytime when you’re coding.

Overall, I liked sharing my thought process and general experience with a new cloud service. What did you think?

Let me know on Twitter, where I’m @marknca.

Building a command line version of Apple's Live Text using Google Cloud

Research

References

What’s Next?

Read next

ChatGPT Delivers Ideas and Answers on Demand, If You Know How To Ask

AWS re:Invent 2022 Attendee Guide: Security

Accelerating innovation at AWS Security