Share

Roboflow improves computer vision with PaliGemma 2

Roboflow was launched in 2020 with the goal of improving computer vision, which enables machines and computers to perceive and interpret images, videos, and camera feeds, similar to human vision.

To help accomplish its goal, Roboflow created a new set of tools to establish a quality computer vision workflow using PaliGemma, Gemma’s vision-language model (VLM), as one of its core models. PaliGemma 2 is now an essential component in Roboflow’s tool set, and is one of the more widely adopted models on its platform. This has driven Roboflow to significantly contribute to the model’s development.

The challenge

The Roboflow founders originally worked on creating their own computer vision applications for improving how developers apply computer vision to their problems. During their development process, the team found building and deploying computer vision models and apps built on them frustrating. The process lacked clear structure, relied on too much trial and error, and required them to code on the fly and use their own training data. Sharing work between teams and organizations posed challenges too, as there were no agreed-upon strategies or techniques for computer vision development. While computer vision has potential for near endless use cases, the number of people who could work with it was comparatively restricted.

Comparison of performance of LLMs in Bulgarian.
Comparison of performance of LLMs in Bulgarian.
PaliGemma ranks as both the fastest and most cost-efficient model in Roboflow’s optical character recognition testing.

The solution

The Roboflow team was determined to simplify and codify the process of creating computer vision applications by creating a developer workflow and toolset that simplifies the process for developers. Roboflow now offers a comprehensive suite of options for computer vision applications, including pre-made building blocks for ready-to-deploy solutions and advanced tools to create and train your own vision models.

An essential asset in Roboflow’s toolbox is the incredible power of PaliGemma 2 3B. Offering industry-leading accuracy, speed, performance, and unique features, PaliGemma is one of the preferred models by Roboflow’s customers. One of those unique features is that PaliGemma can be trained and run locally with proprietary data, enabling developers to create bespoke and private solutions without having to share their data outside of their company. This feature is one of the things that truly sets PaliGemma apart from other VLMs, according to Roboflow Marketing Lead Trevor Lynn. “Open VLMs are a total breakthrough for building multimodal applications for enterprises.”

Beyond the tools and workflows, Roboflow pursues its mission to “make the world programmable” by offering developers free educational resources. Roboflow’s blog features detailed walkthroughs on working with PaliGemma and other VLMs, and its developers consistently share detailed tutorials on channels like X and YouTube, helping improve the world of computer vision for all developers—even those outside of Roboflow’s ecosystem.

The impact

Today, Roboflow has over one million engineers using its toolsets, helping industry leaders make their businesses more efficient, saving valuable time and resources. For example, BNSF Railway, the largest freight railway in the United States, used Roboflow to build computer vision solutions like real-time inventory monitoring, improving safety inspections.

“Achieving positive results using AI in a lab environment is easy, but the real challenge comes when scaling the solution across a network like ours without disrupting day-to-day operations. Our partnership with Roboflow is allowing us to do just that.”

— Asim Ghanchi, AVP of Technology, BNSF Railway

175k

Pre-trained models available

1M

Developer users

575M

Images labeled using Roboflow

What’s next

Roboflow continues to expand its portfolio of tools and resources available to developers by offering new products and extensive updates to existing ones. Recently, the team launched the ability to label and review data for multimodal vision models using Roboflow Annotate, and also began releasing multimodal models for developers to download, edit, and train with.

These initiatives further Roboflow's commitment to advancing computer vision and empowering developers to build innovative solutions with models like PaliGemma. When asked about the future of computer vision, Roboflow CEO Joseph Nelson said, “I believe visual AI is a foundational technology that will transform every industry. Similar to how humans primarily experience the world with our sense of sight, the same will be true for computers and software in our lifetimes.”