Conversation with Ramon Navarro

Ramon Navarro is CTO and CoFounder of Nuclia, a startup that provides RAG on top of unstructured data bringing AI capabilities on top of documents, media and textual information. Ramon was the former CTO at Onna and CoFounder/CTO at Iskra.cat where he developed OSS frameworks to scale data knowledge management infrastructure. He also taught at Barcelona Tech University and is a member of the Plone Foundation.

This is going to be a conversation about Data, AI and Nuclia, I promise. But I cannot help starting our talk with a question about your other passion (music), which is in fact my passion too. During almost 4 years (2014-2018) you were a Diatonic accordion Player at torreta.cat. I am really curious to know the details about it

Good catch, music it's been my passion since I was 10 years old together with computers and math. Torreta was one of the first semi-professional bands that played traditional music with a modern view designed to make people enjoy traditional dances without any previous experience.

So cool. And are you still playing? 

Yes! Currently, I play in three bands: BaumaFolk, Giravolt, and my newest and most professional project, Groovin’ CAT. In Groovin’ CAT, I blend electronic and acoustic music and have begun incorporating some AI elements. All three bands aim to get people dancing. It’s part of a cultural movement that includes everyone, of all ages and from everywhere, to dance together in easy, fun, and traditional dances.

Love it. In fact, this was not planned, but I am going to ask you the same question I recently asked Marc Planagumà (aka DJ Kram), who also suffered one of these interviews. :) So, here it goes the question “What do you think about AI tools such as Udio or Suno, which are being used to “generate music”. I believe they could boost creativity…but at the same time they are getting criticism and some record labels, including Sony and Universal have filed lawsuits against them arguing they "steal music to spit out similar work"

There is a lot of fear in the music industry regarding these technologies. I’ve been part of many discussions about copyrights, and it seems we aren’t fully aware that we constantly borrow music, notes, and harmonies from what we hear. This is more of a legal issue, which isn’t my expertise. On the other hand, it’s clear that you can use AI services to write and compose music based on other songs (using tools like MIDI and Magenta). You can generate sounds by specifying what you want, create chord progressions, and even write songs using language models, allowing you to do everything needed to create music. For some existing music, this could serve as a replacement. However, music has two key elements that AI cannot provide: emotion and performance. These require a musician’s connection with the audience, whether through streaming or live performances. In summary, we need to learn how to use these tools to make our lives easier and, as artists, focus on the emotions, personal connections, and energy we want to convey.

After this little digression, let’s get into the meat and potatoes. First of all, you studied  Computer Science at Universitat Politècnica de Catalunya (UPC). Did you have a call as a programmer? 

Yes, when I was 10 years old, I used to write BASIC code on an old Amstrad with my dad so we could play games and have fun together. Even before starting university, I was already contributing to Debian back in 1996!

Nice. And your first work experience was also at UPC, right?

For me, the university is a place of knowledge, and I loved both sharing and learning, so it was a perfect match. I was very fortunate to have two mentors at the university, Sebas and Lluis, who guided me and allowed me to grow quickly within an amazing group of people. I worked on developing shared file systems, knowledge management applications, and infrastructure to manage computers. I also taught Python development and introduced the first Linux systems. Looking back, it feels like ages ago.

You have also spent part of your career teaching, programming on your own and even writing  a book about applications development using Mono and Gtk. You seem to be a very curious person. 

I often feel like I don’t know anything, which drives my desire to constantly learn more. However, this curiosity can be a problem when you need to focus on long-term research. I attempted to pursue a PhD and even published my first research article on open source as a potential research subject. However, my curiosity became a hindrance. Even my beloved professor once told me that I was too curious to pursue a PhD and questioned why I wanted to do it. He was right; the following week, I started my first company!

Funny, indeed. When did your programming skills intersect with the world of Data?

I’ve always been closely involved with the data world, managing large databases, knowledge bases, and infrastructure. My first major data-intensive project was building a large intranet for a world-class industrial company in 2014. This project required us to manage millions of documents, ensuring their security and processing.

In 2016, you joined Onna as CTO. Can you briefly explain what Onna offers? 

Onna was an eDiscovery platform that allowed you to connect all your enterprise data providers (such as Gmail, Drive, Slack, SharePoint, etc.), collect all the data your company had in those repositories, and provide a centralized search capability across all the data. It also offered specialized legal features like legal hold and data segmentation. Onna was recently bought by Reveal, a large eDiscovery provider in the US.

What were the main data-related challenges you had to deal with? 

The amount of data that a single employee can hold on all their source providers multiplied by the amount of people in the company makes the volume to gather and process massive. This is one of the big challenges, but the biggest one is to provide a real time search experience to all this data. Traditional eDiscovery platforms were designed for batch querying so the system is more focused on gathering. We were willing to deliver the same experience on knowledge management connected to the eDiscovery feature. The amount of data for large corporations was massive, imagine if GDrive and Dropbox deliver a short capability at search because it is not worth the costs, what does it mean to collect all the data and provide it!

The universe of “unstructured data” has significantly evolved during the last few years. From your perspective, which have been the main changes since your Onna days? 

When dealing with unstructured data, the first step is to extract meaningful information from it. This process has evolved significantly and continues to change. Nowadays, using document screenshots with models like Florence2 or Visual LLMs is effective, and extracting a JSON from text is a straightforward task with LLMs. In the past, we had to train numerous visual labels and layout detectors for this purpose. Also, if we talk about video/audio, nowadays STT capabilities are nearly perfect to extract in most available languages.

In 2019, you started your role as co-founder and CTO of Nuclia, a company offering RAG-as-a-Service company. Before getting into the specifics of Nuclia, I would like to get your own definition of RAG and why is it important?

Retrieval Augmented Generation (RAG) combines three state-of-the-art technologies in language models. First, it transforms information into a set of features, making the data discoverable and ready to be used by a language model. Second, it allocates pieces of knowledge across a large number of elements via semantic or hybrid search using vectors, text, and graphs. Third, it generates new information from the obtained data. In summary, RAG generates new data from an existing set of data. Unlike fine-tuning, which has a lower capacity for acquiring new knowledge and does not require a database, RAG excels in understanding the retrieved information and adapting to user queries. For automated tasks, RAG enables the connection of unstructured data to BI or structured sources.

Could you provide some details about the product and services Nuclia offers and the main benefits of your solutions?

Nuclia provides an API with three components required to deliver the best RAG for different use cases. You can use them with an end to end experience so you can focus on building your product on top.

- The processor which adapts unstructured data to be ready for RAG and extends extracted information with synthetic data, text, knowledge graph and vectors.

- NucliaDB an open source data framework designed to connect your data with multiple indexing strategies, vector search, knowledge graph, full text  and deliver multiple RAG strategies like multi-modal, hierarchical text blocks connection, agentic, … 

- Our predict engine that connects all available language models (from OpenAI, Google, Anthropic, Mistral to Huggingface OSS models) with your data providing traceability, metrics of accuracy and unifications of costs.

I would like to know how much your product evolved after the success and deployment of  large language models (LLMs). My understanding is the rise of (LLMs) boosted the vector-based approach, where documents and queries are encoded as vectors or embeddings

Before ChatGPT appeared, it was really difficult to explain what Nuclia was capable of. ChatGPT significantly helped in explaining the technology behind Nuclia. We were fully prepared with our stack back then, capable of delivering span-based answers using models trained with the Squad dataset. Our first HSNW implementation in Rust dates back to the early months of 2021. Once the power of these technologies was clearly demonstrated, it greatly boosted the business of semantic search and later, LLMs.

Which type of companies / industries could get more value from Nuclia? 

Any company that wants to provide answers, generate synthetic data, or connect unstructured data to structured systems (like BI/SQL) will find Nuclia extremely valuable. It allows them to focus on their product’s value proposition and deliver fast, scalable results. We have clients that are building chatbots, clients that want to deliver tooling for their commercial agents to answer faster, clients willing to inject on a database the results of a blood analysis and being able to generate reports, analysis of videos and be able to detect activity with the voice and the captioning of the frames... unstructured data is everywhere, from a simple text to a video.

How does your solution deal with image, video, or audio format? I recently saw Netflix developed  has its own approach based on contrastive learning 

Our approach is to convert all incoming data into language-based information and then transform it into vectors. Currently, our input comes from textual or JSON queries, and we are focused on that. Since NucliaDB supports multiple vector representations for the same information, we plan to include the ability to store vectors from visual and audio models in the future, allowing us to query them based on the input data.

As CTO of Nuclia, which are your main responsibilities and priorities? 

I lead the engineering, scientific, and product teams, and contribute to the technical sales teams to ensure our enterprise clients have the best experience and maximize our technology. At our stage, the connection between these responsibilities allows us to move quickly and adapt easily to new requirements. I am fortunate to have an excellent team that makes delivering Nuclia to our clients easier, stable and scalable. This support enables me to manage my tasks effectively and still find time to code every week.

What is the stack you use for Data Management?

We are an event-source based company with our own scheduler, multi cloud provider and private cloud deployment. The fact that we have our own open source database allows us to own all our stack and know how to scale.

What’s the role of synthetic Data in your offering? 

We use synthetic data in lots of different ways, provide the availability to extract synthetic data at ingestion time and store it in a different field for later usage, connect it to an external notification system or use it to fine tune an adapter for your open source LLM.

I recently read an article by Ben Evans about how Apple seems to be approaching Generative AI and LLMs. According to Evans, Apple considers generative AI, and ChatGPT itself, a commodity technology that is most useful when “1) Is embedded in a system that gives it broader context about the user (which might be search, social, a device OS, or a vertical application and 2) Unbundled into individual features (ditto), which are inherently easier to run as small power-efficient models on small power-efficient devices on the edge (paid for by users, not your capex budget) - which is just as well, because… 3) This stuff will never work for the mass-market if we have marginal cost every time the user presses ‘OK’ and we need a fleet of new nuclear power-stations to run it all”. This is an interesting angle on the debate about “A SuperIntelligence Solution to rule them all”. What do you think about it?

I agree with Apple’s concept of developing robust reasoning engines that can understand RAG (Retrieval-Augmented Generation) and connect them to specific use cases. This approach is more sustainable, secure, and scalable. While the idea of general intelligence is intriguing, I am uncertain about its usefulness and reliability. Like any significant advancement in human history, we need to find a balanced approach that truly benefits us. In my opinion, the major challenge lies in creating reliable models focused on reasoning and language understanding, with memory and knowledge served as a sidecar with RAG ensuring security, transparency, and traceability.

Looking ahead, how do you see RAG evolving in the next 3-5 years? And Artificial Intelligence? 

I foresee RAG becoming the dynamic memory and real-time environment that serves as the source of truth for facts used in reasoning. In physical world interactions, it will help understand the surroundings; in knowledge management, it will handle information within a product or company; in medicine, it will manage human medical records; and in law and government, it will oversee active rules and laws.

Currently, at Nuclia, RAG can connect an SQL database with a set of documents to enhance the reasoning process using any useful information. In a few weeks, RAG will enable the creation of step-by-step models that define where and how to implement RAG. In a few months, security and routing will be integrated into the process to control access to data. Ultimately, these models—comprising multiple specialized models—will combine with the data to create an AI agent capable of interacting with the real world, incorporating new data, and being specific to certain domains.

What comes later, beyond one year, is impossible to predict. The investments in AI are enormous, and we are only at the beginning of this journey.

Anterior
Anterior

Conversation with Jaume Civit

Siguiente
Siguiente

Conversation with Marc Planagumà