Business Intelligence

Is Conversational Analytics the End of Data Analysts?

Oct 18, 2024

Back

Logos for "Gemini" and "Looker" on a light blue background, symbolizing a collaboration between the two platforms - focus on conversational analytics within Looker, enhanced by Gemini's AI.

The first time we read the documentation on conversational analytics, several thoughts came to our mind. Naturally, your initial reaction might be either excitement or skepticism.

“Conversational Analytics relies on your LookML models to understand how to query your data”. According to Google, if the LookML expert hasn't properly constructed the semantic layer, the responses generated by Gemini could be inaccurate.

Diagram illustrating how a "semantic layer" and "quality data" feed into "Gen AI," represented by a happy emoji, symbolizing that structured data and clear definitions are essential for effective AI-generated insights.

Moreover, Google explicitly states that “as an early-stage technology, Gemini in Looker can generate output that seems plausible but is factually incorrect.” This means end users should not automatically trust the results provided by Gemini.

For this reason, it’s crucial to validate the results, but with whom? The most obvious answer is, of course, the data analyst.

So, you might wonder: what’s the point of having a conversational analytics tool if, in the end, a data analyst needs to verify the results (which means they’ll probably run the query themselves anyway)? AI has to be seen as a co-pilot, not an auto-pilot — AI is here to work WITH us, not to replace us. Generative AI is powerful, but its real value is supporting and boosting human decision-making.

Testing with a BigQuery table

We decided to start with a simple test:

→ We connected a table from BigQuery (without using any semantic layer) to see what kind of information and analysis it could provide. Our model grouped data on the traffic of Astrafy’s website.

First, it suggested some relevant questions! This aligns with the documentation, which mentions the “suggested questions” feature. However, depending on the company’s data maturity, something as basic as the number of unique sessions should already be displayed in a global traffic dashboard by the data analyst. For those unfamiliar with traffic data, a “session” corresponds to a user’s visit to a website.

A sample prompt list titled "What questions can I ask?" with example queries for data analysis, such as finding unique sessions, user IDs, top sub-continents by user count, and channel grouping with the highest engagement.

Since this tool is designed for business users and not technical experts, we decided to ask questions from a non-technical perspective.

What is the number of sessions in the last 3 months compared to the previous 3 months?

A session trend analysis displaying the number of website sessions in the last three months (12,300) compared to the previous three months (10,541). The calculation details include date ranges and data columns used, highlighting steps taken for clarity and transparency in trend comparison.

Where are the problems?

We immediately encountered some issues with those business-related questions. The tool defaulted to showing a bar chart, but we needed to change it to a table because the bar chart wasn’t appropriate. However, the bigger problem resided in the calculation logic. We asked for the number of sessions, but instead of a clear answer, it calculated the “sum of date_date” (date_date is the field that provides the date of a session.) Now, what if we had duplicate rows? As a data analyst, your instinct would be to use a distinct count of the session field instead of summing a date field, which doesn’t even make sense in this context.

In fact, in this table, the session isn’t even the primary key. Here is a clear red flag: business users might get results, but if they dig deeper into the calculations, they’ll find mistakes.

After running our own accurate query, it turns out that the answer given by Conversational Analytics was incorrect. And the question wasn’t that difficult. Here’s the correct result for the last three months:

SQL query result showing a distinct count of sessions over the last three months, yielding 10,570 sessions. This count is derived from the session_id field in the dm_sessions_pages table, with a date filter applied to the last three months.

Testing with a LookML model

→ Next, we tested using a LookML model connected to the same table. In this model, each field is documented with descriptions, and we added predefined measures to facilitate the visualization process. This provides additional context for generating more accurate and meaningful insights.

We asked the same question, phrased the same way, to keep things consistent.

What is the number of sessions in the last 3 months compared to the previous 3 months?

A session trend analysis shows the number of sessions from August to October 2024, grouped as the "First Period," totaling 8,500 sessions. However, data for the previous three-month period is missing, affecting the trend comparison. Detailed calculation notes indicate filtering and grouping settings used in the query.

The result was better, but still not quite what we wanted. Instead of using the same date range, it selected data from 2024–08–01 to 2024–10–31, and it couldn’t calculate the previous three months as we had requested. That said, the result for the selected date range was correct:

SQL query result showing a distinct session count for dates between August 1 and October 31, 2024, with a total of 8,646 sessions. This query filters by date to calculate session totals over a specified timeframe.

Semantic Layers to the rescue

Semantic layers translate data used by computers into language that humans can understand; It’s that simple, and it’s exactly what a LLM needs to perform well. Throwing datasets and tables to a LLM has proven to be a terrible idea that leads to poor results (see simple demo above).

Strong data modeling with semantic layer that defines thoroughly your different metrics is the only way forward to enable Gen AI on your data. And actually having strong foundations for data modeling coupled with a semantic layer will support all your data initiatives, not only the AI ones. Your BI dashboards will also benefit from this semantic layer. Gen AI however will accelerate this transition to semantic layer adoption and for the best. With a well-built data model that defines the business terms and semantics, your LLMs and AI tools will be able to correctly share data-related information.

A flowchart showing the process of querying through a semantic model. The steps include defining metrics in the semantic model, sending a query from the client (e.g., BI tool), translating it to SQL by the semantic query engine, retrieving data from the data warehouse, and delivering results back to the client.

Benefits for business users:

More trust in Data: the same results, everytime.
Freedom to choose: Use any integrated analytics tool and let the semantic layer get the data for you.
More data equality: Everyone, technical or not, has access to metrics

Benefits for the data team:

Improved governance: Metrics are defined centrally in code, and changes are logged
See metric lineage: Anticipate downstream impact with a central view
Increased efficiency: Spend less time re-defining and troubleshooting metrics

Gemini in Looker

One point of confusion for many users is that this feature is currently available only in Looker Studio. But what about Looker itself? According to the documentation on Gemini in Looker, the AI assistant is available exclusively for Looker Core, and its functionality is more focused on generating visualization charts or assisting with LookML modeling, rather than providing conversational analytics. This means that, for now, users need to rely on extensions like the Looker Explore Assistant to achieve something similar to Conversational Analytics, as it is not yet natively integrated into Looker.

A table listing Gemini's features in Looker, detailing where users can access each feature (e.g., for Looker visualizations, modeling parameters, data queries), and instructions for enabling these functions. The table specifies areas in Looker Studio and Google Cloud Core where these features are available and configurable.

https://cloud.google.com/gemini/docs/looker/overview?hl=en

However, Google Cloud is deploying Gemini in all its products and Looker is no exception. While the feature is in early-stage, the future of it looks bright and everyone should embrace it and not fight it. It will continue to evolve at a very fast pace and we will keep updating you via articles and LinkedIn posts on the different evolutions.

Conclusion

In my quick and simple test of the Conversational Analytics feature in Looker, it’s clear that the tool is still a work in progress. There’s a lot of room for improvement before it can be used independently and reliably for day-to-day business analysis.

If you want to use it effectively, your LookML models need to be specifically designed for such tasks, avoiding unnecessary complexity so that non-technical users can ask questions and get the most accurate results. However, even then, errors can occur if a question is phrased differently than expected.

This is why Conversational Analytics must be used cautiously, and why data analysts will always be needed to validate results with a simple SQL query 😉.

Before jumping into AI, you have got to get your data right. Without solid data pipelines, even the best AI solutions won’t deliver (garbage in, garbage out). This is our focus at Astrafy — building the foundations so AI can truly thrive.

Thank you

If you enjoyed reading this article, stay tuned as we regularly publish articles on Looker. Follow Astrafy on LinkedIn, Medium, and Youtube to be notified of the next article.

If you are looking for support on Modern Data Stack or Google Cloud solutions, feel free to reach out to us at sales@astrafy.io.