No more time zone rest: Metadata API of the Proactive Mate Dape Health

In today's world, decline of data solutions all. When we built Dashboards and reports, one expects numbers to be shown when they are right and up to date. Based on these numbers, understanding is deducted and verbs have been taken. For any unexpected reason, if the broken dashboard or if numbers are incorrect – and then a Fighting Fire to fix everything. If the odds can be adjusted on time, then it harms an trust placed in the data group and their solutions.
But why can dashboards be broken or have wrong numbers? If the dashboard is properly designed, then 99% of the time the matter appears in Dashboard details – from the data storage data. Some possible situations are:
- Fewest ETL pipes, so the new data is not yet
- Table is replaced by another new
- Some columns on the table is abandoned or renamed
- Schemas on the top of the data has changed
- And many others.
There is still an opportunity that the issue is located in the table area, but with my experience, most of the times, it is always due to some changes in the data storage store. Or know the cause of the root, it is not always direct to start working on the preparation. Here There is no center When you can look at what sources of the table depends on some desk. If you have an additional Table Data Control, it can help, but from what I know is difficult to find depending on the SQL custom query used in data sources.
Nevertheless, an addict is very expensive and many companies do not have. Real pain begins when you have to pass through all the data sources by hand to start fixing it. Upon it, you have the users's head on your head waiting for the immediate preparation. The preparation itself may not be difficult, but it can only be time-consuming.
What if we can Expect these problems including Identify data sources found Before anyone recognizes the problem? Would that not be good? Yes, there is a way now by the table me Metadata API. The Metadata API uses Graphql, the language of the Apis question-language language restoring your favorite data. For more details on what is possible Graphql, check the Graphql.org.
In this blog study, I'll show you how to connect to the Tableaau Me Metadata API Using the Python's Table server client (TSC) a library to sincerely identify Data sources use certain tables, so you can do quickly before the issues arise. Once you know which resources of the Tableau Data contact is a specific table, you can do some updates or let them know the owners of the data about future changes.
Connecting in Beau Metadata API
Lets connect to the Toboau server using TSC. We need to import all the libraries we need to exercise!
### Import all required libraries
import tableauserverclient as t
import pandas as pd
import json
import ast
import re
To connect to Metadata API, you will have to start creating a personal access token in your Toboau account settings. And update the
Harm
with a newly made token. And update
For your Toboau site. If contacting effective, then “connected” will be published in the output window.
### Connect to Tableau server using personal access token
tableau_auth = t.PersonalAccessTokenAuth("", "",
site_id="")
server = t.Server(" use_server_version=True)
with server.auth.sign_in(tableau_auth):
print("Connected")
Let's Now Find out a list of all data sources published on your site. There are many qualities that you can download, but the case of current implementation, allowing it easy and acquired by ID, name and contact details of all data source. This will be our Lord's list that will come to all other information.
############### Get all the list of data sources on your Site
all_datasources_query = """ {
publishedDatasources {
name
id
owner {
name
email
}
}
}"""
with server.auth.sign_in(tableau_auth):
result = server.metadata.query(
all_datasources_query
)
As I want this blog to focus on which way to identify which data sources affected on a particular table, I will not enter taddata of the IPI. To better understand how the question works, you can look for multiple detailed tableau API documentation.
One thing to note that Metadata API returns the data in JSON format. Depending on what you want, you will save a list of many organized JSON and can get too cunning to convert this into pandas data. For the above-metadata question, you will end with the effect that you would like below (this data funny to give you the idea of what the results look like):
{
"data": {
"publishedDatasources": [
{
"name": "Sales Performance DataSource",
"id": "f3b1a2c4-1234-5678-9abc-1234567890ab",
"owner": {
"name": "Alice Johnson",
"email": "[email protected]"
}
},
{
"name": "Customer Orders DataSource",
"id": "a4d2b3c5-2345-6789-abcd-2345678901bc",
"owner": {
"name": "Bob Smith",
"email": "[email protected]"
}
},
{
"name": "Product Returns and Profitability",
"id": "c5e3d4f6-3456-789a-bcde-3456789012cd",
"owner": {
"name": "Alice Johnson",
"email": "[email protected]"
}
},
{
"name": "Customer Segmentation Analysis",
"id": "d6f4e5a7-4567-89ab-cdef-4567890123de",
"owner": {
"name": "Charlie Lee",
"email": "[email protected]"
}
},
{
"name": "Regional Sales Trends (Custom SQL)",
"id": "e7a5f6b8-5678-9abc-def0-5678901234ef",
"owner": {
"name": "Bob Smith",
"email": "[email protected]"
}
}
]
}
}
We need to change this JSON's answer to its data to facilitate working with it. Note that we need to issue the name and email of the owner from within the owner's item.
### We need to convert the response into dataframe for easy data manipulation
col_names = result['data']['publishedDatasources'][0].keys()
master_df = pd.DataFrame(columns=col_names)
for i in result['data']['publishedDatasources']:
tmp_dt = {k:v for k,v in i.items()}
master_df = pd.concat([master_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])
# Extract the owner name and email from the owner object
master_df['owner_name'] = master_df['owner'].apply(lambda x: x.get('name') if isinstance(x, dict) else None)
master_df['owner_email'] = master_df['owner'].apply(lambda x: x.get('email') if isinstance(x, dict) else None)
master_df.reset_index(inplace=True)
master_df.drop(['index','owner'], axis=1, inplace=True)
print('There are ', master_df.shape[0] , ' datasources in your site')
This is the way the building master_df
It looks like:
As long as we have the best listing, we can be the best and start finding names of the tables concentrated on data sources. If you are the user of the Tableau Avid, you know there are two ways to select tables at the table data table – one to create a relationship between them and another to use SQL relationships with one table or more to get a new table. Therefore, we need to face both cases.
SQL Question Tables
Below is a question to get a list of all custom SQs and their data resources. Note that I have confessed a list to get the first 500 first SQL questions. In case of a lot on your org, you will need to use Offset to get the next set of custom SQL questions. There is also an option to use cursor method when you want to download a big list of results (see here). Due to the simple, I just use offset method as I know, because there are less than 500 SQL questions used on the site.
# Get the data sources and the table names from all the custom sql queries used on your Site
custom_table_query = """ {
customSQLTablesConnection(first: 500){
nodes {
id
name
downstreamDatasources {
name
}
query
}
}
}
"""
with server.auth.sign_in(tableau_auth):
custom_table_query_result = server.metadata.query(
custom_table_query
)
Based on our mock data, this is how our outcome would look like:
{
"data": {
"customSQLTablesConnection": {
"nodes": [
{
"id": "csql-1234",
"name": "RegionalSales_CustomSQL",
"downstreamDatasources": [
{
"name": "Regional Sales Trends (Custom SQL)"
}
],
"query": "SELECT r.region_name, SUM(s.sales_amount) AS total_sales FROM ecommerce.sales_data.Sales s JOIN ecommerce.sales_data.Regions r ON s.region_id = r.region_id GROUP BY r.region_name"
},
{
"id": "csql-5678",
"name": "ProfitabilityAnalysis_CustomSQL",
"downstreamDatasources": [
{
"name": "Product Returns and Profitability"
}
],
"query": "SELECT p.product_category, SUM(s.profit) AS total_profit FROM ecommerce.sales_data.Sales s JOIN ecommerce.sales_data.Products p ON s.product_id = p.product_id GROUP BY p.product_category"
},
{
"id": "csql-9101",
"name": "CustomerSegmentation_CustomSQL",
"downstreamDatasources": [
{
"name": "Customer Segmentation Analysis"
}
],
"query": "SELECT c.customer_id, c.location, COUNT(o.order_id) AS total_orders FROM ecommerce.sales_data.Customers c JOIN ecommerce.sales_data.Orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.location"
},
{
"id": "csql-3141",
"name": "CustomerOrders_CustomSQL",
"downstreamDatasources": [
{
"name": "Customer Orders DataSource"
}
],
"query": "SELECT o.order_id, o.customer_id, o.order_date, o.sales_amount FROM ecommerce.sales_data.Orders o WHERE o.order_status = 'Completed'"
},
{
"id": "csql-3142",
"name": "CustomerProfiles_CustomSQL",
"downstreamDatasources": [
{
"name": "Customer Orders DataSource"
}
],
"query": "SELECT c.customer_id, c.customer_name, c.segment, c.location FROM ecommerce.sales_data.Customers c WHERE c.active_flag = 1"
},
{
"id": "csql-3143",
"name": "CustomerReturns_CustomSQL",
"downstreamDatasources": [
{
"name": "Customer Orders DataSource"
}
],
"query": "SELECT r.return_id, r.order_id, r.return_reason FROM ecommerce.sales_data.Returns r"
}
]
}
}
}
As at the first time we create a list of data resources, here and we have a nest JSON from low data sources where we will need to exclude the “word” part. For the “question” column, all of the SQL was condemned. When using the Regex pattern, we can easily burn the table names used in question.
We know that the torrent words remain after or join in a paragraph and usually follow the format
. This page
Is it also chosen and most of the times has not been used. There were questions that I had used using this format and ended only getting databas and schema, not the name of the full table. Once we have issued the names of data and names tables, we need to integrate lines at the data information source as many SQL questions are used at one data source.
### Convert the custom sql response into dataframe
col_names = custom_table_query_result['data']['customSQLTablesConnection']['nodes'][0].keys()
cs_df = pd.DataFrame(columns=col_names)
for i in custom_table_query_result['data']['customSQLTablesConnection']['nodes']:
tmp_dt = {k:v for k,v in i.items()}
cs_df = pd.concat([cs_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])
# Extract the data source name where the custom sql query was used
cs_df['data_source'] = cs_df.downstreamDatasources.apply(lambda x: x[0]['name'] if x and 'name' in x[0] else None)
cs_df.reset_index(inplace=True)
cs_df.drop(['index','downstreamDatasources'], axis=1,inplace=True)
### We need to extract the table names from the sql query. We know the table name comes after FROM or JOIN clause
# Note that the name of table can be of the format ..
# Depending on the format of how table is called, you will have to modify the regex expression
def extract_tables(sql):
# Regex to match database.schema.table or schema.table, avoid alias
pattern = r'(?:FROM|JOIN)s+((?:[w+]|w+).(?:[w+]|w+)(?:.(?:[w+]|w+))?)b'
matches = re.findall(pattern, sql, re.IGNORECASE)
return list(set(matches)) # Unique table names
cs_df['customSQLTables'] = cs_df['query'].apply(extract_tables)
cs_df = cs_df[['data_source','customSQLTables']]
# We need to merge datasources as there can be multiple custom sqls used in the same data source
cs_df = cs_df.groupby('data_source', as_index=False).agg({
'customSQLTables': lambda x: list(set(item for sublist in x for item in sublist)) # Flatten & make unique
})
print('There are ', cs_df.shape[0], 'datasources with custom sqls used in it')
After we made all of the above activities, this is a form of formation cs_df
It looks like:

Regular tables processing from data sources
Now we need to find a list of all common tables used in data data that are not part of the custom SQL. There are two ways to travel about it. Or use publishedDatasources
item and check upstreamTables
or use DatabaseTable
and check upstreamDatasources
. I will walk in the first way because I am looking for data quality Resources (basically, I want a specific code ready to reuse when I want to look at some data source for further data. Here again, because of the simple, instead of getting the full, I put each dasource to make sure I have everything. We find upstreamTables
Within a field object to be cleaned.
############### Get the data sources with the regular table names used in your site
### Its best to extract the tables information for every data source and then merge the results.
# Since we only get the table information nested under fields, in case there are hundreds of fields
# used in a single data source, we will hit the response limits and will not be able to retrieve all the data.
data_source_list = master_df.name.tolist()
col_names = ['name', 'id', 'extractLastUpdateTime', 'fields']
ds_df = pd.DataFrame(columns=col_names)
with server.auth.sign_in(tableau_auth):
for ds_name in data_source_list:
query = """ {
publishedDatasources (filter: { name: """"+ ds_name + """" }) {
name
id
extractLastUpdateTime
fields {
name
upstreamTables {
name
}
}
}
} """
ds_name_result = server.metadata.query(
query
)
for i in ds_name_result['data']['publishedDatasources']:
tmp_dt = {k:v for k,v in i.items() if k != 'fields'}
tmp_dt['fields'] = json.dumps(i['fields'])
ds_df = pd.concat([ds_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])
ds_df.reset_index(inplace=True)
This is the way the building ds_df
You looked:

We may need to soften fields
thing then release the names of the stadium and the tables. As tabernames will repeat multiple times, we will have to provide for storing unique.
# Function to extract the values of fields and upstream tables in json lists
def extract_values(json_list, key):
values = []
for item in json_list:
values.append(item[key])
return values
ds_df["fields"] = ds_df["fields"].apply(ast.literal_eval)
ds_df['field_names'] = ds_df.apply(lambda x: extract_values(x['fields'],'name'), axis=1)
ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_values(x['fields'],'upstreamTables'), axis=1)
# Function to extract the unique table names
def extract_upstreamTable_values(table_list):
values = set()a
for inner_list in table_list:
for item in inner_list:
if 'name' in item:
values.add(item['name'])
return list(values)
ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_upstreamTable_values(x['upstreamTables']), axis=1)
ds_df.drop(["index","fields"], axis=1, inplace=True)
When we do the highest jobs, the final structure of ds_df
They can look like such a thing:

We have all the pieces and now we have to join together:
###### Join all the data together
master_data = pd.merge(master_df, ds_df, how="left", on=["name","id"])
master_data = pd.merge(master_data, cs_df, how="left", left_on="name", right_on="data_source")
# Save the results to analyse further
master_data.to_excel("Tableau Data Sources with Tables.xlsx", index=False)
This is our last master_data
:

The analysis of the Table level
Suppose that there were some schema changes in the “Sales” table and you want to know what data resources will be affected. Then don't just write down a minor job checking if the table is available in any two columns –upstreamTables
or customSQLTables
As under.
def filter_rows_with_table(df, col1, col2, target_table):
"""
Filters rows in df where target_table is part of any value in either col1 or col2 (supports partial match).
Returns full rows (all columns retained).
"""
return df[
df.apply(
lambda row:
(isinstance(row[col1], list) and any(target_table in item for item in row[col1])) or
(isinstance(row[col2], list) and any(target_table in item for item in row[col2])),
axis=1
)
]
# As an example
filter_rows_with_table(master_data, 'upstreamTables', 'customSQLTables', 'Sales')
Below to extract. You can see that 3 data sources will be affected by this change. You can also warn the owners of the data source of Alice Nobob in advance to start working before the preparation of docks in the plain.

You can check the complete version of my GitUBR's Repairitory code here.
This is just another case – cases of the table Me Metadata API. You can also remove the stadium names used in custom SQL questions and add the database to receive the analysis of the field effect. One can look for the sources of a standing data with extractLastUpdateTime
To see if those have problems or require their maintenance if they are not working. We can also use dashboards
The item to download information from the Dashboard page.
The last thoughts
When you reached the distance, Kudos. This is one case for the default use of table data management. It is time to think about your work and think which of those jobs you can use to make your life easier. I hope this Mini project worked as a happy learning experience of the power of the power in Tangeau Metadata API. If you liked reading this, you might also love one of my blog spaces on a table, in some challenges I have dealt with when dealing with a hundred.
And check my previous blog when I checked the working app, the database-powered datable with Python, guidance, and SQLITE.
Before you go …
Follow me so you don't miss any new post I write in the future; You will find some of my other articles for mine. You can connect with me LinkedIn or Sane!