In this blogpost I show you my M-Python-function that I use to export data from Power BI to csv files (Export Python).
Why Python?
I prefer it to R mostly because I don’t have to create the csv-file(names) in advance before I import data to it. This is particularly important for scenarios where I want to append data to an existing file. The key for this task is NOT to use the append-option that Python offers, because M-scripts will be executed multiple times and this would create a total mess in my file. Instead I create a new file with the context to append and use the Import-from-folder method instead to stitch all csvs back together. Therefore I have to dynamically create new filenames for each import. So when the M-Python-scripts are executed repetitively here, the newly created file will just be overwritten – which doesn’t do any harm.
Caveats
- As with R, date-formats will not be recognized correctly when imported into the engines, so my function transforms all columns to text before passing the data frame to the Python-script.
- Single numbers (like 10) will be converted to decimals (10.0) by the Python-engine. When you import this file back to Power BI and transform the column to number, this decimal place will disappear again. But if you want to use the data somewhere else, you should be aware of this.
- Leading zeroes (“0001”) will be deleted (“1″). I haven’t found a way around this yet, apart from prefixing it with a string like a single quote (” ‘ “) and remove that when re-importing.
Function code
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
let func = | |
(SOURCE as table, PATH as text, optional mode as text, optional index as logical, optional header as logical, optional quoting as text, optional chunksize as number, optional decimal_as_point as text) => | |
let | |
Source = Table.TransformColumnTypes(SOURCE, List.Transform(Table.ColumnNames(SOURCE), each {_, type text})), | |
Path = PATH, | |
index = if index = null then "None" else "True", | |
header = if header = null then "True" else "False", | |
quoting = if quoting = null then "csv.QUOTE_ALL" else quoting, | |
chunksize = if chunksize = null then "None" else chunksize, | |
//Edit by RonaiBertalan (https://gist.github.com/RonaiBertalan): | |
decimal_as_point = if decimal_as_point = null then "'.'" else "','", | |
CleanedPath = Text.Replace(Path, "\", "/"), | |
Custom1 = Python.Execute("# 'dataset' holds the input data for this script#(lf)#(lf)import csv #(lf)import pandas as pd#(lf)path = r'"& CleanedPath &"'#(lf)#(lf)dataset.to_csv(path, mode = 'w', index = " & index & ", header = " & header & ", quoting = " & quoting & ", chunksize = " & chunksize & ", decimal= " & decimal_as_point & ")",[dataset=Source]) | |
in | |
Custom1 , | |
documentation = [ | |
Documentation.Name = " Table.ExportCsvPyhton.pq ", | |
Documentation.Description = " Exports table to csv using Python-script. ", | |
Documentation.LongDescription = " Exports table to csv using Python-script. Defaults to: index-None, header-True, quoting-None, chunksize-None. ", | |
Documentation.Category = " Table ", | |
Documentation.Source = " ", | |
Documentation.Version = " 1.1: QUOTE_ALL ", | |
Documentation.Author = " Imke Feldmann: www.TheBIccountant.com. ", | |
Documentation.Examples = {[Description = " ", | |
Code = " ", | |
Result = " "]}] | |
in | |
Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation)) |
I’ve included some optional parameters, whose default values are this:
- index: None, which means that no additional index-column will be created during export. Any value that you fill in here will make an index-column be created.
- header: True, which means that by default the header-row will be created. Again, any value in my function parameter will make that header disappear.
- quoting: csv.QUOTE_NONE, but you can use other options as well (see the docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html) .
- chunksize: none, but you can use any number to chunk up the export process.
Also the slashes in your filenames will automatically be reverted from “/” to “\” so that you don’t have to take care of this manually.
At the time of writing these scripts can be refreshed in the service with the personal gateway, but not with the enterprise gateway yet.
Wondering how to apply the function? Please check out this video on how to export data from Power BI to csv using the Python script:
Enjoy & stay queryious 😉
Pingback: Combining M and Python To Export Power BI Data To CSVs – Curated SQL
Thank you for this awesome function to write CSV-s! I just want to add that it took some time for me to figure out that the input to the path parameter should look like this: C:\Data\filename.csv
also I think it can’t work with Anaconda because pandas does not have the right to call numpy
– “Missing required dependencies {0}”.format(missing_dependencies) –
so I had to install Python 3.7 and install pandas. Sincerely: Bertalan
Thanks for the heads up!
Thanks.. where is the code? I don’t see any M or Python codes on this page.
The code is a link to a Gist on GitHub. You might try a different browser to show it on the page. But just in case, here is the direct link: https://gist.github.com/ImkeF/9e30713789aa50e7e52c4a161af624a7
see on line no 15
Pingback: Export large amount of data from Power BI desktop visuals –
I tried this calling a View on my azure sql server but didn’t get through. It gave me: “Formula.Firewall: Query ‘Invoked Function’ (step ‘Source’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.”.
So, I copy-pasted the View code into a DirectQuery in PBI but still gave the same error. I’m assuming it could be because my View source code has multiple sub-queries, unpivots, etc.
Just wondering if these considerations were made during the design?
I usually disable the security settings for those applications. Not sure how it works with DirectQuery though.
Whenever we try to put this function code above in our python script editor in power bi, we get the syntax error: ”’PythonScriptWrapper.PY” line 20
let func =
^
Does anybody know how to solve this?
Thanks in advance!
Thanks for your reply, however in the video they use the m-code for a blank query/table. In my situation we have a table that is a result from different powerquery’s and because of that we can’t select it as a ”SOURCE” when we need to fill in our parameters. Do you have any idea how to fix this?
Not sure if I understood your problem, but I added a video that describes the process: https://youtu.be/GXa9FA3xYAU
Hope this helps?
When we want to put in the m-python code in the advanced query editor it doesn’t work because the table we want to export is derived from a numerous of other queries. So when we use the m-python code we get a syntax error.
If my understanding is correct, an intermediate table would be a good solution in that case:
Create a table that consolidates what shall be exported “from numerous other queries” to one table before feeding it to the script.
/Imke
Thanks for replying! If i make an intermediate table using DAX then it isn’t possible to take that table to powerquery editor right?
Yes, once you’re in DAX, there is no way back to Power Query.
But if your table is not too large, you could run the Python-script from a visual instead (check out documentation about the current sampling figure !!). You have to use only Python in there. A very simple implementation without any parameters would look like so:
import csv
import pandas as pd
path = r'C:/YourFilePathGoesHere/TestExport.csv'
dataset.to_csv(path)
Where “dataset” is referencing the fields that you’ve dragged into the visual.
But again, sampling might occur!!
When we use this as a python visual it works perfectly in the desktop version but when in power bi service we get the error that it cant find our path location. Could you please upload a video or explain how we can get this working in the power bi service because we keep getting
I have a simple table in Power bi and i’m trying to export the data in it to excel using this code. I have kept the privacy setting to Public and still receive the error “Formula.Firewall: Query ‘Invoked Function’ (step ‘Source’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.” Could you please tell me how to resolve this?
Hello This function works very well for me but I have an issue I have in one column this text 12/5-1 that I want to keep as a text, but when opening the .csv this text is automatically convert into a date 12/05/2001 how to solve this issue
Hi…I am getting below error after pasting the code in Advanced editor:
An error occurred in the ‘’ query. Expression.Error: The name ‘Python.Execute’ wasn’t recognized. Make sure it’s spelled correctly.
Does someone knows how to solve this?
Thanks in advance!
I get this error when I tried to run the query
Formula.Firewall: Query ‘Invoked Function’ (step ‘Source’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
I searched the error and it suggested setting up a staging query which I did but it still does not work. Any ideas?
This is really useful information, and I got the export to csv functionality working. How would we take this script and customize it to export to Excel (xlsx)? I see there is a function in the Pandas library, but I’m struggling to get the syntax right in the Power BI editor.
Any help would be very much appreciated. Thank you.
Hi Riley,
I don’t have a xlsx-script at hand, so would have to google it as well.
Are you sure that Export to Excel is what you need? How about a Power Query in Excel that imports the csv-files? (then you could do some consolidation as well.
/Imke
Hi Imke,
I thought about that as well, but that actually increases the complexity of what I’m trying to do. My goal is to run a number of queries in Power BI for populating mail-enabled security groups locally (the syncing of the csv files to AD groups will still take place with a PowerShell script). Some of the query-based security groups require exceptions, and we’re managing that with a small PowerApp that writes to a CDS entity.
For example, we have a query that pulls all doctoral students from an Oracle database. We have a couple of student examiners that need to be on that list as well. Since they are an exception to the query, we have them added to the CDS entity. The PBI queries bring the CDS data together with the Oracle data to output the finalized CSV files. Lastly, I want an Excel workbook that has all the data appended back together for use as a data source within PowerApps. This is why I’m trying to get the Python script to export to xlsx. I know I could do it with another PQ and point it at the folder containing all the CSV files, however; this process already has a lot of moving parts, and I would prefer to not setup yet another scheduled report in the PBI service.
Understood – can understand that you want to have it in Excel.
You might want to check out this link: https://datatofish.com/export-dataframe-to-excel/
/Imke
I figured it out late yesterday, and that site actually helped me as I was searching for an answer. The main thing that I was missing was that I also needed to have the Openpyxl Python library installed. Once I did that, the to_excel function worked as expected.
Thanks again for the help on this one, and I hope this helps someone else.
Thanks for the feedback!
Thank you very much for the useful code snippet. I would very much like to use this in Power BI to automatically back-up some (very large) data tables as the last part of the queries that create them. Then, instead of pulling all data from our API, I would like to load the locally saved data into Power Query again, and merge it with the newest data pulled from the API. So, in effect an incremental data refresh instead of loading everything every time. Could you please guide me in the right direction? Cheers, Mike
Hi Mike,
please check if this suits your needs:
/Imke
does this export any amount of rows? or does it have the normal PowerBI export?
It doesn’t have the Power BI limits, but is only limited by the capacity of Python.
Another fast way for ad hoc exports is to use DAX Studio:
Hi Imke,
Have you ever tried to get this to work from the service using schedule refresh?
It seems there are several issues besides the need for personal gateway to be set up on the same machine. Python is not supported via enterprise gateway currently.
Cheers
Pat
Hi Patrick,
I have been running it in the service just for testing purposes.
Yes, the limitation with regards to enterprise gateway is not very encouraging, unfortunately.
/Imke
Thanks Imke,
FYI We have an active case with MS PBI support regarding the loss of credentials to the python data source via the personal gateway. Desktop runs the script locally and writes the csv fine. Looks like they are chasing something that it out of sync between the service and the local machine via the GW.
Cheers
Pat
Thanks for the warning, Pat!
Hi Imke,
Thanks for the code, I’ve implemented it successfully for a couple of months now.
Now I’ve come to a point that the exported csv file is quite big.
Is there a way to export the output in several CSV’s based on row count?
Thanks again,
sunnysideup
Hi sunnysideup,
haven’t tried that.
One could create a nested table in Power Query and add a column to call this function or try to find a Python-script to do the split.
This depends on the actual bottleneck/reason that the increasing size brings.
/Imke
Hi All,
Is it possible to use Sharepoint folder as a path without mapping this Sharepoint as drive? Any advice how to do it correctly?
Thanks in advance
Hi,
It is possible to save the file in onedrive?
One important issue is that if you find yourself searching for a education loan you may find that you will need a cosigner. There are many circumstances where this is true because you should find that you do not have a past credit score so the loan company will require that you have someone cosign the credit for you. Thanks for your post.
Hi, can this code be added to power query in excel?
what steps should i follow?
Thanks
Hi, maybe this will help. “Power Bi Exporter” addon seems to do the job without a need to use python.
It is very helpful
An error occurred in the ‘Query1’ query. DataSource.Error: ADO.NET: A problem occurred while processing your Python script.
Here are the technical details: [Expression.Error] We cannot convert a value of type List to type Text.
Details:
DataSourceKind=Python
DataSourcePath=Python
Message=A problem occurred while processing your Python script.
Here are the technical details: [Expression.Error] We cannot convert a value of type List to type Text.
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonUnexpectedException
My dataset having lots of null values as well as have list in one column can we export ignoring the datatype
can we convert power query all step to python code so i can use in other program
hi there the chunk part does not work as it suppose to. I got the following error when invoking the function
An error occurred in the ‘Python Extract To Text’ query. Expression.Error: We cannot apply operator & to types Text and Number.
Details:
Operator=&
Left=# ‘dataset’ holds the input data for this script
import csv
import pandas as pd
path = r’C:/Users/236473/Downloads’
dataset.to_csv(path, sep = ‘|’, index = False, chunksize = ‘
Any idea?