Simplest Way to Run Python Scripts on PC?

Simplest Way to Run Python Scripts on PC?

Author
Discussion

paulrockliffe

Original Poster:

15,998 posts

234 months

Monday 3rd June
quotequote all
I have a PySpark Notebook that is sat running in a Microsoft Fabric trial environment, it was dead simple to setup, just a bit of Google and Chat GPT, no messing with Python as it all just runs natively in a Notebook. It's nothing complicated, it just runs an API thingy to grab some numbers to a schedule and punts both a .parquet and a.csv file to a folder.

Unfortunately Microsoft look like they're going to take my free toys away from me and I want to quickly replicate the process on my PC in the short-term so I'm not left with any gaps in the data.

What's the bestish approach for me, a person that doesn't know Python and is allergic to command lines after being badly scared by a Linux server some years ago? Key features are a) a Notebook type IDE so I can see what's going on when I inevitably have to debug the code when I move it over and b) easy to schedule the Notebook runs?

Thanks!

theboss

7,122 posts

226 months

Monday 3rd June
quotequote all
I'd defer to any actual developers round here but this might be a useful starting reference

https://code.visualstudio.com/docs/python/python-q...

ATG

21,361 posts

279 months

Monday 3rd June
quotequote all
If you want to run Python, you download python from python.org

But ...

It sounds like you actually need to have Spark running on your PC and you then want to use its Python API to tell spark to do stuff, and that's not the same thing at all as getting a python script to run on your PC.

paulrockliffe

Original Poster:

15,998 posts

234 months

Monday 3rd June
quotequote all
I'm using PySpark because that's what works easily in Fabric, but what I'm doing is so basic that it doesn't need to be Spark, I can just ask Chat GPT to port the code to pure Python. I have Python installed, but I don't want to use Command Line with it, I'm sure I could work that out, it's just another layer of stuff I need to learn and I'd rather ride my bike while it's not raining and it's really inconvenient that Microsoft look like they've stopped just rolling the Fabric Trial over!

So, like Jupyter notebook and Jupyter Scheduler, is that can I make it all work just using that over the top of my Python Install, or is there a better way?

Flippin' Kipper

638 posts

186 months

Monday 3rd June
quotequote all
pycharm and a virtual environment, you can then run your code using the venv either via the pycharm cli or via a play button.

ATG

21,361 posts

279 months

Monday 3rd June
quotequote all
Jupyter is a good solution. PyCharm is a nice IDE. If you're used to using the notebooks, I'd start with Jupyter. If you don't care about producing a parquet file, you certainly don't need to go anywhere near Spark just to scrape a website. Generating Parquet files outside Spark with Python is pretty shonky last time I looked into it.

paulrockliffe

Original Poster:

15,998 posts

234 months

Monday 3rd June
quotequote all
Thanks, I tried to get Jupyter to look at earlier, but the website wouldn't load, so I went back to my actual work.

I don't need Parquet files, that was just for learning some stuff that's more specific to Microsoft Fabric than anything else. They work really well there as I understand things.

Anyway, I need .csv files or .json files for backup and I'm going to bounce my data source away from Fabric and on to an Azure SQL Database as I have a free one of those that will do this job fine, so I'm going to need to work that part out as well, but number one priority is making sure I don't have any gaps in my data first.

ATG

21,361 posts

279 months

Monday 3rd June
quotequote all
If your target is a database floating around in an Azure cloud it might also be worth seeing if there is a simple way of running your web scraper in Azure as well. I haven't tried monkeying around with Azure, so I don't know. Come to think of it, the Msft Fabric thing you mentioned is probably doing just that.

The cloud providers tend to sucker you in by giving you free access to their platforms, and then, just when you've got something simple but useful working, they start trying to bill you for it, or they take all the toys away. And because you've actually been playing in their proprietary world, even if using open source frameworks like Spark, it is a royal pain in the arse to migrate to some other cloud framework.

For what very little it is worth, if you're running your screen scraper locally, I'd store its output in a local database too and maybe back that up to some dead simple cloud file server on the basis that moving between cloud file storage solutions in order to avoid fees is generally easy, whereas migrating between databases is more of a pain in the arse. But I'm a developer/tinkerer so my threshold tolerance for running local stuff may be higher than yours, and you may need database access to your data from more than one location, etc., etc.

The more proprietary technology you use from a vendor, the more of a hostage you become to that vendor. The more stuff you decide to do for yourself, the bigger the up-front and maintenance burdens you take on, but you gain flexibility and the cost is your time, not cash. Where to strike the best balance varies from person to person and problem to problem.

paulrockliffe

Original Poster:

15,998 posts

234 months

Thursday 6th June
quotequote all
ATG said:
If your target is a database floating around in an Azure cloud it might also be worth seeing if there is a simple way of running your web scraper in Azure as well. I haven't tried monkeying around with Azure, so I don't know. Come to think of it, the Msft Fabric thing you mentioned is probably doing just that.

The cloud providers tend to sucker you in by giving you free access to their platforms, and then, just when you've got something simple but useful working, they start trying to bill you for it, or they take all the toys away. And because you've actually been playing in their proprietary world, even if using open source frameworks like Spark, it is a royal pain in the arse to migrate to some other cloud framework.

For what very little it is worth, if you're running your screen scraper locally, I'd store its output in a local database too and maybe back that up to some dead simple cloud file server on the basis that moving between cloud file storage solutions in order to avoid fees is generally easy, whereas migrating between databases is more of a pain in the arse. But I'm a developer/tinkerer so my threshold tolerance for running local stuff may be higher than yours, and you may need database access to your data from more than one location, etc., etc.

The more proprietary technology you use from a vendor, the more of a hostage you become to that vendor. The more stuff you decide to do for yourself, the bigger the up-front and maintenance burdens you take on, but you gain flexibility and the cost is your time, not cash. Where to strike the best balance varies from person to person and problem to problem.
Thanks, yeah, I'm using PySpark Notebooks in Fabric because it's the Azure thingy that works best - Fabric is Microsoft rolling lots of Azure things into a wider version of Power BI that covers everything from source data through to the reports users access in one easy to manage environment.

The problem I have is that this is a learning thing and I'm on the Microsoft Developer Programme which gives me a full M365 tenant to play in, but is restricted in that I can't buy anything in the tenant. There isn't a free Azure option for grabbing the datas, not that's always free anyway. I did my learning in Fabric knowing I'd have to move it eventually, that day has arrived!

In the meantime Microsoft ran an always free Azure SQL server offer, which has enough capacity for this, so I have a free Azure place to put my data that works with my free Power BI setup, I'm just missing a free method to get my data there in the first place.

I have an UnRaid server that has an SQL server running on it, so whatever I do with this, when I get it punting the data to the Azure SQL server, I'll have it punt it over to UnRaid too. That was my plan before the free Azure SQL Database thing was possible, but the Azure setup is a little simpler to feed the data into Power BI as it doesn't need the Power BI Gateway setting up, though now I think about it, I think I've already done that....

Not sure what I'll do if I lose my developer tenant, I guess I'll run it all in Power BI Desktop and the finally be forced to learn R or something.

Anyway, Jupyter.org finally loaded up and getting Python and Jupyter setup was trivial, so I've started looking at converting my code. What I didn't realise is that Jupyter Notebooks isn't an application as such, it's a browser thing, so rather than starting on my PC I might as well get it setup on my server now rather than later as it looks a lot simpler than I was anticipating.