The Data Scientist’s Toolbox (and other nick nacks)

Although I started with Google’s “Crash Course in Python”, I’d recommend this course as a first foray into the topic. It sets you up with many of the key tools you need to start exploring and version-controlling your work, including the applications you’ll need to learn Python and R.

This John Hopkins University course is included with Coursera Plus, which alongside my water bill and cat are probably the best value for money items in my life (both my wonderful husband, and the Economist magazine represent exceptional value as well, but come with a significantly higher outlay).

The course is a little older and offers an automated voice-over, but this is excellent for listening on double speed. Week 1 covers general definitions, processes, and troubleshooting methods – nothing particularly groundbreaking but good to start with and easy to get through. Week 2’s exploration of R and RStudio is well-paced and insightful. It really helped set things up, and I wish there were a Jupyter Labs version of this (maybe there is, but I haven’t found it yet). It’s still quick, easy, and intuitive to get through, and I encounter no bugs.

Week 3 is significantly more intense, and either the course is outdated or somehow my setup isn’t quite right. It takes me quite some time to resolve the issues around the SNV directory, and my tutor guides me through downloading VisualSVN Server. We both feel like this should have been more straightforward, but solve it and move on.

Markdown in week 4 also presents opportunities to troubleshoot when RStudio returns error messages to let me know that Latex hasn’t been installed. This isn’t just a simple pip install and takes some time to resolve; this time by installing basic-miktex-23.4-x64. Finally, I knit together my first R markdown into PDF and move on.

We then delve into a cursory but refreshing overview of stats techniques and move on to experimental design. More detail would have been welcome, but the course does offer a few amusing links and comments. I loved the Fivethirtyeight playground designed to allow punters to hack their way to a desired p-value. Might sound awful to academics, but after 20 years of requests to cherry-pick data for executive presentations, I just have to smile.

Finally, the course finishes off with a peer graded assesment. First two items are ludicrously easy – take a screenshot to show you’ve downloaded and installed RStudio and R, and then create a gibhub repo. Third requests you to push a local text file to the github repository. It takes a while to get Gitbash to cooperate. Note to others: never put spaces between your file or folder names (and don’t forget the double apostrophe “”). Last stop is to fork a data sharingrepository and share the link. Finally, students need to grade 3 other student submissions. Hopefully mine will be done soon, and I’ll be able to sign off my first course!

To Jupyter Moons and beyond…

First challenge – 79 moons (at least) on Jupiter inspire a 79 day challenge. My goal is to achieve week of a Coursera or Deep Learning. ai course each day for 79 days (or more!) I will also do my best to redo each lab with a new dataset or different application, and post about that here. Also I will summarise how I went (and problems encountered) as I go.

First up, shopping spree. I secured myself a gorgeous little Microsoft Surface, which almost feels like an extension of my hands, even more so than my mobile device. I also prime’d a Shokz OpenRun Bone Conduction Sports headphone for listening to lectures on the go. Somehow I need to figure out how to make this work on a daily basis amidst a full-time corporate job and my family (including our very intense 9-month-old infant) and trying to lose all that stubborn baby weight.

A simple but helpful trick I’ve discovered is to block my direct corporate stakeholder calls into 8 calls over 4 hours from 6am to 10am and type meeting notes during the calls religiously (no AI transcripts allowed at our company, unfortunately). I find myself so much more motivated and engaged with shorter meetings, more specific outcomes, and the fact that half my work day is over by 10am. 10am I hit the gym down the road and am back online to study just after 11. I also do a similarly clustered afternoon session, with some blocks targeting my reporting, management deck updates and call follow-ups. Wherever possible I do a 30 minute study session or call on the stationary exercise bike (my trusty Schwinn). It certainly feels far more productive than previously, and I have so much more energy now I’ve learning something I’m passionate about. Finally, I’ve replaced my evening K-drama or wuxia hit with some of the less involved lectures, even if just an hour. And whilst it’s not Condor Heroes, Andrew Ng’s material is pretty entertaining. This gives between 2-5 hours daily for learning.

Support and study groups are also a key part of getting ahead and taying motivated. I find myself a tutor who just graduated from a top US university in mathematics and machine learning, plus another 3rd year student to practice business applications with as I go. My fellow Deeplearning.ai students and I also set up a Whatsapp group to share reources and lab case studies. I’m excited to hear from the viewpoints of a diverse array of students across countries, ages and industries and hope the group momentum kicks off.

Finally, I’ve approached one of my husband’s friends, who leads a cutting-edge data science team at a fast-growing eCommerce company. He agrees to ask to be a mentor, chatting every couple of weeks about direction, real-life applications, and industry pain points.

Next, start learning. I decided to start by familiarising myself with Python and enrolled in 2 Coursera crash courses on Python. I’ll post about each course in a separate blog. GTP also helps with debugging and explaining my mistakes but know I’ll need to understand the concepts to repeat on my own. Finding that balance is going to be an interesting journey and indicative of times ahead.

Turns out, one needs to set up their environment to work on these courses, and it’s not as straightforward as hoped. Here’s a list of what I’m starting out with:

  • Jupyter notes for data projects and labs (NOT the lite version, unless you are in search of more problems than Jay-Z)
  • Gitbash (for talking to the terminal instead of using the CMD prompt)
  • Visual studio: yet to figure out how exactly this differs from Jupyter labs, but will update when I do
  • RStudio: useful for stats. Had a little issue finding the SVN link until my tutor advised downloading VisualSVN Server. No issues connecting with Github after this. The other issue with R studio was that Latex wasn’t installed – and it wasn’t obvious to me that the solution was to download basic-miktex-23.4-x64. Anyway, after a frustrating 30 minutes on stack overflow, I managed to ‘knit’ a dummy R markup file.
  • Chat GPT professional subscription, plus I load my credit card in the API section. Fingers crossed I’ll figure out how to use it soon.
  • A kaggle account
  • A Github account (I connect this with my RStudio) and Gitbash
  • I try to sign up for Amazon SageMaker and Google Cloud but neither is easy to navigate for non techies. Azure looks very tempting but I think I’ll save the enticing 30 day trial until I can do more with it. All three look like they have some very helpful courses. I’ll do a post on the pros and cons later when I have the basics down.

I

PowerPoint Monkey to Machine Learning Magic

They say a journey of a thousand miles starts with a single step. Also, that to build a boat, one should not simply gather wood, but teach their ‘men’ to yearn for the sea.

Whelp, after years of looking at the “learn python” on my New Year’s resolution list and wistfully looking at that Stanford Machine Learning course on my Coursera board since about 2013(!), I’ve decided to take the plunge. Of course, we are all yearning for the sea of possibilities that comes with GPT, so hopefully, that will provide sufficient inspiration. The stick to that carrot are the doom and gloom forecasts that we will all be out of a job fairly soon (and I most certainly do not intend to do without my weekly Roederer, so retirement isn’t yet an option). Like most, I really feel like I’ve missed the boat until I remind myself of how many times per day I see “Digital Transformation” marketing messages coming from enterprise software companies, juxtaposed with the facts that ERP systems have been around since the ’70s and e-commerce since late ’90s. I figure it will take some time for businesses to get on board, and the sooner I jump on this wave, the better. Plus, it looks like an exhilarating time to surf!

A little about me; I’m a late 30’s marketing professional in a large corporate tech firm that shall remain nameless. I’ve just had my first daughter and live in the beautiful red dot that is Singapore. I’m writing about my journey because I feel like an unlikely candidate for starting out with machine learning and would like to share how I go, how what I learn impacts my life, and what exciting finds I stumble upon along the way! It’s also a great way to commit to getting things done.

So, here we go! Hope others can join me as I launch onto this slippery slope of applied mathematics, python and infrastructure. And if I manage this as a total com-sci numpkin, anyone should be able to give it a shot.