If you haven’t had a chance to check out some of the Microsoft Learn courses available over on Docs, drop what you’re doing and bookmark a few that catch your eye. They’re actually pretty decent and kinda fun.
I’m currently working through several courses and documenting my thoughts and experience.
To kick things off I thought I’d start with one of the high level (read: non-technical) courses and then move on to some stuff around Databricks.
This post is a review of the Microsoft Learn Course titled “Understand The Evolving World of Data”.
The total estimated time to complete the course is ~28 minutes.
The course sets out to achieve the following:
- Learn the key factors that are driving changes in data generation, roles, and technologies.
- Compare the differences between on-premises data technologies and cloud data technologies.
- Outline how the role of the data professional is changing in organizations.
- Identify use cases that involve these changes.
Important Take Aways
As I worked through the course, (which was all reading until the quiz at the end) a few ideas jumped out that I think are worth highlighting.
The first one being an important limitation to scaling an application or service.
Scalability - Once administrators can no longer scale up a server, they can begin to scale out their operations by adding more servers to a “cluster” with load balancers directing network requests to nodes with available capacity.
A limitation of scaling out services by server clustering is that the hardware for each server in the cluster must be identical. Once the server cluster reaches capacity, each node in the cluster must be upgraded.
That could be a pretty chaotic and costly exercise to go through if you had to scale quickly.
Availability - The availability of a system is often expressed as n9’s (three nines, four nines, or five nines).
This an abstract expression of uptime. (99.9 percent, 99.99 percent, or 99.999 percent).
To calculate system uptime in terms of hours, you simply multiply these percentages by the number of hours in a year (8,760).
The course eludes to something about using these as objectives but isn’t super explicit about it, so allow me to be…
Five 9’s Is Costly and Difficult to Achieve
Actually, it’s close to impossible for most of us. Take a look at the numbers below.
To achieve THREE 9’s of availability would mean that you are committing to allowing NO MORE THAN 8.76 hours of downtime for a given service for an entire year.
And I guess if you stop and think about that number, it seems fairly reasonable. Maybe even achievable.
In the name of continuous improvement let’s just say we wanted to get our availability numbers to FOUR 9’s (99.99%).
Looking Closer At the Math
Per month: 4.38 minutes
Per week: 1.01 minutes
Per day: 8.66 seconds
Most engineering teams aren’t even aware there is a problem in under 5 minutes. The effort and cost associated with reaching the additional “9” is what site reliability practitioners would call an “inappropriate” level of availability.
… also …
Nines Don’t Matter If Users Aren’t Happy
For the record, there are currently 148 of these dashing garments in the world. 💙 Proceeds go to organizations supporting queer and trans in tech. 💖 This shirt has been shown to decrease outage durations by 30%. 🌈 pic.twitter.com/7MSa01W434— Charity Majors (@mipsytipsy) July 17, 2019
Pictured Tweet above from Charity Majors (@miptypsy)
As Chartiy and many others have pointed out time after time…
our users don’t care about these numbers.
They are using our service to perform a task. It either delivers or it doesn’t. That’s what your business (and engineering efforts) should focus on delivering and measuring.
The main point being, if you set an “inappropriate” level of availability for a service that then creates a scenario where operational costs stack up and herculean efforts burnout engineers, whatever it is you are trying to do for your customers will fall short. And your business will fail.
Availability is important but think long and hard about the ROI of that extra “nine”. It’s quite possible that lowering that number significantly and budgeting for some “appropriate” level of downtime might in fact create for a more reliable (socio)technical system in the long run.
Shifting Job Responsibilities
Your skills need to evolve from managing on-premises database server systems, such as SQL Server, to managing cloud-based data systems. If you’re a SQL Server professional, over time you’ll focus less on SQL Server and more on data in general. You’ll be a data engineer.
I think this is important for all of us to remember, regardless of your role.
As storage and compute get cheaper and easier and we focus our efforts on data driven decisions tied to customer expectations or experience we must find better ways to make sense of the vastness and complexity of our data.
The methods in which we are storing, manipulating, and querying information is constantly evolving. Focusing less on the tooling and more on the data is a pretty safe bet for all future efforts.
Hints to the answers
As you work through the course… see if you can answer these questions to yourself and you’ll ace the exam at the end.
There are a couple of types of data processing frameworks out there. Which one would a data engineer use to ingest data onto cloud data platforms?
One specific data type mentioned can define the schema at query time. Which is it?
There are a couple more questions, but I won’t give you all of the ansers.
Take the course for yourself and let me know how you did and which one you’ll be taking next.
### Links and Resources