Machine Learning Best Practices for Public Sector Organizations

AWS has a huge library of fantastic resources. This post highlights the recently released whitepaper walking public sector organizations through machine learning best practices.

“Machine Learning Best Practices for Public Sector Organizations, walks you through the ups and downs of a machine learning practice.

While the title and positioning calls out the US Public Sector, this paper is really broadly applicable. There’s a few specific resources for the US Public Sector—like The National Artificial Intelligence Research and Development Strategic Plan: 2019 Update—but really, only about 1% of the paper is specific to that audience.

I call out a few more details in the Twitter thread below…

Tweet 1/15 👇 Next tweet

today I'm taking a look at the @awscloud paper 📑, "Machine Learning Best Practices for Public Sector Organizations"

it's available as a PDF from https://d1.awsstatic.com/whitepapers/machine-learning-best-practices-for-public-sector-organizations.pdf

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 2/15 👇 Next tweet 👆 Start

this thread is unrolled at https://t.co/t4UCZEUxNA

you can read yesterday's thread at https://markn.ca/2021/aws-serverless-multi-tier-architectures/

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 3/15 👇 Next tweet 👆 Start

the intro lays out the specific challenges & reqs for US public sector organizations heading down the path of leveraging machine learning

...that's good but the paper is really broadly applicable! don't ignore it just because you're not in the public sector

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 4/15 👇 Next tweet 👆 Start

to that point, the section "Challenges for public sector" should be read as "Challenges for everyone" as the only public sector-specific point is that there are draft guidelines for the use of AI within the US government

more at https://www.nitrd.gov/pubs/National-AI-RD-Strategy-2019.pdf

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 5/15 👇 Next tweet 👆 Start

the majority of the paper is the "best practices" section. for each of the subsections, it calls out the biggest challenges you'll face building out your ML practice

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 6/15 👇 Next tweet 👆 Start

on data ingestion & preparation: there are some practical suggestions and use cases for various @awscloud services

what this section should've said is, "Get ready to plow through a bunch of 💩. Data is always messy and there's a lot of clean up to be done" 🤣

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 7/15 👇 Next tweet 👆 Start

on model training & tuning: the paper provides a really great overview of the practical aspects of this part of the ML pipeline.

it's really well written and consistently links out to other resources so you can learn more

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 8/15 👇 Next tweet 👆 Start

MLOps is a little light but I think that's understandable

ops is a very big rabbit hole

this section does well to explain the issues and links out to references and key services like @awscloud SageMaker Pipelines

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 9/15 👇 Next tweet 👆 Start

management & governance is always a 😴 but it's also critical

if you don't pay attention, you're not going to build a reliable practice

you're not going to understand where the data came from, the restrictions on it, how to get the most from it, etc.

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 10/15 👇 Next tweet 👆 Start

security & compliance is near and dear to my ❤️. the paper does a good job of covering this area.

if you're using mainly managed services, a lot of your focus will be on service configuration & data access...read on for more (of course!)

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 11/15 👇 Next tweet 👆 Start

on cost, the paper highlights the areas where costs may bubble up

I would've liked some more concrete tips about how to cut down on costs

of course, what trade offs you can make will depend on your situation so it does make sense they didn't dive in too deep

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 12/15 👇 Next tweet 👆 Start

the last best practice area is bias and explainability

this is THE critical topic when it comes to ML, especially in the public sector

we need more resources on this topic. not just about bias in the model but also understanding where the data comes from...

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 13/15 👇 Next tweet 👆 Start

...honestly, I can't overemphasize how critical it is to focus on this in any ML practice. when it comes to public sector projects, where policy could be influenced or set based on results, the risks are even higher

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 14/15 👇 Next tweet 👆 Start

overall, this is a great paper. it's biggest weakness is that it's labeled as a public sector paper. that's going to turn a lot of ppl away that should read it

there are other papers on ML from the @awscloud team

like, MLOps, https://d1.awsstatic.com/whitepapers/mlops-continuous-delivery-machine-learning-on-aws.pdf

🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

Tweet 15/15 👇 Next tweet 👆 Start

...Model Explainability with AWS Artificial Intelligence and Machine Learning Solutions at https://d1.awsstatic.com/whitepapers/leveraging-model-explainability-with-AWS.pdf

...and of course the ML lens of the @awscloud Well-Architected Framework at https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/machine-learning-lens.html

/🧵☁️ #cloud #ml

@marknca tweeted at 05-Nov-2021, 12:00

👆 Start