AI Literacy Assessment Matrices: Measuring Where Your Nonprofit Team Actually Stands
Every nonprofit leader has an instinct about who on the team is "good with AI" and who is not. Instinct is a bad foundation for a training plan. This guide turns the DOL's AI Literacy Framework into a measurable assessment matrix you can use to find out where staff actually stand and what to do about it.

Three years into the generative AI era, most nonprofit leadership teams still cannot answer a simple question: how AI-literate is our staff, actually? They have anecdotes. They have a vague sense that some people use ChatGPT a lot and others avoid it. They may have run a training or two and counted attendance. But when a funder, a board member, or an incoming CEO asks for the organization's baseline, most leadership teams have nothing concrete to offer.
This gap is starting to matter in ways it did not before. The U.S. Department of Labor's AI Literacy Framework, released in February 2026, gave the field its first widely-recognized scaffold for what AI literacy actually means in a workforce context. The framework defines five foundational content areas and a set of delivery principles, and it implicitly asks every employer, including nonprofits, to be able to say where their workforce sits against those five areas. Funders are starting to ask. Insurance carriers are starting to ask. And boards, having watched a year of headlines about AI mistakes at nonprofits, are starting to ask too.
"How many training hours did we deliver" is not the answer to any of these questions. The right answer is a measurement, not an activity count, and the right measurement is a literacy matrix that captures both the depth of competence and its distribution across roles. This guide is about building that matrix: what to assess, how to score it, how to use the results, and how to keep the assessment honest enough to be worth running again next year.
The starting point is the DOL framework itself. Once you understand the five areas, the matrix builds itself. Once the matrix exists, training plans, hiring decisions, and risk management become a great deal easier than they are today.
The DOL's Five Foundational Areas, Quickly Recapped
The Department of Labor's AI Literacy Framework groups foundational AI competence into five content areas. Each area is a column in your assessment matrix, and each is worth knowing well enough to evaluate on its own. We have covered the framework in depth in our walkthrough of the DOL's five foundational AI literacy areas, and how to map the framework to specific nonprofit job roles. Here is the short version that you need to build a matrix.
1. Understanding AI Principles
A working understanding of how AI systems generate output, what they can and cannot do, and why they sometimes fail. The point is not to make every staff member a technical expert. The point is for them to use AI confidently and appropriately without believing it is magic and without being afraid of it.
2. Exploring AI Uses
Familiarity with practical workplace use cases. Where AI genuinely helps, where it does not, and how to tell the difference. This area is about pattern recognition: knowing what kinds of tasks are good candidates for AI assistance and which are not.
3. Directing AI Effectively
The skill of giving AI systems clear instructions, sufficient context, and useful iterative feedback. This is the area most people think of as "prompting," but it includes more than wording. It also covers what context to provide, what to leave out, and how to course-correct when the first output is wrong.
4. Evaluating AI Outputs
Critical assessment of what the AI produced. Is it accurate, is it complete, does it match the actual request, and is it fit for purpose? Workers who cannot evaluate outputs end up accepting confident-looking nonsense, which is where most reputational damage from AI comes from.
5. Using AI Responsibly
The boundaries of appropriate use: what data is safe to share with which systems, what your organization's policies say, when to disclose AI use, how to avoid harm, and how accountability for AI-assisted work flows back to the human in the loop. This is where most compliance, ethics, and risk concerns live, and where the cost of getting it wrong is highest.
Four Competency Levels That Actually Differentiate
An assessment that uses two levels ("competent or not") collapses too much detail to be useful. An assessment that uses seven or eight levels is too granular for raters to apply consistently. Four levels is the sweet spot used by most validated workforce competency frameworks, and it is what we recommend for nonprofit AI literacy matrices. The levels below build on each other: each one assumes everything the previous level required.
Level 1: Foundational Awareness
Can describe; cannot yet do
The staff member can recognize the basic concept and discuss it in conversation but has not yet applied it to their actual work. They have read about AI, attended an introductory training, or watched a demonstration. They could explain to a colleague at a high level what an AI assistant is and why their organization is interested. They have not yet used one for anything that matters.
Level 2: Supervised Practice
Can perform with guidance
The staff member can complete AI-assisted tasks when given a template, a coach, or a closely-supervised workflow. They produce usable outputs but require review and occasional course correction. They know enough to be helpful and not enough to be left alone with consequential work. The bulk of nonprofit staff currently sit here on most of the five framework areas.
Level 3: Independent Practice
Can perform reliably without close supervision
The staff member can complete AI-assisted tasks independently, recognize when something has gone wrong, and recover. They know which tools to reach for, when to escalate, and when to walk away from a workflow that is not going to work. They use AI as a real part of their job, not as an experiment. Most of the value of AI investment in a nonprofit is generated by staff at this level.
Level 4: Leading and Mentoring
Can teach others and shape the organization's practice
The staff member can not only do the work but can teach colleagues to do it, write the playbook for new use cases, advise leadership on policy, and represent the organization's practice externally. Level 4 staff are usually scarce, often informally identified, and disproportionately important. Most organizations need a handful of them across the team, and especially one or two per major function. For more on identifying these people, see our guide to building AI champions.
Building the Matrix: Rows and Columns
The matrix is simple in structure. Columns are the five DOL content areas. Rows are roles, individuals, or teams, depending on how you plan to use the results. Each cell holds a competency level from 1 to 4. A populated matrix gives you at a glance who is strong where, where the team has gaps, and which gaps are common enough to warrant cohort training rather than individual coaching.
Decision 1: Role-Based or Individual-Based?
A role-based matrix asks: what competency level do we need from anyone holding this role? It is the right frame for hiring, for job description revisions, and for setting baseline expectations. An individual-based matrix asks: where does each specific staff member sit today? It is the right frame for personal development plans, for identifying who can mentor whom, and for spotting individuals who are quietly far ahead or far behind their cohort. Most nonprofits will benefit from doing both, with the role matrix setting the target and the individual matrix showing how close each person is to it.
Decision 2: What Does "Required Level" Mean for Each Role?
Not every role needs Level 3 in every area. A development director who is going to draft funder communications with AI assistance needs a strong "Directing AI Effectively" (likely Level 3) and an even stronger "Evaluating AI Outputs" (Level 3 to 4), but might only need Level 2 in "Understanding AI Principles." A program staff member who uses AI primarily through approved tools with guardrails might need only Level 2 in directing AI and Level 2 to 3 in responsible use. The matrix becomes useful precisely when you stop pretending every role needs the same competence.
- Set the required level per cell, not per person
- Make "Using AI Responsibly" the area where requirements rarely drop below Level 2
- Allow Level 1 in areas that are genuinely irrelevant to the role (rare, but real)
Decision 3: How Granular Should the Role Categories Be?
For a small nonprofit, six to eight role categories usually suffice: leadership, fundraising and development, communications and marketing, program staff, operations and finance, IT and data, board, and frontline volunteers. For a larger organization, finer categories may be warranted, but the cost in maintenance time goes up quickly. As a rule, do not create a role category you will not actually use to make a decision. If two categories would always end up with the same required levels, merge them.
A Sample Required-Level Matrix for a Mid-Size Nonprofit
The table below is illustrative, not prescriptive. It shows what a populated required-level matrix might look like for a hypothetical mid-size human services nonprofit. Use it as a starting point, then adjust based on which AI workflows your organization actually runs and which roles touch them.
| Role | Principles | Uses | Directing | Evaluating | Responsible |
|---|---|---|---|---|---|
| Executive Leadership | 3 | 3 | 2 | 3 | 4 |
| Development Director | 2 | 3 | 3 | 4 | 3 |
| Communications Staff | 2 | 3 | 3 | 3 | 3 |
| Program Staff | 2 | 2 | 2 | 2 | 3 |
| Operations & Finance | 2 | 2 | 2 | 3 | 3 |
| IT & Data | 4 | 3 | 3 | 3 | 4 |
| Board Members | 2 | 2 | 1 | 2 | 3 |
Notice the pattern. "Using AI Responsibly" is never below Level 2, and is at Level 3 or 4 for anyone with consequential authority. "Directing AI Effectively" varies more, because roles that primarily consume AI outputs need less of this than roles that primarily produce them. "Evaluating AI Outputs" is highest for staff producing external-facing content (development, communications), which is where bad output hurts most. Patterns like these emerge naturally once you sit with the framework long enough, and your own organization's pattern will be specific to which workflows matter to you.
How to Actually Measure: Three Assessment Methods
A matrix is only useful if the cells are filled with measurements rather than guesses. Three assessment methods, used in combination, will give you trustworthy data without consuming weeks of staff time.
Method 1: Self-Assessment with Calibration Anchors
Ask staff to rate themselves against the four levels in each of the five areas. Self-assessment is fast and free, but on its own it is unreliable: less confident staff systematically underrate themselves, and a smaller but real number of overconfident staff systematically overrate themselves. The fix is to give each level a behavioral anchor, a concrete description of what someone at that level can do. "I can prompt an AI assistant for help, but I usually need a colleague to fix the output before I can use it" is a Level 2 anchor for Directing AI Effectively. Anchors keep self-assessment honest enough to be useful as a starting point.
Method 2: Performance Tasks
For the areas where self-assessment is least reliable, especially "Directing AI Effectively" and "Evaluating AI Outputs," give staff a small, time-bounded task to perform with an AI tool and a structured rubric to score the result. Examples include drafting a short donor thank-you note using AI, or reviewing a flawed AI-generated meeting summary and identifying the errors. Performance tasks take 30 to 60 minutes per person and yield much more reliable data than any number of survey questions.
Method 3: Supervisor Observation
Supervisors can observe everyday AI use better than any survey can. Build a short structured observation form: "In the past month, did this staff member produce AI-assisted work without supervision? Did they recognize when an AI output was wrong before sending it onward? Did they ask for help when they should have?" Supervisor observation works best for "Using AI Responsibly" and for the upper levels (3 and 4) of every area, which are about behavior under realistic conditions rather than declarative knowledge.
A practical combination for most nonprofits is self-assessment for all five areas, performance tasks for the two areas where doing matters most, and supervisor observation as a calibration check. The whole assessment, done well, takes about two hours per staff member spread over a few weeks. That is a meaningful but manageable investment, and the resulting data is good enough to make real training decisions on.
Reading the Matrix: What the Patterns Tell You
A populated matrix should change what you do. Otherwise it is paperwork. Three patterns recur in nonprofit assessments, and each suggests a different response.
Pattern A: The Common Cohort Gap
Multiple roles, or multiple individuals in the same role, sit below the required level in the same area. This is a training problem, not an individual problem, and it deserves a training response. A cohort-based program, ideally taught by an internal Level 4 staff member or an outside facilitator who knows your context, will move the whole group up at once. This is the most common pattern in our experience: organizations that have not formally trained on AI almost always show a uniform Level 2 across most staff in "Evaluating AI Outputs," because the topic rarely surfaces in informal use.
Pattern B: The Hidden Strength
An individual is rated Level 3 or Level 4 in areas where the rest of their role cohort is Level 2. This is your AI champion candidate, and they are often invisible until the assessment surfaces them. Hidden strengths are particularly common in operations and program staff who have been quietly experimenting on their own time. Once identified, they can be tapped for peer training, internal documentation, and policy development, all of which both leverages their skill and develops them further.
Pattern C: The Senior-Junior Inversion
Senior staff are rated below junior staff in "Directing AI Effectively" or "Exploring AI Uses." This is uncomfortable but extremely common, because younger staff have often spent more personal time experimenting with AI tools. The response is not to demote anyone, and certainly not to ignore the data. It is to design training that respects senior staff's experience while honestly covering what they do not yet know, and to use the junior staff with hidden strengths as reverse mentors. Done well, this is one of the highest-leverage uses of the matrix, and it produces durable cultural change.
Turning the Matrix Into a Training Plan
A populated matrix and a list of required levels gives you a gap analysis directly: every cell where the current level is below the required level is a training opportunity, and the size of the gap suggests the type of intervention required. Three intervention types cover most situations.
Cohort training is the right response when multiple staff in the same role share the same gap. A two-hour facilitated workshop on "Evaluating AI Outputs," for example, can lift an entire development team from Level 2 to Level 3 in a single session if it is well-designed. The economics of cohort training are excellent: the cost per participant is low, the social reinforcement of learning together helps the material stick, and the shared vocabulary that emerges makes ongoing collaboration easier.
Individual coaching is the right response when one or two people sit unusually low for their role. A one-hour coaching session with a Level 4 internal champion, repeated three or four times over a couple of months, will usually close a single-area gap. This is much cheaper than building a full training for one person, and the personalization works better than a generic class would.
Embedded learning is the right response for areas where staff understand the principle but lack practical experience. Rather than a class, build the learning into the workflow: a shared prompt library that staff add to, a peer-review process for AI-assisted work, a Slack channel where staff share examples and ask questions. This kind of learning compounds because it generates organizational artifacts that outlast any individual training session.
For broader guidance on training design that connects to this framework, see the seven delivery principles for nonprofit AI training.
Common Mistakes in Assessment Design
Three failure modes recur in nonprofit literacy assessments, and each is worth avoiding before you start rather than discovering halfway through.
Mistake 1: Using the Matrix for Performance Management
The fastest way to corrupt a literacy matrix is to tie it directly to performance reviews, raises, or layoffs. Once staff understand that a low score has career consequences, the self-assessment becomes a negotiation rather than a measurement, and the data becomes useless. Keep the matrix in a developmental frame: it is for finding gaps and planning training, not for ranking staff. If the matrix and performance management must intersect, do it indirectly: hiring decisions can reference the required levels for a role, and performance conversations can reference behaviors, but the assessment data itself stays in the development workflow.
Mistake 2: Treating the Assessment as One-and-Done
AI literacy is a moving target. The tools change, the use cases evolve, and the expectations on staff shift faster than annual training cycles can keep up. An assessment that runs once a year and never recalibrates the levels themselves will quietly become wrong: a Level 3 in "Directing AI Effectively" in 2026 may not match the expectations of 2027. Plan to re-run the assessment annually, update the level descriptors as the field evolves, and treat the matrix as a living document rather than a static one.
Mistake 3: Confusing Tool Familiarity with Literacy
Many staff "feel literate" because they have used ChatGPT a lot. Heavy use is not the same as literacy, and conflating the two leads to overconfidence in the matrix. The DOL framework's five areas are deliberately not about specific tools: they cover principles, judgment, and responsibility, which a person can lack despite years of casual use. Make sure the level descriptors test for the actual competency, not for tool fluency, and explicitly distinguish "uses AI often" from "uses AI well" in the assessment rubric.
A 90-Day Implementation Plan
The matrix approach can be in place at most nonprofits within a single quarter. The work splits roughly evenly across three months.
In the first month, focus on design. Adapt the DOL framework's five areas to your nonprofit's actual workflows. Define the four competency levels with concrete behavioral anchors that fit your context. Set required levels per role. Build the self-assessment instrument and one or two performance tasks for the highest-leverage areas. Review the design with a small group of staff who can stress-test the language before it goes wider.
In the second month, run the assessment. Send the self-assessment to staff with clear framing: this is for development, not evaluation. Run the performance tasks with staff who agreed to participate. Have supervisors complete observation forms. Collect, score, and aggregate the data into a populated matrix at both role and individual levels. Expect this to take longer than you planned, because some staff will need reminders and some performance tasks will need rescheduling. Build slack into the timeline.
In the third month, act on what you found. Identify the two or three cohort gaps that affect the most staff and design training to close them. Identify the hidden strengths and recruit them into champion roles, peer training, or working groups. Draft a one-page summary of findings and next steps for the board and senior leadership, and a separate one-page summary for staff that respects individual privacy. Schedule the next assessment for twelve months out, with a quick mid-year check-in at six months for the highest-priority gaps.
This timeline is achievable for any nonprofit that takes it seriously. The matrix that results will not be perfect, and that is fine. A useful matrix is far better than no matrix, and the second annual assessment will be cheaper, faster, and more accurate than the first because the level descriptors will have been tested against real data.
What Comes After the First Assessment
The first assessment is the hardest one to run. Subsequent assessments benefit from established level descriptors, calibrated anchors, and a baseline against which to measure change. Three things change in year two and beyond.
The level descriptors themselves should evolve. AI capabilities advance, and the workflows that were impressive in 2026 become routine in 2027. A Level 3 description that talks about "iterating on a prompt to refine the output" may need to be replaced by language about "delegating a multi-step task to an agent and supervising the workflow" by 2027. Keep a change log of descriptor updates so year-over-year comparisons are meaningful, and be willing to admit when a level needs raising or lowering.
The required levels per role should evolve too. Some areas will become baseline expectations for every role, especially "Using AI Responsibly," as compliance environments tighten and constituent expectations rise. Other areas may bifurcate, with some roles needing much higher levels than others as workflows specialize. Revisit the required levels every year alongside the assessment, and document the reasoning behind each change.
The assessment itself can become lighter weight as it matures. Once staff understand the framework, the level descriptors are stable, and the supervisors are calibrated, an annual refresh can take less than half the time of the initial assessment. The asset to invest in is the underlying infrastructure: the rubrics, the performance tasks, the observation forms, and the cohort communications. Build it once well, maintain it lightly, and the matrix becomes a durable management tool rather than a one-time project.
For deeper context on how the DOL framework fits into broader nonprofit workforce development, see our mapping of the framework to nonprofit job roles and our broader guide for nonprofit leaders.
Conclusion
AI literacy at a nonprofit cannot be improved if it cannot be measured, and most nonprofits do not measure it. The DOL's AI Literacy Framework provides the scaffolding for a measurement instrument that is rigorous enough to be useful and simple enough to be maintained. A four-level competency scale across the five framework areas, populated at both the role and individual levels, gives leadership a real view of where the team stands and where to invest.
The matrix is not a destination. It is a tool for ongoing decisions about training, hiring, policy, and risk. Organizations that build it well will know what their workforce can do, will know what they need to teach, and will be able to answer hard questions from funders, regulators, and boards with concrete data rather than anecdotes. Organizations that skip it will keep running ad-hoc trainings, hoping for the best, and discovering gaps the hard way.
The investment to build the matrix is modest: a few weeks of careful design, a couple of months of measurement, and an ongoing commitment to keeping it current. The return is the ability to actually manage AI capability across the team, which in 2026 has stopped being optional. The framework exists. The methods are well-understood. The only question left is whether your organization will use them.
Build Your Nonprofit's AI Literacy Matrix
We help nonprofits adapt the DOL framework into a working assessment matrix, run the first round of measurement, and translate the results into a practical training plan. Most engagements deliver a populated matrix and a draft training plan within a quarter.
