Back to Articles
    AI Ethics & Governance

    How to Choose AI Tools That Don't Perpetuate Systemic Inequities

    AI tools promise efficiency and scale, but they can also encode and amplify existing inequalities. When nonprofits adopt AI without examining the assumptions and biases embedded in these systems, they risk undermining their own missions. More than half of nonprofits fear AI could harm marginalized communities they serve—yet few have frameworks for evaluating tools through an equity lens. This article provides that framework: a systematic approach to assessing AI vendors, identifying bias risks, and making technology choices that advance rather than compromise your commitment to justice and inclusion.

    Published: February 1, 202616 min readAI Ethics & Governance
    Strategic framework for choosing equitable AI tools

    The AI Equity Project's 2025 survey of 850 nonprofits revealed a troubling gap between awareness and action. While 64 percent of organizations reported familiarity with AI bias—up significantly from 44 percent the previous year—only 36 percent were implementing equity practices in their AI adoption. That implementation rate actually dropped from 46 percent the year before. This decline isn't rooted in unwillingness but in limited capacity: staff lack time and process support to translate learning into governance. But capacity constraints don't excuse choosing tools that perpetuate harm. They require more thoughtful approaches that build equity consideration into selection processes themselves.

    The stakes are particularly high for nonprofits serving marginalized communities. AI systems trained on data reflecting historical discrimination will replicate those patterns. Healthcare algorithms have systematically recommended lower levels of care for Black patients than white patients with identical health conditions. Child welfare risk assessment tools have flagged families in low-income neighborhoods as high-risk based on zip code rather than actual safety concerns. Translation tools have reinforced gender stereotypes. Voice recognition systems have failed to understand non-native speakers and speech patterns associated with certain disabilities. These aren't hypothetical risks—they're documented failures affecting real people in exactly the contexts where nonprofits operate.

    Nonprofits adopting existing AI tools are less likely than those building custom solutions to have policies for responsible use: 35 percent versus 69 percent. They're also less likely to have risk mitigation processes. This pattern makes sense—organizations purchasing off-the-shelf tools may assume vendors have addressed bias concerns, while those building systems recognize they must address these issues directly. But purchased tools aren't inherently safer. The vendor's priorities, data sources, and design choices shape what the tool does—and vendors optimizing for broad market appeal may not have considered the specific communities your nonprofit serves.

    This guide provides a framework for equity-centered AI tool selection. We'll examine how to assess vendor transparency and accountability practices, evaluate training data and performance across demographic groups, consider accessibility and inclusion in tool design, and establish ongoing monitoring for bias. The goal isn't to avoid AI—it's to adopt AI thoughtfully, with eyes open to risks and systems in place to detect and address problems. Your mission depends on serving all communities equitably. Your technology choices should reflect that commitment.

    Understanding Where Inequity Enters AI Systems

    Before you can evaluate AI tools for equity, you need to understand how systemic inequities enter these systems in the first place. Bias isn't a bug that careful coding can eliminate—it's woven into AI development at multiple stages, from the data used for training to the assumptions embedded in design decisions to the contexts where tools get deployed. Each stage presents both risks and opportunities for intervention. Understanding this landscape helps you ask better questions of vendors and make more informed decisions.

    Training data represents the foundation of AI behavior. Machine learning systems learn patterns from the data they're trained on—and that data reflects the world as it exists, including historical discrimination and ongoing inequities. If an AI system learns from past hiring decisions, it learns whatever biases shaped those decisions. If it learns from healthcare records, it learns from a system where certain populations have been chronically underserved. If it learns from criminal justice data, it learns from disparate policing and sentencing patterns. The AI doesn't judge these patterns as right or wrong; it simply learns to replicate them.

    Representation gaps compound training data problems. Even well-intentioned data collection often overrepresents some populations and underrepresents others. Facial recognition systems trained primarily on lighter-skinned faces perform poorly on darker skin tones because the training data didn't adequately represent the full range of human faces. Natural language processing systems trained on formal written English struggle with regional dialects, non-standard grammar, or code-switching between languages. For nonprofits serving immigrant communities, rural populations, or linguistic minorities, these representation gaps can make AI tools essentially useless for the communities that need them most.

    How Bias Enters AI Systems

    • Historical data bias: Training data reflects past discrimination—biased hiring, unequal healthcare, discriminatory lending
    • Representation gaps: Underrepresentation of marginalized groups in training datasets leads to poor performance for those populations
    • Design assumptions: Teams lacking diversity make choices that exclude populations they haven't considered
    • Proxy discrimination: Seemingly neutral variables (zip code, name patterns) correlate with protected characteristics
    • Feedback loops: AI decisions create new data that reinforces original biases over time

    Equity Questions at Each Stage

    • Data collection: Where did training data come from? Who is represented and who is missing?
    • Model development: Who built this system? Were diverse perspectives included in design?
    • Testing and validation: Was performance tested across demographic groups? What disparities exist?
    • Deployment context: Is this tool appropriate for your specific communities and use cases?
    • Ongoing monitoring: How will you detect if bias emerges in real-world use?

    Assessing AI Vendors Through an Equity Lens

    Vendor evaluation is your first opportunity to filter out tools likely to perpetuate harm. Research shows that merely 17 percent of AI vendors explicitly commit to complying with all applicable laws—a concerning baseline that suggests equity isn't a priority for most technology providers. Your job is to distinguish vendors who take bias seriously from those who treat it as a marketing afterthought. This requires asking specific questions, requesting documentation, and evaluating not just what vendors claim but what evidence they can provide.

    The IAPP's AI Governance Vendor Report identifies several categories of responsible AI practices, including technical assessments for data quality, robustness, model performance, safety, and fairness, as well as assurance and auditing services that help organizations demonstrate compliance with policies and regulatory requirements. These categories provide a framework for vendor evaluation. You should assess whether vendors have internal processes for bias detection and mitigation, whether they've conducted independent audits, whether they can share performance data disaggregated by demographic groups, and whether they have mechanisms for reporting and addressing bias when it emerges in deployment.

    Training Data Transparency

    Understand what shaped the AI's behavior

    Ask vendors about the datasets used to train their AI systems. What populations are represented? How was data collected, and did affected communities consent? Were known biased datasets excluded? Many vendors can't or won't answer these questions with specificity—that reluctance itself provides important information. Tools trained exclusively on data from affluent, English-speaking, Western populations may not serve diverse communities well, regardless of how sophisticated the underlying technology appears.

    Request documentation about data sourcing, demographic representation, and any known gaps or limitations. Vendors committed to equity will have this information readily available; those who haven't considered these issues will struggle to provide it. Pay particular attention to how training data relates to your specific communities: if you serve populations that are historically underrepresented in technology datasets—rural communities, non-English speakers, people with disabilities, older adults—generic assurances about "diverse data" aren't sufficient. You need evidence of representation for your communities specifically.

    Questions to ask vendors about training data:

    • What datasets were used to train this system? Are they publicly documented?
    • What demographic groups are represented in training data? What groups are underrepresented or missing?
    • Were any datasets excluded due to known bias issues? Which ones and why?
    • How was consent obtained from people whose data trained the model?
    • What languages, dialects, and communication styles are represented beyond standard English?

    Bias Testing and Audit Results

    Evidence of fairness evaluation, not just claims

    A common measure for disparate impact in AI systems is the 80/20 Rule—a selection rate for a protected group that is less than 80 percent of the rate for the most favored group may indicate potential adverse impact. Ask vendors whether they've applied this or similar fairness metrics to their systems. Request specific performance data disaggregated by race, gender, age, disability status, and other relevant categories. Vendors who have conducted rigorous bias testing will have this data readily available; those who haven't will offer vague assurances instead of evidence.

    Independent third-party AI audits provide unbiased evaluations that help organizations detect hidden biases. Ask whether the vendor has engaged external auditors to assess their systems, and request access to audit findings. Internal testing is valuable but insufficient—organizations naturally have blind spots about their own products. External evaluation by experts in AI fairness provides additional confidence that bias concerns have been genuinely addressed rather than glossed over. Tools like IBM AI Fairness 360, an open-source toolkit providing fairness metrics and bias mitigation algorithms, can be used for independent verification even when vendor audits aren't available.

    Questions to ask about bias testing:

    • Has the tool been tested for bias? What fairness metrics were used?
    • Can you share audit results and performance data disaggregated by demographic groups?
    • Have independent third parties audited the system? Can we see those findings?
    • What disparities exist in performance across different populations?
    • How frequently is bias testing conducted as the system evolves?

    Accountability and Remediation Processes

    What happens when bias is discovered

    Contracts with AI vendors should mandate regular audits, bias monitoring, and detailed reporting on model performance and data usage. Beyond contractual requirements, assess whether vendors have genuine accountability mechanisms: how do they receive and investigate bias reports? What's their track record of acknowledging and addressing problems? Have they ever withdrawn or modified a product due to fairness concerns? A vendor who has never found or fixed a bias issue either isn't looking carefully or isn't being honest about what they've found.

    As employers increasingly use AI for screening and decision-making, lawsuits alleging that AI tools disproportionately affect certain groups should serve as a warning. Ask vendors about their legal exposure related to discrimination claims and how they've responded to such challenges. Vendors with robust accountability processes will have clear policies for bias reporting, investigation timelines, remediation procedures, and communication with affected customers. Those without such processes expose you to risks they haven't adequately considered or addressed. For guidance on establishing your own accountability frameworks, see our article on creating an audit trail for AI decisions.

    Questions about accountability:

    • What's the process for reporting suspected bias? Who investigates?
    • Has the vendor ever modified or withdrawn a product due to bias concerns?
    • What legal claims or regulatory actions has the vendor faced related to discrimination?
    • How will you be notified if bias is discovered in the system?
    • What remediation options are available if the tool produces discriminatory outcomes?

    Evaluating Tool Design for Inclusion

    Beyond training data and vendor practices, the design of AI tools themselves can exclude populations or embed discriminatory assumptions. Design decisions about interfaces, languages supported, accessibility features, and assumptions about user context all shape who can benefit from a tool and who is excluded. A technically unbiased algorithm delivered through an inaccessible interface still perpetuates inequity. Evaluating design for inclusion requires examining the tool from the perspective of your most marginalized users, not your most typical ones.

    NTEN focuses on the equitable and skillful use of technology, believing that every staff member, board member, program participant, and community member should have access to training and involvement in intentional decisions about AI adoption. This principle applies to tool design as well: tools should be usable by the full range of people who interact with your organization, not just technologically sophisticated users with reliable internet access and current devices. When tools assume conditions that don't exist for marginalized populations—constant connectivity, high literacy, smartphone ownership, standard English—they exclude exactly the people nonprofits most need to serve.

    Accessibility and Universal Design

    Tools that work for people with varying abilities

    • Screen reader compatibility: Does the interface work with assistive technologies used by people with visual impairments?
    • Multiple interaction modes: Are text, voice, visual, and touch options available for different abilities and preferences?
    • Cognitive accessibility: Is the interface clear enough for users with cognitive disabilities or limited literacy?
    • Device requirements: Does the tool work on older devices or require latest-generation hardware?
    • Connectivity assumptions: Can the tool function with intermittent internet or does it require constant high-speed access?

    Language and Cultural Inclusion

    Support for diverse communities and contexts

    • Language support: What languages are supported? Are they fully functional or just partially translated interfaces?
    • Dialect recognition: Does natural language processing work for regional dialects and non-standard usage?
    • Cultural assumptions: Does the tool assume naming conventions, family structures, or cultural practices that exclude some populations?
    • Content appropriateness: Are AI-generated outputs culturally appropriate for your communities?
    • Localization depth: Beyond translation, is the tool adapted for different cultural contexts and expectations?

    Community-Centered Design Evaluation

    Assess tools from the perspective of those you serve

    The most effective way to evaluate tool design for inclusion is to involve community members in the assessment process. Create opportunities for diverse users—including people with disabilities, non-native English speakers, older adults, and those with limited technology experience—to test tools before adoption. Observe where they struggle, what confuses them, and what assumptions the tool makes that don't match their reality. This participatory evaluation surfaces design barriers that staff testing alone would miss. For guidance on involving communities in technology decisions, see our article on building inclusive AI that serves all communities equitably.

    Participatory evaluation practices:

    • Recruit diverse testers intentionally—don't just accept whoever volunteers
    • Observe actual use rather than just collecting feedback surveys
    • Test under real-world conditions: varied connectivity, different devices, actual time pressures
    • Create safe channels for honest feedback about frustrations and barriers
    • Be willing to reject tools that don't work for your most marginalized users

    Using Equity Frameworks in Tool Selection

    Several organizations have developed frameworks specifically designed to help nonprofits make equity-centered technology decisions. NTEN and Institute for the Future partnered to create the AI Framework for an Equitable World, which helps organizations consider what is necessary for their missions, roles, and community when adopting or integrating AI. This framework was developed through a community-centered process involving dozens of organizations and cross-sector partners. It helps raise critical questions at any stage of decision-making and across any mission or organizational context.

    The framework includes multiple layers: an assessment layer examining inputs, outcomes, and accountability; an impact layer considering individual, organizational, and systemic effects; and an intervention layer addressing design, development, and deployment decisions. Using structured frameworks like this ensures you don't overlook critical equity considerations in the rush to adopt new technology. They transform vague concerns about "doing AI responsibly" into specific, actionable questions that guide purchasing decisions and implementation practices.

    Core Questions from Equity Frameworks

    Embed these questions into every AI tool evaluation

    The AI Equity Project recommends embedding equity questions into every AI decision: Who benefits? Who could be burdened? Whose voice is missing? What risk is possible? These questions may seem simple, but genuinely engaging with them reveals considerations that typical technology evaluations overlook. They shift focus from features and efficiency to impact and justice—exactly the shift nonprofits committed to equity need to make.

    Benefit and Burden Analysis:

    • Which communities will benefit most from this tool?
    • Which populations could be excluded or harmed?
    • Does this tool widen or narrow existing disparities?
    • Who bears the risks if the tool fails or produces errors?

    Voice and Power Analysis:

    • Were affected communities consulted in tool selection?
    • Who makes decisions about how the tool is used?
    • Can community members opt out or appeal AI decisions?
    • How will concerns from marginalized users be heard and addressed?

    Organizations like Ford Foundation, MacArthur Foundation, Patrick J McGovern Foundation, and Emerson Collective have developed frameworks, tools, and resources that can help nonprofits navigate and vet technology vendors. From 2018 to 2023, foundations in the United States allocated an estimated $300 million in grantmaking to AI programs, with one-third earmarked for AI governance and policy efforts. This investment reflects growing recognition that responsible AI adoption requires structural support, not just good intentions. Accessing these resources and frameworks provides additional rigor to your evaluation processes. For organizations new to equity frameworks, see our article on AI policy templates by nonprofit sector.

    Establishing Ongoing Monitoring for Bias

    Selecting equitable tools isn't a one-time decision—it's the beginning of an ongoing responsibility. AI systems can develop bias over time as they encounter new data or edge cases, so initial fairness doesn't guarantee sustained equity. Establishing monitoring systems before deployment ensures you'll detect problems early, when they can be addressed with less harm than if they're discovered after causing significant damage. This proactive approach demonstrates genuine commitment to equity rather than performative concern.

    In healthcare contexts, organizations are confronting the risk that biased training data or unequal digital access could misdiagnose conditions, divert resources toward majority populations, or automate triage rules that deprioritize those already marginalized. To guard against these risks, leading nonprofits audit models for bias, disclose how AI informs decisions, and keep humans involved whenever outcomes affect community participation. These practices apply across sectors, not just healthcare. Any AI tool making decisions that affect people's lives—service allocation, resource prioritization, communication targeting—requires ongoing oversight to ensure equity is maintained.

    Monitoring Practices for Equitable AI

    Systems to detect and address bias in deployed tools

    Effective monitoring requires establishing baseline metrics before AI tools are deployed, then tracking outcomes across demographic groups over time. This disaggregated analysis reveals disparities that aggregate metrics hide. If your AI-powered donor prospecting tool has a 25 percent success rate overall, but only 10 percent for donors of color, aggregate metrics show success while hiding significant inequity. Regular review cycles—at minimum quarterly—catch emerging patterns before they cause substantial harm. For approaches to tracking AI outcomes, see our article on how to measure AI success in nonprofits.

    Key monitoring practices:

    • Baseline metrics: Document outcomes by demographic group before AI implementation for comparison
    • Disaggregated analysis: Review AI outcomes by race, language, disability, income, and other relevant categories
    • Feedback channels: Create accessible ways for staff and community members to report potential bias
    • Regular review cycles: Schedule quarterly bias reviews with authority to modify or discontinue tools
    • Human oversight: Maintain human review for high-stakes decisions affecting vulnerable populations
    • Documentation: Record all bias incidents and remediation actions for accountability and learning

    The AI Equity Project emphasizes that funders and philanthropic intermediaries should fund governance and learning capacity, not just AI tool integration or pilot adoption. Policy and people need support before platforms. This insight applies to monitoring as well: building internal capacity to detect and respond to bias requires investment that goes beyond tool purchasing. Organizations need staff time for analysis, training to recognize bias indicators, and authority structures that empower quick response when problems are found. Without this infrastructure, monitoring becomes documentation without action—a record of harm rather than its prevention. For guidance on building oversight structures, see our article on building an AI ethics committee for your nonprofit board.

    When to Walk Away from an AI Tool

    Not every AI tool is appropriate for every context, and recognizing when to decline adoption is as important as knowing how to evaluate options. Some tools—regardless of their technical capabilities—carry equity risks that outweigh potential benefits. Others may be appropriate for some organizations but not yours, given the specific communities you serve. Developing clear criteria for rejection protects your organization from adopting technology that undermines your mission.

    Red Flags That Should Stop Adoption

    Warning signs that a tool may perpetuate inequity

    • Vendor can't answer basic bias questions: If vendors can't explain training data sources, testing practices, or fairness metrics, they haven't prioritized equity
    • Known demographic performance gaps: If the tool performs poorly for populations you serve, don't adopt it regardless of overall accuracy
    • No accessibility features: Tools that exclude people with disabilities violate basic inclusion principles
    • High-stakes decisions without human review: Automated decisions about service access or resource allocation require human oversight
    • Community opposition: If the people you serve don't want AI involved in their care, respect that preference
    • Data practices that endanger vulnerable populations: Tools that could expose undocumented immigrants, abuse survivors, or others to harm
    • No remediation pathway: If there's no clear process for addressing bias when discovered, problems will persist

    Walking away from an AI tool isn't failure—it's responsible stewardship. The efficiency gains from AI aren't worth undermining your mission or harming communities. When tool evaluation reveals unacceptable equity risks, document your concerns and communicate them to vendors. This feedback helps shift the market toward more equitable products. Look for alternative tools that better align with your values, consider whether the use case requires AI at all, or accept that some applications should wait until better options exist. Your commitment to equity is more important than keeping pace with technology trends. For perspectives on recognizing AI limitations, see our article on when NOT to use AI in your nonprofit.

    Making Technology Choices That Reflect Your Values

    Choosing AI tools that don't perpetuate systemic inequities requires sustained attention, structured processes, and willingness to prioritize equity over convenience. The framework presented here—assessing vendor transparency, evaluating design for inclusion, applying equity frameworks, establishing ongoing monitoring, and knowing when to walk away—provides a systematic approach to technology decisions that too often happen haphazardly. Each element matters, and skipping steps creates openings for bias to enter your systems undetected.

    The current state of AI equity in the nonprofit sector reflects the gap between awareness and implementation that characterizes so much social change work. Most organizations recognize that AI can harm marginalized communities; far fewer have processes to prevent that harm. Bridging this gap requires moving from general concern to specific action: embedding equity questions in every evaluation, requesting evidence rather than accepting assurances, involving affected communities in decisions, and building monitoring systems that catch problems early. These practices take time and attention that already-stretched organizations struggle to provide—but the alternative is technology adoption that undermines your own mission.

    When choosing to use AI, how do you ensure these actions advance the collective social good rather than create new problems or exacerbate existing inequities? This question, posed by sector analysts examining philanthropic investment in AI, applies equally to individual nonprofits making tool selection decisions. The answer lies in intentionality: not avoiding AI, but approaching it with clear-eyed assessment of risks and benefits, structures for accountability, and genuine commitment to the communities you serve. Technology should amplify your impact, not compromise your values.

    Start where you are. You don't need comprehensive evaluation frameworks and monitoring systems in place before making any technology decisions. But each decision is an opportunity to build equity consideration into your practices. Ask one more question of the next vendor. Invite one community member to test the next tool under consideration. Review one AI application's outcomes disaggregated by demographic groups. These incremental steps build organizational muscle for equity-centered technology adoption. Over time, what feels like extra work becomes standard practice—the way your organization makes technology decisions. And those decisions, made thoughtfully across thousands of nonprofits, shape whether AI advances justice or perpetuates the very inequities our sector exists to address.

    Need Help Evaluating AI Tools for Equity?

    Ensure your technology choices advance rather than undermine your commitment to justice. Get expert guidance on vendor assessment, equity frameworks, and building organizational capacity for responsible AI adoption that genuinely serves all communities.