A man watches a video of misinformation on the social media platform TikTok on his mobile phone in Hanoi on 6 October 2023. (Photo by Nhac NGUYEN / AFP).

Long Reads

From Principles to Protocols: Embedding Partnerships into Content Moderation Technologies Against Mis/Disinformation

Published

Despite growing calls for collaboration, the technological core of content moderation remains largely a black box. While multi-stakeholder partnerships are increasingly invoked in regional policy discourse, external oversight or shared governance of the underlying moderation technologies remain limited.

INTRODUCTION

In addressing false and harmful online content, “multi-stakeholder partnership” has become a catch-all phrase to describe any kind of collaboration between actors. At the normative level, the 2023 ASEAN Guideline on Management of Government Information in Combating Fake News and Disinformation in the Media suggests the “penta-helix” approach. Coordinated by Southeast Asian governments, it invites partnerships between business, media, civil society, academia, and governments to debunk and amplify counter-narratives against disinformation.

Institutionally, governments have established formal multi‑stakeholder platforms to shape online content. For example, Malaysia’s Communications and Multimedia Content Forum (CMCF), a self-regulatory body, draws multiple actors to co‑draft and enforce a Content Code and manage public complaints. In the Philippines, the government-led Inter-Agency Council Against Child Pornography (IACACP) coordinates policy rollout, capacity‑building, advocacy, and access blocking to curb online child sexual exploitation.

Partnerships also occur in platform-led initiatives, which might include Priority Flagger (YouTube), Trusted Partner (Meta), or Safety Partners (TikTok) programmes, essentially providing a highway for partner organisations to report problematic content to platforms.

However, while partnerships have begun to materialise at various levels, the technological layer of content moderation remains largely proprietary, i.e. controlled and owned by tech platforms. This article examines the elements of technologies that make up content moderation (particularly against borderline content) and assesses whether each aspect can be grounds for further collaboration or, conversely, whether they are a site of power struggle.

THE COMPLEX SYSTEM OF CONTENT MODERATION

Content moderation, understood as the assessment of user-generated content (UGC) to determine appropriateness, comprises a complex and interconnected set of standards, practices, and technological design. Each of those elements eventually affects the moderation outcomes directly and indirectly.

Zooming in on the technological aspect in particular, the policy discourses in Southeast Asia tend to view the social media “algorithm” as a single entity rather than recognising it as a collection of diverse automated systems. Non-algorithm elements also risk being overlooked, not least because policymakers tend to focus more on the purpose and outcomes of such algorithms, instead of the algorithm’s technical possibilities, development life cycle, and ownership/control.

… it might be beneficial for policymakers to explore the notion of “partnership by design” for content moderation’s technical architecture.

The output-driven approach can indeed be helpful for devising normative and strategic guidelines. However, this might come at the expense of tailored, surgical interventions to address specific technical harms, risks, or biases, that may widen the already sizeable gap between policy intent and implementation.

To understand the technological nuts and bolts relevant for better multi-stakeholder policies, the table below presents a non-exhaustive list of the components of the technical system that make up content moderation (Table 1). This analysis will use the umbrella term of “technology” to discuss techniques, platform-deployed tools/systems, proprietary tools, or hybrid enforcement frameworks that make up content moderation.

Table 1: List of Components making up Content Moderation Technical System

TypeTechnologyDescription
Foundational AI and data processing techniquesMachine Learning (ML)Algorithms trained to identify patterns and flag content without hard rules.
Hashing (Cryptographic)Detects exact matches of banned files using unique digital fingerprints.
Deep Learning (DL)Natural Language Processing (NLP)Analyses human language in text for harmful or policy-violating content.
Large Language Models (LLMs)AI systems that understand and generate human-like text across languages.
Video Frame AnalysisAI reviews video images to detect nudity, violence, and other visual harms.
Audio AnalysisScans audio for forbidden content or matches to copyright/terror databases.
Deepfake DetectionIdentifies AI-generated fake media like videos or voice clones.
Pattern and Behaviour AnalysisCoordinated Inauthentic Behaviour (CIB) DetectionFlags networks of fake or deceptive accounts acting in coordination.
Keyword FilteringBlocks content with pre-set banned terms.
Hashing (Perceptual)Identifies visually similar content even if slightly altered.
Human-Centred FrameworksHuman-in-the-Loop (HITL)Combines AI speed with human judgment and correction.
Community ModerationUsers flag content or add public context (e.g. Community Notes).
Source(s): Various compiled by the author

There are, of course, content moderation elements not primarily driven by technology but by processes or people. For example, there are the Business Process Outsourcing (BPOs) model for content moderators, the aforementioned trusted partner programme, and also independent/third-party fact-checkers.

The moderation process also refers to several, sometimes competing, written guidelines in national legislations, platform community standards, voluntary codes of conduct, or global norms like the Santa Clara Principles. Moderation outcomes are also not immune to government-mandated takedown or Oversight Board decisions (in the case of Meta platforms). These non-technological elements might be endless; however, the bottom line is that the technologies do not exist in a silo but are influenced by (and in turn influence) other elements of content moderation.

Priority Flagger (YouTube), Trusted Partner (Meta), or Safety Partners (TikTok) programmes provide a highway for partner organisations to report problematic content to platforms. Photo simulation of social media apps on a smartphone screen captured on 30 November 2023 (Photo by Jonathan Raa / NurPhoto via AFP)

Additionally, the list only includes established examples and deliberately omits experimental moderation concepts, although some of them might be promising. For example, Francis Fukuyama proposes the concept of “middleware”, a third-party software positioned between users and tech giants, giving users power to choose how the content is curated rather than just accepting platforms’ default algorithms. This idea has materialised in newer/smaller platforms such as BlueSky, but tremendous buy-in is required to implement it in more mainstream platforms.

This article focuses on the technological aspects of content moderation. The technologies involved in content moderation are categorised according to two criteria: whether the technology is reliable for borderline/nuanced content, and whether the existing governance model is proprietary or already has some form of collaboration with external actors. These are reflected in the 2×2 matrix below (Table 2):

Table 2: 2-by-2 Matrix based on Two Criteria of Content Moderation

Only effective for clearly harmful and illegal contentAlso promising for borderline/context-dependent content
Requires external input/partnership (shared governance or input)Quadrant II  Cryptographic hashing, Perceptual hashing, Deepfake detectionQuadrant I  Community moderation/notes Human-in-the-Loop (HITL) with fact-checkers
Platform-controlled (proprietary/closed system)Quadrant III  Machine Learning (ML) Keyword filtering Audio fingerprinting Video frame analysisQuadrant IV  Natural Language Processing (NLP) Large Language Models (LLMs) Internal Human-in-the-Loop (HITL) Manual moderation Detection system for Coordinated Inauthentic Behaviour (CIB)

Ideally, there should be more technologies in Quadrant I, where multi-stakeholder partnerships in technology can address the more complex borderline and context-dependent content. The technologies listed in Quadrant II represent the low-hanging fruit for multi-stakeholder cooperation, since technologies like ContentID or hashing are only reliable in addressing clearly defined and easily identified illegal content. Thus, such partnerships do not entail high-level decision-making or complex problem-solving. Quadrant III lists the proprietary technologies in complete or almost complete control of tech platforms that have been reliably deployed for easily identified illegal contents, but not so much for borderline content. Finally, Quadrant IV lists the technologies that have the potential to be effectively deployed for borderline content but are still largely controlled by tech platforms.

INHERENT LIMITATIONS

There are clear observations from the matrix. The most obvious one is that despite the emerging multi-stakeholder partnerships in fact-checking programmes and shaping regulations, the technological layer of content moderation is still largely privately controlled by tech platforms (Quadrants III and IV). After all, it is the platforms that have put in substantial infrastructure investment and provide maintenance of those systems.

This arrangement, combined with external factors such as under- /overregulation and the inherent complexities of content moderation, further incentivises the platforms to maintain proprietary systems rather than engage in broader collective efforts. As a result, there is currently little room for Southeast Asian countries and regional organisations to exert their influence on technologies owned and implemented by US or China-based social media platforms.

The second observation is that the technologies that address easily identified illegal contents (Quadrants II and III) are relatively more established than the emerging/experimental technologies for borderline content. Established technologies generally enable a more defined partnership. For example, moderation against terrorism and violent extremist content has long benefited from the hash-sharing database run by the Global Internet Forum to Counter Terrorism (GIFCT), a non-profit consortium founded by Meta, Microsoft, YouTube, and then-Twitter.

The GIFCT initiative converts known terrorist or violent extremist content into unique representations known as hashes. Member companies can then quickly block and remove content matching those hashes without needing to share user data or original files. Misinformation and disinformation, however, often lack a clear and consistent digital signature and consistently shapeshifts—whether through text alteration, image manipulation, or misleading narratives that evade standard detection algorithms. These “lawful but awful” contents exploit not only legal limitations, but also a technological one.

Third, technology that has the potential to address borderline content requires substantial human participation in its life cycle. As categorised in Quadrant I, human judgement is a key component in both community notes and Human-in-the-Loop (HITL) with fact-checkers. Community notes harness the collective evaluation of social media users to provide additional context to the original posts. HITL with fact-checkers combines the platforms’ internal HITL system with external input from third-party fact-checkers.

Such partnerships, however, are currently limited and optional. HITL can easily be achieved with internal moderators within social media platforms instead of third-party fact checkers (see Quadrant IV). Community notes, which recently re-emerged after being adapted by Meta platforms (in the US market only), are largely limited to ex-post interventions – i.e., providing additional context after mis/disinformation had already been posted and deemed too slow in reducing engagement with mis/disinformation during the early stage of dissemination. This makes the existing interventions in Quadrant I optional and supplementary, as opposed to being prioritised as an essential tool.

Finally, underneath the technologies managed by social media platforms, there is still a matter of government internet-filtering technologies. To some authoritarian regimes, there is no need to moderate content if no objectionable content can be uploaded in the first place. Southeast Asia is no stranger to instances of governments engaging in internet filtering, whether generally through DNS/IP blocking or surgically through Deep Packet Inspection (DPI). These instruments are traditionally employed to address illegal content but have been increasingly used for borderline content.

Social media platforms may have technological prowess and trade secrets, but governments still hold the internet’s “kill switch”. Whereas undersea cables can be a site of power struggle between governments, content moderation becomes a similar arena for contestation between governments and corporations.

GOING FORWARD

Effective regulatory efforts in the region will only be implementable to the extent that the technological limitations allow. The technologies in Quadrant IV represent the last frontier for multi-stakeholder partnership due to their emergent nature, proprietary ownership, and difficulty in achieving consensus on moderating borderline content.

The policy implications are two-fold. The first relates to the importance of differentiating between regulation for clear-cut illegal/harmful content versus borderline content. Most ASEAN member states still treat both kinds of content using the same legal provisions, with the exception of Singapore with the Online Safety (Miscellaneous Amendments) Act for harmful content and the Protection from Online Falsehoods and Manipulation Act (POFMA) for borderline content. While POFMA is not without its weaknesses, it still underlines the importance of treating illegal and borderline content separately, and under two separate legislative frameworks and processes.

Second, these is a need to specify technical regulations for each technological element of content moderation operated by tech platforms. This is trickier to articulate, let alone implement. However, metaphorically, as much as the proprietary algorithms constitute the company’s “secret sauce”, such sauces should not be exempt from food safety standards.

Most Southeast Asian countries are also “tech-takers”, rather than “tech-makers”, which significantly reduces their leverage and ability to effectively regulate tech platforms in the overall scheme of things.

Elsewhere in the digital policy landscape, there has been an emergence of “by design” approaches. Fundamentally, “by design” is a proactive act to systematically embed a certain principle in the foundation of a certain technology and its life cycle, rather than adding it as an afterthought. There is a “secure-by-design” approach in the Philippines National Cyber Security Plan 2023 – 2028, “safety by design” in the Indonesia’s Online Child Protection Regulation, and “privacy by design” guidelines set to be developed by Malaysia to supplement the revised Personal Data Protection Act.

Following a similar vein, it might be beneficial for policymakers to explore the notion of “partnership by design” for content moderation’s technical architecture. This approach can benefit borderline content (such as misinformation and disinformation) by expanding the room in which non-corporate stakeholders can provide direct input to content moderation technologies.

The governments’ part of the job is to pronounce what exactly “partnership by design” means and in which technologies they want the partnership to happen. For instance, with the emergence of explainable AI (XAI) in policy discourse, governments could specify that for content moderated by technologies in Quadrant IV (e.g., NLP, LLMs, internal HITL), platforms must be able to demonstrate the specific linguistic cues, contextual elements, or data correlations that led to an automated moderation decision. This has become more imperative given the recent trend of platforms downsizing their Trust and Safety (T&S) team in favour of AI moderation.

The platforms’ part of the job is to lead the partnership to ensure that the standards are implementable and operable across platforms. Learning from the best practices of GIFCT and the Coalition for Content Provenance and Authenticity (C2PA), buy-ins from tech platforms are easier if the initiatives are platform-led, rather than government-led.

Admittedly, this proposition is intended to provoke, rather than prescribe. After all, there is an inherent impasse: social media platforms are understandably reluctant to improve transparency without clear regulatory protection, while regulators struggle to craft and operationalise policies without access to the technical insights and capabilities held by the platforms. Most Southeast Asian countries are also “tech-takers”, rather than “tech-makers”, which significantly reduces their leverage and ability to effectively regulate tech platforms in the overall scheme of things.

Nonetheless, the central point stands: normative agreements and ad hoc programmes between stakeholders alone are not enough. They need to be translated—not just into binding regulations—but also into the binary codes that make up the technical system of content moderation. Multi-stakeholder cooperation needs operationalisation as much as it needs institutionalisation.


This is an adapted version of ISEAS Perspective 2025/55 published on 4 August 2025. The paper and its references can be accessed at this link.

Beltsazar Krisetya is Visiting Fellow at the Media, Technology, and Society (MTS) Programme, ISEAS-Yusof Ishak Institute; PhD Student in Science, Technology, Engineering, and Public Policy (STEaPP) at University College London (UCL); and Researcher (on study leave) at the Centre for Strategic and International Studies (CSIS).