AI Safety: aligning artificial intelligence with human values

Jessica Dai wrote an article for Reboot titled The Artificiality of Alignment: How are we actually “aligning AI with human values”? (19 August 2023).

« “AI existential risk” (abbreviated “x-risk”) »

« OpenAI’s ChatGPT, Anthropic’s Claude »

« It’s these capabilities that those in the “AI Safety” community are concerned … that AI systems will inevitably surpass human-level reasoning skills, beyond “artificial general intelligence” (AGI) to “superintelligence”; that their actions will outpace our ability to comprehend them; that their existence, in the pursuit of their goals, will diminish the value of ours. This transition, the safety community claims, may be rapid and sudden (“ꜰᴏᴏᴍ”). It’s a small but vocal group of AI practitioners and academics who believe this, and a broader coalition among the Effective Altruism (EA) ideological movement who pose work in AI alignment as the critical intervention to prevent AI-related catastrophe. »

« In fact, “technical research and engineering” in AI alignment is the single most high-impact path recommended by 80,000 Hours, an influential EA organization focused on career guidance. In a recent NYT interview, Nick Bostrom — author of Superintelligence and core intellectual architect of effective altruism — defines “alignment” as “ensur[ing] that these increasingly capable A.I. systems we build are aligned with what the people building them are seeking to achieve.” »

« So OpenAI and Anthropic might be trying to conduct research, push the technical envelope, and possibly even build superintelligence, but they’re undeniably also building products — products that carry liability, products that need to sell, products that need to be designed such that they claim and maintain market share. Regardless of how technically impressive, useful, or fun Claude and GPT-x are, they’re ultimately tools (products) with users (customers) who hope to use the tool to accomplish specific, likely-mundane tasks. »

« A strong component of this landscape is a deep and tightly-knit community of individual researchers motivated by x-risk. This community has developed an extensive vocabulary around theories of AI safety and alignment, many first introduced as detailed blog posts in forums like LessWrong and AI Alignment Forum. »

« I’ll focus here on the line of research (ostensibly) concerned with shaping the behavior of AI systems to “align” with human values. »

« my real concern here: the existence of financial incentives means that alignment work often turns into product development in disguise rather than actually making progress on mitigating long-term harms. »

« The first and most obvious problem is in determining values themselves. In other words, “which values”? And whose? »

« “HHH,” … “helpfulness, harmlessness, and honesty” »

« But as it turns out, OpenAI was lobbying for reduced regulation even as they publicly “advocated” for additional governmental involvement; on the other hand, extensive incumbent involvement in designing legislation is a clear path towards regulatory capture. Almost tautologically, OpenAI, Anthropic, and similar startups exist in order to dominate the marketplace of extremely powerful models in the future. »

« I struggle to reconcile the incongruity between the task of building a product that people will buy (under the short-term incentives of the market), and the task of preventing harm in the long term. »

« The emphasis on AI capabilities — the claim that “AI might kill us all if it becomes too powerful” — is a rhetorical sleight-of-hand that ignores all of the other if conditions embedded in that sentence: if we decide to outsource reasoning about consequential decisions — about policy, business strategy, or individual lives — to algorithms. If we decide to give AI systems direct access to resources, and the power and agency to affect the allocation of those resources — the power grid, utilities, computation. All of the AI x-risk scenarios involve a world where we have decided to abdicate responsibility to an algorithm. »

« In healthcare, algorithms could in theory improve clinician decisions, but the organizational structure that shapes AI deployment in practice is complex. »

« The newest models are truly remarkable, and alignment research explores genuinely fascinating technical problems. But if we really are concerned about AI-induced catastrophe, existential or otherwise, we can’t rely on those who stand to gain the most from a future of widespread AI deployments. »

AI Safety: aligning artificial intelligence with human values

Published by andreweverett360

Leave a comment Cancel reply

AI Safety: aligning artificial intelligence with human values

Share this:

Related

Published by andreweverett360

Leave a comment Cancel reply