Abstract:Culture is the bedrock of human interaction; it dictates how we perceive and respond to everyday interactions. As the field of human-computer interaction grows via the rise of generative Large Language Models (LLMs), the cultural alignment of these models become an important field of study. This work, using the VSM13 International Survey and Hofstede's cultural dimensions, identifies the cultural alignment of popular LLMs (DeepSeek-V3, V3.1, GPT-5, GPT-4.1, GPT-4, Claude Opus 4, Llama 3.1, and Mistral Large). We then use cultural prompting, or using system prompts to shift the cultural alignment of a model to a desired country, to test the adaptability of these models to other cultures, namely China, France, India, Iran, Japan, and the United States. We find that the majority of the eight LLMs tested favor the United States when the culture is not specified, with varying results when prompted for other cultures. When using cultural prompting, seven of the eight models shifted closer to the expected culture. We find that models had trouble aligning with Japan and China, despite two of the models tested originating with the Chinese company DeepSeek.
Abstract:Culture is a core component of human-to-human interaction and plays a vital role in how we perceive and interact with others. Advancements in the effectiveness of Large Language Models (LLMs) in generating human-sounding text have greatly increased the amount of human-to-computer interaction. As this field grows, the cultural alignment of these human-like agents becomes an important field of study. Our work uses Hofstede's VSM13 international surveys to understand the cultural alignment of the following models: DeepSeek-V3, V3.1, GPT-4, GPT-4.1, GPT-4o, and GPT-5. We use a combination of prompt language and cultural prompting, a strategy that uses a system prompt to shift a model's alignment to reflect a specific country, to align these LLMs with the United States and China. Our results show that DeepSeek-V3, V3.1, and OpenAI's GPT-5 exhibit a close alignment with the survey responses of the United States and do not achieve a strong or soft alignment with China, even when using cultural prompts or changing the prompt language. We also find that GPT-4 exhibits an alignment closer to China when prompted in English, but cultural prompting is effective in shifting this alignment closer to the United States. Other low-cost models, GPT-4o and GPT-4.1, respond to the prompt language used (i.e., English or Simplified Chinese) and cultural prompting strategies to create acceptable alignments with both the United States and China.