How many characters do you need to know?

This really depends on what you mean by “know” and what you intend to do with your knowledge of Chinese characters.

Reading the Newspaper

There is a common factoid bandied about in the Chinese education world, that of the tens of thousands of characters out there, one only really needs to know about 2,000 (or 2,500 or 3,000) in order to read a newspaper. This is 废话 (fei4hua4 “garbage talk”). Firstly, what on earth do you want to read a Chinese newspaper for? Secondly, newspapers are hard. They are full of proper nouns, e.g. names of countries, world leaders, and companies, and either overly rely on abbreviations or get cute with wordplay. So, frankly unless “knowing” a character implies a deep knowledge of the morphemes it represents (i.e. a broad knowledge of the words it appears in and/or the ability to interpret it in new contexts), then any of the cited numbers of characters is going to be insufficient to read a Chinese newspaper with full comprehension.

Though one often encounters the cited factoid when someone is trying to sell you some newfangled way to learn a bunch of essential hanzi and master Chinese literacy, there is at least a logical basis to the myth.

Frequency Analysis

Firstly, frequency analysis of hanzi pretty consistently finds that a “smallish” set of hanzi cover a significant proportion of a given text or corpus. Chinese scholar Zhou Youguang compiled several character frequency studies to find that the first 1,000 most common characters have a 90% coverage, the first 2,400 most common characters have a 99% coverage, etc. (see table below)

# Characters10002400380052006600
% Coverage90%99%99.9%99.99%99.999%

So, let’s say you know those 2,400 characters and are reading some text. Well, on average 1 in 100 characters is going to be unfamiliar. That really does not sound so bad, but again, recognizing a character doesn’t do much good if you don’t know the word it is part of. By way of personal anecdote, though I have a pretty extensive vocabulary (both in terms of raw characters and words), I’m constantly running into things that have me want to check the dictionary.

So which exactly are the most frequent characters? It depends on the corpus of texts you look at. I’d recommend the frequency list put together by Jun Da, since the scope far exceeds any reasonable expectations of vocabulary, covering 9933 hanzi (there are a handful of characters that appear in their traditional form as well as some which aren’t currently part of modern Chinese).

Lists of Common Characters

Xing Hongbing lists 15 “Common Character Lists” ranging from 1928 to 1985 and 5 “General Purpose Character Lists” created between 1965 and 1987, the most recent of which make up a system of 2,500 + 1,000 + 3,500 characters that pretty much encompass everything you are likely to run into unless you have a penchant for archaeology. Though construction of the common character lists took frequency into consideration, much like Special English, there is an attempt to balance raw frequency against utility. For example, among the first 1,000 most frequent characters on Jun Da’s list, there are 7 characters not among the “《现代汉语常用字表》常用字(2500)” [Modern Chinese Commonly Used Characters]. Those seven hanzi (e.g. 尔、伊、谓、诺、伦、俄、洛) are not particularly obscure, but they aren’t exactly high priority characters (unless you are from Russia, in which case you need 俄 on the first day), reserved for the upper levels of HSK test preparation (if at all). Nevertheless, all seven of those characters are in the second list: “《现代汉语常用字表》次常用字(1000)” [Modern Chinese Secondary Commonly Used Characters].

The two sets of commonly used characters are completely contained within the “《现代汉语通用字表》(7000)” [Modern Chinese Characters for General Purposes]. This list is actually part of the centralized push to standardize characters in the People’s Republic of China. If you come across the booklet, there are about 10 pages dedicated to listing out common variants of characters that are no longer “permitted” in printing official documents.

Educational Lists

The old HSK came with a fairly comprehensive list of characters and words divided into four levels. The test prep materials listed out approximately 800, 800, 700, and 600 characters to master (2900 total). Unfortunately, the new HSK not only cut the vocabulary range down significantly, it completely scrapped the idea of a character list separate from the recommended vocabulary words.

Ministry of Education standards for compulsory education (e.g. primary and junior high school) specify that students should recognize 1,600 characters by 2nd grade, 2500 characters by 4th grade, 3,000 characters by 6th grade, and about 3,500 characters by 9th grade (requirements for writing from memory are lower). As for what those characters are, the MOE published its own list of 3,500 “Common Characters for Chinese Language Courses” also following a 2500/1000 character split. This list mainly overlaps with the Common Chinese Character list described above, but there are differences. Interestingly enough, the list highlights 300 characters out of the 2500 set which should be taught first, but otherwise there are no suggestions as to the order in which the characters should be taught.

Conclusion

I don’t know how many characters you should know, but there is always more to know. If you are curious about the overlaps between the various character lists, I have a handy excel sheet here.

See more resources


Sources:

周有光 (1992) 《中国语文纵横谈》,人民教育出版社。
[Zhou Youguang, 1992. Discussing the Length and Breadth of Chinese. Peoples Education Press.]
referenced via —
邢红兵(2007)《现代汉字特征分析与计算研究》,商务印书馆。
[Xing Hongbing, 2007, Characteristic Analysis and Computational Research of Chinese Characters. The Commercial Press.]