Table 1
Linguistic features of the text collection (‘Lang.’ is language, ‘NP’ is noun phrases, ‘MultiWD’ is multiwords, ‘Sent.’ is sentences, ‘NE’ is named entities, ‘Hanzi’ is Chinese characters.
| LANG. | TOKEN | NP | MULTIWD | PARAG.S | SENT. | NE | HANZI |
|---|---|---|---|---|---|---|---|
| English | 2,598,309 | 1,672,577 | 2,376,424 | 272,756 | 597,372 | 1,190,682 | 0 |
| Chinese | 7,480,139 | 1,491,790 | 3,466,453 | 258,213 | 572,185 | 1,268,674 | 21,679,815 |
Table 2
20 most frequently used financial terms.
| Capital | 9383 | Net Worth | 195 |
| Asset | 3086 | Liability | 141 |
| Liquidity | 1704 | Business Plan | 126 |
| Interest Rate | 1036 | Fixed Asset | 101 |
| Bankruptcy | 616 | Debt Financing | 97 |
| Balance Sheet | 522 | Working Capital | 83 |
| Principle | 382 | Financial Statements | 72 |
| Collateral | 371 | Equity Financing | 64 |
| Depreciation | 368 | Line of Credit | 46 |
| Cash Flow | 209 | Appraisal | 42 |
