我主要是 讀txt檔 把裡面的英文分開計算出現次數 再轉出一個txt檔
例如:I is apple
I 1
is 1
apple 1
讀取跟輸出txt檔都ok
就是在把文章分割字串有錯誤
本來想要判斷空白鍵做分割但是都沒成功
希望可以用成空白或是特殊符號都可以分割
請教各位大大我的程式是不是那裡有錯誤?
import java.io.*;
import java.util.*;
public class java2
{
public static void main (String[] args) throws IOException
{
String fileName = "test1.txt";
BufferedReader bufReader = new BufferedReader (new FileReader (fileName));
StreamTokenizer stToken = new StreamTokenizer (bufReader);
Map<String, Integer> mapWords = new HashMap<String, Integer> ();
Set mapSet = null;
Map.Entry[] mapEntries = null;
Integer numWords = null;
int tokenType = 0;
int i;
// 小寫(lower case)模式
stToken.lowerCaseMode (true);
// 除了a - z 及 A - Z, 其餘皆設為ordinary chars
stToken.ordinaryChars (0, 'A' - 1);
stToken.ordinaryChars ('Z' + 1, 'a' - 1);
stToken.ordinaryChars ('z' + 1, 255);
// 處理end of line
stToken.eolIsSignificant (true);
for(i=0;bufReader.ready ();i++){
String temp=stToken.nextToken().toString();
String wordtemp[]=temp.split("\\s+");
}
// 把所有單字及累計的字數存入HashMap裡
while (bufReader.ready ())
{
tokenType = stToken.nextToken ();
大大 我剛剛改完之後 執行出來的結果又變成了 純單字而已
原本的執行 是 有分解英文字詞
然後改取txt檔之後 又變成 只分解了a b c 之類的單字
我的test1.txt要讀的檔的內容如下
<papers>
<paper>
<title>A corporatecreditratingmodelusingmulti-classsupportvectormachines
with anordinalpairwisepartitioningapproach</title>
<authors>Kyoung-jae Kim a,HyunchulAhn</authors>
<journal>Computers & OperationsResearch</journal>
<year>2012</year>
<vol>39</vol>
<pages>1800-1811</pages>
<abstract>
Predicting corporate credit-rating using statistical and artificial intelligence (AI) techniques has received considerable research attention in the literature. In recent years, multi-class support vector machines (MSVMs) have become a very appealing machine-learning approach due to their good performance. Until now, researchers have proposed a variety of techniques for adapting support vector machines (SVMs) to multi-class classification, since SVMs were originally devised for binary classifica- tion. However, most of them have only focused on classifying samples into nominal categories; thus, the unique characteristic of credit-rating – ordinality – seldom has been considered in the proposed approaches. This study proposes a new type of MSVM classifier (named OMSVM) that is designed to extend the binary SVMs by applying an ordinal pairwise partitioning (OPP) strategy. Our model can efficiently and effectively handle multiple ordinal classes. To validate OMSVM, we applied it to a real-world case of bond rating. We compared the results of our model with those of conventional MSVM approaches and other AI techniques including MDA, MLOGIT, CBR, and ANNs. The results showed that our proposed model improves the performance of classification in comparison to other typical multi-class classification techniques and uses fewer computational resources.
</abstract>
<keywords>
Corporate credit rating Support vector machines Multi-class classification Ordinal pairwise partitioning
</keywords>
<content>作者: BlueMarken 時間: 2012-6-13 10:44 PM
只是我用執行完變成這樣
可以執行 是有寫2行錯誤
Note: java2.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
e 151
a 134
i 128
t 118
s 114
o 113
r 108
n 97
c 79
l 66
p 61
d 49
h 48
m 45
u 41
g 27
f 26
v 26
y 23
w 16
b 10
q 5
k 4
j 3
? 1
x 1 作者: ray215018 時間: 2012-6-14 10:53 PM
我就是把txt讀取 跟輸出的地方改過
但是跑出來的結果跟大大的不一樣
//package sliptstring;
import java.io.*;
import java.util.*;
import java.io.IOException;
import java.io.StreamTokenizer;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
/**
*
* @author marken
*/
public class SliptString {
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws IOException {
String fileName = "test2.txt";
BufferedReader svalue = new BufferedReader (new FileReader (fileName));
StreamTokenizer st = new StreamTokenizer (svalue);
st.lowerCaseMode(true);
//st.eolIsSignificant (true);
st.ordinaryChar('.');
Map<String, Integer> mapWords = new HashMap<>();
out:
while (true) {
int ttype = st.nextToken();
switch (ttype) {
case StreamTokenizer.TT_EOF:
break out; // 這一行才能正確的跳出 while 迴圈
case StreamTokenizer.TT_WORD:
// 取得Hashmap內累計的次數
Integer numWords = mapWords.get(st.sval);
numWords = (numWords == null ? 1 : ++numWords);
// 存回次數
mapWords.put(st.sval, numWords);
}
}
// 依出現的次數排序, 遞減排序,最多次的最前面
List<Map.Entry<String, Integer>> mapEntries = new ArrayList<>(mapWords.entrySet());