Java的HTML的URL字符编码转换为Java字符串的函数

By Minidxer | February 25, 2008

用C写过这个函数,想不到今天Java上也需要这样的函数,其实内容很简单,就是将被HTML的URL格式编码的字符串转换为普通的字符串。通常URL中采用的UTF8编码,熟悉计算机字符编码的朋友们应该一眼就明白函数的意思。



不废话了,代码如下:

  1. /**
  2. * Convert HTML character enitities(Unicode) to part of a Java String
  3. */
  4. import java.util.regex.*;
  5. public class UnicodeCeToJavaString {
  6. static final String mbs = "&#(\\d+);"; //like "ロ"
  7. public static String EncodeCesToChars(String paramStr){
  8. String mbChar;
  9. StringBuffer sb = new StringBuffer();
  10. Pattern pat = Pattern.compile(mbs);
  11. Matcher mat = pat.matcher(paramStr);
  12. while (mat.find()){
  13. mbChar = getMbCharStr(mat.group(1)); //pass the digit part
  14. mat.appendReplacement(sb, mbChar);
  15. }
  16. mat.appendTail(sb);
  17. return new String(sb);
  18. }
  19. /* worker method */
  20. static String getMbCharStr(String digits){ //handle "12525" part which is a
  21. char[] cha = new char[1];                //Unicode value stringnized
  22. try{
  23. int val = Integer.parseInt(digits);
  24. char ch = (char)val;
  25. cha[0] = ch;
  26. }
  27. catch(Exception e){
  28. System.err.println("Error from getMbCharStr:");
  29. e.printStackTrace(System.err);
  30. }
  31. return new String(cha); //easy!, because Java uses Unicode
  32. }
  33. }

Topics: 程序开发相关 | 2 Comments » | Tags: , , , , ,

你可能还对下列文章感兴趣:

2 comments | Add One

  1. Jean - 03/2/2009 at 2:09 pm

    i use this regular expression instead:

    mbs = “&#x([0123456789abcdef]+);”

    thank u for this sample.

Trackbacks

Leave a Comment

Name(*):

E-Mail(*) :

Website :

Comments :

Search Posts