Issue
I am reading HTTP response from a Perl page in a Servlet like this:
public String getHTML(String urlToRead) {
URL url;
HttpURLConnection conn;
BufferedReader rd;
String line;
String result = "";
try {
url = new URL(urlToRead);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept-Charset", "UTF-8");
conn.setRequestProperty("Content-Type", "text/xml; charset=UTF-8");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
while ((line = rd.readLine()) != null) {
byte [] b = line.getBytes();
result += new String(b, "UTF-8");
}
rd.close();
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
I am displaying this result with this code:
response.setContentType("text/plain; charset=UTF-8");
PrintWriter out = new PrintWriter(new OutputStreamWriter(response.getOutputStream(), "UTF-8"), true);
try {
String query = request.getParameter("query");
String type = request.getParameter("type");
String res = getHTML(url);
out.write(res);
} finally {
out.close();
}
But the response still is not encoded as UTF-8. What am I doing wrong?
Thanks in advance.
Solution
That call to line.getBytes()
looks suspicious. You should probably make it line.getBytes("UTF-8")
if you are certain that what is returned is UTF-8 encoded. Additionally, I'm not sure why it is even necessary. A typical approach to getting data out of a BufferedReader
is to use a StringBuilder
to continue appending each String
retrieved from readLine
into a result. The conversion back and forth between String
and byte[]
is unnecessary.
Change result
into a StringBuilder
and do this:
while ((line = rd.readLine()) != null) {
result.append(line);
}
Answered By - laz
Answer Checked By - Dawn Plyler (JavaFixing Volunteer)