ced_enc_detect.Rd
Detect charset encoding of the character or raw vector.
ced_enc_detect(x, enc_hint = NULL, lang_hint = NULL)
x | Raw or character vector. |
---|---|
enc_hint | Character vector with encoding hint. |
lang_hint | Character vector with langauge code hint. |
Character vector with suggested encodings.
# detect character vector with ASCII strings ascii <- "I can eat glass and it doesn't hurt me." ced_enc_detect(ascii)#> [1] "US-ASCII"#> [1] "US-ASCII"#> [1] "下午好"ced_enc_detect(utf8)#> [1] "UTF-8"#> [1] "UTF-8"# path to examples ex_path <- system.file("test.txt", package = "ced") ex_txt <- read.dcf(ex_path, all = TRUE) # russian text print(ex_txt[["France"]])#> NULLced_enc_detect(ex_txt[["Russian"]])#> [1] "UTF-8"#> [1] "IBM866"#> [1] "windows-1251"#> [1] "KOI8-R"#> [1] "我能吞下玻璃而不伤身体。"ced_enc_detect(ex_txt[["Chinese"]])#> [1] "UTF-8"#> [1] "GB2312"#> [1] "나는 유리를 먹을 수 있어요. 그래도 아프지 않아요"ced_enc_detect(ex_txt[["Korean"]])#> [1] "UTF-8"#> [1] "EUC-KR"#> [1] "ISO-2022-KR"#> [1] "私はガラスを食べられます。それは私を傷つけません。"ced_enc_detect(ex_txt[["Japanese"]])#> [1] "UTF-8"#> [1] "Shift_JIS"#> [1] "ISO-2022-JP"# \donttest{ # detect encoding of the web pages content if (require("curl")) { detect_enc_url <- function(u) ced_enc_detect(curl_fetch_memory(u)$content) detect_enc_url("https://www.corriere.it") detect_enc_url("https://www.vk.com") detect_enc_url("https://www.qq.com") detect_enc_url("https://kakaku.com") detect_enc_url("https://etoland.co.kr") }#>#> [1] "EUC-KR"# }