R bindings for
uchardet library, that is the encoding detector library of Mozilla. It takes a sequence of bytes in an unknown character encoding without any additional information, and attempts to determine the encoding of the text and returns encoding names in the iconv-compatible format.
To install the package from the CRAN run the following command:
Also you could install the dev-version with the
install_gitlab() function from the
This package contains the compiled code, therefore you have to use the Rtools to install it on Windows.
Installation from source requires
uchardet library and headers. On Linux or OSX the configure script try to find it with
pkg-config or system include/library paths. You can define include and library paths with
UCHARDET_LIBS configure variables.
uchardet system library is not found it will be compiled from source. You can force the compilation of the builtin library with the
--with-builtin-uchardet configure argument.
# load packages library(uchardet) # detect string encoding ascii <- "Hello, useR!" print(ascii) #>  "Hello, useR!" detect_str_enc(ascii) #>  "ASCII" utf8 <- "\u4e0b\u5348\u597d" print(utf8) #>  "下午好" detect_str_enc(utf8) #>  "UTF-8" # detect raw vector encoding detect_raw_enc(charToRaw(ascii)) #>  "ASCII" detect_raw_enc(charToRaw(utf8)) #>  "UTF-8" # detect file encoding ascii_file <- tempfile() writeLines(ascii, ascii_file) detect_file_enc(ascii_file) #> /tmp/Rtmp9uhkaX/file131a598ce4d4 #> "ASCII" utf8_file <- tempfile() writeLines(utf8, utf8_file) detect_file_enc(utf8_file) #> /tmp/Rtmp9uhkaX/file131a4c730707 #> "UTF-8" # detect URL contents encoding detect_url_enc(c("https://www.w3.org/", "https://zh.wikipedia.org/")) #> https://www.w3.org/ https://zh.wikipedia.org/ #> "ASCII" "UTF-8"
Use the following command to go to the page for bug report submissions:
Before reporting a bug or submitting an issue, please do the following:
news(package = "uchardet", Version == packageVersion("uchardet"))command;
uchardetpackage, not from other packages;
Please attach traceback() and sessionInfo() output to bug report. It may save a lot of time.
uchardet package is distributed under GPLv2 license.