πΎ Archived View for republic.circumlunar.space βΊ users βΊ johngodlee βΊ posts βΊ 2021-03-01-azure.gmi captured on 2024-07-09 at 00:51:45. Gemini links have been rewritten to link to archived content
β¬ οΈ Previous capture (2023-04-19)
-=-=-=-=-=-=-
DATE: 2021-03-01
AUTHOR: John L. Godlee
A colleague was having trouble constructing an API call in R to call the Microsoft Azure Translator. They had lots of household survey responses in Portuguese that they wanted to translate to English for analysis. There are some examples[1] on the Microsoft Azure documentation of how to call the API using C#, Go, Java, Node.js and Python, but nothing for R. There are some R packages for using Azure Translator that already exist:
1: https://docs.microsoft.com/en-us/azure/cognitive-services/translator/quickstart-translator
but as Azure Translator uses a conventional RESTful API, it's also possible just to use the {httr} package[5].
2: https://github.com/ChristopherLucas/translateR
3: https://github.com/Azure/AzureCognitive
4: https://github.com/chainsawriot/mstranslator
5: https://github.com/r-lib/httr
Using Azure Translator requires setting up an Azure account in order to access an API key. More documentation on that here[6]. As of 2021-02-30 there is a free tier for Azure Translator which offers up to 2 million characters of translation for free per month, with a few other features.
6: https://docs.microsoft.com/en-us/azure/cognitive-services/translator/quickstart-translator
In R, first, load packages and create some text to translate, two sentences, in English and Portuguese:
# Packages library(httr) # Create some example Portuguese (used Google Translate) engl <- "This is some test text. The big train had black smoke." port <- "Este é um texto de teste. O grande trem tinha fumaça preta." engl2 <- "Cardboard boxes are easy to flatten" port2 <- "Caixas de papelão são fÑceis de achatar"
Then, define keys, endpoints and parameters for the API call. The key, endpoint and location can be retrieved from your Azure portal.
key <- "XXX" endp <- "https://api.cognitive.microsofttranslator.com" location <- "global" path <- "translate" apiv <- "3.0" to_lang <- "en"
Create the headers:
heads <- c( "Ocp-Apim-Subscription-Key" = key, "Ocp-Apim-Subscription-Region" = location, "Content-type" = "application/json" )
This is the bit that took me a bit of trial and error to figure out, using nested lists to create a JSON-like query that can then be converted to JSON for the API query. The Azure documentation states that API queries should follow this structure:
[ { "Text" : "Hello, what is your name?" }, { "Text" : "My name is John" } ]
So in R, thats a list, containing two other named lists (named "Text"), each containing a single character string, the string to translate. In R:
input <- list(port, port2) input_list <- lapply(input, function(x) { list("Text" = x) })
Construct the query using httr::POST():
result <- POST( endp, path = path, query = list( `api-version` = apiv, to = to_lang ), body = input_list, encode = "json", add_headers(.headers = heads) )
The result is returned as a JSON string, so R needs to parse it to return a similarly nested list structure:
[ { "detectedLanguage" : { "language" : "pt", "score" : 1.0 }, "translations" : [ { "text" : "This is a test text. The big train had black smoke.", "to" : "en" } ] }, { "detectedLanguage" : { "language" : "pt", "score" : 1.0 }, "translations" : [ { "text" : "Cardboard boxes are easy to flatten", "to" : "en" } ] } ]
result_parse <- content(result, as = "parsed")
[1] [1] [1] 1 [1] 1 [1] [1]]$translations[[1] [1]]$translations[[1] 1 [1]]$translations[[1] 1 [2] [2] [2] 1 [2] 1 [2] [2]]$translations[[1] [2]]$translations[[1] 1 [2]]$translations[[1] 1
Then it's trivial to convert it to whatever data structure you want, in my case I want a dataframe:
result_df <- do.call(rbind, lapply(result_parse, function(x) { data.frame( from_lang_det = x$detectedLanguage$language, from_lang_score = x$detectedLanguage$score, to_lang = x$translations[[1]]$to, trans = x$translations[[1]]$text ) }))
βββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββ¬ββββββββββββββββββββββββββ β from_lang_det β from_lang_score β to_lang β trans β βββββββββββββββββͺββββββββββββββββββͺββββββββββͺββββββββββββββββββββββββββ‘ β pt β 1 β en β This is a test text ... β βββββββββββββββββΌββββββββββββββββββΌββββββββββΌββββββββββββββββββββββββββ€ β pt β 1 β en β Cardboard boxes are ... β βββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββ΄ββββββββββββββββββββββββββ