Find and extract source text that must be translated.
Usage
find_source(
path = ".",
encoding = "UTF-8",
verbose = getOption("transltr.verbose", TRUE),
tr = translator(),
interface = NULL
)
find_source_in_files(
paths = character(),
encoding = "UTF-8",
verbose = getOption("transltr.verbose", TRUE),
algorithm = algorithms(),
interface = NULL
)
Arguments
- path
A non-empty and non-NA character string. A path to a directory containing R source scripts. All subdirectories are searched. Files that do not have a
.R
, or.Rprofile
extension are skipped.- encoding
A non-empty and non-NA character string. The source character encoding. In almost all cases, this should be UTF-8. Other encodings are internally re-encoded to UTF-8 for portability.
- verbose
A non-NA logical value. Should progress information be reported?
- tr
A
Translator
object.- interface
A
name
, acall
object, or aNULL
. A reference to an alternative (custom) function used to translate text. If acall
object is passed tointerface
, it must be to operator::
. Calls to methodTranslator$translate()
are ignored and calls tointerface
are extracted instead. See Details below.- paths
A character vector of non-empty and non-NA values. A set of paths to R source scripts that must be searched.
- algorithm
A non-empty and non-NA character string equal to
"sha1"
, or"utf8"
. The algorithm to use when hashing source information for identification purposes.
Value
find_source()
returns an R6
object of class
Translator
. If an existing Translator
object is passed to tr
, it is modified in place and returned.
find_source_in_files()
returns a list of Text
objects. It may
contain duplicated elements, depending on the extracted contents.
Details
find_source()
and find_source_in_files()
look for calls to method
Translator$translate()
in R scripts and convert them
to Text
objects. The former further sets these resulting
objects into a Translator
object. See argument tr
.
find_source()
and find_source_in_files()
work on a purely lexical basis.
The source code is parsed but never evaluated (aside from extracted literal
character vectors).
The underlying
Translator
object is never evaluated and does not need to exist (placeholders may be used in the source code).Only literal character vectors can be passed to arguments of method
Translator$translate()
.
Interfaces
In some cases, it may not be desirable to call method
Translator$translate()
directly. A custom function wrapping
(interfacing) this method may always be used as long as it has the same
signature as method
Translator$translate()
. In other words, it must minimally
have two formal arguments: ...
and source_lang
.
Custom interfaces must be passed to find_source()
and
find_source_in_files()
for extraction purposes. Since these functions work
on a lexical basis, interfaces can be placeholders in the source code (non-
existent bindings) at the time these functions are called. However, they must
be bound to a function (ultimately) calling Translator$translate()
at runtime.
Custom interfaces are passed to find_source()
and find_source_in_files()
as name
or call
objects in a variety of ways. The most
straightforward way is to use base::quote()
. See Examples below.
Methodology
find_source()
and find_source_in_files()
go through these steps to
extract source text from a single R script.
It is read with
text_read()
and re-encoded to UTF-8 if necessary.It is parsed with
parse()
and underlying tokens are extracted from parsed expressions withutils::getParseData()
.Each expression (
expr
) token is converted to language objects withstr2lang()
. Parsing errors and invalid expressions are silently skipped.Valid
call
objects stemming from step 3 are filtered withis_source()
.Calls to method
Translator$translate()
or tointerface
stemming from step 4 are coerced toText
objects withas_text()
.
These steps are repeated for each R script. find_source()
further merges
all resulting Text
objects into a coherent set with merge_texts()
(identical source code is merged into single Text
entities).
Extracted character vectors are always normalized for consistency (at step
5). See normalize()
for more information.
Limitations
The current version of transltr
can only handle literal
character vectors. This means it cannot resolve non-trivial expressions
that depends on a state. All values passed to argument ...
of method
Translator$translate()
must yield character vectors
(trivially).
Examples
# Create a directory containing dummy R scripts for illustration purposes.
temp_dir <- file.path(tempdir(TRUE), "find-source")
temp_files <- file.path(temp_dir, c("ex-script-1.R", "ex-script-2.R"))
dir.create(temp_dir, showWarnings = FALSE, recursive = TRUE)
cat(
"tr$translate('Hello, world!')",
"tr$translate('Farewell, world!')",
sep = "\n",
file = temp_files[[1L]])
cat(
"tr$translate('Hello, world!')",
"tr$translate('Farewell, world!')",
sep = "\n",
file = temp_files[[2L]])
# Extract calls to method Translator$translate().
find_source(temp_dir)
find_source_in_files(temp_files)
# Use custom functions.
# For illustrations purposes, assume the package
# exports an hypothetical translate() function.
cat(
"translate('Hello, world!')",
"transtlr::translate('Farewell, world!')",
sep = "\n",
file = temp_files[[1L]])
cat(
"translate('Hello, world!')",
"transltr::translate('Farewell, world!')",
sep = "\n",
file = temp_files[[2L]])
# Extract calls to translate() and transltr::translate().
# Since find_source() and find_source_in_files() work on
# a lexical basis, these are always considered to be two
# distinct functions. They also don't need to exist in the
# R session calling find_source() and find_source_in_files().
find_source(temp_dir, interface = quote(translate))
find_source_in_files(temp_files, interface = quote(transltr::translate))