Rein-IT

Methods of searching for information, methods and approaches.
About 10-15
years ago, the main task of any university was not to teach future specialists
specific skills, as approaches and work methods are constantly changing, but to
teach skills in processing and searching for information among a vast amount of
data. This approach remains true in our time, but the amount of information
clutter has exponentially increased.
Below, I
will describe my approaches to information retrieval.
Google
In most
cases, Google searches in the language in which you made the query.
Google
provides information based on your past queries, sorting it by presumed
relevance to you. Use Incognito mode.
Google
takes your location into account when searching, at least your IP address. Use
proxy servers.
When
searching, enter not a question but a part of the expected answer.
Google can
inflect words by case and use different verb forms. There is no need to
manually rephrase your query.
The case
(lowercase or uppercase) of your query does not matter.
Google
interprets spaces between words as a logical "AND." This means you
will get search results not only for the exact phrase but also for each word in
the phrase. To narrow down the search, enclose phrases you want to find in
quotes. You can replace uncertain words with an asterisk (*).
Now, more details:
Google Dorks
- Combine
Google Dorks for more precise queries.
- Google
Dorks library for learning: [link to exploit-db.com]
- Google
indexes the following file types: [link]
Below is an example of information retrieval using popular Google Dorks.
Suppose we
want to gather email addresses or find any information about suppliers for
universities in Costa Rica. We know that most universities use the ac.cr
domain.
To search
across all domains, the search query would be:
site:*.ac.cr
At the same
time, we understand that nowadays, more structured information is commonly
stored in tables. Therefore, we will add the file type "XLSX" to our
query and the keywords "e-mail," "email," and include
"correo electrónico" in Spanish, and "NOMBRE PROVEEDOR" in
Spanish as well.
As a result,
we will get the following query:
site:*.ac.cr filetype:xlsx email e-mail 'correo electrónico'
'NOMBRE PROVEEDOR'
If we want
to find up-to-date personal information for the current year, for example, a
query like
site:*.ac.cr filetype:xlsx Cédula 2023
If some links do not open, simply click on the three dots next to the search result and select "View cache" from Google's saved scan history.
To exclude certain queries from search results, prepend the minus sign before the query or before the lower-level domain name.
For example: site:*.ac.cr
filetype:xlsx Cédula -Formulario -Hoja1 -ucr.ac.cr
To limit results to the years 2022-2023:
site:*.ac.cr filetype:xlsx Cédula
'2022..2023' -Formulario -Hoja1 -ucr.ac.cr
You can also use specialized search types such as:
- [link tointelx.io]: helps with crafting
search queries and has its own data archive. With a paid subscription, access
to searching the dark web is available.
- [link todedigger.com]: search for files and documents on Google
Drive. The search is conducted within resources that are publicly available.
It often happens that we find many documents, but they are in different formats, including scans and images, and there is so much information that a simple (Ctrl+F) search does not yield results and leads to procrastination.
In that case, I recommend using the service from GOOGLE:
To start,
create a workspace, add all the necessary documents, and wait for them to load
and process.
Advantages:
- Large files
are split into smaller parts.
- Text image
is converted to text.
- Audio file
is also converted to text.
- Video file
is also converted to text.
This system
can search for synonyms, decipher abbreviations, and most importantly, search
across all the loaded documents. You can go to a specific document and search
within it only. You can copy, highlight relevant information with colors, and
create direct links to specific documents.
To analyze word frequency, spam evaluation, and other text statistics, you can use the following online tools: