Methods of searching for information, methods and approaches.
About 10-15 years ago, the main task of any university was not to teach future specialists specific skills, as approaches and work methods are constantly changing, but to teach skills in processing and searching for information among a vast amount of data. This approach remains true in our time, but the amount of information clutter has exponentially increased.
Below, I will describe my approaches to information retrieval.
In most cases, Google searches in the language in which you made the query.
Google provides information based on your past queries, sorting it by presumed relevance to you. Use Incognito mode.
Google takes your location into account when searching, at least your IP address. Use proxy servers.
When searching, enter not a question but a part of the expected answer.
Google can inflect words by case and use different verb forms. There is no need to manually rephrase your query.
The case (lowercase or uppercase) of your query does not matter.
Google interprets spaces between words as a logical "AND." This means you will get search results not only for the exact phrase but also for each word in the phrase. To narrow down the search, enclose phrases you want to find in quotes. You can replace uncertain words with an asterisk (*).
Now, more details:
- Combine Google Dorks for more precise queries.
- Google Dorks library for learning: [link to exploit-db.com]
- Google indexes the following file types: [link]
Below is an example of information retrieval using popular Google Dorks.
Suppose we want to gather email addresses or find any information about suppliers for universities in Costa Rica. We know that most universities use the ac.cr domain.
To search across all domains, the search query would be:
At the same time, we understand that nowadays, more structured information is commonly stored in tables. Therefore, we will add the file type "XLSX" to our query and the keywords "e-mail," "email," and include "correo electrónico" in Spanish, and "NOMBRE PROVEEDOR" in Spanish as well.
As a result, we will get the following query:
site:*.ac.cr filetype:xlsx email e-mail 'correo electrónico' 'NOMBRE PROVEEDOR'
If we want to find up-to-date personal information for the current year, for example, a query like
site:*.ac.cr filetype:xlsx Cédula 2023
If some links do not open, simply click on the three dots next to the search result and select "View cache" from Google's saved scan history.
To exclude certain queries from search results, prepend the minus sign before the query or before the lower-level domain name.
For example: site:*.ac.cr filetype:xlsx Cédula -Formulario -Hoja1 -ucr.ac.cr
To limit results to the years 2022-2023:
site:*.ac.cr filetype:xlsx Cédula '2022..2023' -Formulario -Hoja1 -ucr.ac.cr
You can also use specialized search types such as:
- [link tointelx.io]: helps with crafting search queries and has its own data archive. With a paid subscription, access to searching the dark web is available.
- [link todedigger.com]: search for files and documents on Google Drive. The search is conducted within resources that are publicly available.
It often happens that we find many documents, but they are in different formats, including scans and images, and there is so much information that a simple (Ctrl+F) search does not yield results and leads to procrastination.
In that case, I recommend using the service from GOOGLE:
To start, create a workspace, add all the necessary documents, and wait for them to load and process.
- Large files are split into smaller parts.
- Text image is converted to text.
- Audio file is also converted to text.
- Video file is also converted to text.
This system can search for synonyms, decipher abbreviations, and most importantly, search across all the loaded documents. You can go to a specific document and search within it only. You can copy, highlight relevant information with colors, and create direct links to specific documents.
To analyze word frequency, spam evaluation, and other text statistics, you can use the following online tools: