For data lakes and unstructured data, we use Object Storage solutions when possible and HDFS when required by downstream processing.
For relational data columnar data formats like ORC or parquet with a query engine like Presto or a managed solution like BigQuery work wonders, or PostgreSQL with columnar data storage.
For orchestration Apache Airflow on-premise and AWS lambda with event triggers or GCP Dataflow are our tools of choice.
The processing itself is handled by various tools from Spark for big data to Tensorflow for deep neural networks.
We work mainly in the Python and Linux ecosystems and have extensive experience with relevant tools.
We use a variety of tools available for business intelligence (BI) and data visualization, and the specific tools that are needed will depend on the characteristics of the data being analyzed and the specific goals of the BI project.
Data visualization software we used to create graphical representations of data in order to better understand and communicate insights including Tableau, Qlik, and Power BI.