SaFi Bank Space : SaFi Tech Debts Summary Draft

Now all the tech debts have been migrated to a kanban board, please see herehttps://safibank.atlassian.net/jira/software/projects/STD/boards/43

Please list down tech debts you discovered in the project so we can make sure they are on track.

Tech Debt Name

Descriptions

Questioner

Priority

Status

Labels

Related Cards / tickets

Owner

Decision

Controller should run on IO threads

All blocking methods in controllers (or the controller itself) have to be marked to be run on IO thread.And currently not all of them are following this practice

HIGH

Close

BACKEND COMMONS

Back Office: SM-1982 - annotate audit-log-manager controller methods with @ExecuteOn Done

Card: SM-3260 - All MS controller should use @ExecuteOn(TaskExecutors.IO) Done

Ktlint: SM-2002

Péter Cseh (Unlicensed)

need euronet-gateway, loan-manager to fix the klint rule in advance

Need to setup githooks for our repo

We need to setup pre-push githooks to our repo, so we can make sure every time we pushed the code, all the test is passed.

MEDIUM

Deleted

COMMONS DEVOPS

Daily or weekly testing (on-hold, might be included in our testing practices)

In addition to unit testing, we also need daily or weekly testing to run, to ensure the functionalities are not broken.

The test can be called functional test or integration test.

ON-HOLD

Onhold

TEST

Ask QA to confirm if it done

Define Unified Error Code and Error Message for Restful API Endpoints

When an error occurs due to invalid parameter values or something like that, we just throw the default micronaut errors to the caller. These messages are not that readable and user friendly. We propose that we define an error structure like this when we return HTTP 400 to the callers.

{

“code“: “SAFI_INVALID_PARAMETER_VALUE“,

“offendingFieldName“: “customerId“,

“message“: “'customerId' cannot be blank“

}

We can discuss the detailed structure.

HIGH

Blocked

BACKEND

SM-3262 - All Cards MS to use unified error code and message Blocked Task

Jan Görig (Unlicensed)

Proposal: Global Error Handling

ardhi told me that he was told not to work on it since no agreement is reached yet.

That means can PIC of that arrange a meet and make sure TLs agree on it

Inconsistency of HTTP response status code when successfully create a resource with POST

When we implement a Restful endpoint to create a resource, some modules return 200, and some return 201 on success, we should be consistent on this

LOW

Open

BACKEND

Yuetong Yang (Unlicensed)

write a tutorial about rest api

Configurable Kafka topic names

Hardcoded kafka topic names like @Topic("audit.log.message.temp") should be replaced with configurable application.yml property items.

MEDIUM

Open

BACKEND

zhenghong.li (Unlicensed)

Global Exception Handler

Exceptions should be handled in a centralized place, and normal business logics just throw exceptions if necessary. We can mimic SpringBoot @ControllerAdvance

LOW

Open

BACKEND

Yuetong Yang (Unlicensed)

Different ways to launch dependent docker containers when running unit test cases

Some containers are launched from docker-composer file using DockerComposeContainer, some are started by using component-specific Container implementation, e.g, KafkaContainer. Here is the code snippet

@Containervar temporal = DockerComposeContainer(File("src/test/resources/docker-compose-test.yml"))
    .waitingFor("temporal_1", Wait.forLogMessage(".*Default namespace default registration complete.*\\n", 1))
    .withLocalCompose(false)

@Containerval kafka: KafkaContainer =
    KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:6.2.1"))
        .withEmbeddedZookeeper()
        .waitingFor(Wait.forListeningPort())

Maybe be we should stick to one, and make this part a common lib

LOW

Closed

BACKEND

we have same setting on common squad

Setup integration test & E2E test strategy for CI/CD

For now we always run itest along with unit test before deploy on CI. We need to update this approach to make sure we run itest & E2E test after the deployment.

MEDIUM

Open

DEVOPS

kai.hu (Unlicensed) Yuetong Yang (Unlicensed)

More Tests to cover corner cases

Need to write more tests to verify or validate negative/positive user’s input

MEDIUM

Open

DEVOPS

zhenghong.li (Unlicensed)

Add more logs

Log at every key position of code path and provide details, so we can find any root cause of an issue by logs in prod env.

LOW

Deleted

DEVOPS

Add business level metrics for monitoring purpose

By business level metrics, we mean such things like number of requests per Restful API endpoint, number of logged-in customers per day, number of transactions per minute and total transactions so far. These metrics will help us better understand our system and provide facts for troubleshooting and analysis. We can expose such metrics to Prometheus and then create Grafana dashboards.

LOW

Open

DEVOPS

zhenghong.li (Unlicensed)

User def44

Backend logics should not ignore exception paths and only care about the happy paths

As what we can see from the existing code base, exception paths have been ignored.

Deleted

DEVOPS

RESTful API versioning

Current RESTful API endpoints are not versioned, so when we upgrade (either API signature or implementation), the customer’s app is probably not working fine with the new version of API

HIGH

Closed

DEVOPS

In addition, need to consider how to move customer’s app smoothly to new version of APIs.

2 possible ways:

  1. declare version in URI path

  2. wrap version in HTTP header, the downside is if we use grpc in future, it’s not workable anymore.

Back Office: SM-2803 - Document Restful API endpoint versioning strategy Done

SM-3406 - Design REST API versioning approach Resolved

Juraj Macháč (Unlicensed)

REST API versioning - major version upgrades

REST API versioning approach

Code Quality&Security strategy

we need to setup Sonarqube in backend services, Add Jacoco plugin for test coverage purpose, and later we can set such things as coverage threshold, and if the given coverage is below that threshold, there would be a build error.

HIGH

Open

DEVOPS

Back Office: SM-2802 - Add audit-log-manager to SonarQube for vulnerability scanning Done (Note that the SonarQube URL is http://34.124.144.240:8080/, and you can contact Andre Laksmana to make you an admin there first)

Yuetong Yang (Unlicensed)

Detail Design

Now we only have the product function description and workflow in confluence pages, but we still need to write the detail design documents for each modules of every squad.

Deleted

DEVOPS

Make sure configuration files are excluded in the generated api client libs and hand written client libs

we need to make sure both the generated api clients and hand written clients do not have configuration files included

MEDIUM

Open

DEVOPS

zhenghong.li (Unlicensed)

Unified Micronaut version

Some module use 3.4.1 and other ones use 3.4.2 or 3.5.2, moreover, 3.4.1 has some bug.

MEDIUM

Closed

DEVOPS

now the version updated to 3.7.0, but not force every squad to upgrade.

Unified Time

We should use the same time zone for all of our applications. UTC or UTC + 8?

LOW

Closed

DEVOPS

Timestamp representation

Unified BigDecimal decimal number and round up policy

Use 2 decimal number and round up policy for money(currency like JPY doesn’t need ), and 4 decimal number and round up policy for percentage

Deleted

DEVOPS

Using Cache to store hotspot data

Use Redis or some other memory based database to store hotspot data, like user session informations, dictionaries and so on. It’s can make better performance, especially for the scene that read is way more than update.

Deleted

DEVOPS

TMCoreClient need to add more functions in common project

Functions in TMCoreClient is not enough,we need add more

Deleted

DEVOPS

Add /info endpoint

this endpoint should return the following information

  • build timestamp in the format of yyyy-MM-dd HH:mm:ss(UTC)

  • git branch name

  • git commit id

  • release version

Necessary for developers to troubleshoot bugs

LOW

Closed

DEVOPS

Back Office: SM-2800 - create endpoint /info for back office modules Done

Add jacoco

Add this plugin for test coverage purpose, and later we can set such things as coverage threshold, and if the given coverage is below that threshold, there would be a build error.

HIGN

Deleted

DEVOPS

Back Office: SM-2801 - Add jacoco to back office modules for test coverage purpose Done

Yuetong Yang (Unlicensed)

Merged with other topic

Implementation of Dead Letter Queues

Error handling for streaming services.

MEDIUM

Closed

DEVOPS

SM-3407 - Define rules for DLT management Resolved

Juraj Macháč (Unlicensed)

DLQ Management & Error Handling in Kafka

Add correlation/trace id in logs

We should enable the use of correlation/trace ids within logs so that it is easier to find related logs even across different services

MEDIUM

Deleted

DEVOPS

merged with other topic

Strategy for externally exposed endpoints

We need a way to secure endpoints that we will expose to external merchants.

HIGH

Open

DEVOPS

Norbert Bérci (Unlicensed)

Yuetong Yang (Unlicensed)

Exposed Endpoints & Tyk Mappings

Traffic management

  1. Circuit breaker

  2. fallback handling

  3. retry

We want our system to be durable, but currently we have no mechanism for retrying calls when a network error occurs.

MEDIUM

Open

DEVOPS

SM-3411 - Traffic management strategy To Do

Juraj Macháč (Unlicensed)

Logging Strategy

we need have a logging guidance / strategy for us to do logging in all services

MEDIUM

Open

DEVOPS

https://safibank.atlassian.net/jira/software/c/projects/SM/boards/21?modal=detail&selectedIssue=SM-3692

Yuetong Yang (Unlicensed)

Tracing

We need to define a mechanism to tracking and record the requests flow through our all of our services

MEDIUM

Open

DEVOPS

SM-3095 - Add http calls into Google Firebase Analytics and its x-request-id uuid header Done

Observability in Micronaut

Andre Laksmana (Unlicensed)

Yuetong Yang (Unlicensed)

Http2 Support?

How we are going to support Http 2 to have better performance

LOW

Open

DEVOPS

Yuetong Yang (Unlicensed)

In memory cache support

if we need to cache some thing in memory , we might need to use Micronaut cache support

LOW

Open

DEVOPS

Yuetong Yang (Unlicensed)

Rate limit for external API

Do we have a plan or strategy to do rate limit for our endpoints ?

LOW

Open

DEVOPS

Yuetong Yang (Unlicensed)

Service Monitoring / Alert Mechanism

Alerting mechanism if any of our services went down based on the APM chosen. For New Relic we have the synthetic rules to monitor service status at an interval

LOW

Close

DEVOPS

SMS failover policy

Need to have a failover policy for SMS

MEDIUM

Closed

DEVOPS

zhenghong.li (Unlicensed)

Avro amount/date field type Change

Need to unify “ bytes“ and “string“ type and date type

MEDIUM

Open

DEVOPS

Yuetong Yang (Unlicensed)

need guideline

Restriction of the endpoints on the api gateway

We need to make a configuration page to make sure our micro-services are exposed by the right way

MEDIUM

Open

DEVOPS

merged with other topic

Compilation failure due to using the MapStruct

Transaction-Processor-Manager using the mapStruct to do the object conversion. However, if the source object is provided by another service(like transactionStatus), the compilation would fail after a small change was make to the source object, i.e. adding a value to a particular enum attribute

Jevan Wu (Unlicensed)

LOW

Open

BACKEND

User ce01f

Management of endpoints exposed to the Internet

  • we have a page to keep track of exposed endpoints and their Tyk mappings: Exposed Endpoints & Tyk Mappings, add yours if there are any

  • for callback endpoints exposed to 3rd party platforms, please take authentication into consideration

MEDIUM

Open

BACKEND

merged with other topic

Macrokiosk

This SMS vendor is poor in terms of delivery velocity and API security and usage, why don’t we choose such vendors as Twilio

zhenghong.li (Unlicensed)

MEDIUM

Open

BACKEND

Code snippet

Workflow.await { identityVerified }

For Temporal workflows, we often see such snippet in our current code base. Timed wait would be better and extra logics are needed to handle that

zhenghong.li (Unlicensed)

MEDIUM

Open

BACKEND

@Hui.zhu

Data Sync Strategy

Currently we are doing application level code to sync data from database to Kafka. It’s error prone and may cause data inconsistency.

Yuetong Yang (Unlicensed)

MEDIUM

Open

BACKEND

Yuetong Yang (Unlicensed)

distributed transaction

like account frozen, there should be one transaction in several service

User bd920

hui.zhu (Unlicensed)

chaos testing

we may need chaos testing for our system

Gavin Zhang (Unlicensed)

BACKEND

Gavin Zhang (Unlicensed)

we need a customer onboarding feature to collect customers' favorite methods/channels (SMS, Email, Viber) of receiving messages

Currently, we are able to send messages/notifications to customers via SMS, Email and Viber. But we don’t know what method/channel a customer prefer, so we may need such a feature.

zhenghong.li (Unlicensed)

MEDIUM

Open

Product/Backend

zhenghong.li (Unlicensed)

We should build a mechanism to handle the business exceptions

Now we are implementing the happy path of all business logic, but if there are some exception occurs in the temporal workflow or other services with a business error(account blocked, inactive, etc.), we don’t have a mechanism to handle this.

Kobe Wang (Unlicensed)

MEDIUM

Open

BACKEND

Kobe Wang (Unlicensed)

Thought Machine Performance Test

We don’t know the real performance and architecture of Thought Machine core banking system. Now we are using a sandbox to handle all the request and we have a test that created 10,000 accounts, the process goes slow, we need to find out the reason. If our smart contract code or the settings or thought machine itself has performance issue?

Also how does the thought machine solve the python Global interpreter lock while doing the schedule jobs for millions of accounts?

Kobe Wang (Unlicensed)

MEDIUM

Open

BACKEND

User 3d5f3

Temporal metrics

3rd party vendors/platform access control & balance & credential management

We integrate several 3rd party platforms: Euronet, Paynamics, OneStop, Infobip, Macrokiosk and etc. We need a person/team/role/system to manage the accounts/credentials/access control to those platforms and API keys/access token/…; also, we need to setup prometheus metrics to monitor some of those accounts because we have to make sure that we have enough balances/money on some of those accounts.

zhenghong.li (Unlicensed)

MEDIUM

Open

MGMT

DevOps

User 3d5f3

database table Primary key

need we use UUID as primary key? Because the UUID is out of order, when the rows of table reach hundreds of millions, I assume the btree index on PK would re-build on page frequently when inserting new row. That would slow the inserting and cause IO busy.

Or do we have solution for primary key

https://www.postgresql.org/message-id/20151222124018.bee10b60b3d9b58d7b3a1839%40potentialtech.com

User bd920

User bd920

A Github action to make sure that new migration script versions of the current PR are greater than the latest one of main branch

Flyway has a well defined migration script naming convections here. A github action is required to make sure that the new migration script version of current PR are grreater than the latest one of main branch

zhenghong.li (Unlicensed)

MEDIUM

zhenghong.li (Unlicensed)

Localization process/workflow and tools are required.

We need a localization process/workflow and tools. Some recommended tools: Mozilla-pontoon, weblate, Locaklize.

zhenghong.li (Unlicensed)

zhenghong.li (Unlicensed)