SaFi Bank Space : SaFi Tech Debts Summary Draft

Now all the tech debts have been migrated to a kanban board, please see herehttps://safibank.atlassian.net/jira/software/projects/STD/boards/43

Please list down tech debts you discovered in the project so we can make sure they are on track.

Tech Debt Name	Descriptions	Questioner	Priority	Status	Labels	Related Cards / tickets	Owner	Decision
Controller should run on IO threads	All blocking methods in controllers (or the controller itself) have to be marked to be run on IO thread.And currently not all of them are following this practice		HIGH	Close	BACKEND COMMONS	Back Office: SM-1982 - annotate audit-log-manager controller methods with @ExecuteOn Done Card: SM-3260 - All MS controller should use @ExecuteOn(TaskExecutors.IO) Done Ktlint: SM-2002	Péter Cseh (Unlicensed)	need euronet-gateway, loan-manager to fix the klint rule in advance
~~Need to setup githooks for our repo~~	~~We need to setup pre-push githooks to our repo, so we can make sure every time we pushed the code, all the test is passed.~~		MEDIUM	~~Deleted~~	COMMONS DEVOPS
Daily or weekly testing (on-hold, might be included in our testing practices)	In addition to unit testing, we also need daily or weekly testing to run, to ensure the functionalities are not broken. The test can be called functional test or integration test.		ON-HOLD	Onhold	TEST			Ask QA to confirm if it done
Define Unified Error Code and Error Message for Restful API Endpoints	When an error occurs due to invalid parameter values or something like that, we just throw the default micronaut errors to the caller. These messages are not that readable and user friendly. We propose that we define an error structure like this when we return HTTP 400 to the callers. { “code“: “SAFI_INVALID_PARAMETER_VALUE“, “offendingFieldName“: “customerId“, “message“: “'customerId' cannot be blank“ } We can discuss the detailed structure.		HIGH	Blocked	BACKEND	SM-3262 - All Cards MS to use unified error code and message Blocked Task	Jan Görig (Unlicensed)	Proposal: Global Error Handling ardhi told me that he was told not to work on it since no agreement is reached yet. That means can PIC of that arrange a meet and make sure TLs agree on it
Inconsistency of HTTP response status code when successfully create a resource with POST	When we implement a Restful endpoint to create a resource, some modules return 200, and some return 201 on success, we should be consistent on this		LOW	Open	BACKEND		Yuetong Yang (Unlicensed)	write a tutorial about rest api
Configurable Kafka topic names	Hardcoded kafka topic names like `@Topic("audit.log.message.temp")` should be replaced with configurable application.yml property items.		MEDIUM	Open	BACKEND		zhenghong.li (Unlicensed)
Global Exception Handler	Exceptions should be handled in a centralized place, and normal business logics just throw exceptions if necessary. We can mimic SpringBoot @ControllerAdvance		LOW	Open	BACKEND		Yuetong Yang (Unlicensed)
Different ways to launch dependent docker containers when running unit test cases	Some containers are launched from docker-composer file using DockerComposeContainer, some are started by using component-specific Container implementation, e.g, KafkaContainer. Here is the code snippet @Containervar temporal = DockerComposeContainer(File("src/test/resources/docker-compose-test.yml")) .waitingFor("temporal_1", Wait.forLogMessage(".Default namespace default registration complete.\\n", 1)) .withLocalCompose(false) @Containerval kafka: KafkaContainer = KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:6.2.1")) .withEmbeddedZookeeper() .waitingFor(Wait.forListeningPort()) Maybe be we should stick to one, and make this part a common lib		LOW	Closed	BACKEND			we have same setting on common squad
Setup integration test & E2E test strategy for CI/CD	For now we always run itest along with unit test before deploy on CI. We need to update this approach to make sure we run itest & E2E test after the deployment.		MEDIUM	Open	DEVOPS		kai.hu (Unlicensed) Yuetong Yang (Unlicensed)
More Tests to cover corner cases	Need to write more tests to verify or validate negative/positive user’s input		MEDIUM	Open	DEVOPS		zhenghong.li (Unlicensed)
~~Add more logs~~	~~Log at every key position of code path and provide details, so we can find any root cause of an issue by logs in prod env.~~		LOW	~~Deleted~~	DEVOPS
Add business level metrics for monitoring purpose	By business level metrics, we mean such things like number of requests per Restful API endpoint, number of logged-in customers per day, number of transactions per minute and total transactions so far. These metrics will help us better understand our system and provide facts for troubleshooting and analysis. We can expose such metrics to Prometheus and then create Grafana dashboards.		LOW	Open	DEVOPS		zhenghong.li (Unlicensed) User def44
~~Backend logics should not ignore exception paths and only care about the happy paths~~	~~As what we can see from the existing code base, exception paths have been ignored.~~			~~Deleted~~	DEVOPS
RESTful API versioning	Current RESTful API endpoints are not versioned, so when we upgrade (either API signature or implementation), the customer’s app is probably not working fine with the new version of API		HIGH	Closed	DEVOPS	In addition, need to consider how to move customer’s app smoothly to new version of APIs. 2 possible ways: declare version in URI path wrap version in HTTP header, the downside is if we use grpc in future, it’s not workable anymore. Back Office: SM-2803 - Document Restful API endpoint versioning strategy Done SM-3406 - Design REST API versioning approach Resolved	Juraj Macháč (Unlicensed)	REST API versioning - major version upgrades REST API versioning approach
Code Quality&Security strategy	we need to setup Sonarqube in backend services， Add Jacoco plugin for test coverage purpose, and later we can set such things as coverage threshold, and if the given coverage is below that threshold, there would be a build error.		HIGH	Open	DEVOPS	Back Office: SM-2802 - Add audit-log-manager to SonarQube for vulnerability scanning Done (Note that the SonarQube URL is http://34.124.144.240:8080/, and you can contact Andre Laksmana to make you an admin there first)	Yuetong Yang (Unlicensed)
~~Detail Design~~	~~Now we only have the product function description and workflow in confluence pages, but we still need to write the detail design documents for each modules of every squad.~~			~~Deleted~~	DEVOPS
Make sure configuration files are excluded in the generated api client libs and hand written client libs	we need to make sure both the generated api clients and hand written clients do not have configuration files included		MEDIUM	Open	DEVOPS		zhenghong.li (Unlicensed)
Unified Micronaut version	Some module use 3.4.1 and other ones use 3.4.2 or 3.5.2, moreover, 3.4.1 has some bug.		MEDIUM	Closed	DEVOPS			now the version updated to 3.7.0, but not force every squad to upgrade.
Unified Time	We should use the same time zone for all of our applications. UTC or UTC + 8?		LOW	Closed	DEVOPS	Timestamp representation
~~Unified BigDecimal decimal number and round up policy~~	~~Use 2 decimal number and round up policy for money(currency like JPY doesn’t need ), and 4 decimal number and round up policy for percentage~~			~~Deleted~~	DEVOPS
~~Using Cache to store hotspot data~~	Use Redis or some other memory based database to store hotspot data, like user session informations, dictionaries and so on. It’s can make better performance, especially for the scene that read is way more than update.			~~Deleted~~	DEVOPS
~~TMCoreClient need to add more functions in common project~~	~~Functions in TMCoreClient is not enough,we need add more~~			~~Deleted~~	DEVOPS
Add /info endpoint	this endpoint should return the following information build timestamp in the format of yyyy-MM-dd HH:mm:ss(UTC) git branch name git commit id release version Necessary for developers to troubleshoot bugs		LOW	Closed	DEVOPS	Back Office: SM-2800 - create endpoint /info for back office modules Done
~~Add jacoco~~	~~Add this plugin for test coverage purpose, and later we can set such things as coverage threshold, and if the given coverage is below that threshold, there would be a build error.~~		HIGN	~~Deleted~~	DEVOPS	~~Back Office:~~ SM-2801 - Add jacoco to back office modules for test coverage purpose Done	Yuetong Yang (Unlicensed)	~~Merged with other topic~~
Implementation of Dead Letter Queues	Error handling for streaming services.		MEDIUM	Closed	DEVOPS	SM-3407 - Define rules for DLT management Resolved	Juraj Macháč (Unlicensed)	DLQ Management & Error Handling in Kafka
~~Add correlation/trace id in logs~~	~~We should enable the use of correlation/trace ids within logs so that it is easier to find related logs even across different services~~		MEDIUM	~~Deleted~~	DEVOPS			~~merged with other topic~~
Strategy for externally exposed endpoints	We need a way to secure endpoints that we will expose to external merchants.		HIGH	Open	DEVOPS		Norbert Bérci (Unlicensed) Yuetong Yang (Unlicensed)	Exposed Endpoints & Tyk Mappings
Traffic management Circuit breaker fallback handling retry …	We want our system to be durable, but currently we have no mechanism for retrying calls when a network error occurs.		MEDIUM	Open	DEVOPS	SM-3411 - Traffic management strategy To Do	Juraj Macháč (Unlicensed)
Logging Strategy	we need have a logging guidance / strategy for us to do logging in all services		MEDIUM	Open	DEVOPS	https://safibank.atlassian.net/jira/software/c/projects/SM/boards/21?modal=detail&selectedIssue=SM-3692	Yuetong Yang (Unlicensed)
Tracing	We need to define a mechanism to tracking and record the requests flow through our all of our services		MEDIUM	Open	DEVOPS	SM-3095 - Add http calls into Google Firebase Analytics and its x-request-id uuid header Done Observability in Micronaut	Andre Laksmana (Unlicensed) Yuetong Yang (Unlicensed)
Http2 Support?	How we are going to support Http 2 to have better performance		LOW	Open	DEVOPS		Yuetong Yang (Unlicensed)
In memory cache support	if we need to cache some thing in memory , we might need to use Micronaut cache support		LOW	Open	DEVOPS		Yuetong Yang (Unlicensed)
Rate limit for external API	Do we have a plan or strategy to do rate limit for our endpoints ?		LOW	Open	DEVOPS		Yuetong Yang (Unlicensed)
Service Monitoring / Alert Mechanism	Alerting mechanism if any of our services went down based on the APM chosen. For New Relic we have the synthetic rules to monitor service status at an interval		LOW	Close	DEVOPS
SMS failover policy	Need to have a failover policy for SMS		MEDIUM	Closed	DEVOPS		zhenghong.li (Unlicensed)
Avro amount/date field type Change	Need to unify “ bytes“ and “string“ type and date type		MEDIUM	Open	DEVOPS		Yuetong Yang (Unlicensed)	need guideline
~~Restriction of the endpoints on the api gateway~~	~~We need to make a configuration page to make sure our micro-services are exposed by the right way~~		MEDIUM	~~Open~~	DEVOPS			~~merged with other topic~~
Compilation failure due to using the MapStruct	Transaction-Processor-Manager using the mapStruct to do the object conversion. However, if the source object is provided by another service(like transactionStatus), the compilation would fail after a small change was make to the source object, i.e. adding a value to a particular enum attribute	Jevan Wu (Unlicensed)	LOW	Open	BACKEND		User ce01f
~~Management of endpoints exposed to the Internet~~	~~we have a page to keep track of exposed endpoints and their Tyk mappings:~~ Exposed Endpoints & Tyk Mappings~~, add yours if there are any~~ ~~for callback endpoints exposed to 3rd party platforms, please take authentication into consideration~~		~~MEDIUM~~	~~Open~~	~~BACKEND~~			~~merged with other topic~~
Macrokiosk	This SMS vendor is poor in terms of delivery velocity and API security and usage, why don’t we choose such vendors as Twilio	zhenghong.li (Unlicensed)	MEDIUM	Open	BACKEND
Code snippet `Workflow.await { identityVerified }`	For Temporal workflows, we often see such snippet in our current code base. Timed wait would be better and extra logics are needed to handle that	zhenghong.li (Unlicensed)	MEDIUM	Open	BACKEND		@Hui.zhu
Data Sync Strategy	Currently we are doing application level code to sync data from database to Kafka. It’s error prone and may cause data inconsistency.	Yuetong Yang (Unlicensed)	MEDIUM	Open	BACKEND		Yuetong Yang (Unlicensed)
distributed transaction	like account frozen, there should be one transaction in several service	User bd920					hui.zhu (Unlicensed)
chaos testing	we may need chaos testing for our system	Gavin Zhang (Unlicensed)			BACKEND		Gavin Zhang (Unlicensed)
we need a customer onboarding feature to collect customers' favorite methods/channels (SMS, Email, Viber) of receiving messages	Currently, we are able to send messages/notifications to customers via SMS, Email and Viber. But we don’t know what method/channel a customer prefer, so we may need such a feature.	zhenghong.li (Unlicensed)	MEDIUM	Open	Product/Backend		zhenghong.li (Unlicensed)
We should build a mechanism to handle the business exceptions	Now we are implementing the happy path of all business logic, but if there are some exception occurs in the temporal workflow or other services with a business error(account blocked, inactive, etc.), we don’t have a mechanism to handle this.	Kobe Wang (Unlicensed)	MEDIUM	Open	BACKEND		Kobe Wang (Unlicensed)
Thought Machine Performance Test	We don’t know the real performance and architecture of Thought Machine core banking system. Now we are using a sandbox to handle all the request and we have a test that created 10,000 accounts, the process goes slow, we need to find out the reason. If our smart contract code or the settings or thought machine itself has performance issue? Also how does the thought machine solve the python Global interpreter lock while doing the schedule jobs for millions of accounts?	Kobe Wang (Unlicensed)	MEDIUM	Open	BACKEND		User 3d5f3
Temporal metrics
3rd party vendors/platform access control & balance & credential management	We integrate several 3rd party platforms: Euronet, Paynamics, OneStop, Infobip, Macrokiosk and etc. We need a person/team/role/system to manage the accounts/credentials/access control to those platforms and API keys/access token/…; also, we need to setup prometheus metrics to monitor some of those accounts because we have to make sure that we have enough balances/money on some of those accounts.	zhenghong.li (Unlicensed)	MEDIUM	Open	MGMT DevOps		User 3d5f3
database table Primary key	need we use UUID as primary key? Because the UUID is out of order, when the rows of table reach hundreds of millions, I assume the btree index on PK would re-build on page frequently when inserting new row. That would slow the inserting and cause IO busy. Or do we have solution for primary key https://www.postgresql.org/message-id/20151222124018.bee10b60b3d9b58d7b3a1839%40potentialtech.com	User bd920					User bd920
A Github action to make sure that new migration script versions of the current PR are greater than the latest one of main branch	Flyway has a well defined migration script naming convections here. A github action is required to make sure that the new migration script version of current PR are grreater than the latest one of main branch	zhenghong.li (Unlicensed)	MEDIUM				zhenghong.li (Unlicensed)
Localization process/workflow and tools are required.	We need a localization process/workflow and tools. Some recommended tools: Mozilla-pontoon, weblate, Locaklize.	zhenghong.li (Unlicensed)					zhenghong.li (Unlicensed)