Now all the tech debts have been migrated to a kanban board, please see herehttps://safibank.atlassian.net/jira/software/projects/STD/boards/43
Please list down tech debts you discovered in the project so we can make sure they are on track.
Tech Debt Name | Descriptions | Questioner | Priority | Status | Labels | Related Cards / tickets | Owner | Decision |
---|---|---|---|---|---|---|---|---|
Controller should run on IO threads | All blocking methods in controllers (or the controller itself) have to be marked to be run on IO thread.And currently not all of them are following this practice | HIGH | Close | BACKEND COMMONS | Back Office: SM-1982 - annotate audit-log-manager controller methods with @ExecuteOn Done Card: SM-3260 - All MS controller should use @ExecuteOn(TaskExecutors.IO) Done Ktlint: SM-2002 | need euronet-gateway, loan-manager to fix the klint rule in advance | ||
|
| MEDIUM |
| COMMONS | ||||
Daily or weekly testing (on-hold, might be included in our testing practices) | In addition to unit testing, we also need daily or weekly testing to run, to ensure the functionalities are not broken. The test can be called functional test or integration test. | ON-HOLD | Onhold | TEST | Ask QA to confirm if it done | |||
Define Unified Error Code and Error Message for Restful API Endpoints | When an error occurs due to invalid parameter values or something like that, we just throw the default micronaut errors to the caller. These messages are not that readable and user friendly. We propose that we define an error structure like this when we return HTTP 400 to the callers. { “code“: “SAFI_INVALID_PARAMETER_VALUE“, “offendingFieldName“: “customerId“, “message“: “'customerId' cannot be blank“ } We can discuss the detailed structure. | HIGH | Blocked | BACKEND | SM-3262 - All Cards MS to use unified error code and message Blocked Task | Proposal: Global Error Handling ardhi told me that he was told not to work on it since no agreement is reached yet. That means can PIC of that arrange a meet and make sure TLs agree on it | ||
Inconsistency of HTTP response status code when successfully create a resource with POST | When we implement a Restful endpoint to create a resource, some modules return 200, and some return 201 on success, we should be consistent on this | LOW | Open | BACKEND | write a tutorial about rest api | |||
Configurable Kafka topic names | Hardcoded kafka topic names like | MEDIUM | Open | BACKEND | ||||
Global Exception Handler | Exceptions should be handled in a centralized place, and normal business logics just throw exceptions if necessary. We can mimic SpringBoot @ControllerAdvance | LOW | Open | BACKEND | ||||
Different ways to launch dependent docker containers when running unit test cases | Some containers are launched from docker-composer file using DockerComposeContainer, some are started by using component-specific Container implementation, e.g, KafkaContainer. Here is the code snippet @Containervar temporal = DockerComposeContainer(File("src/test/resources/docker-compose-test.yml")) .waitingFor("temporal_1", Wait.forLogMessage(".*Default namespace default registration complete.*\\n", 1)) .withLocalCompose(false) @Containerval kafka: KafkaContainer = KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:6.2.1")) .withEmbeddedZookeeper() .waitingFor(Wait.forListeningPort()) Maybe be we should stick to one, and make this part a common lib | LOW | Closed | BACKEND | we have same setting on common squad | |||
Setup integration test & E2E test strategy for CI/CD | For now we always run itest along with unit test before deploy on CI. We need to update this approach to make sure we run itest & E2E test after the deployment. | MEDIUM | Open | DEVOPS | ||||
More Tests to cover corner cases | Need to write more tests to verify or validate negative/positive user’s input | MEDIUM | Open | DEVOPS | ||||
|
| LOW |
| DEVOPS | ||||
Add business level metrics for monitoring purpose | By business level metrics, we mean such things like number of requests per Restful API endpoint, number of logged-in customers per day, number of transactions per minute and total transactions so far. These metrics will help us better understand our system and provide facts for troubleshooting and analysis. We can expose such metrics to Prometheus and then create Grafana dashboards. | LOW | Open | DEVOPS | ||||
|
|
| DEVOPS | |||||
RESTful API versioning | Current RESTful API endpoints are not versioned, so when we upgrade (either API signature or implementation), the customer’s app is probably not working fine with the new version of API | HIGH | Closed | DEVOPS | In addition, need to consider how to move customer’s app smoothly to new version of APIs. 2 possible ways:
Back Office: SM-2803 - Document Restful API endpoint versioning strategy Done SM-3406 - Design REST API versioning approach Resolved | |||
Code Quality&Security strategy | we need to setup Sonarqube in backend services, Add Jacoco plugin for test coverage purpose, and later we can set such things as coverage threshold, and if the given coverage is below that threshold, there would be a build error. | HIGH | Open | DEVOPS | Back Office: SM-2802 - Add audit-log-manager to SonarQube for vulnerability scanning Done (Note that the SonarQube URL is http://34.124.144.240:8080/, and you can contact Andre Laksmana to make you an admin there first) | |||
|
|
| DEVOPS | |||||
Make sure configuration files are excluded in the generated api client libs and hand written client libs | we need to make sure both the generated api clients and hand written clients do not have configuration files included | MEDIUM | Open | DEVOPS | ||||
Unified Micronaut version | Some module use 3.4.1 and other ones use 3.4.2 or 3.5.2, moreover, 3.4.1 has some bug. | MEDIUM | Closed | DEVOPS | now the version updated to 3.7.0, but not force every squad to upgrade. | |||
Unified Time | We should use the same time zone for all of our applications. UTC or UTC + 8? | LOW | Closed | DEVOPS | ||||
|
|
| DEVOPS | |||||
|
|
| DEVOPS | |||||
|
|
| DEVOPS | |||||
Add /info endpoint | this endpoint should return the following information
Necessary for developers to troubleshoot bugs | LOW | Closed | DEVOPS | Back Office: SM-2800 - create endpoint /info for back office modules Done | |||
|
| HIGN |
| DEVOPS |
|
| ||
Implementation of Dead Letter Queues | Error handling for streaming services. | MEDIUM | Closed | DEVOPS | SM-3407 - Define rules for DLT management Resolved | |||
|
| MEDIUM |
| DEVOPS |
| |||
Strategy for externally exposed endpoints | We need a way to secure endpoints that we will expose to external merchants. | HIGH | Open | DEVOPS | ||||
Traffic management
| We want our system to be durable, but currently we have no mechanism for retrying calls when a network error occurs. | MEDIUM | Open | DEVOPS | SM-3411 - Traffic management strategy To Do | |||
Logging Strategy | we need have a logging guidance / strategy for us to do logging in all services | MEDIUM | Open | DEVOPS | ||||
Tracing | We need to define a mechanism to tracking and record the requests flow through our all of our services | MEDIUM | Open | DEVOPS | SM-3095 - Add http calls into Google Firebase Analytics and its x-request-id uuid header Done | |||
Http2 Support? | How we are going to support Http 2 to have better performance | LOW | Open | DEVOPS | ||||
In memory cache support | if we need to cache some thing in memory , we might need to use Micronaut cache support | LOW | Open | DEVOPS | ||||
Rate limit for external API | Do we have a plan or strategy to do rate limit for our endpoints ? | LOW | Open | DEVOPS | ||||
Service Monitoring / Alert Mechanism | Alerting mechanism if any of our services went down based on the APM chosen. For New Relic we have the synthetic rules to monitor service status at an interval | LOW | Close | DEVOPS | ||||
SMS failover policy | Need to have a failover policy for SMS | MEDIUM | Closed | DEVOPS | ||||
Avro amount/date field type Change | Need to unify “ bytes“ and “string“ type and date type | MEDIUM | Open | DEVOPS | need guideline | |||
|
| MEDIUM |
| DEVOPS |
| |||
Compilation failure due to using the MapStruct | Transaction-Processor-Manager using the mapStruct to do the object conversion. However, if the source object is provided by another service(like transactionStatus), the compilation would fail after a small change was make to the source object, i.e. adding a value to a particular enum attribute | LOW | Open | BACKEND | ||||
|
|
|
|
|
| |||
Macrokiosk | This SMS vendor is poor in terms of delivery velocity and API security and usage, why don’t we choose such vendors as Twilio | MEDIUM | Open | BACKEND | ||||
Code snippet
| For Temporal workflows, we often see such snippet in our current code base. Timed wait would be better and extra logics are needed to handle that | MEDIUM | Open | BACKEND | @Hui.zhu | |||
Data Sync Strategy | Currently we are doing application level code to sync data from database to Kafka. It’s error prone and may cause data inconsistency. | MEDIUM | Open | BACKEND | ||||
distributed transaction | like account frozen, there should be one transaction in several service | |||||||
chaos testing | we may need chaos testing for our system | BACKEND | ||||||
we need a customer onboarding feature to collect customers' favorite methods/channels (SMS, Email, Viber) of receiving messages | Currently, we are able to send messages/notifications to customers via SMS, Email and Viber. But we don’t know what method/channel a customer prefer, so we may need such a feature. | MEDIUM | Open | Product/Backend | ||||
We should build a mechanism to handle the business exceptions | Now we are implementing the happy path of all business logic, but if there are some exception occurs in the temporal workflow or other services with a business error(account blocked, inactive, etc.), we don’t have a mechanism to handle this. | MEDIUM | Open | BACKEND | ||||
Thought Machine Performance Test | We don’t know the real performance and architecture of Thought Machine core banking system. Now we are using a sandbox to handle all the request and we have a test that created 10,000 accounts, the process goes slow, we need to find out the reason. If our smart contract code or the settings or thought machine itself has performance issue? Also how does the thought machine solve the python Global interpreter lock while doing the schedule jobs for millions of accounts? | MEDIUM | Open | BACKEND | ||||
Temporal metrics | ||||||||
3rd party vendors/platform access control & balance & credential management | We integrate several 3rd party platforms: Euronet, Paynamics, OneStop, Infobip, Macrokiosk and etc. We need a person/team/role/system to manage the accounts/credentials/access control to those platforms and API keys/access token/…; also, we need to setup prometheus metrics to monitor some of those accounts because we have to make sure that we have enough balances/money on some of those accounts. | MEDIUM | Open | MGMT DevOps | ||||
database table Primary key | need we use UUID as primary key? Because the UUID is out of order, when the rows of table reach hundreds of millions, I assume the btree index on PK would re-build on page frequently when inserting new row. That would slow the inserting and cause IO busy. Or do we have solution for primary key https://www.postgresql.org/message-id/20151222124018.bee10b60b3d9b58d7b3a1839%40potentialtech.com | |||||||
A Github action to make sure that new migration script versions of the current PR are greater than the latest one of main branch | Flyway has a well defined migration script naming convections here. A github action is required to make sure that the new migration script version of current PR are grreater than the latest one of main branch | MEDIUM | ||||||
Localization process/workflow and tools are required. | We need a localization process/workflow and tools. Some recommended tools: Mozilla-pontoon, weblate, Locaklize. |