Improvements to the datasets


We are implementing some changes and updates to our public and investor data export data sets. The updates are mainly done in two areas:

  • Adjusting and adding data fields to reflect better the recent changes to the business processes.
  • Standardizing the data and fixing some known issues.

Changes will include:

  • The second application in the series of duplicates, will use the previous application info for LoanApplicationStartedDate, ApplicationSignedHour, ApplicationSignedWeekday
  • Previous duplicate application data will be excluded from the following fields: NoOfPreviousApplications, AmountOfPreviousApplications, NoOfPreviousLoans, AmountOfPreviousLoans, PreviousRepayments, PreviousEarlyRepayments.
  • Data fields that show percentages (ratios, interest rates etc) will all be shown in same number format. For example, 20% will be shown as 0.20 in every data field that includes percentages.

New fields include:

  • We’ll add to Investments list ScoringDate, ModelVersion, EL and Rating with same meaning as they are currently in Loan Dataset.
  • CancelledWithin1Month in both Loan Dataset and Investments list will show all loans cancelled or repaid by borrowers within 1 month after issuing.
  • Released funds will be put into separate row in both Portfolio Cashflow and Monthly overview.
  • Loan cancellations will also be separated into 3 new columns:
    • EarlyRepaidWithin14Days – repaid by borrower within 14 days
    • PostFundingCancellation – loan was released after our analyses
    • IdCancellation – borrower did not pass required ID checks

4 responses to “Improvements to the datasets”

  1. Taavi, when it will start? Will it be announced at least day before? Have you mentioned all changes? Any improvements in other datasets(i.e. future cashflow)?
    I hope they will update documentation at the same time and will identify all changed/new fields (i.e. in bold for few weeks)?

    Sorry if too many questions :) But thanks in advance for comments!

    • Certain fixes within data fields, like the updates in logic with duplicate loans, we expect to deliver already within this week. Rest of the updates are expected within the next 2-3 weeks.

  2. Can you please shorten the length of fields names to max 30 characters. At the moment it is not possible to (automatically) import reports to Oracle database. It is annoying to change the field names all the time…

    For example in investment list these fields:
    Employment_Duration_Current_Employer 36
    InDebt14Day_PrincipalProportion 31
    InDebt21Day_PrincipalProportion 31
    InDebt30Day_PrincipalProportion 31
    InDebt60Day_PrincipalProportion 31

    And also please do not use spaces in names. Oracle import fails.
    Problematic fields in loan dataset:
    1D FromFirstPayment
    14D FromFirstPayment
    30D FromFirstPayment
    60D FromFirstPayment

  3. If a loan gets rescheduled, sometimes the monthly payday is switched. Could we have this info (with the new monthly payday) in the historic data file?